DOC

TCP-IP

By Kristen Porter,2014-05-21 03:10
7 views 0
TCP-IP

    TCP-IP

    NaturaISciences

    ArticlelD:10071202(2007)01015104

    DOI10.1007/s11859.00602632

    Vo1.12NO.12007151-154

    TCP,_PFeatureReductionin

    IntrusionDetection

    LIUYuling,WANGHuiran,TIANJunfeng

    CollegeofMathematicsandComputer,HebeiUniversity Baoding071002,Hebei,China

    Abstract:DuetotheamountofdatathatanIDSneedstoexam

    ineisverylarge.itisnecessarytoreducetheauditfeaturesand neglecttheredundant~atares.Therefore,weinvestigatedthe performancetoreduceT?PPfeaturesbasedonthedecision

    treerulebasedstatisticalmethod(DTRS1.Itsmainideaistocreate ndecisiontreesinndatasubsets,extracttherules.workoutthe relativelyimportantfeaturesinaccordancewiththefrequencyof useofdifferentfeaturesanddemonstratetheperformanceofre. ducedfeaturesbetterthanprimaryfeaturesbyexperimentalre

    suits.

    Keywords:intrusiondetection;featurereduction;decisiontree; datamining

    CLCnumber:TP393.08

    Receiveddate:20060520

    Foundationitem:SupportedbyNaturalScienceFoundationofHebeiProv

    ince(F2004000133)

    Biography:LIUYuling(1964),female,Associateprofessor,researchdirection:

    informationsecurityandalgorithmanalysis.Email:lyl@mail.hbu.edu.cn 0Introduction

    Acomputersystemintrusionisseenasanysetof actionsthatattempttocompromisetheintegrity,confi

    dentialityoravailabilityofaresource.Althoughpre

    ventativetechniquessuchasaccesscontrolandauthen. ticationattempttopreventintruders,thesecanfail,and asasecondlineofdefense,intrusiondetectionhasbeen introduced.Intrusiondetectionreferstothemecha. nismsthataredevelopedtodetectviolationsofsystem securitypolicy151.

    Manyintrusiondetectionsystems(IDSs)arefar fromtheideasecuritysolutions.Lackofaccuracydueto lowdetectionrateandhighfalsealarmrateisstillama

    orshortcomingofIDSs.EventhemoreaccurateIDSs oftenwastetimeandresourcestodetectandrespondto attacks;theyoverloadsecuritystaffandincreasethe maintenanceandexpenseofIDSspJ.Therefore.extract

    ingusefulinformationintheauditdataisoneoftheeffi. cientmethodforIDSstocomputeefflcientlyandtode. tectattacksaccurately.

    InRef.6],SrinivasMukkamalaheandAndrew

    H.SungusePerformancebasedmethodforrankingim.

    portance(PFRM)toidentifytheimportantfeaturesofthe TCP/IPdata.makingthefollowingconclusions:onlythe importantfeaturesareused,theresultachievesahigher accuracythanallthefeaturesareused.However,PFRM neglectstherelationshipamongthefeatures.InRef.3,

    SrilathaChebrolu,andJohnsonRTh0masuseBayesian learningandMarkovblanketmodelingofinputfeatures

    fBLMB),andtheC:lassificationandRegressionTrees fCART)tOidentifytheimportantfeatures.Theresults concludethatBLMBandCARThavedifferentaccura

    ciesforfourtypesofattacks(Dos,Probe,U2r,R21),espe

    ciallythepredictionoftheU2risworsethanthePFRM. ThispaperintroducesanothermethodnamedDeci

    siontreerulebasedstatistic(DTRS)forthefeaturesre

    duction,andtheresultoftheexperimentbasedonthe datasetfromtheMITLincolnLaboratoryshowsthat performanceoftheDTRSiscomparativelysatisfyingfor thefeaturesreductionandaccuracyofprediction. 1DecisionTreeRule-BasedStatistic

    andTestingMethods

    1.1TheBasicTheory0fDTRS

    DecisiontreefDT)isonemethodofDatamining (DM),whichgenerallyreferstotheprocessofextracting usefulmodelsfromlargestoresofdataL/JDTiSa methodofapproachingtargetfunctionofdiscretevalues, inwhichthefunctionisexpressedasaDTandtheDT alsocanbeinsteadOfmanvrulesasif-then8,91.Themles

    extractedfromDThavedifferentfrequencyforusingall thefeaturestowarddifierentclassestobepredicted,and insomeoftherulesallthefeaturescanbeused.butin someotherrules,onlyseveralfeaturescanbeused,so wecanjustpickuptheimportantfeaturesaccordingto thedifferentfrequencyofemergenceintherules.Thisis thefollowingmethod:DecisionTlreeRulebasedStatis

    tic.

    1.2DTRS'sAlgorithm

    ThemethodofDTRSdividesthelargedatasetinto manysubsets,onwhichdecisiontreesaremade,ex

    tractingtherulesetfromeachdecisiontree,computing thefrequencyofeachfeatureineachrulesetforthedif- ferentclasses,finallygettingaunionofeachfeature's frequencyforalltherulesets.ThealgorithmisasFig.1 toshow.

    Thestepofthealgorithmisasfollowing:

    Fig.1DTRSflowchart

    

    Step1DividingthelargeTCP/IPdatasetinton subsets,onwhichndecisiontreesaremade(DTI, DT2,,DT).

    Step2Extractingeachruleset(r,r2,,,I)from

    eachdecisiontree,computingthetimesofeachfeavture inandgettingtheunionfortherulesetsaccordingto differentclasses.Giventhatthereismfeatures(m[1,

    m[21.,m[m1),kclasses(kl,k2,,kk),andthecounter

    ljissetupwhentheclassiskkandthefeatureismlj], thekeyalgorithmisasfollowing:

    Input:r,r2,

    

    ,

    ,;

    Output:k~t/],k2IJ],,L;

    Begin

    Lablel:Whilenotnull//:l,2,,n

    ForfirstRuletolastRuledo

    begin

    ifclass=kl

begin

    while(mlj]notnul1)and(j<~m)l/j:l,2,,m

    lplusl;

    end;

    M

    ifclass=kk

    begin

    while(mlj]notnul1)and(『?m)//j:l,2,,m

    Lplusl;

    end;

    end;

    plusl:

    Iff?n

    Jumplabell;

    End;

    Step3Foreachfeaturerkl,k2,,kk),accordingto

    theabovestatisticresult,orderingmfeaturesbyfre

    quencyfromhightolowandthefeaturewhichisused lessthan2timescanbeignored(experientialvalue),just keepingdownffeatureswhichformtheimportantfea

    turesets(SET).Wlewilldotheexperimenttoprovethat theperformanceofusingffeaturesisbetterthanusingm features

    1.3TestingMethod

    Thetestingmethodstil1usesDTtobuildclassifiers. whichisasfollowing:

    Step1Dividethelargedatasetintotrainingdata

    setandtestdataset,extractingnrulesfromthedecision treewhichisusedonthetrainingdataset.Givenoneof

theruleslikethat(A>3,B?7,C=0classG[0.833]),

    whenthefeaturesofonetestingdatameetfA>3.B?7,

    C0),thereliabilityofclassGofthistestdatais0.833. W_edOthetestingusedonthetestingdatasetaccordingto thenrules,andgetthevalueoftheclassrestingonof

    whichthevalueofthereliabilityismaximumthrough comparisonamongnrules.

    Step2ShOrtthenumberoffeaturesofthetraining datasetfrommintoi,extractingtwodifferentrulesets fromdecisiontreesusedontrainingdatasets,ofwhichone ownsmfeatures,theotherownifeatures,thendotheex

    Perimentonthetworulesetsusedonthetestdataset. 2ExperimentEmulationandAnalysis

    oftheResults

    2.1TheDataoftheExperiment

    Inthe1998DARPAintrusiondetectionevaluation program.anenvironmentwassetuptoacquireraw TCP/IPdumpdataforanetworkbysimulatingatypical U.S.AirForceLAN.TheLANwasoperatedasareal environmentbutonethatwasbeingblastedwithmulti

    pieattacks.ForeachTCP/IPconnection,41features wereextracted.Attacktypesfal1intofourmaincatego

    ties:Dos(denialofservice),Probing(surveillanceand otherprobing),U2r(unauthorizedaccesstolocalsuper rootprivilege).R2l(unauthorizedaccessfromaremote machine).

    2.2FeatureReduction(DTRS)

    WeusethedatasetfromtheTcpdumpdatabaseas mentionedabove,dividingthedatasetintothreesubsets,

    makingninedifferentdecisiontreesonthedatasubsets accordingtodifferentparameterssettingwhenSee5 softwarecreatethetrees,fromwhichweextractninerule sets.Thestatisticresultsaresummatizedinthefol

    lowingtables,whicharecomputedbytheDTRSalgo

    rithmandusedontheDelphieditor.T{lble1showsthe numberofeachfeatureisusedinalloftherules.T{lble2 showstheorderingresultofthefeaturesaccordingtothe eciencyofeachfeatureisusedintherulesandthe featurewhoseefficiencyishighiscomparativelyimpor

    tant.butthefeaturewhoseeciencyislowislessim

    portant.InT{lble2.thefeaturenamesaredisplacedby correspondingserialnumbersthatareshowedinTable1. InTable2.theitemoftotaltypesreferstotheordering Table1Theefficiencyoffeaturesstatistics Table2Rankingofthefeatures

    TypesRankingofthefeatures

    AH,E,J,AJ,C,F,A.W,B,M,AC,AK,AF,AG,AI,AN,AA AM.AO

    AH,E,J,C,AI,AJ

    Normal

    Dos

    ProbeE,C,AJ,J,F,AH,B,AG,AI,AK,V,AM,AO,M.AC,AN,AK, W,AF,AA,G,H,L,P,X,AD,AE

    U2rC,J,E,AJ,AM,M,F,AH,AG,AF,AC,AN,AA,V,A,W,L,D R21E,C,J,AH,AJ,AI,AG,AK,F,A,L,M,V,AA,AO,X,AC,AN W,AM,AF,D,AE

    AlltypesE,C,J,AH,AJ,F,AI,AG,AK,M,A,B,V,AM,AN,AO,W, AA,AC,AF,L,X,K,D,AE,G,H,P,AD

    ofthefeaturesofthetypescontainedintheattacksand

    norma1.andwedeletethefeatureofwhichthenumberis lessthan2initem.Accordingtotherankingmethod,at last,41featuresarereducedto24features(E,C,J,AH, AJ,F,AI,AG,AK,M,A,B,V,AM,AN,AO,W,AA,

    AC,AF,L,X,K,D),whichconstitutethesubsetswith optimization.

    2.3TheTestingExperimentandResult

    2-3.1Preparationofdata

    Thesamedataset(Tcpdumpdataset)describedin Section2.1isbeingused.ofwhich60%areusedfor trainingand40%areusedfortesting.

    2_3.2Theprocessofexperiment

    Step1First,wecutthefeaturesofthetraining datafrom41to24(E,C,J,AH,AJ,F,AI,AG,AK,M, A,B,V,AM,AN,A0,W,AA,AC,AF,L,X,K,D).

    Second,weclassithetrainingdataintofivecatego

    ties(Normal,Dos,Probe,U2r,R21).Third,weextracting rulesfromthedecisiontreescreatedbySee5software usedonthetrainingdata,notedasruleset1,containing 36rule.

    Step2Asthesamemethodabove,wereducethe

    sametrainingdatasetfromthe41featuresto23features (A,B,C,D,E,F,J,L,Q,W,X,z,AA,AB,AC,AE,

    AF,AG,AH,AJ,AL,AM)whichwasgottenfromby PFRM.Thenweextractingrulesfromthedecisiontrees createdbySee5softwareusedonthistrainingdataset, notedasruleset2,containing44rules.

    Step3Weextractingrulesfromthedecisiontrees createdbySee5softwareusedonthetrainingdataset whichhas41features,notedasruleset3,containing47

rules.

    St4UsingtheDelphisoftwareasthetesting

    tool:wetesteachTCPconnectiononthetestingdata accordingtotherulesfromthetrainingdata.Theresult 153

iSasfollowing.

    2.3.3Resultsofexperiment

    T{lb1e3comparestheperformanceusing24features

    reduceddatasetbvDTRSand23featuresreduceddata

    setbyPFRMandthe4lfeaturesoriginaldatasetonthe

    testdatasetoftheTcpdumpdatasetfortheknownat

    tacks.

    Tlable3Resultoftest%

    InTable3,N.accuracy,D.accuracy,P.accuracy, U.accuracy.R.accuracy,A.accuracyrepresentNomal accuracy,Dosaccuracy,Probeaccuracy,U2raccuracy, R21accuracy.attackaccuracyrespectively.Wlecansee fromthetable,ExceptfortheU2rtype,the24datasets performedwellforalltheotherclasses,andfortheU2r class,the24datasetsperformedbetterthanotherdata sets,andusingreduceddatasets,thetestingtimehas beenshortentoacertainextent.

    3Conclusion

    Becausedataminingcanautomaticallyminethere

    lationshipamongthedata.DTRSextracttheimportant informationfromtherulesandthenconstructthedetec

    tionmodelbasedonthereducedfeatures.Wehavealso demonstratedperformancecomparisonsusingdifierent reduceddatasets.However,weknowthatthedifierence

    inaccuracyfigurestendtobeverysmallandmaynotbe statisticallysignificant,especiallyinviewofthefactthat thedifferentclassesofpatternsdifferintheirsizestre

    mendously.Ourfutureresearchwillbedirectedtowards 154

    analyzingmorecomprehensivesetsofnetworktraffic dataintherealenvironmentanddevelopingmoreaccu

    rateclassifiers.

    References

    lHeadyR,LugerG,MaccabeA,eta1.TheArchitectureofa NetworkLevelNetworkIntrusionDetectionSystem.(Tech

    nicalReportCS9020)R,UniversityofNewMexico:De

    partmentofComputerScience.1990.

    2BiermanE,CloeteE,VenterLM.AComparisonofIntru

    sionDetectionSystems[J].Computers&Security,2001, 2O(8):676683.

    3ChebroluS,AbrahamA,ThomasJP.FeatureDeductionand EnsembleDesignofIntrusionDetectionSystem[J]. Computer&Security.2005.24(4):295307.

    4MukkamalaS,SungAH,AbrahamA.IntrusionDetection UsingallEnsembleofIntelligentParadigms[J].Journalof NetworkandComputerAppplications,2005,28(2):167182.

    5WenkeL,FanWlei,MillerM,eta1.TowardCostSensitive

    ModelingforIntrusionDetection.(TechnicalReport CUCS00200)R,ColumbiaUniversity:ComputerScience, 2O00.

    61MukkamalaS,SungAH.IdentifyingSignificantFeatures forNetworkForensicAnalysisUsingArtificialIntelligent Techniques[J].InternationalJournalofDigitalEvidence,

Report this document

For any questions or suggestions please email
cust-service@docsford.com