DOC

Policy Optimization Study Based on Evolutionary Learning

By Samuel Walker,2014-02-18 11:33
7 views 0
Policy Optimization Study Based on Evolutionary Learningon,Study,Based,based,study

    Policy Optimization Study Based on

    Evolutionary Learning JoumalofDonghuaUniversity(Eng.Ed.)Vo1.26,No.6(2009)621 PolicyOptimizationStudyBasedonEvolutionaryLearning L1USu-ping(~J素平),DINGYong.sheng(T永生)

    JCollegeofInformationScienceandTechnolog3'.DonghuaUniversity,Shanghai201620.C

    ina

    2Inj'orrnatizationOffice,DonghuaUniversity.Shanghai201620,China 3EngineeringResearchCenterofDigitizedTextile&FashionTechnology.iVIinistryof

    Education,DonghuaUniversity.Shanghai

    201620.ina

    Abstract:Inordertoachieveanintelligentandautomated sel~managementnetwork,dynamicpolicyconfigurationand selectionareneeded.AO~l-tairlpolicyonlysuitstoacertain networkenvironment.Ifthenetworkenvironmentchanges, thecertainpolicydoesnotsuitanymoreThereby,the policy-basedmanagementshouldalsohavesimilar"natural selection"process.Usefulpolicywillberetained,and policieswhichhavelosttheireffectivenessareeliminated.A policyoptimizationmethodbasedonevolutionarylearning wasproposed.Fordifferentshootingtttlles,thepriorityof policywithhighshootingtimesisimproved,whilepolicy withalowratehaslowerpriority,andlong-termnoshooting policywillbedormant.Thusthestrategyforthesurvivalof thefittestisrealized,andthedegreeofself-learningin policymanagementisimproved.

    Keywords:policy-basedmanageTnent;evolutionlearning;

policyoptimization

    CInumber:TP393.07Doclentcode:A

    .

    ~rticleID:l6725220(2009)06062104

    introduction

    Policy-basedmanagementbecomesapromising

    solutioninlarge-scaleddistributednetworkmanagement. Itisdevelopedtodeliversimplificationandautomationof thenetworkmanagementprocess.Byprovidinguniform cross-productpolicydefinitionandmanagement infrastructure,amanagermaysimplycontrolthebusiness objectivesandthesystemwillmakeitondemand. Networkpolicyisanaggregationofpolicyrules.Each policyismadeupofaseriesofconditionsanda

    corresp0ndingsetofformallydefinedactions.Whenaset ofassociatedconditionsaremet,thespecifiedactionmust betaken:.

    Theadvantageofusingpolicyisthattheexpansibility andagilityofmanagementsystemareimproved.Although policy-basedmanagementhasbeenthesubjectofautonomic computing(AC)research,itsstaticpolicyconfigurations CaDnotaccordwiththeaimofself-management.Proposed solutionsareoftenrestrictedtocondition-actionrules,so manualinterventionisrequiredtocaterforconfiguration changesandenablepolicydeployment.

    Policy-basednetworkmanagementachievesdistributed anddynamicadaptivemanagementbypolicy.Inthe processofpolicy-basedmanagement.inordertoachievea moreintelligentandautomatedself-managementnetwork, dynamicpolicyconfigurationandselectionareneeded.

    Dynamicchoiceofthepolicycanalsobeseenasaprocess similartobiologicalevolutionprocessof"naturalselection, surviva1ofthefittest,,[.

    Aframeworktorealizepolicyadaptationdynamically ispresenteff,whichsupportsautomatedpolicy deploymentandflexibleeventtriggerstopermitdynamic policyconfiguration.Buttheirworkfocusonlyontwo categoriesofpolicyadaptation,andthethirdcategoryof policyadaptation,policyadaptationbylearningwhichare themostsuitablepolicyconfigurationstrategiesfromthe systembehavior,hasnotrealizedintheirwork.Inspffed byclassicalconditioning,whichisthebasiclearningmode ofbiologicalsystem,wepresentadynamicpolicy adaptationframework.whichisfoundedwithseveral

    simplebuildingblockscomposingacompletereflexarc.It isanextensionofInternetEngineeringTaskForce(1ETF) frameworkforpolicy-basedmanagement.Inordertolearn themostsuitableconfigurationpolicyfromsystem behavior,networkrulesspecifiedwithinthisframework aredynamicallytriggeredbycomparingtrainingstimulusof theexperiments.Sometypicalexamplesareintroducedto testifyhownetworkmanagementpoliciescaterforthe dynamicmanagementofnetworksecurity,andtheselected networksecuritypolicyisvariedwithchanging Receiveddate:20080927

    Foundationitems:NationalNaturalScienceFoundationofChina(No.60534020);Cultivati

    onFundoftheKeyScientificandTechnical

    InnovationProjectfromMinistryofEducationofChina(No.706024);InternationalScience

    CooperationFoundationofShanghai,

    China(No.06l307041)

*CorrespondenceshouldbeaddressedtoLIUSuping,E?mail:lsp@dhu.edu.cn

    622Journal0fDonghuaUniversity(Eng.Ed.)Vo1.26,No.6(2009) environmentatrun.time.

    Inthispaper,apolicyoptimizationbasedonevolution learningisproposed.?astoachieveamoreintelligentand automatedself-managementnetwork.

    1TemporalEffectivenessofPolicy

    Policyisagroupofruleswhichguidehowtocontrol anddistributenetworkresources.Todynamicallyadjust systembehavior,wedonotneedtochangewithinthe systemmanagementcomponents,andallwejustdoisonly toestablishanewpolicy.Butacertainpolicyonlysuitsto acertainnetworkenvironment.Thisisthetemporal effectivenessofpolicy.

    Wetakethespreadofnetworkwormsasan

    example.RapidspreadofInternetwormsisamajor featureoftheworm.Itsspreadusesupalotofnetwork resources.orevenleadstonetworkbreakdown.In 2001.CodeRedvirusbrokeout,anditattacked 2j0000computersin9h.MostInternetwormsoftenuse loopholesinMicrosoftsystemtotransmit,andtheworm itselfistimesensitive.Securitypolicycanbeappliedto preventthewormfromtransmitting.Aftersystem administratorsrepairedthecorrespondingpatch,the networkwormwouldloseitstransmissionpass. Therefore,afteraperiodoftimethevirusattackbecame weaker,andtheexistencevalueofthecorresponding securitypolicieswouldbecomelowertoo.

    Thereby,thepolicybasedmanagementshouldalso

    havesimilar"naturalselection"process.Usefulpolicywill

    beretained,andpolicieswhichhavelosttheireffectiveness areeliminated.

    2EvolutionaryPolicy0ptimization

    Weassumepolicysetimplementedincertain

    equipmentasapopulation,namely,"P(f)".Here"t"is evolutionarystep.Anelementof"P(r)"isrepresentative ofonepolicyorasetofpolicies,namely,1(),x2(t),

    andsoon.Theelementl(t)representsthefirstpolicy, andx2(t)expressesthesecondpolicy.Thenumberof elementsinP(f)iscountedbyN.

    WeaSsulnethattherearcnoprosandconsinthe

    populationatthebeginning,andeachindividualenjoysthe sameleveloftreatment.Theimplementedorderofthe policydoesnotmatter,andeverypolicyisrandomlygiven apriority.

    Toexpressthedegreethatthepolicymatches

    environment,wetaketheimplementationnumberofthe policyasaparametertoassesstheusefullevelofthepolicy inthepopulation.Theimplementationnumberexpresses thefitnessofindividuals,namely,c(t).Forexample, c1(f)representsthefitnessofthefirstpolicy.Thebigger thevalueofcf(t)showsthemoreimplementationnumber ofthepoUc~,andthemoreusef~pogcy.Thesmaller valueofc(t)showsthefewereventsmatchwiththe policy.

    Forpolicywithbigcl(t)value,higherpriority shouldbeallocated.soitcanbeexecutedbeforethe policywithsmallc(t)value.Forpolicywithsmall c(t)value,lowerpriorityshouldbedistributed,and thisletpolicywithbigcf(t)beimplementedfirst.This

    issimilartobiologicalgroups,astheindividualadaptto theenvironmentisretainedbutwhicharenotsuitedis eliminated.Policyoptimizationalgorithmisshownin Fig.1.

    Fig.1Policyoptimizationalgorithm

    Onelifecycleinthealgorithmisdefinedastime counterc(f)countingfrom0tothemaximumalcount,or severaltimesofthattimeperiod.

    Sometimessomepoliciesassociateeachother,and theymustbeperformedinafixedorder.Theycanbe classifiedasagroupofpolicieshandledasawholeinthe optimizationprocess.Sotheimplementationorderwillnot changeamongthegroup,andallpoliciescanbeappliedto thismethodandoptimized.

    Whenagroupofrelatedpoliciescomparewithother policy,thelargestvalueofcf(f)isgotfromthegroup, andthisensurethataslongasonepolicyofthegroupis surviving,otherpoliciesofthegroupalsohavethe necessarytosurvive.

    3MethodofPolicySleepandWakeup

    Sleepistomaintainself-inhibitionfunctionofnormal brain,andorganismscangetpromotionofmentaland physicalrehabilitationthroughsleep.Sincepolicyistime- sensitive.wemaketheindividualdormant,whosefitness valuec(t)keepslong-termzero.Thisislikeakindof biologicalgroupswiththesurvivalofthefittest,namely, thebestadapttotheenvironmentisretainedbutwhichare notsuitedaretemporarilyorpermanentlywashedout. JourrmlofDonghuaUniversity(Eng.Ed.)9'o1.26.No.6(2009)623 PolicysleepalgorithmisshowninFig.2

{si(t)=0;mt;e1;

    While(ci(t)=0ande:1)do

    {While(t-m)=1do

    {i(t)=si(t)+1;)//countforwhoseci(t)keeps long-termzero;

    m:t:

    tt+1:

    Waitforafifecycle;

    Getfitnessvalueci(t+1)fromindividualsofP(t 1):

    Whilesi(t)=2047do//ci(t)keepszerocontinually for2048lifecycle;

    {disablexi(t);//disablethepolicy,andmakeitsleep; s(t)=0;e=0;

    Bi-i~lkI)

    )

    Fig.2Policysleepalgorithm

    Withthechangingenvironment,thesleeppolicy mayawakeagain.Wedefinetheperiodofdormancy, andwecanwakethepolicyupafterthedormancy period.Iftheenabledsleeppolicyissetdormant again,itshowsthatthetemporaleffectivenessofthe policystillloses.Thenitsdormantperiodisextended, andthepolicyentersthedeepsleep.Iftheenabled deepsleeppolicyissetdormantforthesecondtime, itsdormantperiodisfurtherextended,anditcollapses intoacomaperiod.Iftheenabledcomapolicyisset dormantforthethirdtime,thepolicyisfinally deleted,anditgetsdeathsentence.Theprocessis describedasfollows:

Sleep-~deepsleep---coma-*-death

    Ineachofthefourprocesses,thejudgmentlifecycle, whichdetermineswhetherthepolicyshouldbesleepor not,shouldbeextendedwiththeextendedsleepperiod accordingly.Forexample,ifc(t)keepszerocontinually for2048lifecycle,thenthepolicygoestosleep.Afterthe sleepperiod,thepolicywakesup.Theconditiontomake thispolicyenterdeepsleepisthatc(t)keepszero continuallyfor4096lifecycle.Thistreatmentistoprevent themisjudgmentofthedeletionpolicy.

    4CaseStudy

    AccessControlLists(ACL)canbeusedtorealize flowcontro1.anditcandecidewhichkindofdatacan accessthenetwork.Bythisway,thesecurityofthe networkcanbeimproved.ACLpoliciescanbeenabledin mostofswitches.

    Inthissection,weselectsomeanti.wormACLpolicies inacertainswitchasanexampletostudy(shownin Table1).

    Table1Anti.wormACLPolicies

    NoPolicy

    1

    2

    3

    4

    0

    6

    7

    Createaccesslisttcp135d-detcpdestinationanyl'p

port135sourceanyipportanydenyportsany

    precedence1012

    Createaccesslisttcp139d-detcpdestinationanyl'p

    port139sourceanyipportanydenyportsany

    precedence1014

    Createaccesslisttcp445ddetcpdestinationanyl'p

    port445sourceanyip?portanydenyportsany precedence1016

    Createaccesslisttcp593d-detcpdestinationanyi'p?

    port593sourceanyipportanydenyportsany

    precedence1018

    Createaccesslisttcp4444d-detcpdestinationanyl'p

    port4444sourceanyipportanydenyportsany

    precedencel020

    Createaccesslistudp69d-deudpdestinationanylp?

    port69sourceanyip-portanydenyportsanyprecedence 1022

    Createaccesslistudp1434d-deudpdestinationanylp-

    port1434sourceanyip-portanydenyportsany precedence1024

    ThesequenceofACLpoliciesbeingexecutedisfrom highprecedencetolowprecedence.InTable1,thesmaller theprecedencenumberis,thehigherthepolicyprecedence is.SotheexecutedsequenceinTablelisfromNo.1policy toNo.7policy.

    InTable1.No.7policyisapolicytopreventMS-SQL ServerWormfromspreading,andtheotherpoliciesare usedtoblockW32/Blasterworm.

    Accordingtothecharacteristicsoftheseworms,the virusprotectionpoliciesblockthecommunicationportthey

    needtovisit,sotheyprotectnetworksagainstattacksfrom certainworms.

    Afterpolicyoptimization,forpolicywithbigfitness value,higherpriorityisallocated,sothepolicywith higherprioritycanbeexecutedbeforethepolicywithsmall fitnessvalue.Inthisexample.ifMS-SOLServerWorm breaksout,theprecedenceofNo.7policyaresetfrom 1024to101t,sothispolicyisexecutedbeforeother policies.ItisshowninTable2.

    Table2Optimizedanti-wormACLpolicies

    N0Policy

    2

    Createaccesslistudp1434——d-deudpdestinationanyip-

    portl434sourceanyip-portanydenyportsany precedence1011

    Createaccesslisttcp135——ddetcpdestinationanyip-

    port135sourceanyip?portanydenyportsany

    precedencel012

    Createaccesslisttcp139——d-detcpdestinationanyip

    port139sourceanyipportanydenyportsany

    precedence1014

    624JournalofDonghuaUniversity(Eng.Ed.)Vo1.26,No.6(2009) (Tle1continued)

    NoPolicy

    Forthosewhohavepassedtheperiodofpopular worms,thepolicyisautomaticallydisabled,sothismethod cangreatlyspeedupsystem'sresponsetimeandimprove thedegreeofintelligence.IftherearenoW32/Blaster wormsforalongtime,therelatedpoliciesaredisabled,as showninTable3.

Report this document

For any questions or suggestions please email
cust-service@docsford.com