DOC

Immune algorithm for discretization of decision systems in rough set theory

By Catherine Dunn,2014-06-02 02:20
11 views 0
Immune algorithm for discretization of decision systems in rough set theoryof,in,for,rough,set,Rough,Set,sets

    Immune algorithm for discretization of

    decision systems in rough set theory 602Jiaeta1./JZhejiangUnivSCIENCEA20067l41:602606

    JournalofZhejiangUniversitySCIENCEA

    ISSN10093095(Print);ISSN18621775(Online)

    www.zju.edu.czUS;www.springerlink.tom

    Email:jzus@zju.edu.cn

    Immunealgorithmfordiscretizationofdecisionsystemsinroughsettheory JIAPing,DAIJian.huat

    ,CHENWei.dong,PANYun.he,ZHUMiao.1iang

    (InstituteofArtificiallntelligence,Zh~iangUniversity,Hangzhou310027,China) E-mail:jhdai@zju.edu.cn

    ReceivedJuly8,2005;revisionacceptedNov.20,2005

    Abstract:Roughsettheoryplaysanimportantroleinknowledgediscovery,butcannotdealwithcontinuousattributes.thus

    discretizationisaproblemwhichwecannotneglect.Anddiscretizationofdecisionsystemsinroughsettheoryhassomeparticular

    characteristics.Consistencymustbesatisfiedandcutsfordiscretizationisexpectedtobeassmallaspossible.Consistentand

    minimaldiscretizationproblemisNP

    complete.Inthispaper.animmunealgorithmfortheproblemisproposed.Thecorrectness andeffectivenesswereshowninexperiments.ThediscretizationmethodpresentedinthisPaDercanalsobeusedasadatapre.

    treatingstepforothersymbolicknowledgediscoveryormachinelearningmethodsotherthanroughsettheory.

    Keywords:Roughsets,Discretization,Immunealgorithm,Decisionsystem doi:10.1631/jzus.2006.A0602Documentcode:ACLCnumber:TP18

INTRODUCTION

    Asaneweffectivemathematicaltooltodeal

    withvaguenessanduncertainty,theroughsettheory wasfirstproposedbyPawlakin1982.Theoretical researchonroughsettheoryischaracterizedbymany achievements(Dai,2004b;2005;Daieta1..2004).In recentyearsroughsettheoryhasbeensuccessfully appliedinmanyfieldssuchasmachineIeaming. pattemrecognition,decisionsupportanddatamining. Butroughsettheorycannotdealwithcontinuous attributesalthoughaverylargeproportionofreal datasetsincludecontinuousvariables.Onesolutionto thisproblemistopartitionnumericaIvariablesintoa numberofintervalsandtreateachintervalasacate. gory.Thisprocessofpartitioningcontinuousvari

    ablesintocategoriesisusuallytermeddiscretization. Discretizationinroughsettheoryhasnewcontent whenconsideringindiscemibilityrelation.Generally ProjectsupposedbytheNationalBasicResearchProgram(973) ofChina(No.2002CB312106),ChinaPostdoctoralScienceFounda- tion(No.2004035715),theScience&TechnologyProgramofZhe- jiangProvince(No.2004C31098),andthePostdoctoralFoundationof

    ZhejiangProvince(No.2004?bsh023),China

    speaking,weshouldseekpossiblyminimumnumber ofdiscreteintervals,andatthesametimeshouldnot weakentheindiscemibilityability.Inotherwords.we wanttogetconsistentandminimumdiscretization. Dai(2004a)presentedageneticalgorithmforthe problem

    Immunealgorithm(IA)(Chuneta1.,1997;1998)

    isanewoptimizationalgorithmimitatingtheimmune systemtosolvemanyproblemsintherealworld.In thispaper,weproposeanimmunealgorithmfordis

    cretizationofdecisionsystemsinroughsettheory. DESCRIPTIONOFDESCRETIZAT1ONBASED

    0NROUGHSETTHEORY

    Let5<ubeadecisionsystem,

    while1,2.,x)isobjectsset,A={al,02.,

    }isconditionattributesset,disdecisionattribute. ForanyaeA,thereisinformationmappingU, whereisthevaluedomain.Weassume:[,

    .WealsoassumethatSisaconsistentdecision systeminthispaper.

Jiaeta1.,JzhinngUnivSCIENCEA20067(4):602606

    DefinitionIAnypaira,c),whereaEAandcER, definesapartitionofintoleft-hand-sideand right-hand-sideinterva1.Andthepair(a,c)iscalleda cuton.

    Letusfixanattribute口?.Anysetofcuts

    =

    {,Cl.),(a,c2),.(,ca)),(1)

    wherekaeN,andla=CO.<el.<C2<<c<c+1,

    definesapartitiononintosubintervals,i.e., [Co.,Clu[c1,c2.uu[ca,c+l.(2)

    So,anysetofcutsonconditionattributes lIDtransformstheoriginaldecisionsystemS,_," aeA

    intodiscretedecisionsystem=<Aua,,>,

where)f.)?[ci.,l,xEf?{0,1,,

    ).Afterdiscretization,theoriginaldecisionsystem isreplacedwiththenewone.Anddifferentsetsof cutswillconstructdifferentnewdecisionsystems. Itisobviousthatdiscretizationprocessisasso

    ciatedwithlossofinformation.Usually,thetaskof discretizationistodetermineaminimalsetofcuts fromagivendecisionsystemandkeepingthedis

    cernibilitybetweenobjects.Andtherationalityofthe selectedcutscanbeevaluatedbythefollowingcrite

    ria(Nguyen,1997;1998):

    (a)Consistency.Foranyobjects",vEsatis- fying:?if,varediscernedbyA)then,varedis

    cernedbyD1.

    (b)Minimum.ThereisnoDc,satisfyingthe consistency.

    (c)Optimality.ForanyDsatisfyingconsistency, itfollowsthatcard(D)<card(D),thenDisoptimal cut.

    Theorem1OptimalDiscretizationProblemis NP-complete(Nguyen,1997).

    IMMUNEALGORTIHMFORDISCRETIZAION

    OFDECISIONSYSTEMSINROUGHSET

    THEORY

    Immunealgorithm(IA)isakindofeffective searchingandoptimizingtechniqueandhasbeen appliedtovariousfields.Theimmunesystemisa 603

    basicdefensesystemagainstbacteria,virusesand otherdisease.causingorganismsandhasdramaticand

    complexmechanismsthatrecombinegenestocope withtheinvadingantigens,producingantibodies againstantigens.Byusingthismechanism,IAper

    formswellasanoptimizationalgorithm.Wedesign animmunealgorithmforoptimaldiscretization problemmentionedabove.

    Frameofalgorithm:

    determinecandidatecuts

    constructthenewdecisiontable=<己广,R,,

    f>fromtheoriginaldecisiontableS--<U, Aud,

    initializetheantigenandantibodiespop(t); t=-1;

    while(notterminate)do

    {

    workouttheaffinitiesbetweenantigenand antibodiesinpop(t);

    workouttheamnitiesbetweenantibodiesand thedensityvaluesinpop(t);

    changememorycellsmere(t);

    crossoverpop(t);

    mutatepop(t);

    f=f+1:

    )

    Theantigenandantibodyofanyimmunesystem correspondtoobjectivefunctionandoptimalsolution ofimmunealgorithmrespectively.Memorycells mem(t)constituteapartofthepopulationpop(t). Determinationofcandidatecuts

Let<Awd,beadecisionsystem.An

    arbitraryconditionattribute口?.definesasequence

    V<a<

    <,where{V,V2a,...,}={):??,

    thenthesetofallpossiblecutsonaisdefinedby: fr={l,lV+

    2

    +

    2

    ]}.(3

    Thesetofpossiblecutsonallattributesisde- notedby:

    c=U4EA

    ,,????????,

    \,???一,,

    \,???一,,

Jiaeta1.IJZhejiangUnivSCIENCEA200674).602606

    Definition2LetusassumethattheobjectsinUare sortedinascendingorderoverthevaluesofa,(a,c)is acutofa,(a,c)iscalledaboundcutifandonlyifthe followingaresatisfied:

    (1)Sx?d(xi)~d(xj),satisfying)<c<

    );

    (2)ThereisnoxEUbetweenxiand,satisfying f)<)<).

    Theorem2LetSbeadecisionsystem,assuming thatthesetofboundcutsofcutsCAdefinedbyEq.(2) isB,Bcandiscernalltheobjectsfromdifferent decisionclasses.

    BasedonTheorem2,wegetthecandidatecuts CAconstitutingthesetofalltheboundcutsinCA. TheTheorem2detailscanbefoundin(DaiandLi, 2002).

    ConstructthenewdecisiontablefromS

    Let<Awd,beadecisionsystem,and

    constructthenewdecisionsystem=<(/.,BCA,, }>bythefollowing:

    =

    {f,)f)?撕)}

    ={la~A},Pdistherthboundcut(,Cra)

    onattributea

    Forany,ifCraE{min[a(xi),)],max[a(xi),

    )]},thenf(,f,)1,

    elsef[P/,(xi,=o

    Todescribetheprocedureaboveclearly,we discussanexampledecisiontable(Table1)inwhich theconditionattributesetC={al,a2}andthedecision attributeisItiseasytofindthatthetwocondition attributesmustbediscretized.

    Table1Adecisionsystem

    Basedonthediscussionabove,wecangetthe possiblecutsandthecandidatecutsasfollows: CA={(al,0.9),(al,1.15),(al,1.35),(al,1.45),(al, 1.55),(02,0.75),(a2,1.5),(02,2.5)}, BCA={(1,0.9),(al,1.15),(al,1.35),(al,1.45), (2,0.75),(02,1.5),(02,2.5)}.

    ByusingthecandidatecutsBCA,wegetthenew decisiontable(seeTable2)fromtheoriginaldecision system(Table1).

Table2ThenewdecisiontableforTable1

    Representationofantibodies

    Tocalculatetheminimalcutsistofindthe minimalsubsetmaintainingthediscretizationability ofthewholeNcandidatecuts.Itiseasytorepresent anantibodyasabinarystringoflengthNwhereNis thenumberofcandidatecuts,i.e.N=card(B).1 meansthatthecorrespondingattributeispresentand 0meansitisnot.

    TakingTable2asanexample,wehave7can

    didatecuts{,.,.,,,,}i.e.{(al,

    0.9),(a2,1.15),(l,1.35),(1,1.45),(a2,0.75),(a2,

    1.5),(02,2.5)}.Antibody0111010representsthe determinedcutssetasfollows:

    /)=--{(al,1.15),(al,1.35),(al,1.45),(02,1.5)} Computingofaffinitiesanddensities

    LetAbk---<Abkl,Abk2,,Abe>beanantibody,

    whereAbkt=Oor1,l=12._IN.Thentheaffinitybe

    tween6JandAbjisdefinedas:

    (1I]/Ni.(5)4()=I?lI.(5)H/

    Accordingtothedefinitionofoptimaldiscreti

    zation,weknowthatthefitnessfunctiondependson thenumberofcuts(whichwewishtokeepaslowas

Jiaeta1./JZhejiangUnivSCIENCEA20067f4),'602606

    possible)andtheconsistency(whichwewishtokeep ashighaspossible).So,wecandefinetheaffinity betweenantigenAgandanantibodyAbkas: (Abe)(?_]/N,(6)=tIJ7v一?I+?,(6)

    whereisthenthbitofantibodyAbk,Disthe

    objectspairssetdiscernedbyantibodyAbk,card(.) isthenumberoftheobjectspairsinthenewdecision systemS,andpareweightfactors.Basedonthe definitionoftheaffinitybetweenantigenandanti- bodies,wecandefinethedensityDenbk)ofan antibodyAbkas

    DP():card(

    {AbtlAff(

    Ab~,A

    

    bt

    

    )>-P})

    ,(7)

    antlOoales

    i,e.,theproportionoftheantibodiesAblsatisfying Afj(Abk,Abt)>-p,l=l,2,...,#antibodies,0.8l,tothe wholeantibodypopulation.

    Crossoverandmutation

    Weuseclassical,onepointcrossoverprocess

    thataffectsantibodyselectedtoreproducewith probabilityQtP

    Weusestrategyself-adaptivemutationratein mutationprocess.Bythisstrategy,weuseahigher probabilityofmutatingfrom"1"to"0"thanthatin theoppositedirection,sinceourgoalisminimaldis

    cretization.

    Letcbeanantibody.Wedefineones(c)isthe numberof"l"inc.Andwedefinethemutationrate ofanantibodyc,as:

i=1

    wherempj(/=1,2.,popsize)isthemutationprob

    ability.

    EXPERIMENTALSTUDY

    Nguyen(1997)proposedthenameddiscretiza

    tionapproachbasedonroughsetmethodsandBoo- leanreasoning.Themainideaistofindpossibly minimumnumberofdiscreteintervals,andatthe 605

    sametimenotweakentheindiscemibilityability.In thissection,wewillmakecomparativeexperiment betweenouralgorithmandNguyen(1997)'smethod. FromTable1'sdecisionsystem,wegetthefol

    lowingresultcutsbyNguyen'smethod:

    D=((1,1.15),(al,1.35),(al,1.45),(02,1.5)) Theresulting4cutsandthediscretizedresultare showninTable3.

    Table3ThediscretizedresultofTable1byNguyen's method

    WealsodiscretizedTable1'sdecisionsystemby ourmethod.Thenumbercandidatecutsdetermined thestringlengthas7.Thepopulationsize,thesizeof memorycells,theparameterandthecrossingover

    ratePcwere20,5,0.9and0.7respectively.The weightfactorsweresetas1and2.5respectively.The algorithmstopswhentherewasnovariationofthe averagefitnessincertainnumberofgenerations.In theexperiment.theantibodiesalltendedtobeiden

    ticalr0101010)andthecorrespondingcutssetwas: D=-{(al,1.15),(al,1.45),(02,1.5)).

Report this document

For any questions or suggestions please email
cust-service@docsford.com