DOC

Attribute

By Gilbert Wagner,2014-06-20 05:36
11 views 0
Attribute

    Attribute

    422HIGHTECHNOLOGYLE/TERSfVo1.13No.4fDec.2OO7

    Attributereductionbasedonbackgroundknowledgeanditsapplication

    inclassificationofastronomicalspectradata?

    ZhangJffu(张继福)?一,LiYinhua,ZhangSulan

    (SchoolofComputerScienceandTechnology,TaiyuanUniversityofScienceandTechnology,Taiyuan030024,P.R.China)

    (NationallaboratoryofPatternRecognition,InstituteofAutomation,ChineseAcademyofSciences,Beijing100080,P.R.China)

    Abstract

    Toimprovetheefficiencyoftheattributereduction.wepresentallattributereductionalgorithm

    basedonbackgroundknowledgeandinformationentropybymakinguseofbackgroundknowledgefromre.

    searchfields.Undertheconditionofknownbackgroundknowledge,thealgorithmcannotonlygreatly

    improvethee//iciencyofattributereduction,butalsoavoidthedefectionofinformationentropypartialto

    attributewithmuchvalue.Theexperimentalresultverifiesthatthealgorithmiseffective.Intheend.the

    algorithmproducesbetterresultswhenappliedintheclassificationofthestarspectradata. Keywords:roughsettheory,backgroundknowledge,informationentropy,attributereduction,as-

    tronomicalspectradata

    RoughsettheoryproposedbyPawlakin1982isa

    veryusefulmathematicaltooltodealwithimprecise.un.

    certainandincompletedataL,.Attributereductionand

    decisionrulesextractionareveryimportanttasksinthe roughsettheory.Atpresent,thealgorithmofthistask canroughlybedividedintotwokinds:oneisthealgo. rithmbasedondiscernibilitymatrixanddiscernibility functionL),6j:theotheristheheuristicalgorithmbased ontheinforrnationentropy.Fortheformeralgorithm.the basicwayistoeducediscernibilityfunctionbydiscerni. bilitymatrix,andthentocomputethecorrespondingdis- coniuctionnormalform;everyiteminthenormalformso obtainedisonereductionofthesystem.Coreandallre. ductionscanbecalculated.Becauselargeamountsofrep. etitiouselementappearinthediscernibilitymatrix,the efficiencyofthealgorithmisreduced.Existingresults provethatgettingtheminimalreductionofinformation systemisanNP.hardproblemandthetraditionalmethod ofdiscemibilitymatrixLisdifficulttoadapttothereduc. tioncomputationinthelargedataset.So.manyre- searchersdevotedthemselvestolookingforheuristicalgo. rithmofreduction.Fortheheuristicalgorithm.thebasic wayistocomputecorefirst,andthen,accordingtothe importancedegreeofotherattributes,toaddattributesto coresequentiallyordeletethoseattributesthathavenoin. fluenceonclassificationssequentiallybythedegreeofde. pendenceofdecisionattributesontheconditionalat. tributes.untilthecategorizationabilitiesofreductionre. suitsarethe$alneastheabilitiesoforiginalsystem(or decisiontable).Butnotallofthereductionscanbe foundbyusingtheheuristicalgorithm.

    Becausetheattributesandinstancesintheinforma. tionsystemwithalargedatasetarenumerous.thereisa

    greatneedfordesigningakindofattributereductionalgo- rithmtoireprovetheefficiencyofattributereduction.For thepracticalapplication,domainexpertsoruserswhohave accttmulatedtheexperiencethroughthelong-termwork, oftenhaveadeeperunderstandingofdataintheinforma. tionsystembecausetheyknowwhichattributeisimportant f0rclassificationandwhichattributeisuselessatal1.If theexperienceknowledgeisappliedtoattributereduction, theefficiencyofattributereductionwi11begreatlyim. proved.Atthesanletime,thefaultsthatattributewitha lotofvalues(evenifitisimportantbyconditionentropy, buthasnoinfluenceonclassification)wi11beselectedinto thereductionresultspreferentiallycanbeavoidedL. Nowalargetelescopecalledlargeskyareamulti--ob-- jectfiberspectroscopictelescope(LAMOST)isagreat nationalscienceprojectofChinaunderconstructioninthe NationalObservatoryinBeijing.AfteritsscheduledCOB pletionintheendof2006.itisexpectedtocollectmore than4OOOOspectrumsinasingleobservationnight, namelytotalamountofspectrawiUbeupto4TGL.Such voluminousdatademandautomaticspectraprocessingand datamining.Therefore,thereisgreatimportantapplica

    tionvalueforstudyingautomaticclassificationtechnology ofcelestialspeet13.1_qlSbasedonroughsettheory. Inthispaper,bymakinguseofbackgroundknowl

    edgeinproblemfields,wepresentanattributereduction algorithmbasedonbackgroundknowledgeandinformation ?

    SupportedbytheNationalNaturalScienceFotmdationofChina(No.60573075),tlleNationa

    lHi_

    T~hnologyResearchandDevelopmentProgramof China(No.2003~t33060)andtheNaturalScienceFoundationofShanxiProvince(?.2006

    Ol1O4,.

    ?Towhomomesceshouldbeaddressed.Email:iifuzh@sina.tom.ell ReceivedonAug.15,2OO6

    HIGHTECHNOLOGYIEnRSIVo1.13No.4IDec.2007423 entropy.Forsomeunimportantattributesinallcondition attributes,wedonotjudgetheirimportancebyinforma' tionentropy.Accordingly.theefficiencyofattributere

    ductionalgorithmcanbegreatlyimproved,andthedeft

    ciencyofinformationentropypartialtoattributewithmuch valuecanalsobeavoided.Thealgorithmiseffectivewhen appliedintheclassificationofthestarspectradata. 1Informationentropy-basedattributereduc- tion

    Theroughsettheoryregardsknowledgeastheparti

    tionofuniverseandcarriesonrigorousanalysisandtreat

    menttoknowledge.MiaoandWang,intermsofinforma

    tiontheory,havesetuptherelationshipbetweenknowl

    edgeandinformationentropy,andputforwardthecon

    ceptsofknowledgeentropyandconditionentropy. Definition1:LetX,YbethepartitionofP,Q

    withrespecttoU(X={X1,X2,,Xn},Y={Y1,

    y2,,ym}),theprobabilitydistributionofP,Qin subsetofUis:

    p]-p]

    )p()===P)

    whereP(Xi)=IXI/IUI,i=1,2,,n;P()=

jj/jUj,J:1,2,,m;IUjisthecardinalBum

    berofU.Aftertheprobabilitydistributionofknowledge isdefined.theinformationentropyandconditionentropy ofknowledgecanbedefinedaccordingtotheinformation theory.

    Definition2:InformationentropyH(P)ofPis: (P):一?p(Xi)log(p())

    Definition3:Conditi.nentmpy().f

    Q(:{,1,Y2,,Ym})relatiVet.theknowl-

    edgeP(:{X,X2,,Xn})is:

    (Q/P)=一?p()?p(v/x)log(p(/))

    whereP(/)=fnXi/Ii,i=1,2,,n;

    :1,2,,m.

    Definition4:Theimportancedegreeofattribute

    is:

    SGF(a,R,D):H(D/R)H(D/RU{a})

    whereH(D/R)istheconditionentropyofDrelativeto .

    2Attributereduction-orientedbackground

    knowledge

    Dataminingisacourseofextractingusefulknowl

    edgefromlargeamountsofdatasets,whichalegenerally organizedandstoredbyusingrelationshiptable.Sodata mining-orientedbackgroundknowledgeisnaturallyapri

    oriinformationthatdescribestherestrictionrelationship betweenobjectsandattributes,andalsotherestrictionre' lationshipamongattributesinthedatasets. Predicatelogicisakindofformallanguagesystem thatstudiesthelawofreasoningwiththelogicmethod.It isusefulforexpressingfactualknowledge,suchasstate,

    attribute,andconceptofthethings.Itcanalsobeused toshowtheexactcausalitybetweenthethings,i.e.the rule.Becauseofitsnaturalness,accuracyandeaseofre

    alization,thepredicatelogicisakindofwidelyused knowledgeexpressingtechnology,.

    Itisfeasibletoadopt

    predicatelogictoexpressthebackgroundknowledgein datamining.Similarly,itisfitforexpressingtheback

    groundknowledgeintheattributereductionwithpredicate logic.

    Inthedecisiontablewithpracticalbackground knowledge,letCbeconditionattributesets,accordingto theexpert'sexperience,wegivetheattributesubsetA= {a,a2,,a}(AC),whichhasinf1ueneeonclas

    sification.TheimportancedegreeofattributesubsetArel

    ativetoclassificationfromhightolowcanbeexpressedas thefollowingattributegroups:{al,,al,2,,n,n1},

    {a2,1'a2,2,,a2,n2},,{a+,ak,2,,a,},where

    theimportancedegreeofattributestoclassificationinat

    tributegroup{n.

    1,n.

    2,,n.

    {(1<i<)isde

    cidedbycomputingtheinformationentropy. Definition5:Letbeanarbitraryobjectinthede

    cisiontable,onthebasisoftheexpert'sexperience,the remainingconditionattributesubsetA={al,a2,,a}

    isdenotedwithpredicateSelect(,(a1,a2,,a))af-

    tereliminatingtheunimportantattributesfromallthecon

    ditionattributesC.

    Definition6:Onthebasisoftheexpert'sexperi. enee,theimportancedegreeofattributesubsetA:{a,

    n2,,a5}relativetoclassificationfromhightolowis dividedintoseveralgroups{al,,al,2,,al.n1},

    {a2,1,a2,2,,a2,2},,{a,,ak,2,,a.

    }and

    isdenotedwithpredicatePr({al,1,al,2,,a1,1},

    {a2,1,a2,2,,a2,2},{ak,1,ak,2,,ak,

    }.

    Definition7:Attributereduction.orientedback. groundknowledge(aprioriinformation)isdenotedwith thefollowingpredicateformula:

    V(Select(,(a1'a2,,a))P,初疗({a1_1'

    al,

    2,,al,

    n1},{a2,1,a2,

    2,,a2,

    n2},,{ak,1,a,2,

    

    ,a.

    }))

    whereistheobjectinthedecisiontable,a1,a2,,

    aletheconditionattributesetgivenbyexperts,n{ai'1' ,

    2,,,

    ,{}=,U{,1,,2,,,,{}={a1,

    424HIGHTECHNOLOGYIEI1RSIv0J.13No.4lDec.2O0r7 2,,a}(1?J?k).

    3Backgroundknowledgeandinformation

entropy-basedattributereduction

    3.1eideaandmethodofattributereduction

    Fromtheabovedescription,wecanknowthatif

    backgroundknowledgeisappliedtoattributereduction, theefficiencyofattributereductioncanbeimproved.,Ihe basicideaandmethodaredescribedasfollows:letCbe conditionattributesetinthedecisiontablewithpractical background.Firstlyaccordingtotheexpert'sexperience, weselectalltheconditionattributesetsA={al,a2,

    ,a}thatinfluencetheclassificationfromC(notice: theconditionthattheremainingattributesC?Ahaveno

    influenceontheclassificationmustbesatisfied),thendi. videthemintoseveralgroups{al,1,al,2,,al,n1},

    ll{)ia2

    ,

    1,a2,

    2,,a2,

    n2},1ak,1,ak,2,,ak,nk}ac'

    cordingtotheimportancedegreeoftheattributesubsetA toclassificationfromhightolowandregardthesegroups asattributereductionorientedbackgroundknowledge. Secondly,wecomputethecoreandaddattributesinto corefromthefirstgroupsequentially(theimportanceof theattributesinthesamegroupisjudgedbythecondition informationentropy).Wecarryonthiscourseuntilthe conditionattributesubsethasthesalnediscernibilitycapa- n

    bilityastheoriginaldecisiontable,i.e.(_-_/J)=

    (/J)?Havinggottheattributesubsetthatsatisfies theconditionattributesubset.weneednottocomputethe remainingconditionattributesbyuseoftheconditionin. formationentropy.Thusthecalc~ationamountofinfor. mationentropyisreduced.andtheefficiencyofattribute reductionisimproved.

    Theorem1:WhentheremainingattributesinC

    havenoinfluenceontheclassification,theabovereduc- tionprocesscancertainlyfindareductionofconditionat. tributeset.

    Proof:SincetheattributesinCAhavenoinflu.

    enceon.classification,theabovereductionprocessgives thecorrespondingequivalentdecisiontableandthecoreis correct.Ifthecoredoesnotchangeandtheconditionat. tributesetthatinfluencesclassificationdoesnotchange either,thentheshingc.nditi0n()='D)

    doesnotchange.Thus,areductionofconditionattribute setcancertainlybefoundthroughtheabovereduction process?

    Theorem2:Whenthebackgroundknowledgeis

    empty,thetimeefficiencyoftheaboveprocessofreduc- tionisthesalneasthealgorithmofinformationentropy- basedattributereduction.

    Proof:W1lenthebackgroundknowledgeisempty. theconditionattributesetisstilltheoriginalone.itneed notbegrouped,andtheaboveprocessdegradestoinfor- mationentropy.basedalgorithmofattributereduction. Thuswhenthebackgroundknowledgeisempty,timeeffi- ciencyoftheaboveprocessofreductionisthesameas thatofthealgorithmofinformationentropy.basedattribute

reducti0n.

    Tlle0refn3:Whenthebackgroundknowledgeis

    wrong(wronggroupingonly),theaboveprocesscancer- tainlyfindareductionofconditionattributeset,also,the timeefficiencyintheworstcaseisthesarneasthatofthe algorithmofinformationentropybasedattributereduction.

    Proof:Ifgroupingiswrong,thencorecanbecom- putedfromequivalentdecisiontableoftheoriginaldeci- siontableS=(U,AUD,V,f).Coredoesnot

    change.Conditionattributesetwhichinfluencesclassifi

    n

    cationdoesnotchangeandfinishingcondition()=

    ()doesnotchangeeither.Thusareduction.f

    conditionattributesetscancertainlybefound.Further

    more,weneedtocomputeconditioninformationentropy ofallattributeswhenthissituationappears(infactthe attributesthathavegreatinfluenceonclassificationarein thelastgroup).Sothetimeefficiencyintheworstcaseis thesalneasthatofthealgorithmofinformationentropy- basedattributereduction.

    3.2Attributereductionalgorithm

    Accordingtotheabovemethodandinformationen

    tropy-basedattributereductionalgorithm,anattributere' ductionalgorithmbasedonthebackgroundknowledgeand informationentropyisdescribedasfOllOWS"

    Input:S=(U,CUD,V,f),backgroundknowl-

    edge,attributesetA(attributeinfluencingtheclassifica- tiongivenbytheexpertthroughexperience)and{a1.1' a1.

Report this document

For any questions or suggestions please email
cust-service@docsford.com