DOC

Application Research of Robust LS-SVM Regression Model in Forecasting Patent Application Counts

By Danny Hawkins,2014-01-26 10:18
9 views 0
Application Research of Robust LS-SVM Regression Model in Forecasting Patent Application Counts

    Application Research of Robust LS-SVM Regression Model in Forecasting Patent

    Application Counts

    JournalofBeijingInstituteofTechnology,2009,Vo1.18,No?4

    ApplicationResearchofRobustLS?SVMRegression

    ModelinForecastingPatentApplicationCounts

    ZHANGLiwei(张丽玮),ZHANGQian(张茜),WANGXuefeng(汪雪锋),

    ZHUDonghua(朱东华)

    (1.SchoolofManagementandEconomics,BeijingInstituteofTeehnology,Beijing100081,China;2?InformationCollege

    CapitalUniversityofEconomicsandBusiness.Beijing100070,China;3.DepartmentofComputerScience,

    KyungwonUniversity,SeongnamSi461701,Korea)

    Abstract:Aforecastingsystemofpatentapplicationcountsisstudiedinthispaper.Theoptimizationmodel

    prop0sedintheresearchisbasedonsupportvectormachines(SVM),inwhichcross

    validationalgorithmis

    usedforpreferencesselection.Resultsofdatasimulationshowthattheproposedmethodhashigherforecasting

    precisionpowerandstrongergeneralizationabilitythanBPneuralnetworkandRBFneuralnetwork-Inaddi

    tion,itisfeasibleandeffectiveinforecastingpatentapplicationcounts. Keywords:supportvectormachine;cross

    validationalgorithm;patentapplicationcount;forecasting

    CLCnumber:G303Documentcode:AArticleID:10040579(2009)04049705

    Patentsareregardedbymanyenterprisesasthe

    coreelementandmostvaluableassetintechnological arena.Quantitiesandqualitiesofpatentapplication reflectthedepthofR&Dendeavors.1eveloftechnol

    ogyinnovationandtechniquestrengthinacountry, regionorcompany.Thereforeit'ssignificanttouse supportvectormachine(SVM),oneofthenewest mathematicaltools,toforecastpatentapplication countsthroughtheanalysisonfactorsinfluencing patentapplication,soastosupplyimportantfeed

    backstogoverningbodiesformakingstrategicpatent policies.TheSVMtheorywasraisedbyVapnikin the1990s,andhasbeenwidelyusedeversincein manyfieldssuchasfacerecognition,imageprocessing anddatamining.Thepaperisbasedonstatisticalda

    tagatheredfromlarge-.andmedium?-sizedenterprises andhitechfirmsinChinafrom1997to2006,in

    whichtheSVMtheoryisapplied,indicessuchasfi

    nancia1expenditure,manpowerinput,etc.inpatent application,areusedasinput;andcountofapplica

    tionasoutput.

    ChoiceofPatentIndicesandInflu-

    enceFactors

    Wehavechosencountofpatentapplicationsin

    steadofthatofpatentauthorizationasoneofpatent indices.Ourreasonsare:Firstly,thereisstrongcor

    relationbetweenthesetwonumbers,andtheinfor

    mationcontainedintheformerdoescoverthoseinthe lattertoagreatextent.Secondly,timelagoftheau

    thorizationcountislongercomparatively.Thirdly, usingcountofpatentauthorizationasanindextends

    tocauseinformationdistortion.Thegapbetweenthe twoiscausedbyfactorssuchasrelevanttechnologies inpatentapplicationsarenotsatisfactoryenough, patentagenciescannotfunctionideally,etc[. Thereexistmanyfactorswhichaffecttechnolog

    icdevelopmentofpatents.Whileinconsiderationof integralityofpatentdata,fourtypesofindicesare used,i.e.,financialexpenditure,manpowerinput, patentpropensityanddegreeofeconomicrichness. Financialexpenditureonpatentisconsistedofspend

    Received2009.0329

    Sponsoredby"985"PhilosophyandSocialScienceInnovationBaseoftheMinistryofEducat

    ionofChina(107008200400024)

    BiographiesZHANGLi.wei(1981

    ),doctoralstudent,zhangliwei19810@126tom;ZHUDonghua(1963),professor,doctoraladviser

    497?--——

    JournalofBeijingInstituteofTechnology,2009,Vo1.18,No.4 ingonrelevantscitechactivities,expensesonR&D activities,ratioofexpenditureonsciteehactivitiesto

    salesincome.andratioofexpenditureonR&Dactiv

    itiestosalesincome;manpowerinputintopatentsin

    dudesnumberofpeopleengagedinseitechactivities,

    headcountofpersonsinvolvedinR&Dactivities,and proportionofseitechpersonnelinoverallemployee count[2-3]:degreeofeconomicrichnessisrelatedto GDPpercapitaandGDP'sgrowthrate[]:and

    patentingpropensityisreflectedbytheproportionof GDPofthesecondarysectorofeconomyintheentire industry[].DetailsareshowninTab.1.

    Tab.1Listofindiceshavinginfluenceon patentapplicationcount

    upper-grade

    indicators

    lower-grade

    indicators

    expenditure

    spentonsci-techactivities

    expenditureusedinR&Dactivitiesvs.incomes fromsales

    expendituredefrayedinseiteehactivitiesvs.income fromsales

    investedonR&Dactivities

    阳锄嘴Irati..fGDPfmmndarysect0rofec0Il0myt0totaI propensttyl

    manpower

    input

    degreeof

    eConOmlC

    richness

    numberofpeopleengagedinsei-teehactivities numberofpeopleengagedinR&Dactivities personsengagedinsci-techactivitiesasapercentage

    oftotalemployeecount

    GDPpercapita

    annualgrowthrateofGDP

    2MathematicalModelUsingSupport

    VectorRegression

    SVMhavebeenintroducedtosolveproblemsin

patternrecognitionandnonlinearfunctionestima

    tion[.

    Inapplyingregression.thegoalistochoosea hyperplanewithsmallnol"iiisandsimultaneouslymin

    imizethedistancefromthedatapointstothehyper

    plane.Thisissolvedthroughquadraticprogramming (QP).Fornonlinearregressionissues,akernelfunc

    tionisusedtomapthedatasetintoahighdimension, inwhichthedatabecomelinearregressed.Inthe

    predictionofpatentapplicationcounts,ourobjective istofindafunctionf(x)thatgivesadeviations fromtheactuallyobtainedtargetYforalltraining dataandisatthesametimeasflataspossible. Theleastsquares(LS)versionofSVMforre

    gressionhasalreadybeenproposed.InthisLS-

    SVMversion,onecouldgetneededresultsbysolving alinearsysteminsteadofdealingwiththequadratic programming.Thisisduetousingequalitycon

    straintsotherthaninequalityones.Suchlinearsys- temseonforlTltoKarushKuhnTucker(KKT)condi.

    tions,andtheirnumericalstabilityhasbeeninvesti

    gated[.

    GivenatrainingsetcontainingNsamples(, Yf)(i=1,2,,;x?R",Y?R),thesupport

    vectormethodaimingatconstructingaregression modelhastheform:

    y(x)=cc,T(z)+b,??R,bER,(1)

    where()isthekernelfunction,c,istheweight

    vector,andbistheconstantofrealnumber.The LS-SVMregressionhyperplaneisobtainedbysolving

    thefollowingoptimizationissue: minc()=号硼T+詈砉(2)

    s.t.Y=wj(zf)b+zf.

    FromLagrangefunctionwecanget L(,b,z,n):

    c(w,z)一?n["wTj(Xf)+b+zf],i=1

    (3)

    withbeingtheLagrangemultiplierwhichcanbe

    eitherpositiveornegativedependingontheequality

    constraintsaspertheKKTconditions. Theconditionsforoptimalityareasfollows:

    498?--——

    =

    .一硼=

    =-

    aL

    :

    0ab

    aL

    a

    aL

    aal

    =0a=Cz,

    =0wTJ(z)+b+zY=0

    Eq.(1)canberewrittenas

    (4)

    0

    =

    n

    =

?

    ZHANGLiwei(张丽玮)eta1./ApplicationResearchofRobustLS

    SVMRegressionModel

    KI/C,+JLa.JLj

    whereI=(1,1,,1),Y=(yl,y2,,Y),

    K=(z)(xj),i,J=1,2,,anda

    (o/l,a2,,).Thekernelfunctionisanysym

    metricalfunctionsatisfyingMercerconditions[. Asistheusualcase,wecanfindmorethanone suitablekernelfunctionstomapthedatasetintoafea

    turespace.Butwecouldnotsaythatonecertainker

    neloutperformstheothers.Therefore,aparticular situationmayneedaspecifickernelfunction.Some validationtechniquessuchasbootstrappingandcross

    validationcanbeusedtodeterminethem.Inpredict

    ingpatentapplicationcounts,RBFkernelfunction, defindaSK()=exp(一嚣)n

    resultingoodperformances

    3UseoftheModelinPatentAppli-

    cationCountForecasting

    3.1ExperimentsBasedonSupportVectorRegres- sionMOdel

    Theforecastingprocedureofpatentapplication countgivenbelowisverycomplicatedduetomulti

    plexnatureofrelevantfactors.Thetenimportant factorsfromfourmajorcategorieslistedinTab.1are theresultsofthoroughtheoreticalanalysisandtrial

    and-errorchoices.Tab.2isasummaryofstatistical resultsofnumerouslarge/mediumsizedenterprises

andhitechfirmsinChinaduringl9972006

    groupedbythesefactorsandcategories. Tab.2Statisticaldatafromlargeandmedium-sizedenterprisesandhi-techfirms

    inChinaduring1997--2006,groupedbyindices AbbreviationsinTab.2:EST,financialexpen

    ditureonscitechactivities;ERD.financialexpendi

    tureonR&Dactivities:RSTE/Sratioofexpendi

    tureonscitechactivitiestosalesincome;RRDE/S,

    ratioofexpenditureonR&Dactivitiestosalesin

    come;NST,numberofpeopleengagedinscitech

    activities;NRD.numberofpeopleengagedinR&D activities;RST/E,ratioofpeopleengagedinsci

    techactivitiestoemployees;.

    GPC,GDPpercapita;

    GRG,annualgrowthrateofGDP;GRS/T,GDP

    ---——

    499

    ratioofsecondarysectorofeconomytotota1. DatashowninTab.2areusedinourforecasting mode1.Intheexperimentation,westartfromthe followingdynamicalsystem:

    y=f(xl,2,,zn),

    whereyisthepatentapplicationcountattimei, andinourcase,equals10,meaningthetenmain factors.Inoursystem,wefocusonapplication countsbytheyear.AMatlabimplementationofthe SVRalgorithmisused.Wegivespecialattentionto JournalofBeijingInstituteofTechnology,2009,Vo1.18,No.4

    thefollowingissuesinourexperiments.Firstisthe determinationofseveralparametersbeforerunning

theapplication.Wechoosee,Candthespecificker

    nelparameter.Forthefirsttwo,wesettheirexperi

    mentalvaluesto=0andC=100.

    Inordertochoosetheproperparameterforker

    nelfunction,cross-validationcanbeusedsimplyto estimatethegeneralizationerrorofagivenmodel,or itcanbeusedformodelselectionbychoosingone modelhavingthesmallestestimatedgeneralizationer

    ror.Here.wefocusonleave.one.outcrossvalidation

    (LOOCV)amongmanycroSSvalidationmethods.In

    LOOCV,samplesaregroupedintosubsets,only oneofthemisforvalidation,andtherestareallused fortraining.Weset=0.001asaninitialvalue.In

    ordertooptimizetheparametersinthemodel,grid

    searchingapproachisapplied.Afteroptimization, turnsouttobe0.0002.Testresultsbeforeoptimiza

    tionareshowninFig.1;whilethoseafteroptimiza

    tionaredisplayedinFig.2.Theabsoluteerror Fig.10andpredictedvaluesusingSVRbeforeoptimization Fig.2OriginalandpredictedValUeSusingSVRafteroptimization

    betweenoriginalvalueandpredictedresultaregiven inTab.3.

    Tab.3Errorsbetweenoriginaldataandpredicted resultsafteroptimization

    3.2ComparisonofSVMwithNeuralNetwork

    Bothneuralnetworks(NN)andsupportvector machine(SVM)areusedinnonlinearoptimization.

     However,neura1networksidentifynonlinearrelation

    shipsbyadoptinganarchitecture--turned?-to-user-de?- signedsearchalgorithm,whileSVMachievesdoes

theworkbychangingasingleparameterinthecon

    vexoptimization;SVMoptimizationoffersglobalso

    lution.whileNNmakestheoptimizationlocally.Da

    taanalysisfromasmallsample(1assthan5000) showsthatSVMisbetterthanNNinaccuracy,gen

    eralizationperformanceandrunningtime.However, forhugedatasets,NNgetstheupperhand.Being limitedbyavailablesources,wechoosedatafrom 1997to2005astrainingdata,andthosein2006for testing.Inourmode1basedonSVM,theresultis 67891,withanabsoluteerror1108.Comparedwith theresuItbasedonBPNN(70741)andabsoluteer? rot(1732),themode1basedonSVMisbetterthan thatonBPNN.Thenwetrytotestitbyusingthe modelbasedonRBFNNandgetapredictedvalue 11320,somethingmuchlowerthantruedata 69009.OurcomparisonresultsareshowninTab.4. Hereabs.errormeansabsoluteerror,andre1.error standsforrelativeerrorasapercentageoforiginalda

    ta.Wehaverealizedthatinorderoobtainagood model,NNalgorithmneedsamorecomplexstructure 500

    8u08I1t0_/l8o1_B.一一?

    ZHANGLiwei(张丽

    )eta1./ApplicationResearchofRobustL&SVMRegressionModel

    Tab.4ComparisonamongSVM,BPNNandRBFNN withmultiplelayers.ModelsbasedoneitherSVMor BPNNareallmuchbetterthanthoseonRBFNN. Patentapplicationcountforecastingwon'tturnout voluminouspredictions,onlyonenumberayear.

Report this document

For any questions or suggestions please email
cust-service@docsford.com