DOC

Accuracy

By Joanne Chavez,2014-07-06 05:23
14 views 0
Accuracy

    Accuracy

    ChinaOceanEngineering,Vo1.22,No.3,PP.421429

    ?2008ChinaOceanPress,ISSN08905487

    AccuracyEvaluationofADiagnosticTestbyDetecting

    OutliersandInfluentialObservations

    HsienChuehPeterYAN.TsungHaoCHEN",Cheng-WuCHEN,

    ChenYuanCHENandChunTeHU.

    DepartmentofRiskManagementandInsurance,NationalKnohsiungFirstUniversity ofScienceandTechnology,Kaohsiung811,China

    DepartmentofBusinessAdministration,Shu-TeUniversity,Kaohsinag82445,China .DepartmentofLogisticsManagement,Shu-TeUniversity,Kaohsiung82445,China DepartmentofManagementInformation,Yung-TaInsthuteofTechnologyand Commerce,Pingtung90941,China

    .DoctoralPrograminManagement,NationalKaohsiungFirstUniversityofScience andTechnology,Kaohsiung811,China

    (Received12March2007;receivedrevisedform18Feburary2008;accepted25March2008) Logitregressionanalysisiswide?lyappliedinscientificstudiesandlaboratoryexperiments,whereskewedohserva-

    tionsonadatasetareoftenencountered.Anumberofproblemswiththismethod,forexample,outliersandintluential

    observations,cancauseoverdispelsionwhenamodelisfitted.Inthisstudyasystematicstatisticalapproach,including

    theplottingofseveralindicesisusedtodiagnosethelack-0f_mofafogisticregressionmode1.rheoutliersandinfluential

    observationsOildatafromlaboratoryexperimentsarethendetected.Specificallywetakeaccountoftheinteracfionofall

    intema]solitarywave(isw)witl1anobstacle,i.e.,antmderwaterridge,andalsoanalyzetheeff

ectsoftheridge

    height,thelowerlayerwaterdepth,andthepotentialenergyontheamplitude-basedtransmissionrateoftheISW.As

    concluded,thegoodness-of-fitoftherevisedlogitregressionmodelisbetterthanthatofthemodelwithoutthisapproach.

    Keywords:diagnostictesting;oatliers;influentialobservations;internalsolitarywave 1.Introduction

    Itisgenerallyacceptedthatinternalsolitarywaves(ISWs)canbecharacterizedasthemotionof

    aninterface.Intheocean,ISWsoccurinastratifiedfluidofdifferentdensities.Theamplitudeofall

    ISWintheAndamanSeaorSutuSeacarlexceed50mwhilethatintheSouthChinaSeacanexceed

    110m(OsborneandBurch,1980;Apeleta1.,1985;Hueta1.,1985,1998;HsuandLiu, 2000;Hsueta1.,2003;LiuandHsu,2004;ZengandAlpers,2OO4).Manyrepotsindicatethat seabedtopographyinducesenergydissipationintheISW;energyisreducedbytheinteractionofnon-

    linearISWswiththeseabedtopography(Helfrich,1992;WesselsandHutter,1996;Michalletand

    *TheworkwasfinanciallysupportedbyScienceCouncilofTaiwanProvinceunderGrantNos.NSC962628E366-

    O04MlY2and962628E-1320olMY2.

    1Correspondingauthor.Email:hcyang@ccms.nkfust.edu.tw

422Hsien-ChuehPeterrANGeta1./G~imzOceanEngbwering,22(3),2008,421429

    Ivey,1999;Chenetaf.,2006a,2006b;Chen,2007;Chenetaf.,2007a;Cheneta1.,2008a). Sinceenergydissipationhasallimportanteffectonmanyaspectsofwaterandsedimentmovementin

    coastaloceans,itisnecessarytohaveabeuerfittingandmoreappropriatepredictionmodelforISW

    propagation.Cheneta1.(2007b)developedapreliminarystatisticalmethodwithalinearregression

    toprovideaconceptuallysimplealgorithmforexaminingthefunctionalrelationshipsamongvariables.

    Theresultsarequiteconsistentwithexperimentalresults,andareapplicabletothenaturallyoccurring

    reflectionofintemalsolitarywavesfromslopingbottoms.Recently,Cheneta1.(2008b)concluded

    thatthegoodness

    of-fitandpredictiveabilityofthecumulativelogisticregressionmodelsarebetterthan thatofthebinarylogisticregressionmodels.However,thetransmissionofISWsisdependentonim

    portantphysicalfactors,andreportsonstatisticalmanipulationsrelatedtothisthemecanrarelybe

    found.

    Notonlydoesstronginternalwavemixinghaveanimportantinfluenceonclimaticchangesbutal

    socausessedimentmovementoncontinent~shelvesandslopes(Cacchioneeta1.,2002).Wave

    ridgeinteractionscanbeclassifiedastheweakorthestronginaecolx~aneewiththeamplitudebased

    transmissionrate.Inthisstudyexaminedare-theweightedinfluenceoffactors,includingridgeheight,

    lowerlayerwaterdepthandpotentialenergyusingbinarylogisticregressionmode1.First,anefficient

    statisticalapproach,includingtheplottingot.severalindices,isselectedtodiagnosethelackof-fitofa

    binarylogisticregressionmodel;second,theoutliersandinfluentialobservationsonsomeexperimental

    dataaredetected;third.theseoutliersandinfluentialobservationsareremovedfromthedata,a

ndthe

    newlyrevisedmodelrefitbasedontheremainingobservations;finally,conclusionsaredrawnbycorn

    parisonsofthetwologisticmodels.Theresuhsrelatedtotheadequacyofmodelareanalyzed,andthe

    fittothedataISWpropagationassessed.

    ThepropagationofanISWoverasubmarineridgeandinstratifiedwaterofdifferentdensitiesre

    suhsinenergydispersion.Toclarifytheeflhctofthehydrodynamicinteractionbetweenawaveanda

    ridgeonthemarineenvironment,inthisstudywedevelopalogisticregressionmodeltoanalyzetheef-

    feetsoftheridgeheight,thelowerlayerwaterdepthandthepotentialenergyduringwave

    ridgeinter

    action.

    2.1LogisticRegressionModel

    Logisticregressionisacommonlyusedprocedurefortheanalysisofdatawithbinarytargetvari

    ables.Binarydataaregenerallythemostcommoflformofcategoricaldata.Theresponsevariablefora

    typicallogisticregressionisdichotomous(Walker,2002).Thebinarylogisticregressionmodelallows

    manyobservedfactorstoaffectthedependentvariable,inthiscasetheamplitude

    basedtransmission

    rateoftheISW.Thebinarylogisticregressionmodelforthispopulationcanbewrittenas: E(Y)=o/+1+2++,

    where,istheintercept,

    listheparameterassociatedwithXil,P2istheparameterassociatedwith

Hsien-ChuehPeterYANGeta1./ChinaOceanEngineering,22(3),2008,421429423

    Xi2,andsoon.

    ForabinaryresponseYandtheexplanatoryvariableXi=(1,xi2,,Xik),let(Xi)denote

    theprobabilityof"success"whenXiwhereidenotesthei-thobservation.Forthesimplificationofthe

    notation,(X)=E(IX)iaadoptedtorepresenttheconditionalmeanofVenbyxiwhenthe

    logisticdistributionisused.

    1og))_ln()_ln()

    k.(1) =ol+lXi1+Xi2++flkXi

    Thelogitiselogofanoddargument.Inthismodel,ittakestheformofthelogoddsofthe stronglevelversustheweaklevelfortheincidentrate.

    2.2Goodn~sofHtandDiagnostics

    Severalproblemscancauseoverdispersionduringthefittingofamodel,suchasalargeresidual

    deviancerelativetothenumberofdegreesoffreedomorwhenthemodellacksimportantexplanatory

    variables(Collett,20035.Allison(1999)suggestedthatthecausesoflack

    of-fitcouldbeassociated

    withalackofindependentobse~wations.

    Twoconventionalgoodnessof-fittestsarethePearsonandthelikelihoodratio,whichisal

    soknownasdevianceX2(Allison,1999;Myerseta1.,2002;Lawal,2003).Pregibon(1981)car

    tiedoutthenecessarytheoreticalworktoextendlinearregressiondiagnosticstologisticregression.

    Discussedbelowistheutilizationofresidualstatisticstoassesstl1eoutliersandinfluentialobser

    vationsinlogisticregression.Thefirstdiagnosticstepafterfittingthemodelisusuallytoexamineboth

theChi

    square(X2)anddevianceresidualvalues.Obviously,theresidualsareusefulfordetecting thoseobservationsmostpoorlyfittedbythemode1.

    Whiletherawresidualsdonothavetheunitvariances.thestandardizedresidualsdohaveunit variances.Collett(2003)proposedtousethestandardizeddevianceresidualsforroutinemodelcheck

    ing,andLawal(2003)furtherstatedthatanystandardizedresidualoutside[

    2,2]isunsatisfacto.

    ry.Allison(1999)suggestedthatstandardizedresidualssmallerthan2arenotworthmentioning.and

    onlythoselargerthanorequalto3meritseriousattention.

    2.2.1PearsonandtheLikelihoodRatioTest

    (1)Pearsontest

    rrh.Pe.tetu.th.tatiti?,(2)

    whereOjdenotestheobservedfrequencies,andEistheexpectedfrequencyincellj. ThePearsonresidualhasanadvantageoverthedevianceresidualbecauseithaslargervalues, meaningthatoutlyingcasestendtostandoutmoresharply(Menard,1995).ThePearsonresidualsare

    givenby

    ?

    ?''

424HsienChuehPeterYANGeta1./~hinaOceanEngineering,22(3),2008,421429

    ThestandardizedPearsonresidualscanbewrittenas:

    ^

    

    

    一而'(4)

    where,.!=np,n(1Pi)andhisthei-thdiagonalelementofthenbynmatrixH=W

(XWX)XW.

    (2)Likelihoodratiotest

    Thelikelihoodratio)[testusesthestatistics

    where0jindicatestheobservedfrequencies,andEJ

    ThedevianceresidualdiCallbeexpressedaS:

    2g((5)

    istheexpectedfrequencyincellj

    .g

    ?g(<

    d12Y.g()+2y.g,hi-Yi)】尼iy(6)

    (7)

    2.2.2LeverageStatistics

    Pregibon(1981)derivedalinearapproximationforthefittedvalues,whichyieldsahatmatrixfor

    logisticregression.TheleverageinalogisticregressionCallbesimilarlyderived(Hosmerand

    Lemeshow,2000).Itrangesfrom0(noinfluence)to1(completelydeterminingthemodelparame

    tees).CaSeswithahat

    value(h)oflargerthan(k+1)/Nareconsideredasinfluential(Menard, 1995).Thismatrixisgivenby

    =W/(Xwx)XW,(8)

    whereWisajjdiagonalmatrixwithgeneralelements.

    3.]ResultsofAnalysis

    Forabinarylogisticregressionmodel,thedependentvariablescanbeclaSsifiedintotwogroups,

    weakorstrongaccordingtotheamplitude

    basedincidentrate(cm/cm).Thelevelisconsideredstrong

    whentheincidentrateislargerthan0.5aundweakwiththeincidentratesmallerthan0.5.Thefr

e

    queneiesforthestronglevelandweaklevelgroupsare35and28,respectively. 3.1TestofGlobalNullHypothesis:=0

    ThethreeChi?squarestatisticsfortestingtheglobalnullhypothesisarelistedinTable1:p=0 (1ikelihoodratio,score,andWaldtest).UndertheSanlenullhypothesis,

    eexplanatoryvariables

    willhavecoefficientsofzero.ThereisnoreaSontopreferanyparticulartypeofstatisticsforlargesam

    ples.Incontrast,analyticalresultsindicatethattheChi

    squarelikelihoodratioispreferredforsmall

    ,??????????,?,?????????,

    ^y

Hsien-Chu&PeterYANGeta1./ChinaOceanEngineering,22(3),2008,421429425

    samples(Jennings,1986).Thep-valuesassociatedwiththethreeChi'squareanalysesareallapproxi

    matelyzero,indicatingthatnotallexplanatorycoefficientsarezero.Inotherwords,thesestatisticsin.

    dicatethatthebinaryassumptionisvalid(seeTable1).

    Table1Testingthe0halnullhypothesis:l9=0

    3.2GoodlltNSofFitandDiagnostics

    3.2.1ParameterEstimation

    Table2presentstheanalyticalresultsofmaximumlikelihoodestimates,inwhichalloftheex

    planatoryvariableshavesignificanteffectsontheamplitude

    basedtransmissionrateofISWs.Theridge

    height(1),thelowerlayerwaterdepth(2)andthepotentialenergy(X3)areallsignificantfac

    tots,withPvaluesof0.0005,0.0001and0.0014,respectively.Thefittedlogisticregressionlineis

    logit(/~):+Pl*I+X2+P3X3

:4.21800.331lX1+0.4320x20.2302x3.(9)

    Table2~malysisofthemaximumlikelihoodestimates

    Figure1showsthegraphofthefittedlogisticregressionmodel(Eq.(9)).Thelogistictransfor

    mationofthesuccessprobabilitypisln[],whichisttenaslog).rheh.riz.ntalaxisis labeledas:

    0.3311x1+0.4320x20.2302x3.(10) logit(p)=4.2180

    Thisequationcontainstheestimatesfortransformationvalues.Theverticalaxisindicatesallpmb

    abilitiesofpredictivevalues.Thegraphindicatesallthevaluesofpintherange(0,1)correspond

    ingtothevalueoflogit(p)in(一?,..).

    3.2.2Goodnessof-FitStat~tms

    Twoconventionalgoodnessof-fittestsarethePearson)[andthelikelihoodratio.Pearson)( isdistributedas,with{(r1)(s1)t}degreesoffreedom,wheretisthenumberof

426HsienChuehPeterYANGet./ChinaOceanE,~neering,22(3),2008,421429

    'r

    ........._.

    logit(p)4.21800.3311x1+0.4320x10.2302x3

    Fig.1.]-ogisticregressioncurves

    explanatoryvariables,risthenumberofresponselevels,andSisthenumberofsubpopulations(A1

    lison,1999).

    Thegoodness

    of-fitstatisticsarepresentedinTable3.Theestimatefordeviance,labeledValue/ DF,containsadispersionparameter(value/DF)of0.7268andaPearson)[dispersionparameterof

    1.2157.ThestatisticvaluesofPearsonandthelikelihoodratio(deviance)are36.3380and

60.7842,respectively,with50[(21)×(541)

    3:50Jdegreesoffreedom.Thisvalueshould

    ideallybeverycloseto1.00.TheP-valuesforthelikelihoodratioaridthePearsonarealllarger than0.05(0.9260and0.1412,respecti~ely).ThePearsondispersionparameter(1.2157)is slightlylargerthanthedegreesoffreedom.Fhesefindingsindicatethatthismodelhasanacceptablefit

    withthedata,butthereisstillslightoverdispersion,andthusstillneedsmodification. Table3DevianceandPearsongoodness0f_mstatistics

    3.2.3RegressionDiagnostics

    Collett(2003)proposedtoHSethestandardizeddevianceresidualsforroutinemodelchecking,

    andLawal(2003)furtherstatedthatanystandardizedresidualoutsidef

    2,2]isunsatisfactory.

    Caseswherethestandardizedresidualissmallerthan

    2orlargerthan+2areselectedforexamina.

    tioninthepresentstudy.

    Landwehreta1.(1984)andHosmerandLemeshow(2000)discussedgraphicaltechniquesfor

    logisticregressiondiagnostics.Theindexplotsareusefulfortheidentificationofextremevahles(SAS

    Institute,1999).TheindexplotsofthePearsonresiduals(seeFig.2)andthedevianceresidu.a1s

    (seeFig.3)indicatethatcases11and27arepoorlyaccountedforbythemode1.Theindexplotof

    thediagonalelementsofthehatmatrix(Se:eFig.4)indicatesthatcase49isanextremepointinthe

    design.

Report this document

For any questions or suggestions please email
cust-service@docsford.com