Statistical Pattern Recogmition Andrew R. Webb

[ Pobierz całość w formacie PDF ]
.xj!i/p1 s.x/dxiD1� Z �CX1Bhattacharyya JBD pi log.p.xj!i/p.x//2 dxiD1ZCXp.xj!i/Joshi JDD pi [p.xj!i/ p.x/]log dxp.x/iD1� �1ZCX2Patrick Fischer JPD pi [p.xj!i/ p.x/]2 dxiD1Discussion 429gives the probabilistic dependence measures corresponding to the probabilistic distancemeasures in Table A.4.In practice, application of probabilistic dependence measures islimited because, even for normally distributed classes, the expressions given in Table A.5cannot be evaluated analytically since the mixture distribution p.x/is not normal.A.3 DiscussionThis appendix has reviewed some of the distance and dissimilarity measures used inChapter 9 on feature selection and extraction and Chapter 10 on clustering.Of coursethe list is not exhaustive and those that are presented may not be the best for yourproblem.We cannot make rigid recommendations as to which ones you should usesince the choice is highly problem-specific.However, it may be advantageous from acomputational point of view to use one that simplifies for normal distributions even ifyour data are not normally distributed.The book by Gordon (1999) provides a good introduction to classification methods.The chapter on dissimilarity measures also highlights difficulties encountered in practicewith real data sets.There are many other books and papers on clustering which listother measures of dissimilarity: for example, Diday and Simon (1976), Cormack (1971)and Clifford and Stephenson (1975), which is written primarily for biologists but theissues treated occur in many areas of scientific endeavour.The papers by Kittler (1975b,1986) provide very good introductions to feature selection and list some of the morecommonly used distance measures.Others may be found in Chen (1976).A good accountof probabilistic distance and dependence measures can be found in the book by Devijverand Kittler (1982).Statistical Pattern Recognition, Second Edition.Andrew R.WebbCopyright � 2002 John Wiley & Sons, Ltd.ISBNs: 0-470-84513-9 (HB); 0-470-84514-7 (PB)BParameter estimationB.1 Parameter estimationB.1.1 Properties of estimatorsPerhaps before we begin to discuss some of the desirable properties of estimators, weought to define what an estimator is.For example, in a measurement experiment we mayassume that the observations are normally distributed but with unknown mean, �, andvariance, �2.The problem then is to estimate the values of these two parameters fromthe set of observations.Therefore, an estimator Oof is defined as any function of thesample values which is calculated to be close in some sense to the true value of theunknown parameter.This is a problem in point estimation, by which is meant derivinga single-valued function of a set of observations to represent the unknown parameter (ora function of the unknown parameters), without explicitly stating the precision of theestimate.The estimation of the confidence interval (the limits within which we expectthe parameter to lie) is an exercise in interval estimation.For a detailed treatment ofestimation we refer to Stuart and Ord (1991).Unbiased estimate The estimator Oof the parameter is unbiased if the expectationover the sampling distribution is equal to , i.e.Z4E[ O]D Op.x1;:::;xn/dx1:::dxnDwhere Ois a function of the sample vectors x1;:::;xn drawn from the distributionp.x1;:::;xn/.We might always want estimators to be approximately unbiased, butthere is no reason why we should insist on exact unbiasedness.Consistent estimate An estimatorOof a parameter is consistent if it converges inprobability (or converges stochastically) to as the number of observations, n!1.That is, for all%7ń;~>0p.jj O jj1 %7ń for n>n0orlim p.jj O jj>~/D0n!1432 Parameter estimationEfficient estimate The efficiency, , of one estimator O relative to another O is defined2 1as the ratio of the variance of the estimatorsE[jj O jj2]1DE[jj O jj2]2O is an efficient estimator if it has the smallest variance (in large samples) compared to1all other estimators, i.e.1 for all O.2Sufficient estimate A statistic O D O.x1;:::;xn/ is termed a sufficient statistic if,1 1for any other statistic O,2p.j O; O/Dp.j O/ (B.1)1 2 1that is, all the relevant information for the estimation of is contained in O and the addi-1tional knowledge of O makes no contribution.An equivalent condition for a distribution2to possess a sufficient statistic is the factorability of the likelihood function (Stuart andOrd, 1991; Young and Calvert, 1974):p.x1;:::;xnj /Dg.Oj /h.x1;:::;xn/ (B.2)Owhere h is a function of x1;:::;xn and is essentially p.x1;:::;xnj /and does not de-pend on , and g is a function of the statistic Oand.Equation (B.2) is also the conditionfor reproducing densities (Spragins, 1976; Young and Calvert, 1974) or conjugate priors(Lindgren, 1976): a probability density of , p./, is a reproducing density with respectto the conditional density p.x1;:::;xnj / if the posterior density p.jx1;:::;xn/ andthe prior density p./ are of the same functional form.The family is called closed un-der sampling or conjugate with respect to p.x1;:::;xnj /.Conditional densities thatadmit sufficient statistics of fixed dimension for any sample size (and hence reproducingdensities) include the normal, binomial and Poisson density functions (Spragins, 1976).PnExample 1 The sample mean xnD1 xi is an unbiased estimator of the popula-iD1ntion mean,�, since" #n n1X 1XE[xn]DE xi D E[xi ]D�n niD1 iD1but the sample variance2" # !23n n n1X 1X 1X 5E [ Pobierz całość w formacie PDF ]

Wątki