Markov Chain Monte Carlo (MCMC) methods are now an indispensable tool in scientific computing. This book discusses recent developments of MCMC methods with an emphasis on those making use of past sample information during simulations. The application examples are drawn from diverse fields such as bioinformatics, machine learning, social science, combinatorial optimization, and computational physics.
Key Features:
This book can be used as a textbook or a reference book for a one-semester graduate course in statistics, computational biology, engineering, and computer sciences. Applied or theoretical researchers will also find this book beneficial.
Le informazioni nella sezione "Riassunto" possono far riferimento a edizioni diverse di questo titolo.
Faming Liang, Associate Professor, Department of Statistics, Texas A&M University.
Chuanhai Liu, Professor, Department of Statistics, Purdue University.
Raymond J. Carroll, Distinguished Professor, Department of Statistics, Texas A&M University.
Markov Chain Monte Carlo (MCMC) methods are now an indispensable tool in scientific computing. This book discusses recent developments of MCMC methods with an emphasis on those making use of past sample information during simulations. The application examples are drawn from diverse fields such as bioinformatics, machine learning, social science, combinatorial optimization, and computational physics.
Key Features:
This book can be used as a textbook or a reference book for a one-semester graduate course in statistics, computational biology, engineering, and computer sciences. Applied or theoretical researchers will also find this book beneficial.
1.1 Bayes
Bayesian inference is a probabilistic inferential method. In the last two decades, it has become more popular than ever due to affordable computing power and recent advances in Markov chain Monte Carlo (MCMC) methods for approximating high dimensional integrals.
Bayesian inference can be traced back to Thomas Bayes (1764), who derived the inverse probability of the success probability θ in a sequence of independent Bernoulli trials, where θ was taken from the uniform distribution on the unit interval (0, 1) but treated as unobserved. For later reference, we describe his experiment using familiar modern terminology as follows.
* Example 1.1 The Bernoulli (or Binomial) Model With Known Prior
Suppose that θ ~ Unif(0, 1), the uniform distribution over the unit interval (0, 1), and that x1, ..., xn is a sample from Bernoulli(θ), which has the sample space X = {0, 1} and probability mass function (pmf)
Pr (X = 1|θ) = θ and Pr (X = 0|θ) = 1 - θ, (1.1)
where X denotes the Bernoulli random variable (r.v.) with X = 1 for success and X = 0 for failure. Write [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], the observed number of successes in the n Bernoulli trials. Then N|θ~ Binomial(n, θ), the Binomial distribution with parameters size n and probability of success θ.
The inverse probability of θ given x1, ..., xn, known as the posterior distribution, is obtained from Bayes' theorem, or more rigorously in modern probability theory, the definition of conditional distribution, as the Beta distribution Beta(1 + N, 1 + n-N) with probability density function (pdf)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.2)
where B(·, ·) stands for the Beta function.
1.1.1 Specification of Bayesian Models
Real world problems in statistical inference involve the unknown quantity θ and observed data X. For different views on the philosophical foundations of Bayesian approach, see Savage (1967a, b), Berger (1985), Rubin (1984), and Bernardo and Smith (1994). As far as the mathematical description of a Bayesian model is concerned, Bayesian data analysis amounts to
(i) specifying a sampling model for the observed data X, conditioned on an unknown quantity θ,
X ~ f(X|θ) (X [member of] X, θ [member of] Θ), (1.3)
where f(X|θ) stands for either pdf or pmf as appropriate, and
(ii) specifying a marginal distribution π(θ) for θ, called the prior distribution or simply the prior for short,
(θ) ~ π(θ) (θ [member of] Θ0. (1.4)
Technically, data analysis for producing inferential results on assertions of interest is reduced to computing integrals with respect to the posterior distribution, or posterior for short,
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (1.5)
where L(θ|X) [varies] f(X|θ) in θ, called the likelihood of θ given X. Our focus in this book is on efficient and accurate approximations to these integrals for scientific inference. Thus, limited discussion of Bayesian inference is necessary.
1.1.2 The Jeffreys Priors and Beyond
By its nature, Bayesian inference is necessarily subjective because specification of the full Bayesian model amounts to practically summarizing available information in terms of precise probabilities. Specification of probability models is unavoidable even for frequentist methods, which requires specification of the sampling model, either parametric or non-parametric, for the observed data X. In addition to the sampling model of the observed data X for developing frequentist procedures concerning the unknown quantity θ, Bayesian inference demands a fully specified prior for θ. This is natural when prior information on θ is available and can be summarized precisely by a probability distribution. For situations where such information is neither available nor easily quantified with a precise probability distribution, especially for high dimensional problems, a commonly used method in practice is the Jeffreys method, which suggests the prior of the form
πJ (θ) [varies] |I(θ)|1/2 (θ [member of] Θ), (1.6)
where I(θ) denotes the Fisher information
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
The Jeffreys priors have the appealing property that they are invariant under reparameterization. A theoretical adjustment in terms of frequency properties in the context of large samples can be found in Welch and Peers (1963). Note that prior distributions do not need to be proper as long as the posteriors are proper and produce sensible inferential results. The following Gaussian example shows that the Jeffreys prior is sensible for single parameters.
* Example 1.2 The Gaussian N(μ, 1) Model
Suppose that a sample is considered to have taken from the Gaussian population N(μ, 1) with unit variance and unknown mean μ to be inferred. The Fisher information is obtained as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where φ (x - μ) = (2π)-1/2 exp{-1/2 (x - μ)2} is the pdf of N(μ, 1). It follows that the Jeffreys prior for θ is the flat prior
πJ (μ) [varies] 1 (-∞ < μ < ∞), (1.7)
resulting in the corresponding posterior distribution of θ given X
πJ (μ|X) = N(X, 1). (1.8)
Care must be taken when using the Jeffreys rule. For example, it is easy to show that applying the Jeffreys rule to the Gaussian model N(μ, σ2) with both mean μ and variance σ2 unknown leads to the prior
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
However, this is not the commonly used prior that has better frequency properties (for inference about μ or σ) and is given by
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
that is, μ and σ2 are independent and the distributions for both μ and ln σ2 are flat. For high dimensional problems with small samples, the Jeffreys rule often becomes even less appealing. There are also different perspectives, provided by the extensive work on reference priors by José Bernardo and James Berger (see, e.g., Bernardo, 1979; Berger, 1985). For more discussion of prior specifications, see Kass and Wasserman (1996).
For practical purposes, we refer to Box and Tiao (1973) and Gelman et al. (2004) for discussion on specification of prior distributions. The general guidance for specification of priors when no prior information is available, as is typical in Bayesian analysis, is to find priors that lead to posteriors having good frequency properties (see, e.g., Rubin, 1984; Dawid, 1985). Materials on probabilistic inference without using difficult-to-specify priors are available but beyond the scope of Bayesian inference and therefore will not be discussed in this book. Readers interested in this fascinating area are referred to Fisher (1973), Dempster (2008), and Martin et al. (2009). We note that MCMC methods can be applied there as well.
1.2 Bayes Output
Bayesian analysis for scientific inference does not end with posterior derivation and computation. It is thus critical for posterior distributions to have clear interpretation. For the sake of clarity, probability used in this book has a long-run frequency interpretation in repeated experiments. Thus, standard probability theory, such as conditioning and marginalization, can be applied. Interpretation also suggests how to report Bayesian output as our assessment of assertions of interest on quantities in the specified model. In the following two subsections, we discuss two types of commonly used Bayes output, credible intervals for estimation and Bayes factors for hypothesis testing.
1.2.1 Credible Intervals and Regions
Credible intervals are simply posterior probability intervals. They are used for purposes similar to those of confidence intervals in frequentist statistics and thereby are also known as Bayesian confidence intervals. For example, the 95% left-sided Bayesian credible interval for the parameter μ in the Gaussian Example 1.2 is [-∞, X + 1.64], meaning that the posterior probability that μ lies in the interval from -∞ to X + 1.64 is 0.95. Similar to frequentist construction of two-sided intervals, for given α [member of] (0, 1), a 100(1 - α)% two-sided Bayesian credible interval for a single parameter θ with equal posterior tail probabilities is defined as
[θα/2, θ1-α/2] (1.9)
where the two end points are the α/2 and 1-α/2 quantiles of the (marginal) posterior distribution of θ. For the the Gaussian Example 1.2, the two-sided 95% Bayesian credible interval is [X - 1.96, X + 1.96].
In dealing simultaneously with more than one unknown quantity, the term credible region is used in place of credible interval. For a more general term, we refer to credible intervals and regions as credible sets. Constructing credible sets is somewhat subjective and usually depends on the problems of interest. A common way is to choose the region with highest posterior density (h.p.d.). The 100(1 - α)% h.p.d. region is given by
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.10)
for some θ1-α satisfying
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
For the the Gaussian Example 1.2, the 95% h.p.d. interval is [X - 1.96, X + 1.96], the same as the two-sided 95% Bayesian credible interval because the posterior of μ is unimodal and symmetric. We note that the concept of h.p.d. can also be used for functions of θ such as components of θ in high dimensional situations.
For a given probability content (1 - α), the h.p.d. region has the smallest volume in the space of θ. This is attractive but depends on the functional form of unknown quantities, such as θ and θ2. An alternative credible set is obtained by replacing the posterior density π(θ|X) in (1.10) with the likelihood L(θ|X):
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.11)
for some θ1-α satisfying
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
The likelihood based credible region does not depend on transformation of θ. This is appealing, in particular when no prior information is available on θ, that is, when the specified prior works merely as a working prior leading to inference having good frequency properties.
1.2.2 Hypothesis Testing: Bayes Factors
While the use of credible intervals is a Bayesian alternative to frequentist confidence intervals, the use of Bayes factors has been a Bayesian alternative to classical hypothesis testing. Bayes factors have also been used to develop Bayesian methods for model comparison and selection. Here we review the basics of Bayes factors. For more discussion on Bayes factors, including its history, applications, and difficulties, see Kass and Raftery (1995), Gelman et al. (2004), and references therein.
The concept of Bayes factors is introduced in the situation with a common observed data X and two competing hypotheses denoted by H1 and H2. A full Bayesian analysis requires
(i) specifying a prior distribution on H1 and H2, denoted by, Pr (H1) and Pr (H2), and
(ii) for each k = 1 and 2, specifying the likelihood Lk(θk|X) = fk(X|θk) and prior π(θk|Hk) for θk, conditioned on the truth of Hk, where θk is the parameter under Hk.
Integrating out θk yields
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.12)
for k = 1 and 2. The Bayes factor is the posterior odds of one hypothesis when the prior probabilities of the two hypotheses are equal. More precisely, the Bayes factor in favor of H1 over H2 is defined as
B12 = Pr (X|H1)/Pr (X|H2). (1.13)
The use of Bayes factors for hypothesis testing is similar to the likelihood ratio test, but instead of maximizing the likelihood, Bayesians in favor of Bayes factors average it over the parameters. According to the definition of Bayes factors, proper priors are often required. Thus, care must be taken in specification of priors so that inferential results are meaningful. In addition, the use of Bayes factors renders lack of probabilistic feature of Bayesian inference. In other words, it is consistent with the likelihood principle, but lacks of a metric or a probability scale to measure the strength of evidence. For a summary of evidence provided by data in favor of H1 over H2, Jeffreys (1961) (see also Kass and Raftery (1995)) proposed to interpret the Bayes factor as shown in Table 1.1.
The use of Bayes factor is illustrated by the following binomial example.
* Example 1.3 The Binomial Model (continued with a numerical example)
Suppose we take a sample of n = 100 from Bernoulli(θ) with unknown θ, and observe N = 63 successes and n - N = 37 failures. Suppose that two competing hypotheses are
H1 : θ = 1/2 and H2 : θ ≠ 1/2. (1.14)
Under H1, the likelihood is calculated according to the binomial distribution:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
Under H2, instead of the uniform over the unit interval we consider the Jeffreys prior
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
the proper Beta distribution with shape parameters 1/2 and 1/2. Hence, we have
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
The Bayes factor log10(B12) is then -0.4, which is 'barely worth mentioning' even if it points very slightly towards H2.
It has been recognized that Bayes factor can be sensitive to the prior, which is related to what is known as Lindley's paradox (see Shafer (1982)).
This is shown in Figure 1.1 for a class of Beta priors Beta(α, 1 - α) for 0 ≤ α ≤ 1. The Bayes factor is infinity at the two extreme priors corresponding to α = 0 and α = 1. It can be shown that this class of priors is necessary in the context of imprecise Bayes for producing inferential results that have desired frequency properties. This supports the idea that care must be taken in interpreting Bayesian factors in scientific inference.
Bayesian factors are not the same as a classical likelihood ratio test. A frequentist hypothesis test of H1 considered as a null hypothesis would have produced a more dramatic result, saying that H1 could be rejected at the 1% significance level, since the probability of getting 63 or more successes from a sample of 100 if θ = 1/2 is 0.0060, and as a normal approximation based two-tailed test of getting a figure as extreme as or more extreme than 63 is 0.0093. Note that 63 is more than two standard deviations away from 50, the expected count under H1.
1.3 Monte Carlo Integration
1.3.1 The Problem
Let ν be a probability measure over the Borel σ-field X on the sample space X [suvset or equal to] Rd, where Rd denotes the d-dimensional Euclidian space. A commonly encountered challenging problem is to evaluate integrals of the form
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.15)
where h(x) is a measurable function. Suppose that ν has a pdf f(x). Then (1.15) can be written as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.16)
(Continues...)
Excerpted from Advanced Markov Chain Monte Carlo Methodsby Faming Liang Chuanhai Liu Raymond Carroll Copyright © 2010 by John Wiley & Sons, Ltd. Excerpted by permission of John Wiley & Sons. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Le informazioni nella sezione "Su questo libro" possono far riferimento a edizioni diverse di questo titolo.
EUR 17,02 per la spedizione da U.S.A. a Italia
Destinazione, tempi e costiEUR 6,32 per la spedizione da Regno Unito a Italia
Destinazione, tempi e costiDa: PBShop.store UK, Fairford, GLOS, Regno Unito
HRD. Condizione: New. New Book. Shipped from UK. Established seller since 2000. Codice articolo FW-9780470748268
Quantità: 15 disponibili
Da: Ria Christie Collections, Uxbridge, Regno Unito
Condizione: New. In. Codice articolo ria9780470748268_new
Quantità: Più di 20 disponibili
Da: GreatBookPricesUK, Woodford Green, Regno Unito
Condizione: New. Codice articolo 6081316-n
Quantità: Più di 20 disponibili
Da: moluna, Greven, Germania
Condizione: New. Markov Chain Monte Carlo (MCMC) methods are now an indispensable tool in scientific computing. This book discusses recent developments of MCMC methods with an emphasis on those making use of past sample information during simulations. The application exampl. Codice articolo 446913519
Quantità: Più di 20 disponibili
Da: GreatBookPrices, Columbia, MD, U.S.A.
Condizione: New. Codice articolo 6081316-n
Quantità: Più di 20 disponibili
Da: GreatBookPrices, Columbia, MD, U.S.A.
Condizione: As New. Unread book in perfect condition. Codice articolo 6081316
Quantità: Più di 20 disponibili
Da: GreatBookPricesUK, Woodford Green, Regno Unito
Condizione: As New. Unread book in perfect condition. Codice articolo 6081316
Quantità: Più di 20 disponibili
Da: THE SAINT BOOKSTORE, Southport, Regno Unito
Hardback. Condizione: New. New copy - Usually dispatched within 4 working days. 762. Codice articolo B9780470748268
Quantità: Più di 20 disponibili
Da: Kennys Bookshop and Art Galleries Ltd., Galway, GY, Irlanda
Condizione: New. * Presents the latest developments in Monte Carlo research. * Provides a toolkit for simulating complex systems using MCMC. * Introduces a wide range of algorithms including Gibbs sampler, Metropolis-Hastings and an overview of sequential Monte Carlo algorithms. Series: Wiley Series in Computational Statistics. Num Pages: 378 pages, Illustrations. BIC Classification: PBKS. Category: (P) Professional & Vocational. Dimension: 159 x 233 x 26. Weight in Grams: 724. . 2010. 1st Edition. Hardcover. . . . . Codice articolo V9780470748268
Quantità: Più di 20 disponibili
Da: THE SAINT BOOKSTORE, Southport, Regno Unito
Hardback. Condizione: New. This item is printed on demand. New copy - Usually dispatched within 5-9 working days 762. Codice articolo C9780470748268
Quantità: Più di 20 disponibili