--- Start Class on 4-3-2014 (session 7)
Once you have computed (simulated) from \(\pi(\theta| y)\), \(p(\tilde y | y)\).
You are basically done.
The posterior distribution is a sufficient statistic. If the model is correct, you can throw away your data and stick only with \(\pi(\theta | y)\).
One can claim that the Bayesian estimate of \(\theta\) is \(\pi(\theta | y)\). One can claim that the Bayesian estimate of \(\tilde y\) is \(p_{\pi}(\tilde y | y)\).
Why does any Bayesian need anything else (point estimation, interval estimator, test, predictive intervals)?
Imagine that your parameter space \(\theta = (\theta_1, \dots \theta_p) \in R^p\) and you only care about \(\theta_1\). What do you do if you only have:
\(\ell_{Y = y}(\theta_1, \dots \theta_p)\)
\(\tilde \ell(\theta_1) = max \ell_{Y = y}(\theta_1, \dots \theta_p)\)
Why is this a good answer?
Bayesian answer will be the marginal:
\(\pi(\theta_1 | y) = \int \dots \int \pi(\theta_1 \dots \theta_p |y) d\theta_1 \dots d\theta_p\)
Imagine that you have worked things out for \(\theta\) and now you ask questions about \(g(\theta)\).
Frequentist have difficulties translating inferences about \(\theta\) into inferences about \(g(\theta)\).
Bayesian will do \(\pi(\theta | y) \rightarrow \pi(g(\theta) | y)\) just a change of variables.
\(\theta_1 \dots \theta_p \rightarrow g(\theta_1) \dots g(\theta_p)\) sample from \(\pi(g(\theta) | y)\).
How do we summarize the \(formula\) with a number if we must?
Observation:
You can only use the Median if you deal with real valued \(\theta\). If \(\theta \in \Omega R^2\) you can't sort \(\Omega\).
What happens if your posterior is ??
What do you give one point estimation? Do you want to give a point estimate? Probably not.
If you are in \(R^10\) you probably want to do point estimate.
An estimator is neither Bayesian nor Frequentist. An estimator is a function of the data \(formula\) that hopefully will be close to the truth \(\theta^*\) most of the time.
What will be Bayesian or Frequentist is how you judge (assess) the estimator.
We define a region with posterior credibility p to be a subset of \(\Omega\), \(formula\), such that \(\int_{C_{p}(y)}formula\)
http://en.wikipedia.org/wiki/Credible_interval
The same concept applies to \(formula\).
They are useful as summaries of the uncertainty in \(formula\), \(formula\), \(formula\), \(formula\).
There are 2 families of credibility regions
You restrict yourself to picking the values of \(\theta\) with the highest density.
They might no be connected.(fact)
--- Start Class on 6-3-2014 (session 8)
\(formula\) Sample simulated from \(\pi(\theta | y)\) ordered from small to big
\(\hat q^{\frac{1-p}{2}} = formula\)
Invariant when you re-parametrize. (strength)
They have to be intervals (fact)
An interval \(| a(y), b(y)|\) is neither Bayesian nor frequentist. What is Bayesian or Frequentist is the way in which you asses (judge) it.
A Bayesian will judge the interval through the posterior \(\pi(\theta | y)\)
\(P_{\theta | y}(\theta \in | a(y), b(y)| | y) = p\) credible \(\theta\) unknown \(y\) fixed
A Frequentist will judge it through repeated sampling from \(M = {P(y | \theta^*), \theta^* \in \Omega}\)
\(\inf_{\theta^* \in \Omega} = P_{y|\theta^*}(\theta^* \in | a(y), b(y)| | \theta^*) = p(\theta^*)\) confidence of the interval.
\(\theta\) fixed, \(y\) random.
It is extreamly rare that \(p\) and \(1 - \alpha\) for a given \(| a(y), b(y)|\).
There is a temptation of selling an 95% confidence interval as if it was 95% credible interval. This is cheating.
Confidence it is not a probability.
\(\Omega = \Omega_1 \cup \Omega_2\)
\(M = {P(y | \theta), \theta \in \Omega} = formula\)
\(\left\{\begin{matrix}H_{1}: \theta \in \Omega_{1}\\H_{2}: \theta \in \Omega_{2}\end{matrix}\right.\)
\(M_1: \tilde y \sim M_1\) \(M_2: \tilde y \sim M_2\)
\(formula\)
You will choose the \(H_1\) that has the largest posterior probability.
\(formula\) posterior odds.
Note that we are treating the null and the alternative symetrically. Compute both probability and choose the one that has more probability.
There is no difference between \(H_0\) and \(H_a\).
\(\underline{Example 1}\)
Simple against simple.
\(M = {P(y | \theta), \theta \in {\theta_1, \theta_2}} = {p(y| \theta_1), p(y| \theta_2)}\) is a Dichothomy.
\(complex formula\)
\(P(H_1 | y) = \frac{formula}{formula}\) \(P(H_2 | y) = \frac{formula}{formula}\)
Posterior odds \(\frac{P(H_1 | y)}{P(H_2 | y)} = \frac{P(H_1 | y)}{1 - P(H_1 | y)} = \frac{P(H_1)}{P(H_2)}\frac{P(y | \theta_1)}{P(y | \theta_2)} = \frac{P(H_1)}{P(H_2)}\frac{\ell_y(\theta_1)}{\ell_y(\theta_2)}\)
Posterior odds = prior odds x likelihood ratio (Bayes factor)
Neyman-Pearson states that it is optimal to have a rejection region based on
\(\frac{\ell_y(\theta_1)}{\ell_y(\theta_2)} = \frac{P(y | \theta_1)}{P(y | \theta_2)}\)
C os a constant that depends on the size of your test. If it is > C then \(H_1\) If it is < C then \(H_2\)
\(p-value = formula\)
Only works for simple against simple.
Often we do as if the \(p-value = P(H_1|y) = P(H_0|y)\).
Instances when a p-value is approximately equal to \(P(H_1|y)\) are rare.
\(\underline{Example 2}\)
Chance between two submodels
\(M = M_1 \cup M_2 =formula \cup formula\)
\({Poisson(\lambda), \lambda \in (0, \infty)} \cup {NegativeBinomial(r, \theta), \theta \in (0,1)}\)
\(\left\{\begin{matrix}H_{1}: \theta \in \Omega_{1}\\H_{2}: \theta \in \Omega_{2}\end{matrix}\right.\)
\(M_1: \tilde y \sim P_1(y | \theta)\) \(P(H_1), \pi(\theta | H_1)\)
\(M_2: \tilde y \sim P_2(y | \theta)\) \(P(H_1), \pi(\theta | H_2)\)
\(P(H_1 |y)\)
Complex problems will be dealed as the simple case.