We treat this as an inference problem in Naive Bayes: observations are independent from each other.
Instead of trying to compute a
That is, we want to get some:
“for each value of
To do this, we desire:
“what’s the probability of theta being at a certain value given the observations we had.”
And to obtain the actual the actual value, we calculate the expectation of this distribution:
If its not possible to obtain such an expected value, we then calculate just the mode of the distribution (like where the peak probability of
Bayesian Parameter Learning on Binary Distributions
We are working in a Naive Bayes environment, where we assume that
Using the same steps as inference with Naive Bayes and some algebra:
Now, we would like to normalize this function for
where,
Normalizing the output, we have that:
where
Beta Distribution
Suppose you had a non-uniform prior:
- Prior:
- Observe:
positive outcomes, negative outcomes - Posterior:
That is: for binary outcomes, the beta distribution can be updated without doing any math.
For instance, say we had:
and we observed that
instead, if we observed that
Essentially: MAGNITUDE of beta distribution governs how small the spread is (higher magnitude smaller spread), and the balance between the two values represents how much skew there is.
Beta is a special distribution which takes parameters
and variance:
and has mode:
when
This means that, at
Laplace Smoothing
Laplace Smoothing is a prior where:
so you just add
see also Laplace prior, where you use Laplace Smoothing for your prior
Total Probability in beta distributions
Recall, for total probability, beta is a special distribution which takes parameters
and has mode:
Choosing a prior
- do it with only the problem and no knowledge of the data
- uniform typically works well, but if you have any reason why it won’t be uniform (say coin flip), you should count accordingly such as making the distribution more normal with
Dirichlet Distribution
We can generalize the Bayesian Parameter Learning on Binary Distributions with the Dirichlet Distribution.
For
Now:
whereby:
for
whereby prior is your initial distribution. If its uniform, then all prior equals one.
The expectation for each
and, with
expectation of a distribution
For Beta Distribution and Dirichlet Distribution, the expectation of their distribution is simply their mean.
if you say want to know what the probability of
The first thing is just the actual value of
This, of course, just add up to the expected value of