information
Information is the amount of surprise a message provides.
Shannon (1948): a mathematical theory for communication.
information value
The information value, or entropy, of a information source is its probability-weighted average surprise of all possible outcomes:
\begin{equation} H(X) = \sum_{x \in X}^{} s(P(X=x)) P(X=x) \end{equation}
properties of entropy
- entropy is positive: \(H(x) \geq 0\)
- entropy of uniform: for \(M \sim CatUni[1, …, n]\), \(p_{i} = \frac{1}{|M|} = \frac{1}{n}\), and \(H(M) = \log_{2} |M| = \log_{2} n\)
- entropy is bounded: \(0 \leq H(X) \leq H(M)\) where \(|X| = |M|\) and \(M \sim CatUni[1 … n]\) (“uniform distribution has the highest entropy”); we will reach the upper bound IFF \(X\) is uniformly distributed.
binary entropy function
For some binary outcome \(X \in \{1,2\}\), where \(P(x=1) = p_1\), \(P(X_2 = 2) = 1-p_1\). We can write:
\begin{equation} H_{2}(p_1) = p_1 \log_{2} \frac{1}{p_1} + (1-p_1) \log_{2} \frac{1}{1-p_1} \end{equation}
If you plot this out, we get a cap-like function, whereby at \(H(0) = 0\), \(H(1) = 0\), but \(H(0.5) = 1\) — information sources are most effective when what’s communicated is ambiguous.
information source
We model an information source as an random variable. A random variable can take on any number of information, until you GET the information, and you will get an exact value. Each source has a range of possible values it can communicate (which is the support of the random variable representing the information source).
We will then define the surprise of a piece of information as a (decreasing function? of) the probability corresponding to the event of receiving that information.
surprise
IMPORTANTLY: this class uses base \(2\), but the base is unimportant.
\begin{equation} s(p) = \log_{2} \frac{1}{p} \end{equation}
Properties of Surprise
- log-base-2 surprise has units “bits”
- \(s(1) = 0\)
- \(p \to 0, s(p) \to \infty\)
- \(s(p) \geq 0\)
- “joint surprise” \(s(p,q) = s(p) + s(q)\)
Facts about Surprise
- surprise should probably decrease with increasing \(p\)
- surprise should be continuous in \(p\)
- for \(s(p_i), s(q_{j})\), for two events \(p_i\) and \(q_{j}\), the “surprise” of \(s(pi, q_{j})\) = \(s(p_{i}) + s(q_{j})\)
- surprise satisfies the fact that something with probably \(0\) happening, we should be infinitely surprises; if something happens with iincreasing higher probabiity, surprise would be low
The surprise function is the unique function which satisfies all of the above property.
additional information
information is relative to the domain which is attempted to be communicated