SU-ENGR76 APR042024

information

Information is the amount of surprise a message provides.

Shannon (1948): a mathematical theory for communication.

information value

The information value, or entropy, of a information source is its probability-weighted average surprise of all possible outcomes:

\begin{equation} H(X) = \sum_{x \in X}^{} s(P(X=x)) P(X=x) \end{equation}

properties of entropy

entropy is positive: \(H(x) \geq 0\)
entropy of uniform: for \(M \sim CatUni[1, …, n]\), \(p_{i} = \frac{1}{|M|} = \frac{1}{n}\), and \(H(M) = \log_{2} |M| = \log_{2} n\)
entropy is bounded: \(0 \leq H(X) \leq H(M)\) where \(|X| = |M|\) and \(M \sim CatUni[1 … n]\) (“uniform distribution has the highest entropy”); we will reach the upper bound IFF \(X\) is uniformly distributed.

binary entropy function

For some binary outcome \(X \in \{1,2\}\), where \(P(x=1) = p_1\), \(P(X_2 = 2) = 1-p_1\). We can write:

\begin{equation} H_{2}(p_1) = p_1 \log_{2} \frac{1}{p_1} + (1-p_1) \log_{2} \frac{1}{1-p_1} \end{equation}

If you plot this out, we get a cap-like function, whereby at \(H(0) = 0\), \(H(1) = 0\), but \(H(0.5) = 1\) — information sources are most effective when what’s communicated is ambiguous.

information source

We model an information source as an random variable. A random variable can take on any number of information, until you GET the information, and you will get an exact value. Each source has a range of possible values it can communicate (which is the support of the random variable representing the information source).

We will then define the surprise of a piece of information as a (decreasing function? of) the probability corresponding to the event of receiving that information.

surprise

IMPORTANTLY: this class uses base \(2\), but the base is unimportant.

\begin{equation} s(p) = \log_{2} \frac{1}{p} \end{equation}

Properties of Surprise

log-base-2 surprise has units “bits”
\(s(1) = 0\)
\(p \to 0, s(p) \to \infty\)
\(s(p) \geq 0\)
“joint surprise” \(s(p,q) = s(p) + s(q)\)

Facts about Surprise

surprise should probably decrease with increasing \(p\)
surprise should be continuous in \(p\)
for \(s(p_i), s(q_{j})\), for two events \(p_i\) and \(q_{j}\), the “surprise” of \(s(pi, q_{j})\) = \(s(p_{i}) + s(q_{j})\)
surprise satisfies the fact that something with probably \(0\) happening, we should be infinitely surprises; if something happens with iincreasing higher probabiity, surprise would be low

The surprise function is the unique function which satisfies all of the above property.

additional information

information is relative to the domain which is attempted to be communicated