Kolomogorov Complexity is a “universal theory of information”. “how much information is contained in a string”
The Kolomogorov Complexity of a string \(x\) is the length of the shortest description, \(|d(x)|\)
information as description
Key idea: the more we can compress a string, the more information it contains. The amount of information in a string \(x\) is the length of the shortest description of \(x\).
aside
For some \((M,w)\), we are about to write short strings of it; how do we encode it?
Well, we need some encoding such that \(Z = (M,w)\) iff \(\pi_{1}(Z) = M\), \(\pi_{2}(Z) = w\).
Let us define:
\begin{equation} (M,w) := 0^{|M|} 1 M w \end{equation}
“give the length of \(M\) and so then we can break \(Mw\)”.
description
A description of \(x\) is a string \(M, w\) such that \(M\) on the input \(w\) halts with only \(x\) in its tape
shortest description
shortest description of \(x\), written as \(d(x)\), is the lexicographically shortest string \(M, w\) such that \(M(w)\) halts with only \(x\) on its tape.
additional information
unprovable komogorov complexity
For every interesting consistent \(\mathcal{F}\), there is a \(t\) such that for all \(x\), \(K(x) > t\) is unprovable in \(\mathcal{F}\).
To do this, let’s define an \(M\) that treats all of its input as a integer, whereby:
\(M(k)\) searches over all strings \(x\) and proofs \(P\) for a proof \(P\) in \(\mathcal{F}\) that \(K(x) >k\); output \(x\) if found. If \(M(k)\) halts, it must print \(x’\); whereby \(K(x’) \leq c + \log k\). However, \(k \leq c + \log k\) is only true for small \(k\); choose \(t\) for which this is not the case. Notice this makes \(K(x) > t c + \log k\), meaning \(M\) can’t halt.
Therefore, \(K(x) > t\) has no proof in this language.
random unprovable truths
For every interesting consistent \(\mathcal{F}\), there is a \(t\) such that for all \(x\), we have \(K(x) > t\) is unprovable in \(\mathcal{F}\).
Yet, for randomly chosen \(x\) of length \(t + 100\), we have that \(K(x) > t\) is true with probability at least \(1 - \frac{1}{2^{100}}\). So, we have thees exceedingly likely to be true statements which are unprovable.
determining compressibility
for some
\begin{equation} COMPRESS = \qty {(x,c) | K(x) \leq c} \end{equation}
is actually undecidable.
Proof idea: if decidable, we could just print the first incompressible string of length \(n\); but then; that actually describes our supposedly “incompressible” string through our constant machine size and \(n\) in binary; and then we reach contradiction.
Atm is STILL not decidable
because \(COMPRESS\) is reducible to ATM.
Define \(M_{x,c}\): on input \(w\), for all pairs \((M’, w’)\) with \(|(M’, w’) |\leq c\), simulate \(M’\) on \(w’\) in parallel; if some \(M’\) halts and prints \(x\), accept. Meaning, \(K(x) \leq c \Leftrightarrow M_{x,c}\) accepts \(\varepsilon\).
Meaning this reduces to the set \(A_{TM}\).
interpreter
an interpreter is a semi-computable function:
\begin{equation} p: \Sigma^{*} \to \Sigma^{*} \end{equation}
which takes programs as input, and may print their outputs. In particular let \(x \in \qty {0,1}^{*}\), the shortest description of \(x\) under \(p\), called \(d_{p}(x)\), is the lexicographically shortest string \(w\) for which \(p(w)=x\)
Let \(K_{p}\) complexity of \(x\) be \(K_{p}(x) := |d_{p}(x)|\)
Theorem: for interpreter \(p\), there is a fixed \(c\) so that for all \(x \in \qty {0,1}^{*}\), there is \(K(x) \leq K_{p}(x) + c\)
Proof:
let’s define \(M(w)\) for which on \(w\), we simulate \(p(w)\) and write its outputs to the tape. Meaning, \((M, d_{p}(x))\) is a description of \(x\). Notice that the \(K(x) \leq 2|M|+ K_{p}(X) + 1 \leq c + K_{p}(x)\).
incompressible strings
For every \(n\), there is as string \(x \in \qty {0,1}^{n}\) such that \(K(x) \geq n\). “there are incompressible strings of every length”.
Number of binary strings of length \(n\) is \(2^{n}\), yet the number of descriptions that could result in \(K(x) < n\) is the number of descriptions of length \(< n\) which is bounded by the number of binary strings \(< n\) meaning \(2^{n-1}\); there are therefore less “sufficiently-short” descriptions than strings we want to describe; so there’s at least one \(n\) bit string on \(x\) that doesn’t have a description \(<n\)
random strings is incompressible
the probably of a random string having Kolomogorov Complexity is lower-bounded:
for all \(n\) and \(c > 1\), we have:
\begin{equation} P_{x \in \qty {0,1}^{n}} \qty[K(x) \geq n-c] \geq 1- \frac{1}{2^{c}} \end{equation}
proof uses the same thing as incompressible strings; the number of descriptions of length \(<n-c\) is \(2^{n-c}-1\); so the probability that a random string satisfies this is at most \(\frac{(2^{n-c}-1)}{2^{n}} < \frac{1}{2^{c}}\)
simple upper bound
There’s a fixed \(c\) so that all \(x\) in \(\qty {0,1}^{*}\), there exists:
\begin{equation} K(x) \leq |x|+c \end{equation}
“the amount of information in \(x\) isn’t much more than \(|x|\)”. Because we can always define a turing machine, for which “on input \(w\), halt”. Meaning, for any string \(x\), \(M(x)\) halts with \(x\) on its tape (i.e. immediate).
So, \((M,x)\) is a description of \(x\), and by the paring given above, we have \(2|M| + |x| + 1 \leq |x|+c\) .
repetitive strings
there’s a fixed constant \(c\) so that for all \(n\geq 2\), and all \(x \in \qty {0,1}^{*}\), we have \(K(x^{n}) \leq K(x) + c \log n\).
Because we can define the Turing machine \(N=(n, (M,w))\) for which we write \(x = M(w)\) and “print \(x\) for \(n\) times”
So, for \(K(x^{n}) \leq K((N, (n, (M,w))) \leq 2|N| + d \log n + K(x) \leq c \log n + K(x)\)
Recall Church-Turing thesis
hypothesis: “Everyone’s intuitive notion of algorithms is a Turing-machine”