we need to keep two sequences aligned; so in addition to minimum edit distance we need to know how to transform one sequence into another.
To do this, we keep a pointer of what cell we came from.
This is similar to edit distance with DP, but we keep a pointer of each cell of the action: point DOWN (less j) if inserting, point LEFT (less i) if deleting, and point diagonally if substituting.
For two strings, let’s define:
- \(X\) of length \(n\)
- \(Y\) of length \(m\)
we define some \(D(i,j)\) as the edit distance between substring \(X[1:i]\) and \(Y[1:j]\).
Let:
- \(D(i,0) = i\)
- \(D(0,j) = j\)
for i in range(1,M):
for j in range(1,N):
# deletion: ignoring one char of previous string
d1 = D(i-1,j) + 1 # (cost)
# insertion: insertion into string before using rest of j
d2 = D(i,j-1) + 1 # (cost)
# keep same if char is same or substitute current
d3 = D(i-1,j-1) + (0 if X[i] == Y[j] else 2)
# cache
D(i,j) = min(d1, d2, d3)