“minimum edit distance” is one approach to solving the problem of “how similar are these two strings”? minimum edit distance is defined by the smallest number of editing operations (insertion, deletion, substitution) needed to transform one string into another.
There are two technical definitions. Both definitions are grounded upon “minimum number of operations it takes to transform a string into another, where”
DP costs \(O(nm)\), backtrace costs \(O(n+m)\).
Standard Edit Distance
insertion, deletion, and substitution cost 1
Levenshtein Distance
insertion and deletion cost 1; substitution cost 2
Example
For instance: “graffe”. Is…
- graf
- graft
- grail
- giraffe
the closest?
Common Use
- machine translation + speech recognition uses edit distance to evaluate output quality
- coreference and NER uses edit distance as a baseline check