Currently dedupe uses an affine gap distance. The edit distance is very similar to the Levenshtein distance except that the cost of extending a gap (a deletion or insertion) is less than opening the gap.
This works really well for the kinds of strings we deal with in record linkage. How would we implement this for pyhacrf.
Currently dedupe uses an affine gap distance. The edit distance is very similar to the Levenshtein distance except that the cost of extending a gap (a deletion or insertion) is less than opening the gap.
This works really well for the kinds of strings we deal with in record linkage. How would we implement this for pyhacrf.