-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathspellchecker.py
More file actions
82 lines (67 loc) · 3.88 KB
/
spellchecker.py
File metadata and controls
82 lines (67 loc) · 3.88 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
"""Spell checking program. Runs through an infinite loop, prompts user for a string, and prints either a
spell checked english word or 'NO SUGGESTION' according to the specification at
http://www.twitch.tv/problems/spellcheck.
The idea here is to preprocess a complete list of english words into two dictionaries. One of these
dictionaries contains just correct words, so we don't autocorrect if it's unnecessary. The other contains
keys of misspellings according to a certain algorithm, explained below. The value for each key is a
correct english word that can be reached using the misspellings allowd in the specification.
These "incorrect" keys are generated by running the following steps for each word in /usr/share/dict/words:
1. lowercase each word. Working in the lowercase domain makes it easy to handle case errors.
2. change all vowels into a's. This effectively makes it so vowels are ignored when comparing an input
word to the complete list of words. 'a' was chosen arbitrarily; any placeholder character would do.
3. remove repeated characters until each character has no adjacent identical character.
These steps result in a set of misspellings that map back to a correct word that can be reached by the
misspelling rules in the specification. We can apply these same steps to the word input by the user, and
merely look up the key in the dictionary. If there is no such key, we can be confident that the word typed
by the user cannot reach a correct word through manipulation according to the rules. In this case, print
"NO SUGGESTION". If a key is found, print the correct word it maps to.
"""
import itertools
from sys import stdin
words = open('words', 'r')
correct_word_dict = {}
word_misspelling_dict = {} # this dict will map a modified word to the original word. It will include:
# all lowercase versions of words (to handle casing errors),
# all versions of words with repeated letters trimmed off (i.e. written -> writen),
# all versions of vowel substitutions for each word (i.e. fade -> fede, fodo, ...)
def remove_repeats(word):
if len(word) == 1:
return word
listed_word = list(word)
for i in range(0, len(word) - 1):
if listed_word[i] == listed_word[i + 1]:
listed_word[i] = None # None is placeholder for repeated character
return ''.join([letter for letter in listed_word if letter is not None])
def substitute_vowels(word):
"""Takes in a word, and returns it with 'a' substituted for all vowels
"""
listed_word = list(word)
vowels = 'aeiou'
for i in range(0, len(listed_word)):
if listed_word[i] in vowels:
listed_word[i] = 'a'
return ''.join(listed_word)
def word_to_common_misspelling(word):
"""Takes a word, and returns a list of the misspellings of the word generated by the algorithm
enumerated in the docstring at the top of the file. It first lowercases the word, then finds
all possible vowel substitutions, and removes duplicate letters from them each of these
"""
word = word.lower() # work in lowercase domain (handles case errors)
vowels_substituted = substitute_vowels(word)
return remove_repeats(vowels_substituted)
def word_suggestion(word):
if correct_word_dict.get(word, None) is not None:
return word # word was not misspelled, so return it
elif word_misspelling_dict.get(word_to_common_misspelling(word), None) is not None:
return word_misspelling_dict[word_to_common_misspelling(word)]
else:
return 'NO RESPONSE'
# Parse a complete list of English words, setting correct words and a mapping to a set of incorrect words as keys
for word in words:
word = word.strip() # strip out newlines from file reading
correct_word_dict[word] = word # make correct words a key (so we avoid correcting if unnecessary)
# add mapping of [a subset of] possible misspellings of each word in the dictionary
word_misspelling_dict[word_to_common_misspelling(word)] = word
while(True):
word = raw_input('>')
print word_suggestion(word)