You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 10, 2023. It is now read-only.
It would be nice if there was a builtin way to iterate over the internal nodes in the trie like this:
def_iternodes(self):
""" Generates all nodes in the trie """stack=deque([[self._root]])
whilestack:
fornodeinstack.pop():
yieldnodestack.append(node.children.values())
That would make it much easier to write the Shortest Unique Prefix algorithm:
defshortest_unique_prefixes(items):
""" The shortest unique prefix algorithm. Args: items (list of str): returned prefixes will be unique wrt this set Returns: list of str: a prefix for each item that uniquely identifies it wrt to the original items. References: http://www.geeksforgeeks.org/find-all-shortest-unique-prefixes-to-represent-each-word-in-a-given-list/ https://github.com/Briaares/InterviewBit/blob/master/Level6/Shortest%20Unique%20Prefix.cpp Requires: pip install pygtrie Doctest: >>> from pysseg.fnameproc import * >>> items = ["zebra", "dog", "duck", "dove"] >>> shortest_unique_prefixes(items) ['z', 'dog', 'du', 'dov'] Timeing: >>> # make numbers larger to stress test >>> # L = max length of a string, N = number of strings, >>> # C = smallest gaurenteed common length >>> # (the setting N=10000, L=100, C=20 is feasible we are good) >>> import random >>> def make_data(N, L, C): >>> rng = random.Random(0) >>> return [''.join(['a' if i < C else chr(rng.randint(97, 122)) >>> for i in range(L)]) for _ in range(N)] >>> items = make_data(N=1000, L=10, C=0) >>> %timeit shortest_unique_prefixes(items) 17.5 ms ± 244 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) >>> items = make_data(N=1000, L=100, C=0) >>> %timeit shortest_unique_prefixes(items) 141 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> items = make_data(N=1000, L=100, C=70) >>> %timeit shortest_unique_prefixes(items) 141 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> items = make_data(N=10000, L=250, C=20) >>> %timeit shortest_unique_prefixes(items) 3.55 s ± 23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) """importpygtriefromcollectionsimportdequeiflen(set(items)) !=len(items):
raiseValueError('inputs must be unique')
# construct treetree=pygtrie.CharTrie.fromkeys(items, value=0)
# Hack into the internal structure and insert frequencies at each nodedef_iternodes(self):
""" Generates all nodes in the trie """stack=deque([[self._root]])
whilestack:
fornodeinstack.pop():
yieldnodestack.append(node.children.values())
# Set the value (frequency) of all (except the root) nodes to zero.fornodein_iternodes(tree):
ifnodeisnottree._root:
node.value=0# For each item trace its path and increment frequenciesforiteminitems:
final_node, trace=tree._get_node(item)
forkey, nodeintrace[1:]:
node.value+=1# Query for the first prefix with frequency 1 for each item.# This is the shortest unique prefix over all items.unique= []
foriteminitems:
freq=Noneforprefix, freqintree.prefixes(item):
iffreq==1:
breakassertfreq==1, 'item={} has no unique prefix'.format(item)
unique.append(prefix)
returnunique
It would be nice if there was a builtin way to iterate over the internal nodes in the trie like this:
That would make it much easier to write the Shortest Unique Prefix algorithm: