Iterator over internal node structure (feature request)

It would be nice if there was a builtin way to iterate over the internal nodes in the trie like this: 

```python
            def _iternodes(self):
                """
                Generates all nodes in the trie
                """
                stack = deque([[self._root]])
                while stack:
                    for node in stack.pop():
                        yield node
                        stack.append(node.children.values())
```

That would make it much easier to write the Shortest Unique Prefix algorithm:

```python
def shortest_unique_prefixes(items):
    """
    The shortest unique prefix algorithm.

    Args:
        items (list of str): returned prefixes will be unique wrt this set

    Returns:
        list of str: a prefix for each item that uniquely identifies it
           wrt to the original items.

    References:
        http://www.geeksforgeeks.org/find-all-shortest-unique-prefixes-to-represent-each-word-in-a-given-list/
        https://github.com/Briaares/InterviewBit/blob/master/Level6/Shortest%20Unique%20Prefix.cpp

    Requires:
        pip install pygtrie

    Doctest:
        >>> from pysseg.fnameproc import *
        >>> items = ["zebra", "dog", "duck", "dove"]
        >>> shortest_unique_prefixes(items)
        ['z', 'dog', 'du', 'dov']

    Timeing:
        >>> # make numbers larger to stress test
        >>> # L = max length of a string, N = number of strings,
        >>> # C = smallest gaurenteed common length
        >>> # (the setting N=10000, L=100, C=20 is feasible we are good)
        >>> import random
        >>> def make_data(N, L, C):
        >>>     rng = random.Random(0)
        >>>     return [''.join(['a' if i < C else chr(rng.randint(97, 122))
        >>>                      for i in range(L)]) for _ in range(N)]
        >>> items = make_data(N=1000, L=10, C=0)
        >>> %timeit shortest_unique_prefixes(items)
        17.5 ms ± 244 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
        >>> items = make_data(N=1000, L=100, C=0)
        >>> %timeit shortest_unique_prefixes(items)
        141 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
        >>> items = make_data(N=1000, L=100, C=70)
        >>> %timeit shortest_unique_prefixes(items)
        141 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
        >>> items = make_data(N=10000, L=250, C=20)
        >>> %timeit shortest_unique_prefixes(items)
        3.55 s ± 23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    """
    import pygtrie
    from collections import deque
    if len(set(items)) != len(items):
        raise ValueError('inputs must be unique')

    # construct tree
    tree = pygtrie.CharTrie.fromkeys(items, value=0)

    # Hack into the internal structure and insert frequencies at each node
    def _iternodes(self):
        """
        Generates all nodes in the trie
        """
        stack = deque([[self._root]])
        while stack:
            for node in stack.pop():
                yield node
                stack.append(node.children.values())

    # Set the value (frequency) of all (except the root) nodes to zero.
    for node in _iternodes(tree):
        if node is not tree._root:
            node.value = 0

    # For each item trace its path and increment frequencies
    for item in items:
        final_node, trace = tree._get_node(item)
        for key, node in trace[1:]:
            node.value += 1

    # Query for the first prefix with frequency 1 for each item.
    # This is the shortest unique prefix over all items.
    unique = []
    for item in items:
        freq = None
        for prefix, freq in tree.prefixes(item):
            if freq == 1:
                break
        assert freq == 1, 'item={} has no unique prefix'.format(item)
        unique.append(prefix)
    return unique
``` 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterator over internal node structure (feature request) #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Iterator over internal node structure (feature request) #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions