Skip to content

Latest commit

 

History

History
33 lines (23 loc) · 1.35 KB

File metadata and controls

33 lines (23 loc) · 1.35 KB

Basic Grep Implementation

In this, implement grep like functionality

Inspiration

Inspired by codecrafters.

Dependencies

Tested with Python 3.12+

Features

Supports:

  • Basic character support, including \d, \w and wildcards
  • Character Classes - both positive and negative [abc] and [^abc]
  • Beginner and End Carats (^, $)
  • Quantifiers like *, +, ?
  • Parenthesis support so (a*b)+ ops are possible
  • Support for Alternation |

Design Approach and History

  • We started with very basic recursive algorithm
  • However, we soon realised that we needed to tokenise pattern since it make quantifier search pretty easy For example, with no tokenisation, it is hard to check for \\w+, [abc]* or 2? since each is different length But with tokens, you can check that if token[n+1] is quantifier, then token[n] would be thing where this applies
  • This approach scaled very well for lot of use cases
  • However, once we tried to handle parenthesis - we started running into issues. It quickly became apparant that our token array would quickly became nested structure
  • Looking and investigating into that, it turned out that we would be better served by converting Token Arrays wholesale into AST-like structure
  • At this stage, we converted Parser, AST and Matcher into their own separate modules