Skip to content

TXT Engine C : Educational text engine in pure C with optional Python bindings, designed for learning parsing, tokenization, text statistics, and internal text analysis architecture.

License

Notifications You must be signed in to change notification settings

geniusinsanity/TXT-Engine-C

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TXT Engine C

A modular, educational Text Processing Engine written in pure C99. Designed to teach core software engineering concepts: Modularity, State Machines, and API Design.

🏗 Architecture

The engine is built as a pipeline: Raw Text -> [Scanner] -> Chars -> [Tokenizer] -> Tokens -> [App]

  • Scanner: Reads raw text sources safely.
  • Tokenizer: Groups characters into Words, Numbers, or Punctuation.
  • Stats: Computes metrics (Word count, etc.).
  • Normalizer: Standardizes text (e.g., Lowercasing).

🚀 Usage

Building the Project

We use a Makefile for automation.

# Build the library and demo
make

# Run the tests
make test

# Clean up
make clean

Example Code

#include "tokenizer.h"

// ... inside main ...
Scanner scanner;
scanner_init(&scanner, "Hello World 123");

Tokenizer tokenizer;
tokenizer_init(&tokenizer, &scanner);

Token token;
while ((token = tokenizer_next(&tokenizer)).type != TOKEN_END) {
    if (token.type == TOKEN_WORD) {
        printf("Word found: %.*s\n", token.length, token.start);
    }
}

📂 Project Structure

  • include/: Public API (Header files).
  • src/: Implementation (Source code).
  • examples/: Demo programs.
  • tests/: Unit tests.
  • docs/: Educational step-by-step guides.

📚 Educational Guides

  1. Architecture
  2. Module Definitions
  3. Header Design
  4. Implementation
  5. Build System
  6. Example Program
  7. Testing
  8. Git Best Practices

🔮 Roadmap

  • UTF-8 Support
  • Python Bindings (ctypes)
  • Streaming file input (not just strings)

About

TXT Engine C : Educational text engine in pure C with optional Python bindings, designed for learning parsing, tokenization, text statistics, and internal text analysis architecture.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published