ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β βββββββββββββββββ ββββββ βββββββ βββββββ βββ βββ β
β βββββββββββββββββββββββββββββββββββββββββββββ ββββ β
β ββββββββ βββ ββββββββββββββββββββββββ βββββββ β
β ββββββββ βββ ββββββββββββββββββββββββ βββββ β
β ββββββββ βββ βββ ββββββ ββββββ βββ βββ β
β ββββββββ βββ βββ ββββββ ββββββ βββ βββ β
β N O T E β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Cybernetic Knowledge Architecture System
Transform raw academic chaos into structured, exam-ready study guides β powered by Gemma 3 running locally on Apple Silicon.
- What Is StarryNote?
- Key Features
- System Architecture
- Project Structure
- Prerequisites
- Installation
- Usage
- Pipeline Deep Dive
- The Master Template
- Post-Processing Pipeline
- Knowledge Architect Prompt
- Terminal UI
- Testing
- Documentation
- Configuration
- Contributing
StarryNote is a local-first, AI-powered knowledge synthesis engine that transforms raw study materials β lecture notes, code files, PDFs, screenshots β into professional-grade, structured study guides with zero cloud dependency.
Unlike generic summarizers, StarryNote acts as a Knowledge Architect: it doesn't just restate your input β it synthesizes original code examples, mathematical proofs, Mermaid diagrams, and exam questions that explain the source material at a deeper level.
The Philosophy: Your notes are fragments. StarryNote turns them into architecture.
| Problem | StarryNote's Solution |
|---|---|
| Notes are scattered across formats | Universal MIME scanner processes any file type |
| AI summaries are surface-level | Knowledge Architect prompt forces synthesis > summary |
| Cloud AI raises privacy concerns | Runs 100% locally on Apple Silicon via MLX |
| Output varies wildly | Master Template enforces consistent, exam-ready output |
| No way to self-assess | Metacognitive Calibration with confidence meters |
| LLM output has rendering bugs | Triple-layer PostProcessor auto-fixes every output |
|
|
|
|
|
|
graph TD
classDef default fill:#1a1a1a,stroke:#bc13fe,stroke-width:2px,color:#00f3ff
classDef highlight fill:#2a0a3a,stroke:#00f3ff,stroke-width:2px,color:#bc13fe
classDef input fill:#1a1a1a,stroke:#ff6ec7,stroke-width:2px,color:#ff6ec7
classDef output fill:#1a1a1a,stroke:#39ff14,stroke-width:2px,color:#39ff14
A["π Raw Study Materials"]:::input --> B["π StarryScanner<br/>MIME Detection Β· DFS Walk"]
B --> C{"File Type Router"}
C -->|"image/*"| D["πΌοΈ Image Analyzer<br/>PIL Β· Multimodal"]
C -->|"application/pdf"| E["π PDF Analyzer<br/>PyMuPDF Β· OCR"]
C -->|"text/*"| F["π Text Analyzer<br/>UTF-8 Read"]
D --> G["π§ Gemma 3 Engine"]:::highlight
E --> G
F --> G
G --> H["π PromptBuilder<br/>System Rules + Template"]:::highlight
H --> I["π‘οΈ PostProcessor<br/>Mermaid Fix Β· Clean Β· Validate"]:::highlight
I --> J["πΎ StarryFormatter<br/>Instructions/ Output"]
J --> K["π Study Guides"]:::output
graph LR
classDef default fill:#1a1a1a,stroke:#bc13fe,stroke-width:2px,color:#00f3ff
classDef highlight fill:#2a0a3a,stroke:#00f3ff,stroke-width:2px,color:#bc13fe
main[main.py] --> engine[StarryEngine]
main --> scanner[StarryScanner]
main --> formatter[StarryFormatter]
engine --> tl[TemplateLoader]:::highlight
engine --> pb[PromptBuilder]:::highlight
engine --> pp[PostProcessor]:::highlight
formatter --> pp
pp --> mf[MermaidFixer]
pp --> oc[OutputCleaner]
pp --> ov[OutputValidator]
sequenceDiagram
participant U as π€ User
participant M as main.py<br/>TUI Hub
participant S as StarryScanner
participant E as StarryEngine
participant PB as PromptBuilder
participant G as Gemma 3<br/>MLX Metal
participant PP as PostProcessor
participant F as StarryFormatter
U->>M: python main.py
M->>E: Initialize (load model)
E->>G: Load weights into Unified Memory
G-->>E: Model ready
M->>S: scan(cwd)
S-->>M: ScanResult{resources, stats}
loop For each resource
M->>E: process_resource(resource)
E->>PB: build(template, content)
PB-->>E: Complete prompt
E->>G: stream_generate(prompt)
G-->>E: Raw Markdown
E->>PP: PostProcessor.process(raw)
PP-->>E: Clean Markdown
E-->>M: guide_content
M->>F: save_guide(path, content)
F->>PP: PostProcessor.process(content)
F-->>M: output_path
end
M-->>U: Mission Report + Constellation
StarryNote/
βββ main.py # π₯οΈ TUI entry point (4-phase pipeline)
βββ requirements.txt # π¦ Python dependencies
βββ README.md # π You are here
βββ .gitignore # π« Git exclusion rules
β
βββ src/ # βοΈ Core engine modules (6 files, 10 classes)
β βββ __init__.py # Package initializer
β βββ scanner.py # π UniversalResource + ScanResult + StarryScanner
β βββ template_loader.py # π Template I/O, cleaning, and compaction
β βββ prompt_builder.py # π€ Knowledge Architect prompt construction
β βββ model_engine.py # π§ MimeClassifier + TextExtractor + StarryEngine
β βββ postprocessor.py # π‘οΈ MermaidFixer + OutputCleaner + OutputValidator
β βββ formatter.py # πΎ Post-process + save to Instructions/
β
βββ templates/ # π AI output templates
β βββ master_template.md # π 10-section study guide scaffold
β
βββ tests/ # π§ͺ Test suite (382 tests across 12 files)
β βββ __init__.py # Package initializer
β βββ test_engine.py # π¬ StarryEngine prompt + routing tests (22)
β βββ test_file_types.py # π¬ MimeClassifier + TextExtractor + routing (92)
β βββ test_postprocessor.py # π¬ MermaidFixer + Cleaner + Validator (27)
β βββ test_prompt_builder.py # π¬ PromptBuilder rules tests (14)
β βββ test_template_loader.py # π¬ TemplateLoader I/O tests (14)
β βββ test_template.py # π¬ Master template structure tests (33)
β βββ test_formatter.py # π¬ Formatter + post-processing tests (15)
β βββ test_scanner.py # π¬ Scanner + ScanResult tests (22)
β βββ test_edge_cases.py # π¬ Cross-module edge cases (19)
β βββ test_tui.py # π¬ TUI utility + animation tests (112)
β βββ test_model.py # π¬ GPU + metal validation (1, requires GPU)
β βββ test_universal_scanner.py # π¬ Integration smoke test (1)
β βββ sample_note.txt # π Test fixture
β
βββ docs/ # π Documentation
β βββ TestLog.md # π Complete test execution log
β βββ TraceabilityMatrix.md # π Requirements β Code β Tests mapping
β βββ FunctionExplanations.md # π Detailed function documentation
β
βββ .github/ # π€ CI/CD
β βββ workflows/
β βββ main.yml # βΆοΈ GitHub Actions: pytest on push/PR
β
βββ models/ # ποΈ MLX model weights (auto-downloaded, gitignored)
βββ Instructions/ # π Generated study guides (created at runtime)
| Requirement | Minimum | Recommended |
|---|---|---|
| macOS | 13.0 (Ventura) | 14.0+ (Sonoma) |
| Chip | Apple M1 | Apple M3 / M4 |
| RAM | 8 GB Unified | 16 GB+ Unified |
| Python | 3.11 | 3.12+ |
| Disk | ~5 GB (model weights) | 10 GB+ |
| libmagic | Required | brew install libmagic |
β οΈ Apple Silicon Required. StarryNote uses MLX, Apple's Metal-optimized ML framework. It will not run on Intel Macs or Linux/Windows without modifying the engine.
git clone https://github.com/NikanEidi/StarryNote.git
cd StarryNote# libmagic is required for MIME type detection
brew install libmagicpython3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtpython -c "import mlx.core as mx; print(f'Metal GPU: {mx.metal.is_available()}')"
# Expected output: Metal GPU: Trueπ‘ First Run Note: Gemma 3 weights (~5 GB) are downloaded automatically from Hugging Face on the first execution. Subsequent runs load from cache.
Navigate to any directory containing study materials, then run:
cd /path/to/your/study/materials
python /path/to/StarryNote/main.pyOr from the StarryNote directory itself:
python main.py- β‘ Phase 1 β Neural Initialization: Loads Gemma 3 into Apple Silicon's unified memory
- π Phase 2 β Deep Scan: DFS traversal discovering all files via MIME detection
- π§ Phase 3 β Knowledge Synthesis: Processes each file through the Knowledge Architect pipeline
- π Phase 4 β Mission Report: Displays results table with timing and density ratings
Study guides are saved to an Instructions/ folder in the current working directory:
Instructions/
βββ lecture_notes_StudyGuide.md
βββ algorithm_code_StudyGuide.md
βββ exam_review_StudyGuide.md
Every saved guide is automatically post-processed β Mermaid diagrams are fixed, leaked instructions are stripped, and output is validated.
graph LR
classDef default fill:#1a1a1a,stroke:#bc13fe,stroke-width:2px,color:#00f3ff
A["os.walk()"] --> B["python-magic<br/>MIME detection"]
B --> C{"Classify"}
C -->|"image/jpeg"| D["πΌοΈ UniversalResource"]
C -->|"application/pdf"| E["π UniversalResource"]
C -->|"text/x-python"| F["π UniversalResource"]
C -->|"text/plain"| G["π UniversalResource"]
The StarryScanner uses libmagic to read binary headers and determine the true MIME type. Each file is packaged into a UniversalResource dataclass:
@dataclass
class UniversalResource:
file_path: str # Absolute path to the file
mime_type: str # e.g., 'image/jpeg', 'application/pdf'
raw_data: Any # Path reference for downstream processing
size_bytes: int = 0 # File size in bytesThe enhanced scan() method returns a ScanResult with full statistics:
result = scanner.scan("/path/to/notes")
print(f"Found {result.count} files, {result.total_bytes} bytes")
print(f"Skipped {result.skipped_count}, Errors: {result.error_count}")The engine routes each UniversalResource through the appropriate analyzer:
| MIME Type | Analyzer | Strategy |
|---|---|---|
image/* |
_analyze_image() |
PIL β RGB conversion β multimodal prompt |
application/pdf |
_analyze_pdf() |
PyMuPDF text extraction β OCR fallback if <100 chars |
text/* |
_analyze_text() |
Direct content injection into prompt |
All three analyzers run PostProcessor.process() on the raw output before returning.
- Creates
Instructions/directory at the current working directory - Generates filenames:
{original_name}_StudyGuide.md - Automatically post-processes every guide before saving (Mermaid fixing, instruction stripping)
- Provides
validate_guide()for checking structural completeness of saved files
Every generated study guide follows a strict 10-section structure:
graph TD
classDef default fill:#1a1a1a,stroke:#bc13fe,stroke-width:2px,color:#00f3ff
classDef highlight fill:#1a1a1a,stroke:#39ff14,stroke-width:2px,color:#39ff14
A["I. Executive Summary"] --> B["II. Core Concepts"]
B --> C["III. Visual Knowledge Graph"]
C --> D["IV. Technical Deep Dive"]
D --> E["V. Annotated Glossary"]
E --> F["VI. Exam Preparation"]
F --> G["VII. Knowledge Connections"]
G --> H["VIII. Quick Reference Card"]:::highlight
H --> I["IX. Metacognitive Calibration"]:::highlight
I --> J["X. Source Archive"]
| # | Section | Purpose | Unique Feature |
|---|---|---|---|
| I | Executive Summary | Abstract + Central Thesis + Applied Context | Forces non-obvious insight extraction |
| II | Core Concepts | Concept Register table + Comparative Analysis | Requires specific "Common Pitfall" per concept |
| III | Visual Knowledge Graph | Auto-generated Mermaid diagram | Cyberpunk styling: #bc13fe stroke, #00f3ff text |
| IV | Technical Deep Dive | Code (CS) / LaTeX (Math) / Source Analysis (Humanities) | Auto-selects block type by subject classification |
| V | Annotated Glossary | Domain terms with etymology & related terms | Requires linguistic root for scientific terms |
| VI | Exam Preparation | 3-tier questions: Application β Analysis β Synthesis | Collapsible answers with reasoning chains |
| VII | Knowledge Connections | Dependencies, next topics, cross-domain links | Maps learning pathways |
| VIII | Quick Reference Card | Condensed cheat sheet: takeaways + formulas + traps | Pre-exam checklist |
| IX | Metacognitive Calibration | Confidence Meter (π΄π‘π’π΅) per concept | Personalized study prescriptions |
| X | Source Archive | Verbatim original input (read-only) | Audit trail for review |
StarryNote uses a triple-layer defense to guarantee clean output regardless of what the LLM generates:
graph LR
classDef default fill:#1a1a1a,stroke:#bc13fe,stroke-width:2px,color:#00f3ff
classDef highlight fill:#2a0a3a,stroke:#00f3ff,stroke-width:2px,color:#bc13fe
A["Raw LLM Output"] --> B["OutputCleaner<br/>Strip leaked instructions"]:::highlight
B --> C["MermaidFixer<br/>Fix diagrams + inject classDef"]:::highlight
C --> D["OutputValidator<br/>Check sections + warnings"]:::highlight
D --> E["Clean Study Guide"]
All rules are baked into the system prompt β the model is instructed to generate clean output from the start.
Even if the LLM ignores the rules, PostProcessor.process() auto-fixes the output:
| Fixer | What It Does |
|---|---|
| OutputCleaner | Strips <!-- AI INSTRUCTION -->, [[AI INSTRUCTION]], **RULES:**, unfilled {{PLACEHOLDERS}} |
| MermaidFixer | Replaces sequenceDiagram/mindmap/classDiagram β graph TD, injects cyberpunk classDef, removes ; and inline style |
| OutputValidator | Logs warnings for missing sections, missing mermaid, missing exam questions |
StarryFormatter.save_guide() runs the full PostProcessor pipeline again before writing to disk β the final safety net.
The AI follows 4 Core Directives defined in src/prompt_builder.py:
| Directive | Rule |
|---|---|
| AUTHORSHIP | Set Author to "S T A R R Y N O T E" |
| SYNTHESIS > SUMMARY | Create original examples, proofs, and diagrams β don't just repeat the input |
| FORMATTING | Follow the Master Template exactly, generate ALL 10 sections |
| ACADEMIC TONE | Scholarly, precise, no conversational filler |
Plus section-specific rules for each of the 10 sections, Mermaid rules with exact classDef values, and explicit output rules forbidding HTML comments and instruction markers.
StarryNote's TUI is built with Rich and follows a 4-phase pipeline design:
| Phase | Name | Visual Elements |
|---|---|---|
| β‘ 1 | Neural Initialization | Animated spinner while loading Gemma 3 into unified memory |
| π 2 | Deep Scan | Resource table with MIME icons (ππΌπππ¦), file sizes |
| π§ 3 | Knowledge Synthesis | Progress bar per file + overall, elapsed time, density rating |
| π 4 | Mission Report | Results table, summary panel, constellation footer |
Measures AI amplification β how much original content the AI generated relative to the input size:
| Rating | Ratio | Meaning |
|---|---|---|
| β¦ | < 1Γ | Minimal expansion |
| β¦β¦ | 1β2Γ | Moderate synthesis |
| β¦β¦β¦ | 3β4Γ | Strong synthesis |
| β¦β¦β¦β¦ | 5β7Γ | Deep synthesis |
| β¦β¦β¦β¦β¦ | 8Γ+ | Maximum amplification |
source .venv/bin/activate
pytest tests/ -v| File | Tests | What It Covers |
|---|---|---|
test_engine.py |
22 | Engine prompt building, MIME routing, token budget |
test_file_types.py |
92 | MimeClassifier (50+ MIME types), TextExtractor (all readers), routing (24 formats) |
test_postprocessor.py |
27 | MermaidFixer, OutputCleaner, OutputValidator, pipeline |
test_prompt_builder.py |
24 | All rules, Mermaid classDef, structural rules, table format rules |
test_template_loader.py |
14 | Template I/O, clean, compact, recovery mode |
test_template.py |
33 | Master template structure, sections, placeholders |
test_formatter.py |
15 | Save, naming, UTF-8, post-processing integration |
test_scanner.py |
22 | Resources, ScanResult, filtering, errors |
test_edge_cases.py |
19 | Symlinks, Unicode, nested dirs, realistic dirty output |
test_tui.py |
112 | Icons, sizing, density, starfield, glitch, matrix rain, waveform, orbital, neon pulse, gradient bar, design system |
test_model.py |
1 | GPU validation (requires Apple Silicon) |
test_universal_scanner.py |
1 | Integration smoke test |
| TOTAL | 382 | 100% pass rate |
GitHub Actions runs pytest tests/ on every push to main/master and on pull requests. See .github/workflows/main.yml.
β οΈ Note:test_model.pyrequires Apple Silicon with Metal GPU β it will skip in CI (Ubuntu runner).
| Document | Path | Description |
|---|---|---|
| Test Log | docs/TestLog.md |
Complete test execution results with all 196 tests |
| Traceability Matrix | docs/TraceabilityMatrix.md |
Maps 53 requirements β implementations β 196 tests |
| Function Explanations | docs/FunctionExplanations.md |
Detailed documentation of every class and method |
Change the model in src/model_engine.py:
engine = StarryEngine(model_path="google/gemma-3-4b-it") # Default
engine = StarryEngine(model_path="google/gemma-3-12b-it") # Larger (needs 16GB+ RAM)Adjust MAX_TOKENS in src/model_engine.py:
MAX_TOKENS = 8192 # Default β full 10-section guide
MAX_TOKENS = 12000 # Longer, more detailed guidesCustomize skip patterns in src/scanner.py:
scanner = StarryScanner(skip_patterns={
"Instructions", ".venv", "__pycache__", ".git",
".DS_Store", ".idea", "node_modules",
})- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Commit with clear messages:
git commit -m "feat: add X" - Push to your fork:
git push origin feature/my-feature - Open a Pull Request
black src/ main.py tests/pytest tests/ -v
# All 382 tests should passgraph LR
classDef default fill:#1a1a1a,stroke:#bc13fe,stroke-width:2px,color:#00f3ff
subgraph "AI Layer"
A["Gemma 3 4B-IT"] --> B["MLX Framework"]
B --> C["Metal GPU"]
end
subgraph "Processing Layer"
D["python-magic"] --> E["StarryScanner"]
F["PyMuPDF"] --> G["PDF Analyzer"]
H["Pillow"] --> I["Image Analyzer"]
end
subgraph "Safety Layer"
J["MermaidFixer"] --> K["PostProcessor"]
L["OutputCleaner"] --> K
M["OutputValidator"] --> K
end
subgraph "Presentation Layer"
N["Rich"] --> O["Cyberpunk TUI"]
P["Master Template"] --> Q["Markdown Output"]
end
E --> A
G --> A
I --> A
A --> P
A --> K
K --> Q
| Module | Classes | Responsibility |
|---|---|---|
scanner.py |
UniversalResource, ScanResult, StarryScanner |
DFS file discovery, MIME detection, skip filtering, stats |
template_loader.py |
TemplateLoader |
Template I/O, cleaning, compaction, recovery mode |
prompt_builder.py |
PromptBuilder |
System prompt with all rules (single source of truth) |
model_engine.py |
MimeClassifier, TextExtractor, StarryEngine |
MIME classification, universal file reading, LLM orchestration |
postprocessor.py |
MermaidFixer, OutputCleaner, OutputValidator, PostProcessor |
Output sanitization pipeline |
formatter.py |
StarryFormatter |
Post-process + save to disk + validation |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
S T A R R Y N O T E Β· Knowledge Architecture System Β· v2.1
Gemma 3 Β· Apple Silicon Β· MLX Β· 382 Tests Β· 12 Classes
Structured for clarity. Engineered for mastery. Calibrated for you.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Made with β¦ by Nikan Eidi