Skip to content

Latest commit

 

History

History
159 lines (129 loc) · 6.21 KB

File metadata and controls

159 lines (129 loc) · 6.21 KB

Test Fixtures for Multi-Language Verification

Test fixture repositories in ~/dev/test-fixtures/ for verifying semfora-engine across multiple programming languages. All repos cloned with --depth 1.

Available Fixtures

Rust: tokio (tokio-rs/tokio)

  • Location: ~/dev/test-fixtures/tokio
  • Size on disk: 9.5 MB
  • Index Results:
    • Files found: 808 | Processed: 808 | Errors: 0
    • Modules: 121 | Symbols: 7,857
  • Exercises: Async patterns, trait bounds, macro expansions, unsafe code blocks, complex call graphs

TypeScript/JavaScript: ts-vscode (microsoft/vscode)

  • Location: ~/dev/test-fixtures/ts-vscode
  • Size on disk: 206 MB
  • Index Results:
    • Files found: 8,288 | Processed: 8,283 | Errors: 5 (99.94% success)
    • Modules: 1,926 | Symbols: 89,692
  • Exercises: TypeScript types, large TS codebase, extension architecture, editor patterns

TypeScript/JavaScript: next.js (vercel/next.js)

  • Location: ~/dev/test-fixtures/next.js
  • Size on disk: 334 MB
  • Index Results:
    • Files found: 23,453 | Processed: 23,452 | Errors: 1 (99.996% success)
    • Modules: 10,647 | Symbols: 91,708
  • Exercises: Monorepo structure, JSX/TSX, ES modules, React patterns, mixed JS/TS

Go: kubernetes (kubernetes/kubernetes)

  • Location: ~/dev/test-fixtures/kubernetes
  • Size on disk: 384 MB
  • Index Results:
    • Files found: 23,670 | Processed: 23,670 | Errors: 0
    • Modules: 4,307 | Symbols: 296,919
  • Exercises: Go interfaces, struct embedding, goroutines, large-scale package organization

C: c-linux (torvalds/linux)

  • Location: ~/dev/test-fixtures/c-linux
  • Size on disk: 2.0 GB
  • Index Results:
    • Files found: 72,518 | Processed: 72,518 | Errors: 0
    • Modules: 0 | Symbols: 4,463,505
  • Exercises: C macros, header dependencies, kernel patterns, massive scale
  • Note: Indexing takes 40+ minutes and uses ~16 GB RAM. Modules count is 0 because C files lack a module system — symbols are extracted at file level.

C/C++: llvm-project (llvm/llvm-project)

  • Location: ~/dev/test-fixtures/llvm-project
  • Size on disk: 2.7 GB
  • Index Results:
    • Files found: 72,334 | Processed: 72,320 | Errors: 14 (99.98% success)
    • Modules: 8,273 | Symbols: 1,175,855
  • Exercises: C++ templates, namespaces, header-heavy patterns, compiler infrastructure
  • Note: Indexing takes 15-20 minutes and uses ~5 GB RAM. 14 errors likely from unusual preprocessor constructs.

Java: spring-boot (spring-projects/spring-boot)

  • Location: ~/dev/test-fixtures/spring-boot
  • Size on disk: 110 MB
  • Index Results:
    • Files found: 5,365 | Processed: 5,365 | Errors: 0
    • Modules: 1,683 | Symbols: 42,674
  • Exercises: Java annotations, generics, inheritance hierarchies, Maven/Gradle patterns, dependency injection

Ruby: rails (rails/rails)

  • Location: ~/dev/test-fixtures/rails
  • Size on disk: 61 MB
  • Index Results:
    • Files found: 472 | Processed: 472 | Errors: 0
    • Modules: 117 | Symbols: 966
  • Exercises: Ruby metaprogramming, DSLs, dynamic method generation, module mixins

Mixed: mixed-nickel-rs (nickel-org/nickel.rs)

  • Location: ~/dev/test-fixtures/mixed-nickel-rs
  • Size on disk: 880 KB
  • Index Results:
    • Files found: 82 | Processed: 82 | Errors: 0
    • Modules: 17 | Symbols: 359
  • Exercises: Small multi-language project, quick smoke tests, Rust web framework patterns

Quick Reference

Repo Language Files Symbols Errors Disk
mixed-nickel-rs Rust (mixed) 82 359 0 880 KB
rails Ruby 472 966 0 61 MB
tokio Rust 808 7,857 0 9.5 MB
spring-boot Java 5,365 42,674 0 110 MB
ts-vscode TypeScript 8,288 89,692 5 206 MB
next.js TypeScript/JS 23,453 91,708 1 334 MB
kubernetes Go 23,670 296,919 0 384 MB
c-linux C 72,518 4,463,505 0 2.0 GB
llvm-project C/C++ 72,334 1,175,855 14 2.7 GB

Total disk usage: ~5.8 GB (shallow clones)

Indexing Errors Summary

Across all 9 repos, semfora-engine encountered 20 errors out of 230,990 files (99.99% success rate):

  • next.js: 1 error (23,453 files)
  • ts-vscode: 5 errors (8,288 files)
  • llvm-project: 14 errors (72,334 files)
  • All other repos: 0 errors

No crashes or panics observed during any indexing run.

Usage Guidelines

Quick Smoke Tests

For rapid verification, use the smallest repos:

  • mixed-nickel-rs (82 files) — Rust, under 1 second
  • rails (472 files) — Ruby
  • tokio (808 files) — Rust

Standard Cross-Language Check

Test changes against at least 2-3 repos from different language families:

  • tokio (Rust) + kubernetes (Go) + spring-boot (Java) — good default set

Stress Testing

For performance and scale verification:

  • kubernetes (296K symbols) — large Go project
  • c-linux (4.4M symbols) — extreme scale, 40+ min indexing, ~16 GB RAM
  • llvm-project (1.1M symbols) — large C/C++, 15-20 min indexing, ~5 GB RAM

Regenerating Indexes

Indexes are stored in ~/.cache/semfora/ (hashed by project path), not inside the repos.

cd ~/dev/test-fixtures/<repo-name>
semfora-engine index generate

Adding New Fixtures

  1. Clone to ~/dev/test-fixtures/ with --depth 1
  2. Run semfora-engine index generate and capture output
  3. Update this file with actual stats from the indexing run

Language Coverage

Language Repos Status
Rust tokio, mixed-nickel-rs ✅ Indexed, 0 errors
TypeScript/JS ts-vscode, next.js ✅ Indexed, 6 errors total
Go kubernetes ✅ Indexed, 0 errors
C c-linux ✅ Indexed, 0 errors
C/C++ llvm-project ✅ Indexed, 14 errors
Java spring-boot ✅ Indexed, 0 errors
Ruby rails ✅ Indexed, 0 errors
Python (use existing pytorch fixture)

Verification Checklist

Before submitting PRs, developers should verify:

  • Changes tested against at least 2-3 repos from different language families
  • No new indexing errors introduced
  • Symbol counts remain stable (±1% acceptable variance)
  • No crashes or panics during indexing