THE NEXT 80 SUBJECTS

Ranked by the gap between reputation and substrate truth.

Every repo below was selected on the same axiom: the more trust a system has accumulated, the more interesting its confession.

Selection Logic

The first 20 subjects mapped the obvious terrain — the tools everyone uses and nobody reads. The next 80 go deeper: into the protocols beneath the protocols, the agent frameworks being deployed before anyone understands them, the crypto contracts holding billions with 500 lines of Solidity, and the compilers that compile the compilers.

The forensic lens does not change. The targets get harder.

Category 01 — AI Agents & Orchestration (10)

The fastest-moving category in software. Ship velocity and security rigor are inversely correlated. These repos are the blast radius.

#	Repo	Stars	Forensic Angle
21	microsoft/autogen	~35k	Multi-agent trust boundaries — when an agent delegates to another agent, who validates the output?
22	crewAIInc/crewAI	~27k	Role-based agent permissions with no enforcement substrate beneath the abstraction.
23	microsoft/semantic-kernel	~23k	Microsoft's enterprise AI glue layer — the attack surface of every Fortune 500 AI integration.
24	OpenBMB/ChatDev	~26k	Agents writing code that gets executed — the supply chain attack writes itself.
25	joaomdmoura/crewAI-examples	~5k	Examples become templates become production. The assumption baked into the example is the assumption baked into the system.
26	pydantic/pydantic-ai	~8k	Validation as the last line of defense in agent pipelines — what happens when the schema is wrong?
27	BerriAI/litellm	~17k	Universal LLM proxy — a single abstraction layer routing secrets and prompts to every major provider.
28	letta-ai/letta	~14k	Stateful memory agents — persistent state is persistent attack surface.
29	composiohq/composio	~12k	Tool-use substrate for agents — the integration layer that touches OAuth tokens, APIs, and filesystems.
30	e2b-dev/e2b	~7k	Code execution sandboxes for AI agents — the boundary between the model's suggestion and the host OS.

Category 02 — LLM Infrastructure & Serving (8)

The metal beneath the model. These repos decide whether a prompt becomes a response or a production incident.

#	Repo	Stars	Forensic Angle
31	ggerganov/llama.cpp	~73k	C++ inference at the edge — memory management, no garbage collector, quantization math nobody re-derives.
32	microsoft/DeepSpeed	~37k	Distributed training orchestration — the assumptions baked into ZeRO that every large model inherits.
33	ray-project/ray	~35k	Distributed compute substrate — the scheduling primitives beneath half of ML infra.
34	openai/triton	~14k	GPU kernel compiler — the layer between PyTorch and CUDA that nobody audits because nobody understands it.
35	lm-sys/FastChat	~37k	The backbone of the LMSYS Chatbot Arena — how benchmarks get gamed starts here.
36	skypilot-org/skypilot	~8k	Multi-cloud LLM job orchestration — the IAM credentials are the target.
37	guidance-ai/guidance	~19k	Structured generation control — constrained decoding as a new attack surface category.
38	unslothai/unsloth	~25k	Fine-tuning acceleration — the assumption that the dataset is clean is load-bearing.

Category 03 — Security: Offensive (5)

Five more tools that made offense cheap. The democratization of capability is a forensic event.

#	Repo	Stars	Forensic Angle
39	bettercap/bettercap	~17k	Network attack Swiss Army knife — what the codebase reveals about the author's threat model.
40	tcpdump/tcpdump	~2.5k	40 years of packet parsing — the CVE history is a geological record of assumptions.
41	gentilkiwi/mimikatz	~19k	The credential extractor that redefined red teaming — every Windows security assumption in one binary.
42	pwndbg/pwndbg	~8k	Exploit development environment — the tools that write the exploits have their own attack surface.
43	hashcat/hashcat	~22k	Password recovery at GPU scale — the assumptions about entropy that the codebase makes explicit.

Category 04 — Security: Defensive (5)

The blue team's substrate. The tools that are supposed to catch everything.

#	Repo	Stars	Forensic Angle
44	hashicorp/vault	~31k	The secrets manager that holds the keys to every other system — its own key management assumptions are the confession.
45	getsentry/sentry	~39k	Error monitoring at scale — the irony of a system that catches other systems' errors having its own.
46	falcosecurity/falco	~7k	Runtime security via eBPF — the kernel observer that can itself be observed.
47	aquasecurity/trivy	~24k	Vulnerability scanning — the scanner's own dependency chain is the attack vector.
48	openssl/openssl	~25k	The cryptographic substrate of the internet — Heartbleed is the ghost that never leaves this repo.

Category 05 — Databases & Storage (8)

Where the data actually lives. The graveyard of every access control assumption.

#	Repo	Stars	Forensic Angle
49	redis/redis	~67k	In-memory by design, persistent by accident — the gap between intended and actual durability guarantees.
50	postgres/postgres	~16k	35 years of SQL — the execution planner's assumptions are load-bearing for every ORM built on top.
51	cockroachdb/cockroach	~30k	Distributed SQL with serializable isolation claims — the CAP theorem confession is always in the edge cases.
52	clickhouse/ClickHouse	~38k	Columnar analytics at scale — the performance claims are the attack surface.
53	qdrant/qdrant	~21k	Vector database — the HNSW index approximation assumptions every RAG system inherits.
54	chroma-core/chroma	~16k	The default vector DB for LangChain demos becoming production systems — default configs as permanent state.
55	apache/cassandra	~8.5k	Eventual consistency by design — the "eventually" is the ghost in every distributed system built on it.
56	milvus-io/milvus	~32k	Enterprise vector DB — the dependency chain between Go, Python, and C++ is the fault line.

Category 06 — Developer Toolchain (8)

The tools that build the tools that build the internet. Meta-layer forensics.

#	Repo	Stars	Forensic Angle
57	microsoft/TypeScript	~101k	The type system that 40% of npm depends on — what the compiler assumes about soundness.
58	vitejs/vite	~68k	The build tool that replaced webpack by being fast — speed and correctness are in tension, and the debt is in Rollup interop.
59	oven-sh/bun	~74k	JavaScript runtime rewritten in Zig — compatibility claims vs. substrate reality.
60	denoland/deno	~100k	Node's security model, inverted — the permission system is the thesis, the escape hatches are the confession.
61	webpack/webpack	~65k	10 years of bundler assumptions — the module resolution logic is a fossil record.
62	rust-lang/rust	~99k	Memory safety via ownership — the unsafe blocks are the geological record of where the model breaks down.
63	llvm/llvm-project	~29k	The compiler that compiles the compilers — IR optimizations as a source of undefined behavior archaeology.
64	docker/buildx	~3.5k	Multi-platform build orchestration — the cache poisoning surface nobody thinks about.

Category 07 — Web Frontend (7)

The browser-facing layer. Where billions of users meet billions of lines of assumption.

#	Repo	Stars	Forensic Angle
65	vuejs/core	~47k	Reactivity via Proxy — the difference between the mental model and the actual scheduler is the bug report backlog.
66	sveltejs/svelte	~80k	Compile-time reactivity — the compiler output is what ships, and most developers never read it.
67	angular/angular	~96k	Zone.js as the change detection substrate — the performance assumptions baked into enterprise Angular apps.
68	trpc/trpc	~35k	Type-safe APIs — the gap between TypeScript types and runtime validation.
69	prisma/prisma	~40k	ORM that generates SQL — the query planner does not know what Prisma promised.
70	shadcn-ui/ui	~78k	Copy-paste component library — the trust model is "you own the code," which means you own the debt.
71	tailwindlabs/tailwindcss	~84k	Utility-first CSS — the PostCSS plugin chain is the actual execution environment nobody reads.

Category 08 — Blockchain & Crypto (7)

Immutable systems where the ghost cannot be patched. The confession is permanent.

#	Repo	Stars	Forensic Angle
72	ethereum/go-ethereum	~47k	The reference EVM implementation — every EIP's assumption becomes consensus law.
73	bitcoin/bitcoin	~80k	15 years of hardened C++ — the script interpreter assumptions are load-bearing for $1T of value.
74	solana-labs/solana	~13k	High-throughput consensus — the performance is the design, and the design is the attack surface.
75	OpenZeppelin/openzeppelin-contracts	~25k	The standard library that every smart contract inherits — one assumption flaw, infinite blast radius.
76	Uniswap/v3-core	~4.5k	Concentrated liquidity AMM — the math is correct, the oracle manipulation assumptions are not.
77	foundry-rs/foundry	~8k	Smart contract testing framework — the testing tool's assumptions become the contract's untested assumptions.
78	gakonst/ethers-rs	~3k	Rust Ethereum primitives — the type system encoding of EVM semantics vs. actual EVM behavior.

Category 09 — Communication & Protocol (5)

The channels. Where data moves and identity claims propagate.

#	Repo	Stars	Forensic Angle
79	matrix-org/synapse	~12k	Federated messaging — the trust model between homeservers is the attack surface for the entire network.
80	signalapp/Signal-Android	~26k	The gold standard for encrypted messaging — the gap between protocol correctness and implementation correctness.
81	nicowillis/ntfy	~19k	Push notification server self-hosted — authentication as an afterthought in the default config.
82	caddyserver/caddy	~58k	Automatic HTTPS web server — the certificate management assumptions, the JSON config attack surface.
83	cloudflare/quiche	~9.5k	QUIC and HTTP/3 implementation — the protocol that replaces TCP has its own assumption inventory.

Category 10 — Systems & Low-Level (7)

The bedrock. The assumptions here propagate upward through every layer.

#	Repo	Stars	Forensic Angle
84	torvalds/linux	~185k	The kernel — not the whole thing, but the eBPF verifier and the scheduler. Two subsystems that touch everything.
85	moby/moby	~68k	Docker's engine — the namespace and cgroup isolation primitives that container security rests on.
86	containerd/containerd	~17k	The container runtime beneath Kubernetes — the OCI spec assumptions made concrete.
87	cilium/cilium	~20k	eBPF-based networking — kernel-level packet processing with no garbage collector and all of the consequences.
88	WebAssembly/wabt	~7k	The WebAssembly binary toolkit — the spec interpreter is what browser implementations diverge from.
89	bytecodealliance/wasmtime	~15k	Production Wasm runtime — the sandbox escape surface is the spec gap.
90	openzfs/zfs	~10k	Copy-on-write filesystem — the data integrity guarantees and the edge cases where they don't hold.

Category 11 — Data & Analytics (5)

The pipelines. Where raw events become decisions. The transformation is the vulnerability.

#	Repo	Stars	Forensic Angle
91	apache/kafka	~28k	Distributed log — the offset management assumptions every stream processing system inherits.
92	apache/airflow	~37k	Workflow orchestration — DAG serialization, the pickle attack surface, the default credential storage.
93	apache/spark	~39k	Distributed data processing — the RDD lineage assumptions and what happens when they're wrong at scale.
94	dbt-labs/dbt-core	~10k	SQL transformation layer — the assumption that your warehouse's SQL dialect matches dbt's model.
95	great-expectations/great_expectations	~10k	Data quality validation — the system that validates data has no external validator.

Category 12 — Cloud Native & GitOps (5)

The operators. The systems that manage systems. Recursive attack surfaces.

#	Repo	Stars	Forensic Angle
96	argoproj/argo-cd	~18k	GitOps delivery — git as the source of truth, and the RBAC model that wraps it.
97	fluxcd/flux2	~6.5k	Continuous delivery operator — the reconciliation loop assumptions and what happens when state diverges.
98	istio/istio	~36k	Service mesh — mTLS as an assumption, the Envoy sidecar as the enforcement point, the config as the attack surface.
99	open-telemetry/opentelemetry-collector	~4.5k	Observability pipeline — the system that sees everything is itself invisible to most security reviews.
100	prometheus/prometheus	~55k	The monitoring system that became the default — the TSDB compaction assumptions, the scrape model's trust boundary.

The forensic lens does not change with the target. Every codebase has a confession. The only variable is how long it takes to ask the right question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

THE NEXT 80 SUBJECTS

Selection Logic

Category 01 — AI Agents & Orchestration (10)

Category 02 — LLM Infrastructure & Serving (8)

Category 03 — Security: Offensive (5)

Category 04 — Security: Defensive (5)

Category 05 — Databases & Storage (8)

Category 06 — Developer Toolchain (8)

Category 07 — Web Frontend (7)

Category 08 — Blockchain & Crypto (7)

Category 09 — Communication & Protocol (5)

Category 10 — Systems & Low-Level (7)

Category 11 — Data & Analytics (5)

Category 12 — Cloud Native & GitOps (5)

FilesExpand file tree

TARGETS.md

Latest commit

History

TARGETS.md

File metadata and controls

THE NEXT 80 SUBJECTS

Selection Logic

Category 01 — AI Agents & Orchestration (10)

Category 02 — LLM Infrastructure & Serving (8)

Category 03 — Security: Offensive (5)

Category 04 — Security: Defensive (5)

Category 05 — Databases & Storage (8)

Category 06 — Developer Toolchain (8)

Category 07 — Web Frontend (7)

Category 08 — Blockchain & Crypto (7)

Category 09 — Communication & Protocol (5)

Category 10 — Systems & Low-Level (7)

Category 11 — Data & Analytics (5)

Category 12 — Cloud Native & GitOps (5)