This document covers the public library API exported from
code-dna/lib. For CLI usage, see the README. For MCP server usage, see MCP.md.
npm install code-dnaRequires Node.js 20 or later.
Import from the library entry point:
import { analyze, formatMarkdown, formatYaml } from 'code-dna/lib';Run the full 4-layer code-dna analysis pipeline on a project directory.
async function analyze(
rootPath: string,
options: Partial<AnalysisOptions>,
): Promise<DnaOutput>Parameters:
| Parameter | Type | Description |
|---|---|---|
rootPath |
string |
Absolute or relative path to the project root to analyse. Resolved to an absolute path internally. |
options |
Partial<AnalysisOptions> |
Analysis configuration. All fields are optional; omitted fields use defaults or values from .codedna.yaml. |
Returns: Promise<DnaOutput> — A fully-populated DNA output object ready for formatting.
Example:
import { analyze } from 'code-dna/lib';
const dna = await analyze('/path/to/project', {
layers: [1, 2, 3, 4],
tokenBudget: 8000,
gitDepth: 500,
});
console.log(dna.metadata.projectName);
console.log(dna.metadata.totalFiles);Layer selection:
// Structural skeleton only (no git, no patterns, no risk)
const dna = await analyze('/path/to/project', { layers: [1] });
// Skip git archaeology
const dna = await analyze('/path/to/project', { layers: [1, 3, 4] });
// Language filter
const dna = await analyze('/path/to/project', {
layers: [1, 2, 3, 4],
languages: ['typescript', 'python'],
});Error handling:
Layer 2 (git) failures do not abort the pipeline. If git history cannot be read, the analysis continues with empty git results. Other errors reject the promise.
try {
const dna = await analyze('/path/to/project', {});
} catch (err) {
console.error('Analysis failed:', err.message);
}Format a DnaOutput as Markdown using an explicit token budget allocation.
function formatMarkdown(dna: DnaOutput, budget: SectionBudget): stringParameters:
| Parameter | Type | Description |
|---|---|---|
dna |
DnaOutput |
Fully-populated DNA output from analyze(). |
budget |
SectionBudget |
Token allocation per section. Use allocateTokenBudget() to compute this. |
Returns: string — UTF-8 Markdown document.
Example:
import { analyze, formatMarkdown, allocateTokenBudget } from 'code-dna/lib';
const dna = await analyze('/path/to/project', {});
const budget = allocateTokenBudget(
8000,
{ architecture: 15, moduleMap: 25, dependencies: 15, conventions: 15,
hotFiles: 10, riskSurface: 10, apiSurface: 5 },
{
architecture: dna.architecture.layers.length * 50 + 100,
moduleMap: (dna.skeleton.root.children?.length ?? 1) * 40,
dependencies: dna.dependencies.edges.length * 30,
conventions: 200,
hotFiles: dna.hotFiles.length * 60,
riskSurface: dna.riskSurface.length * 70,
apiSurface: dna.apiSurface.length * 50,
},
);
const markdown = formatMarkdown(dna, budget);Format a DnaOutput as YAML. Unlike Markdown, YAML output includes the full dataset with no token-based truncation.
function formatYaml(dna: DnaOutput): stringParameters:
| Parameter | Type | Description |
|---|---|---|
dna |
DnaOutput |
Fully-populated DNA output from analyze(). |
Returns: string — UTF-8 YAML document. Suitable for machine consumption or as input to diff.
Example:
import { analyze, formatYaml } from 'code-dna/lib';
import fs from 'node:fs';
const dna = await analyze('/path/to/project', {});
const yaml = formatYaml(dna);
fs.writeFileSync('CODEBASE-DNA.yaml', yaml, 'utf8');Compare two DnaOutput snapshots and return a structured diff.
function computeDnaDiff(oldDna: DnaOutput, newDna: DnaOutput): DnaDiffParameters:
| Parameter | Type | Description |
|---|---|---|
oldDna |
DnaOutput |
The baseline DNA snapshot. |
newDna |
DnaOutput |
The newer DNA snapshot to compare against the baseline. |
Returns: DnaDiff — Structured diff covering file changes, symbol changes, dependency changes, risk movements, and convention shifts.
Example:
import { analyze, computeDnaDiff, formatDnaDiff } from 'code-dna/lib';
const before = await analyze('/path/to/project', {});
// ... time passes, code changes ...
const after = await analyze('/path/to/project', {});
const diff = computeDnaDiff(before, after);
console.log(`Files added: ${diff.summary.filesAdded}`);
console.log(`Symbols removed: ${diff.summary.symbolsRemoved}`);Note: computeDnaDiff is exported from src/core/diff-engine.ts, not from src/lib.ts. Import it directly:
import { computeDnaDiff, formatDnaDiff } from 'code-dna/dist/core/diff-engine.js';Format a DnaDiff as a Markdown report suitable for human review or inclusion in a pull-request description.
function formatDnaDiff(diff: DnaDiff): stringParameters:
| Parameter | Type | Description |
|---|---|---|
diff |
DnaDiff |
Diff produced by computeDnaDiff(). |
Returns: string — UTF-8 Markdown diff report with sections for file changes, symbol changes, dependency changes, risk surface movements, and convention changes.
Example:
const diff = computeDnaDiff(before, after);
const report = formatDnaDiff(diff);
fs.writeFileSync('DNA-DIFF.md', report, 'utf8');Distribute a total token budget across DNA output sections, proportional to configured weights and estimated data sizes.
function allocateTokenBudget(
totalBudget: number,
weights: Record<string, number>,
dataSizes: Record<string, number>,
): SectionBudgetParameters:
| Parameter | Type | Description |
|---|---|---|
totalBudget |
number |
Total token count to distribute. |
weights |
Record<string, number> |
Relative weight per section (higher = more tokens). |
dataSizes |
Record<string, number> |
Estimated raw data size per section (used to avoid over-allocating to sparse sections). |
Returns: SectionBudget — Token allocation per section. Each value is at least 200 tokens and at most 40% of totalBudget.
Constraints:
- Minimum per section: 200 tokens
- Maximum per section: 40% of total budget
- Truncation order when budget is tight (lowest priority first):
apiSurface,hotFiles,riskSurface,dependencies,conventions,moduleMap,architecture
Example:
const budget = allocateTokenBudget(
8000,
{ architecture: 15, moduleMap: 25, dependencies: 15,
conventions: 15, hotFiles: 10, riskSurface: 10, apiSurface: 5 },
{ architecture: 300, moduleMap: 1200, dependencies: 600,
conventions: 200, hotFiles: 400, riskSurface: 350, apiSurface: 150 },
);
// budget.moduleMap will be larger than budget.apiSurfaceEstimate the token count of a string using a simple character-count heuristic (approximately 4 characters per token).
function estimateTokens(text: string): numberParameters:
| Parameter | Type | Description |
|---|---|---|
text |
string |
The text to estimate. |
Returns: number — Estimated token count.
Options passed to the top-level analyze() function.
interface AnalysisOptions {
path: string;
format: 'md' | 'yaml';
output?: string;
layers: number[];
languages?: string[];
tokenBudget: number;
gitDepth?: number;
scope?: string;
}| Field | Type | Default | Description |
|---|---|---|---|
path |
string |
— | Absolute or relative path to the project root. |
format |
'md' | 'yaml' |
'md' |
Output format. |
output |
string? |
undefined |
Destination file path. undefined means stdout. |
layers |
number[] |
[1, 2, 3, 4] |
Which pipeline layers to execute. |
languages |
string[]? |
all | Restrict to these language IDs, e.g. ['typescript', 'python']. |
tokenBudget |
number |
8000 |
Target token count for Markdown output. |
gitDepth |
number? |
1000 |
Maximum commits to traverse during git archaeology. |
scope |
string? |
undefined |
Subdirectory to scope the analysis to (relative to path). |
All fields are optional when passed to analyze(), which accepts Partial<AnalysisOptions>.
The fully-populated DNA output produced by the core engine after all layers have run.
interface DnaOutput {
metadata: GenomeHeader;
skeleton: ModuleMap;
dependencies: DependencyGraph;
conventions: ConventionReport;
hotFiles: HotFile[];
riskSurface: RiskEntry[];
apiSurface: ApiEntry[];
architecture: ArchitectureReport;
}| Field | Type | Description |
|---|---|---|
metadata |
GenomeHeader |
Project metadata and aggregate statistics. |
skeleton |
ModuleMap |
Tree-shaped module structure with symbols. |
dependencies |
DependencyGraph |
Import/export dependency graph. |
conventions |
ConventionReport |
Detected coding conventions. |
hotFiles |
HotFile[] |
Files with highest commit and author churn. |
riskSurface |
RiskEntry[] |
Risk-ranked files with scores and factor breakdowns. |
apiSurface |
ApiEntry[] |
Public API surface — exported symbols and HTTP endpoints. |
architecture |
ArchitectureReport |
Architectural style and framework detection. |
Project-level metadata at the top of the DNA output.
interface GenomeHeader {
projectName: string;
version: string;
generatedAt: string;
gitRemote?: string;
branch?: string;
commitSha?: string;
languages: LanguageStat[];
totalFiles: number;
totalLoc: number;
}| Field | Type | Description |
|---|---|---|
projectName |
string |
Project name from package.json or directory name. |
version |
string |
Semver version of the code-dna tool that produced this output. |
generatedAt |
string |
ISO 8601 timestamp. |
gitRemote |
string? |
Git remote URL if available. |
branch |
string? |
Active git branch at analysis time. |
commitSha |
string? |
HEAD commit SHA at analysis time. |
languages |
LanguageStat[] |
Language breakdown sorted by file count descending. |
totalFiles |
number |
Total source files analysed (unknown-language files excluded). |
totalLoc |
number |
Total lines of code across all analysed files. |
Per-language aggregate statistics.
interface LanguageStat {
id: string;
files: number;
loc: number;
percentage: number;
}| Field | Type | Description |
|---|---|---|
id |
string |
Language identifier, e.g. 'typescript', 'python'. |
files |
number |
Number of source files in this language. |
loc |
number |
Total lines of code in this language. |
percentage |
number |
Percentage of total project LOC (0–100). |
Entry point for the tree-shaped structural skeleton.
interface ModuleMap {
root: ModuleNode;
}A single node in the module tree — either a directory or a source file.
interface ModuleNode {
name: string;
path: string;
type: 'directory' | 'file';
role?: FileRole;
language?: string;
loc?: number;
symbols?: SymbolEntry[];
children?: ModuleNode[];
}| Field | Type | Description |
|---|---|---|
name |
string |
Base name of the file or directory. |
path |
string |
Full path relative to the project root. |
type |
'directory' | 'file' |
Node type. |
role |
FileRole? |
Inferred semantic role (file nodes only). |
language |
string? |
Programming language of the file. |
loc |
number? |
Lines of code (file nodes only). |
symbols |
SymbolEntry[]? |
Top-level exported symbols (file nodes only). |
children |
ModuleNode[]? |
Child nodes (directory nodes only). |
FileRole values: controller, route, handler, service, usecase, model, entity, schema, repository, dao, middleware, util, helper, lib, test, spec, config, env, migration, type, interface, component, page, layout, entry, main, unknown.
A named symbol extracted from a source file.
interface SymbolEntry {
name: string;
kind: 'function' | 'class' | 'interface' | 'type' | 'enum' | 'variable' | 'method';
exported: boolean;
signature?: string;
decorators?: string[];
loc: { start: number; end: number };
}| Field | Type | Description |
|---|---|---|
name |
string |
Identifier as written in source. |
kind |
string |
Category of the symbol. |
exported |
boolean |
Whether the symbol is exported from its module. |
signature |
string? |
Human-readable type signature, e.g. (req: Request) => void. |
decorators |
string[]? |
Decorator names applied to the symbol, e.g. ['Controller', 'Get']. |
loc |
{ start: number; end: number } |
Source line range (1-indexed). |
Import/export relationships between all source files.
interface DependencyGraph {
nodes: DependencyNode[];
edges: DependencyEdge[];
circularDeps: string[][];
}| Field | Type | Description |
|---|---|---|
nodes |
DependencyNode[] |
One node per file that participates in the import graph. |
edges |
DependencyEdge[] |
Directed import edges. |
circularDeps |
string[][] |
Groups of file paths that form circular dependency cycles. |
A single file in the dependency graph with degree metrics.
interface DependencyNode {
path: string;
inDegree: number;
outDegree: number;
}| Field | Type | Description |
|---|---|---|
path |
string |
File path relative to the project root. |
inDegree |
number |
Number of other files that import this file (fan-in). |
outDegree |
number |
Number of files this file imports (fan-out). |
A directed edge in the dependency graph.
interface DependencyEdge {
from: string;
to: string;
type: 'import' | 'reexport' | 'dynamic';
}| Field | Type | Description |
|---|---|---|
from |
string |
Importing file path. |
to |
string |
Imported module path (resolved to a project file where possible). |
type |
string |
How the dependency was introduced. |
Coding conventions observed across the project.
interface ConventionReport {
naming: NamingConvention;
fileOrganization: string;
importStyle: ImportStyle;
errorHandling: string;
testPattern: string;
exportStyle: string;
confidence: number;
}| Field | Type | Description |
|---|---|---|
naming |
NamingConvention |
Identifier and file naming conventions. |
fileOrganization |
string |
One of: 'by-feature', 'by-layer', 'by-type', 'hybrid'. |
importStyle |
ImportStyle |
Import ordering and path style. |
errorHandling |
string |
One of: 'try-catch', 'result-type', 'callback', 'mixed'. |
testPattern |
string |
One of: 'co-located', 'mirror-directory', 'flat', 'mixed'. |
exportStyle |
string |
One of: 'named', 'default', 'barrel', 'mixed'. |
confidence |
number |
Confidence score for the detected conventions (0–1). |
NamingConvention fields: files, functions, classes, variables, constants — each a human-readable style name (e.g. 'camelCase', 'kebab-case', 'PascalCase', 'snake_case').
ImportStyle fields: ordering (one of 'external-first', 'internal-first', 'alphabetical', 'unordered'), pathStyle (one of 'relative', 'absolute', 'alias', 'mixed').
A file identified as a churn hotspot via git archaeology.
interface HotFile {
path: string;
commitCount: number;
authorCount: number;
lastModified: string;
reason: string;
}| Field | Type | Description |
|---|---|---|
path |
string |
File path relative to the project root. |
commitCount |
number |
Total commits touching this file. |
authorCount |
number |
Number of distinct authors. |
lastModified |
string |
ISO 8601 timestamp of the most recent commit. |
reason |
string |
Human-readable explanation of why this file is considered hot. |
A file ranked by its composite risk score.
interface RiskEntry {
path: string;
score: number;
factors: RiskFactor[];
}| Field | Type | Description |
|---|---|---|
path |
string |
File path relative to the project root. |
score |
number |
Composite risk score (0–100). Higher is riskier to change. |
factors |
RiskFactor[] |
Individual factors contributing to the score. |
A single dimension of risk for a file.
interface RiskFactor {
type: 'high-centrality' | 'low-test-coverage' | 'high-churn' | 'many-authors' | 'high-complexity';
value: number;
description: string;
}| Field | Type | Description |
|---|---|---|
type |
string |
Which risk dimension this factor measures. |
value |
number |
Normalised numeric value for this factor. |
description |
string |
Human-readable explanation. |
A single entry on the public API surface.
interface ApiEntry {
path: string;
symbol: string;
kind: 'endpoint' | 'export' | 'public-method';
signature: string;
httpMethod?: string;
route?: string;
}| Field | Type | Description |
|---|---|---|
path |
string |
File declaring this API entry. |
symbol |
string |
Symbol name as declared in source. |
kind |
string |
Category of API entry. |
signature |
string |
Type signature of the symbol. |
httpMethod |
string? |
HTTP method for endpoint entries (e.g. 'GET', 'POST'). |
route |
string? |
URL route pattern for endpoint entries (e.g. '/users/:id'). |
High-level architectural characterisation of the project.
interface ArchitectureReport {
style: string;
framework: FrameworkInfo;
layers: LayerInfo[];
confidence: number;
}| Field | Type | Description |
|---|---|---|
style |
string |
One of: 'mvc', 'hexagonal', 'layered', 'event-driven', 'monolith', 'unknown'. |
framework |
FrameworkInfo |
Detected framework and detection evidence. |
layers |
LayerInfo[] |
Logical layers found in the project. |
confidence |
number |
Confidence score for the architecture detection (0–1). |
FrameworkInfo fields: name (e.g. 'nextjs', 'express', 'fastapi', 'spring-boot', 'nestjs'), version?, markers: string[] (evidence files/packages).
LayerInfo fields: name, directories: string[], fileCount: number, role: string.
The full structural diff between two DnaOutput snapshots, produced by computeDnaDiff().
interface DnaDiff {
summary: DnaDiffSummary;
fileChanges: {
added: string[];
removed: string[];
modified: string[];
};
symbolChanges: {
added: Array<{ name: string; kind: string; file: string; line: number }>;
removed: Array<{ name: string; kind: string; file: string }>;
};
dependencyChanges: {
added: string[];
removed: string[];
};
riskChanges: RiskChange[];
conventionChanges: string[];
}DnaDiffSummary fields: filesAdded, filesRemoved, filesModified, symbolsAdded, symbolsRemoved, riskIncreased, riskDecreased — all number.
RiskChange fields: file: string, oldScore: number, newScore: number, direction: 'increased' | 'decreased'.
Token allocation for each DNA output section.
interface SectionBudget {
architecture: number;
moduleMap: number;
dependencies: number;
conventions: number;
hotFiles: number;
riskSurface: number;
apiSurface: number;
}All values are positive integers representing allocated token counts. Produced by allocateTokenBudget() and consumed by formatMarkdown().