code-dna Programmatic API Reference

This document covers the public library API exported from code-dna/lib. For CLI usage, see the README. For MCP server usage, see MCP.md.

Installation
Functions
Types

Installation

npm install code-dna

Requires Node.js 20 or later.

Import from the library entry point:

import { analyze, formatMarkdown, formatYaml } from 'code-dna/lib';

Functions

analyze

Run the full 4-layer code-dna analysis pipeline on a project directory.

async function analyze(
  rootPath: string,
  options: Partial<AnalysisOptions>,
): Promise<DnaOutput>

Parameters:

Parameter	Type	Description
`rootPath`	`string`	Absolute or relative path to the project root to analyse. Resolved to an absolute path internally.
`options`	`Partial<AnalysisOptions>`	Analysis configuration. All fields are optional; omitted fields use defaults or values from `.codedna.yaml`.

Returns: Promise<DnaOutput> — A fully-populated DNA output object ready for formatting.

Example:

import { analyze } from 'code-dna/lib';

const dna = await analyze('/path/to/project', {
  layers: [1, 2, 3, 4],
  tokenBudget: 8000,
  gitDepth: 500,
});

console.log(dna.metadata.projectName);
console.log(dna.metadata.totalFiles);

Layer selection:

// Structural skeleton only (no git, no patterns, no risk)
const dna = await analyze('/path/to/project', { layers: [1] });

// Skip git archaeology
const dna = await analyze('/path/to/project', { layers: [1, 3, 4] });

// Language filter
const dna = await analyze('/path/to/project', {
  layers: [1, 2, 3, 4],
  languages: ['typescript', 'python'],
});

Error handling:

Layer 2 (git) failures do not abort the pipeline. If git history cannot be read, the analysis continues with empty git results. Other errors reject the promise.

try {
  const dna = await analyze('/path/to/project', {});
} catch (err) {
  console.error('Analysis failed:', err.message);
}

formatMarkdown

Format a DnaOutput as Markdown using an explicit token budget allocation.

function formatMarkdown(dna: DnaOutput, budget: SectionBudget): string

Parameters:

Parameter	Type	Description
`dna`	`DnaOutput`	Fully-populated DNA output from `analyze()`.
`budget`	`SectionBudget`	Token allocation per section. Use `allocateTokenBudget()` to compute this.

Returns: string — UTF-8 Markdown document.

Example:

import { analyze, formatMarkdown, allocateTokenBudget } from 'code-dna/lib';

const dna = await analyze('/path/to/project', {});

const budget = allocateTokenBudget(
  8000,
  { architecture: 15, moduleMap: 25, dependencies: 15, conventions: 15,
    hotFiles: 10, riskSurface: 10, apiSurface: 5 },
  {
    architecture: dna.architecture.layers.length * 50 + 100,
    moduleMap: (dna.skeleton.root.children?.length ?? 1) * 40,
    dependencies: dna.dependencies.edges.length * 30,
    conventions: 200,
    hotFiles: dna.hotFiles.length * 60,
    riskSurface: dna.riskSurface.length * 70,
    apiSurface: dna.apiSurface.length * 50,
  },
);

const markdown = formatMarkdown(dna, budget);

formatYaml

Format a DnaOutput as YAML. Unlike Markdown, YAML output includes the full dataset with no token-based truncation.

function formatYaml(dna: DnaOutput): string

Parameters:

Parameter	Type	Description
`dna`	`DnaOutput`	Fully-populated DNA output from `analyze()`.

Returns: string — UTF-8 YAML document. Suitable for machine consumption or as input to diff.

Example:

import { analyze, formatYaml } from 'code-dna/lib';
import fs from 'node:fs';

const dna = await analyze('/path/to/project', {});
const yaml = formatYaml(dna);
fs.writeFileSync('CODEBASE-DNA.yaml', yaml, 'utf8');

computeDnaDiff

Compare two DnaOutput snapshots and return a structured diff.

function computeDnaDiff(oldDna: DnaOutput, newDna: DnaOutput): DnaDiff

Parameters:

Parameter	Type	Description
`oldDna`	`DnaOutput`	The baseline DNA snapshot.
`newDna`	`DnaOutput`	The newer DNA snapshot to compare against the baseline.

Returns: DnaDiff — Structured diff covering file changes, symbol changes, dependency changes, risk movements, and convention shifts.

Example:

import { analyze, computeDnaDiff, formatDnaDiff } from 'code-dna/lib';

const before = await analyze('/path/to/project', {});
// ... time passes, code changes ...
const after = await analyze('/path/to/project', {});

const diff = computeDnaDiff(before, after);
console.log(`Files added: ${diff.summary.filesAdded}`);
console.log(`Symbols removed: ${diff.summary.symbolsRemoved}`);

Note: computeDnaDiff is exported from src/core/diff-engine.ts, not from src/lib.ts. Import it directly:

import { computeDnaDiff, formatDnaDiff } from 'code-dna/dist/core/diff-engine.js';

formatDnaDiff

Format a DnaDiff as a Markdown report suitable for human review or inclusion in a pull-request description.

function formatDnaDiff(diff: DnaDiff): string

Parameters:

Parameter	Type	Description
`diff`	`DnaDiff`	Diff produced by `computeDnaDiff()`.

Returns: string — UTF-8 Markdown diff report with sections for file changes, symbol changes, dependency changes, risk surface movements, and convention changes.

Example:

const diff = computeDnaDiff(before, after);
const report = formatDnaDiff(diff);
fs.writeFileSync('DNA-DIFF.md', report, 'utf8');

allocateTokenBudget

Distribute a total token budget across DNA output sections, proportional to configured weights and estimated data sizes.

function allocateTokenBudget(
  totalBudget: number,
  weights: Record<string, number>,
  dataSizes: Record<string, number>,
): SectionBudget

Parameters:

Parameter	Type	Description
`totalBudget`	`number`	Total token count to distribute.
`weights`	`Record<string, number>`	Relative weight per section (higher = more tokens).
`dataSizes`	`Record<string, number>`	Estimated raw data size per section (used to avoid over-allocating to sparse sections).

Returns: SectionBudget — Token allocation per section. Each value is at least 200 tokens and at most 40% of totalBudget.

Constraints:

Minimum per section: 200 tokens
Maximum per section: 40% of total budget
Truncation order when budget is tight (lowest priority first): apiSurface, hotFiles, riskSurface, dependencies, conventions, moduleMap, architecture

Example:

const budget = allocateTokenBudget(
  8000,
  { architecture: 15, moduleMap: 25, dependencies: 15,
    conventions: 15, hotFiles: 10, riskSurface: 10, apiSurface: 5 },
  { architecture: 300, moduleMap: 1200, dependencies: 600,
    conventions: 200, hotFiles: 400, riskSurface: 350, apiSurface: 150 },
);
// budget.moduleMap will be larger than budget.apiSurface

estimateTokens

Estimate the token count of a string using a simple character-count heuristic (approximately 4 characters per token).

function estimateTokens(text: string): number

Parameters:

Parameter	Type	Description
`text`	`string`	The text to estimate.

Returns: number — Estimated token count.

Types

AnalysisOptions

Options passed to the top-level analyze() function.

interface AnalysisOptions {
  path: string;
  format: 'md' | 'yaml';
  output?: string;
  layers: number[];
  languages?: string[];
  tokenBudget: number;
  gitDepth?: number;
  scope?: string;
}

Field	Type	Default	Description
`path`	`string`	—	Absolute or relative path to the project root.
`format`	`'md' \| 'yaml'`	`'md'`	Output format.
`output`	`string?`	`undefined`	Destination file path. `undefined` means stdout.
`layers`	`number[]`	`[1, 2, 3, 4]`	Which pipeline layers to execute.
`languages`	`string[]?`	all	Restrict to these language IDs, e.g. `['typescript', 'python']`.
`tokenBudget`	`number`	`8000`	Target token count for Markdown output.
`gitDepth`	`number?`	`1000`	Maximum commits to traverse during git archaeology.
`scope`	`string?`	`undefined`	Subdirectory to scope the analysis to (relative to `path`).

All fields are optional when passed to analyze(), which accepts Partial<AnalysisOptions>.

DnaOutput

The fully-populated DNA output produced by the core engine after all layers have run.

interface DnaOutput {
  metadata: GenomeHeader;
  skeleton: ModuleMap;
  dependencies: DependencyGraph;
  conventions: ConventionReport;
  hotFiles: HotFile[];
  riskSurface: RiskEntry[];
  apiSurface: ApiEntry[];
  architecture: ArchitectureReport;
}

Field	Type	Description
`metadata`	`GenomeHeader`	Project metadata and aggregate statistics.
`skeleton`	`ModuleMap`	Tree-shaped module structure with symbols.
`dependencies`	`DependencyGraph`	Import/export dependency graph.
`conventions`	`ConventionReport`	Detected coding conventions.
`hotFiles`	`HotFile[]`	Files with highest commit and author churn.
`riskSurface`	`RiskEntry[]`	Risk-ranked files with scores and factor breakdowns.
`apiSurface`	`ApiEntry[]`	Public API surface — exported symbols and HTTP endpoints.
`architecture`	`ArchitectureReport`	Architectural style and framework detection.

GenomeHeader

Project-level metadata at the top of the DNA output.

interface GenomeHeader {
  projectName: string;
  version: string;
  generatedAt: string;
  gitRemote?: string;
  branch?: string;
  commitSha?: string;
  languages: LanguageStat[];
  totalFiles: number;
  totalLoc: number;
}

Field	Type	Description
`projectName`	`string`	Project name from `package.json` or directory name.
`version`	`string`	Semver version of the code-dna tool that produced this output.
`generatedAt`	`string`	ISO 8601 timestamp.
`gitRemote`	`string?`	Git remote URL if available.
`branch`	`string?`	Active git branch at analysis time.
`commitSha`	`string?`	HEAD commit SHA at analysis time.
`languages`	`LanguageStat[]`	Language breakdown sorted by file count descending.
`totalFiles`	`number`	Total source files analysed (unknown-language files excluded).
`totalLoc`	`number`	Total lines of code across all analysed files.

LanguageStat

Per-language aggregate statistics.

interface LanguageStat {
  id: string;
  files: number;
  loc: number;
  percentage: number;
}

Field	Type	Description
`id`	`string`	Language identifier, e.g. `'typescript'`, `'python'`.
`files`	`number`	Number of source files in this language.
`loc`	`number`	Total lines of code in this language.
`percentage`	`number`	Percentage of total project LOC (0–100).

ModuleMap

Entry point for the tree-shaped structural skeleton.

interface ModuleMap {
  root: ModuleNode;
}

ModuleNode

A single node in the module tree — either a directory or a source file.

interface ModuleNode {
  name: string;
  path: string;
  type: 'directory' | 'file';
  role?: FileRole;
  language?: string;
  loc?: number;
  symbols?: SymbolEntry[];
  children?: ModuleNode[];
}

Field	Type	Description
`name`	`string`	Base name of the file or directory.
`path`	`string`	Full path relative to the project root.
`type`	`'directory' \| 'file'`	Node type.
`role`	`FileRole?`	Inferred semantic role (file nodes only).
`language`	`string?`	Programming language of the file.
`loc`	`number?`	Lines of code (file nodes only).
`symbols`	`SymbolEntry[]?`	Top-level exported symbols (file nodes only).
`children`	`ModuleNode[]?`	Child nodes (directory nodes only).

FileRole values: controller, route, handler, service, usecase, model, entity, schema, repository, dao, middleware, util, helper, lib, test, spec, config, env, migration, type, interface, component, page, layout, entry, main, unknown.

SymbolEntry

A named symbol extracted from a source file.

interface SymbolEntry {
  name: string;
  kind: 'function' | 'class' | 'interface' | 'type' | 'enum' | 'variable' | 'method';
  exported: boolean;
  signature?: string;
  decorators?: string[];
  loc: { start: number; end: number };
}

Field	Type	Description
`name`	`string`	Identifier as written in source.
`kind`	`string`	Category of the symbol.
`exported`	`boolean`	Whether the symbol is exported from its module.
`signature`	`string?`	Human-readable type signature, e.g. `(req: Request) => void`.
`decorators`	`string[]?`	Decorator names applied to the symbol, e.g. `['Controller', 'Get']`.
`loc`	`{ start: number; end: number }`	Source line range (1-indexed).

DependencyGraph

Import/export relationships between all source files.

interface DependencyGraph {
  nodes: DependencyNode[];
  edges: DependencyEdge[];
  circularDeps: string[][];
}

Field	Type	Description
`nodes`	`DependencyNode[]`	One node per file that participates in the import graph.
`edges`	`DependencyEdge[]`	Directed import edges.
`circularDeps`	`string[][]`	Groups of file paths that form circular dependency cycles.

DependencyNode

A single file in the dependency graph with degree metrics.

interface DependencyNode {
  path: string;
  inDegree: number;
  outDegree: number;
}

Field	Type	Description
`path`	`string`	File path relative to the project root.
`inDegree`	`number`	Number of other files that import this file (fan-in).
`outDegree`	`number`	Number of files this file imports (fan-out).

DependencyEdge

A directed edge in the dependency graph.

interface DependencyEdge {
  from: string;
  to: string;
  type: 'import' | 'reexport' | 'dynamic';
}

Field	Type	Description
`from`	`string`	Importing file path.
`to`	`string`	Imported module path (resolved to a project file where possible).
`type`	`string`	How the dependency was introduced.

ConventionReport

Coding conventions observed across the project.

interface ConventionReport {
  naming: NamingConvention;
  fileOrganization: string;
  importStyle: ImportStyle;
  errorHandling: string;
  testPattern: string;
  exportStyle: string;
  confidence: number;
}

Field	Type	Description
`naming`	`NamingConvention`	Identifier and file naming conventions.
`fileOrganization`	`string`	One of: `'by-feature'`, `'by-layer'`, `'by-type'`, `'hybrid'`.
`importStyle`	`ImportStyle`	Import ordering and path style.
`errorHandling`	`string`	One of: `'try-catch'`, `'result-type'`, `'callback'`, `'mixed'`.
`testPattern`	`string`	One of: `'co-located'`, `'mirror-directory'`, `'flat'`, `'mixed'`.
`exportStyle`	`string`	One of: `'named'`, `'default'`, `'barrel'`, `'mixed'`.
`confidence`	`number`	Confidence score for the detected conventions (0–1).

NamingConvention fields: files, functions, classes, variables, constants — each a human-readable style name (e.g. 'camelCase', 'kebab-case', 'PascalCase', 'snake_case').

ImportStyle fields: ordering (one of 'external-first', 'internal-first', 'alphabetical', 'unordered'), pathStyle (one of 'relative', 'absolute', 'alias', 'mixed').

HotFile

A file identified as a churn hotspot via git archaeology.

interface HotFile {
  path: string;
  commitCount: number;
  authorCount: number;
  lastModified: string;
  reason: string;
}

Field	Type	Description
`path`	`string`	File path relative to the project root.
`commitCount`	`number`	Total commits touching this file.
`authorCount`	`number`	Number of distinct authors.
`lastModified`	`string`	ISO 8601 timestamp of the most recent commit.
`reason`	`string`	Human-readable explanation of why this file is considered hot.

RiskEntry

A file ranked by its composite risk score.

interface RiskEntry {
  path: string;
  score: number;
  factors: RiskFactor[];
}

Field	Type	Description
`path`	`string`	File path relative to the project root.
`score`	`number`	Composite risk score (0–100). Higher is riskier to change.
`factors`	`RiskFactor[]`	Individual factors contributing to the score.

RiskFactor

A single dimension of risk for a file.

interface RiskFactor {
  type: 'high-centrality' | 'low-test-coverage' | 'high-churn' | 'many-authors' | 'high-complexity';
  value: number;
  description: string;
}

Field	Type	Description
`type`	`string`	Which risk dimension this factor measures.
`value`	`number`	Normalised numeric value for this factor.
`description`	`string`	Human-readable explanation.

ApiEntry

A single entry on the public API surface.

interface ApiEntry {
  path: string;
  symbol: string;
  kind: 'endpoint' | 'export' | 'public-method';
  signature: string;
  httpMethod?: string;
  route?: string;
}

Field	Type	Description
`path`	`string`	File declaring this API entry.
`symbol`	`string`	Symbol name as declared in source.
`kind`	`string`	Category of API entry.
`signature`	`string`	Type signature of the symbol.
`httpMethod`	`string?`	HTTP method for endpoint entries (e.g. `'GET'`, `'POST'`).
`route`	`string?`	URL route pattern for endpoint entries (e.g. `'/users/:id'`).

ArchitectureReport

High-level architectural characterisation of the project.

interface ArchitectureReport {
  style: string;
  framework: FrameworkInfo;
  layers: LayerInfo[];
  confidence: number;
}

Field	Type	Description
`style`	`string`	One of: `'mvc'`, `'hexagonal'`, `'layered'`, `'event-driven'`, `'monolith'`, `'unknown'`.
`framework`	`FrameworkInfo`	Detected framework and detection evidence.
`layers`	`LayerInfo[]`	Logical layers found in the project.
`confidence`	`number`	Confidence score for the architecture detection (0–1).

FrameworkInfo fields: name (e.g. 'nextjs', 'express', 'fastapi', 'spring-boot', 'nestjs'), version?, markers: string[] (evidence files/packages).

LayerInfo fields: name, directories: string[], fileCount: number, role: string.

DnaDiff

The full structural diff between two DnaOutput snapshots, produced by computeDnaDiff().

interface DnaDiff {
  summary: DnaDiffSummary;
  fileChanges: {
    added: string[];
    removed: string[];
    modified: string[];
  };
  symbolChanges: {
    added: Array<{ name: string; kind: string; file: string; line: number }>;
    removed: Array<{ name: string; kind: string; file: string }>;
  };
  dependencyChanges: {
    added: string[];
    removed: string[];
  };
  riskChanges: RiskChange[];
  conventionChanges: string[];
}

DnaDiffSummary fields: filesAdded, filesRemoved, filesModified, symbolsAdded, symbolsRemoved, riskIncreased, riskDecreased — all number.

RiskChange fields: file: string, oldScore: number, newScore: number, direction: 'increased' | 'decreased'.

SectionBudget

Token allocation for each DNA output section.

interface SectionBudget {
  architecture: number;
  moduleMap: number;
  dependencies: number;
  conventions: number;
  hotFiles: number;
  riskSurface: number;
  apiSurface: number;
}

All values are positive integers representing allocated token counts. Produced by allocateTokenBudget() and consumed by formatMarkdown().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code-dna Programmatic API Reference

Table of Contents

Installation

Functions

analyze

formatMarkdown

formatYaml

computeDnaDiff

formatDnaDiff

allocateTokenBudget

estimateTokens

Types

AnalysisOptions

DnaOutput

GenomeHeader

LanguageStat

ModuleMap

ModuleNode

SymbolEntry

DependencyGraph

DependencyNode

DependencyEdge

ConventionReport

HotFile

RiskEntry

RiskFactor

ApiEntry

ArchitectureReport

DnaDiff

SectionBudget

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

code-dna Programmatic API Reference

Table of Contents

Installation

Functions

analyze

formatMarkdown

formatYaml

computeDnaDiff

formatDnaDiff

allocateTokenBudget

estimateTokens

Types

AnalysisOptions

DnaOutput

GenomeHeader

LanguageStat

ModuleMap

ModuleNode

SymbolEntry

DependencyGraph

DependencyNode

DependencyEdge

ConventionReport

HotFile

RiskEntry

RiskFactor

ApiEntry

ArchitectureReport

DnaDiff

SectionBudget