This repository follows a Layered Canonical Dataset (LCD) architecture: a directory-driven, inheritance-based data model in which vehicle specifications are authored as layered JSON fragments and compiled into canonical, fully-expanded vehicle records.
Why LCD
- Eliminates repetition: shared attributes live in
base.jsonand are inherited by year and variant files. - Deterministic builds: compilation follows a strict precedence order, producing the same output for the same input.
- Low-friction contributions: contributors edit small, localized JSON files instead of a monolithic dataset.
- Supports variants cleanly: a single base vehicle can generate multiple canonical vehicles (e.g., a higher-range variant).
- Global correctness: a strict contract (schema + units + validation rules) enforces consistency across markets.
The dataset is authored under src/ with this logical structure:
src/
<make_slug>/
<model_slug>/
base.json
<year>/
<vehicle_slug>.json
<vehicle_slug>_<variant_slug>.json
<vehicle_slug>_<variant_slug>.json
...
Example
src/
bmw/
ix1/
base.json
2024/
ix1.json
ix1_350_autonomy.json
2025/
...
The build process produces canonical vehicle records by applying a merge pipeline:
Merge Precedence (lowest → highest)
src/<make>/<model>/base.json(model-level defaults)src/<make>/<model>/<year>/<vehicle_slug>.json(year base vehicle)src/<make>/<model>/<year>/<vehicle_slug>_<variant_slug>.json(variant override, optional)
Output cardinality
- Each
<vehicle_slug>.jsonproduces one canonical vehicle (the year base). - Each matching
<vehicle_slug>_<variant_slug>.jsonproduces one additional canonical vehicle.
To guarantee predictable compilation:
- Objects: deep-merged by key.
- Scalars (string/number/bool): overridden by the higher-precedence layer.
- Arrays: replaced entirely by the higher-precedence layer (no implicit concatenation).
- Nulls: not allowed as a "delete" mechanism. If a value must be removed, the schema must represent that state explicitly.
- Unknown keys: forbidden (schema validation fails).
Folder slugs (<make_slug>, <model_slug>) and file slugs (<vehicle_slug>, <variant_slug>) must:
- be lowercase ASCII
- use
a-z,0-9, and underscore_only - not start or end with
_ - be stable over time (renames are breaking changes)
A variant file name must match its internal variant.slug.
This section defines the contract for fully-expanded JSON vehicle records produced by compilation.
For complete documentation of all fields, data types, and validation rules, see SCHEMA.md.
Key style
- JSON keys are
snake_case. - All semantic identifiers (make/model/trim/variant) use slugs in addition to human-readable names.
Units All measures must be stored in SI or explicitly specified units. See SCHEMA.md for the complete units convention.
Numbers
- Store numeric values as JSON numbers (not strings).
- Use
.as decimal separator.
Sources
- Every canonical record must include at least one verifiable source reference in
sources. - If a variant changes a spec, the variant must provide a source covering that change.
- If values depend on conditions (temperature, wheel size, market), sources must reflect those conditions.
Each canonical vehicle record must have a stable identifier:
-
unique_code: an optional internal unique code or database key. -
Recommended format for external usage (normative for canonical output):
- Base vehicle:
oed:<make_slug>:<model_slug>:<year>:<trim_slug> - Variant vehicle:
oed:<make_slug>:<model_slug>:<year>:<trim_slug>:<variant_slug>
- Base vehicle:
Where:
trim_slugis a stable slug for the trim/grade.variant_slugis a stable slug for the variant.
The following fields are required at the root level of every canonical vehicle record:
schema_version: "1.0.0"make: object withslugandnamemodel: object withslugandnameyear: integer (model year)trim: object withslugandnamevehicle_type: classification stringpowertrain: object withdrivetrain(and optionally motors, power, etc.)battery: object (must include at least one ofpack_capacity_kwh_grossorpack_capacity_kwh_net)charge_ports: array with at least one portcharging: object (should includeacand/ordcspecifications)range: object withratedarray (at least one rated range entry)sources: array with at least one verifiable source
For detailed specifications of all fields (required and optional), see SCHEMA.md.
This repository is authored in layers, but compiled into canonical full records.
Purpose
- Store attributes that are stable across years and trims (or that rarely change), minimizing duplication.
Allowed content (recommended)
make,modelvehicle_type,body(if stable)dimensions(if stable)- charge-port connector defaults (only if stable)
- high-level links and references
Not recommended in model base
yearrange,pricing,availability(usually year/market specific)performancefigures (often change)- charge curve / charge times (often change by year, firmware, battery supplier, market)
- anything that is demonstrably year-dependent
Purpose
- Defines the canonical base vehicle for a given year (and typically a trim/grade).
Rules
- Must include:
year,trim,battery,charge_ports,charging,range, andsources. - Must compile into a valid canonical vehicle record after merging with model base.
- File name
<vehicle_slug>.jsonis the variant root for that year.
A variant represents a distinct, consumer-relevant configuration that should become a separate canonical record (e.g., higher-range battery option, different charge port, higher DC peak, different protocol support).
Mandatory rules
-
The filename must be:
<vehicle_slug>_<variant_slug>.json
-
The JSON must include:
variant.slug == "<variant_slug>"variant.name(human label)
-
The file must be a delta:
- Include only fields that differ from the year base vehicle (plus
variantand any required sources).
- Include only fields that differ from the year base vehicle (plus
-
The variant must include sources covering the changed claims.
Variant intent (recommended)
-
Provide
variant.kindto classify the nature of the change, such as:range_upgrade,battery_upgrade,charging_upgrade,market_specific,software_unlock,wheel_package,v2x_enabled
{
"schema_version": "1.0.0",
"year": 2024,
"trim": { "slug": "base", "name": "Base" },
"charge_ports": [
{
"kind": "combo",
"connector": "ccs2",
"location": { "side": "right", "position": "rear" }
}
],
"powertrain": {
"drivetrain": "awd",
"motors": [
{ "position": "front", "power_kw": 100 },
{ "position": "rear", "power_kw": 140 }
],
"system_power_kw": 230
},
"battery": { "pack_capacity_kwh_net": 64.7, "thermal_management": "liquid" },
"charging": {
"ac": { "max_power_kw": 11.0, "phases": 3 },
"dc": { "max_power_kw": 130.0, "architecture_voltage_class": "400v" },
"protocols": { "dc": ["din_70121", "iso_15118_2"], "plug_and_charge": true }
},
"range": {
"rated": [{ "cycle": "wltp", "range_km": 438 }]
},
"sources": [
{
"type": "oem",
"title": "Official Specifications",
"url": "https://example.com/specs",
"accessed_at": "2025-12-24T00:00:00Z"
}
]
}{
"schema_version": "1.0.0",
"variant": {
"slug": "350_autonomy",
"name": "350 Autonomy",
"kind": "range_upgrade",
"notes": "Higher autonomy configuration for this model year."
},
"range": {
"rated": [{ "cycle": "wltp", "range_km": 350 }]
},
"sources": [
{
"type": "oem",
"title": "Variant Range Statement",
"url": "https://example.com/variant-range",
"accessed_at": "2025-12-24T00:00:00Z"
}
]
}Compilation outcome
-
Produces:
- Base canonical vehicle (from
ix1.json) - Variant canonical vehicle (merged
base.json+ix1.json+ix1_350_autonomy.json)
- Base canonical vehicle (from
A change should be modeled as a separate canonical vehicle record (variant) when it impacts at least one of:
- range rating (any test cycle)
- battery capacity (gross/net), voltage class, or usable SOC window
- drivetrain or motor configuration
- charging capability (AC/DC power limits, connector, voltage/current limits)
- charging protocols / Plug&Charge support (interoperability changes)
- DC charge curve or charge-time table (material changes)
- V2X capability (V2L/V2H/V2G support, max power, protocol/connector)
- official performance metrics (0–100, top speed)
- regulatory classification impacting consumer comparison
Minor changes that do not affect key specs (e.g., infotainment-only changes) should be stored as notes and not as variants.
- No unverifiable claims: every meaningful spec must be backed by at least one source.
- Market ambiguity must be explicit: if a spec is market-limited, specify
markets. - Avoid marketing ambiguity: prefer numeric, test-cycle-qualified values over promotional claims.
- Prefer net capacity when available: if both exist, store both gross and net.
- Charging data must declare context: charge times and curves must disclose SOC window and conditions when possible.
- Keep variants minimal: variants are deltas, not full copies.
The repository will provide tooling to:
- Discover all vehicle files in
src/ - Merge layers according to precedence rules
- Validate against
schema.json - Generate canonical output files
- Export to various formats (JSON, CSV, SQL, etc.)
All canonical vehicle records must:
- Pass JSON Schema validation against
schema.json - Have at least one source
- Have at least one rated range entry
- Have at least one charge port
- Have battery capacity (gross or net)
- Follow naming conventions for slugs
- Author data in layers: model base → year base → year variants.
- Compile deterministically using strict merge rules (objects merge, scalars override, arrays replace).
- Canonical output must match the full JSON contract defined in
schema.jsonand documented in SCHEMA.md. - Variants must be explicit, minimal, source-backed deltas that produce separate vehicles.
- Charging is first-class: ports, AC/DC limits, protocols, curves, times, and V2X can be represented for global compatibility.
- Data quality is paramount: every spec must be verifiable, market-specific data must be explicit, and ambiguity must be minimized.
For complete field-by-field documentation of the schema, see SCHEMA.md.