Spec ID: OVOS-INTENT-2 · Version: 2 · Status: Draft
This document defines the locale folder layout and the plain-text resource file formats a skill ships so a voice assistant can recognize what the user says and produce what the assistant speaks. It is implementation-agnostic: any assistant, in any language, can adopt this layout and these formats.
Every resource file is a list of sentence templates as defined by the
companion Sentence Template Grammar Specification (OVOS-INTENT-1). That
grammar has two facets — expansion ((a|b) / [x] / <name>) and named slots
({name}).
The key words MUST, MUST NOT, SHOULD, and MAY are used as in RFC 2119.
A skill is a self-contained unit of assistant functionality. Beyond its code, a skill ships resource files, grouped by language.
There are two file formats, distinguished by whether named slots are permitted:
- slot-bearing — expansion and named slots;
- slot-free — expansion only.
These two formats are realized as six resource roles, identified by file extension. The role tells a consumer how to use the file; the format tells a loader how to parse it.
| Role | Extension | Format | Purpose |
|---|---|---|---|
| Intent | .intent |
slot-bearing | Templates for utterances that trigger a skill action |
| Dialog | .dialog |
slot-bearing | Phrases the assistant speaks in response |
| Entity | .entity |
slot-free | Example values that can fill a named slot |
| Vocabulary | .voc |
slot-free | A named set of localized phrasings |
| Blacklist | .blacklist |
slot-free | Words that suppress an intent |
| Prompt | .prompt |
whole-file verbatim | A localized language-model prompt (§4.4) |
The slot-bearing roles map onto the data path of a voice interaction:
.intent— ASR input: templates matched against the user's speech; their named slots are filled by the engine at match time..dialog— TTS output: templates for what the assistant says back; their named slots are filled by the caller before the text is spoken.
.entity, .voc, and .blacklist share the slot-free format; they are how a
developer encodes a set of natural-language phrasings for the assistant to use, and
differ only in which component consumes them (§4.3).
This specification covers the folder layout, the common parsing rules, and the two file formats across their six roles. It does not cover intent scoring, matching, or skill runtime behaviour.
All localized resources live under a single locale/ directory at the skill
root, with one subdirectory per language:
my-skill/
└── locale/
├── en-US/
│ ├── turn_on.intent
│ ├── device.entity
│ ├── turn_on.blacklist
│ ├── thing.voc
│ ├── confirm.dialog
│ └── dialogs/
│ └── greeting.dialog
├── pt-BR/
│ └── …
└── de-DE/
└── …
A language directory MAY contain subdirectories; a skill author MAY use them to organize resources in any way they choose. A loader resolves a resource by searching the language directory and all its subdirectories, recursively. Subdirectory names carry no meaning to a loader — they are an authoring convenience only.
A resource is identified by the pair (role, base name) — its file extension
and its base name together. Two files MAY share a base name when their roles
differ: confirm.intent and confirm.dialog are distinct resources. Two files
with the same extension MUST NOT share a base name anywhere within one
language directory tree, since subdirectories do not distinguish resources. A
loader that nonetheless encounters two such files MUST treat the skill as
malformed.
A resource base name MUST consist only of lowercase ASCII letters, digits, and
underscores, and MUST NOT contain whitespace; file extensions are likewise
lowercase. Where a base name names a slot —
an .entity file naming the {slot} it supplies — it additionally obeys the
slot-name rule of OVOS-INTENT-1 §3.4 (lowercase letters, digits, and
underscores; not beginning with a digit).
Language directories are named with BCP-47 language tags (en-US, pt-BR,
zh-Hans). Tag comparison is case-insensitive: en-us and en-US denote
the same language.
A resource may be provided from three places. When the same resource — the same
(role, base name) pair — exists in more than one, a loader resolves it in this
precedence order (first match wins):
- User overrides — files under a per-skill directory in the platform user
data path, laid out as
…/<skill_id>/locale/<lang>/, where<skill_id>is the skill's unique identifier. - Skill resources — files bundled in the skill's own
locale/directory. - Core resources — fallback files shipped by the assistant framework. The
root directory holding them is assistant-defined; only the
locale/<lang>/layout beneath that root is normative.
All three use the layout of §2. Overrides apply at whole-file granularity: an override file replaces the corresponding lower-precedence file entirely.
This specification does not mandate behaviour when the requested language has no
directory. A loader SHOULD prefer an exact match. As a suggestion, a
loader MAY fall back to the nearest available language; the langcodes library
provides a tag_distance() function for this, and treats a distance below 10
as a usable regional match. Any such fallback is an implementation choice,
not a requirement of this specification, because cross-region substitution can
produce wording a user would not expect.
All resource files are line-oriented and share one reader behaviour:
- the file is UTF-8; it SHOULD NOT begin with a byte-order mark, and a reader that encounters one MUST discard it;
- lines are terminated by
LForCRLF; a reader MUST accept both; - lines are read in order, each stripped of leading and trailing whitespace;
- a blank line is skipped;
- a line whose first character is
#is a comment and is skipped; there are no inline (end-of-line) comments.
Each surviving line is one template (OVOS-INTENT-1). Format-specific rules below apply after this filtering.
A file's format is determined by its role (§1): .intent and .dialog are
slot-bearing; .entity, .voc, and .blacklist are slot-free. Every
file is a list of templates, one per line, parsed per §3; a resource is
identified by its (role, base name) pair (§2).
Format. Slot-bearing: each line is a template using the full OVOS-INTENT-1
grammar — expansion (a|b) / [x] and named slots {name}.
Role. Defines an intent: the templates whose expanded samples train the
engine to recognize one skill action. Matched against ASR input; named slots
are filled by the engine at match time (OVOS-INTENT-1 §5.1). The file base name
is the intent name. Lines in the file MAY declare different sets of named
slots (OVOS-INTENT-1 §5.5); the intent's slot set is the union of the
slots declared across its templates, and the engine extracts only the slots of
the template that matched (OVOS-INTENT-1 §5.5, OVOS-INTENT-3 §5.1). A phrasing
that needs different slots therefore MAY live in the same .intent file.
Loads as. The union of the sample sets of all lines (OVOS-INTENT-1 §4) — training data for the intent, with named slots intact. The engine generalizes beyond these samples; see OVOS-INTENT-1 §4.
# play.intent
(play|put on) {query}
(play|put on) {query} (on|using) {engine}
i want to listen to {query}
Format. Slot-bearing: each line is a template using expansion (a|b) /
[x] for wording variety, and named slots {name}.
Role. Holds the phrases the assistant may speak for one response — TTS output. The file base name is the dialog name. A phrase is output text, not ASR input, so it MAY contain mixed case and punctuation.
Slots. Named slots in a dialog are filled by the caller — the skill —
before the phrase is rendered to TTS (caller-supplied fill, OVOS-INTENT-1
§5.1). The caller MUST fill every slot in the chosen phrase; a phrase with
an unfilled slot MUST NOT be sent to TTS. Every phrase in the file MUST declare
the same set of named slots (OVOS-INTENT-1 §5.5), so the caller supplies the
same values whichever phrase is chosen. This is the normal way to inject
dynamic values into a spoken response — a weather reading, the current time,
a computed result. An .entity value set is normally not involved in filling
a dialog slot, though an implementation MAY consult one.
Limitation. A .dialog phrase uses the metacharacters ( ) [ ] { } |
structurally and therefore cannot contain any of them as literal spoken text.
This is an accepted constraint; spoken responses rarely require them. A
.dialog file recognizes both named-slot forms defined by OVOS-INTENT-1
§3.4 — {name} and the equivalent double-brace {{name}} — and treats them
identically, since the two forms denote the same slot.
Rendering. To render a dialog, an implementation selects one phrase, fills
its named slots with caller-supplied values, and expands its (a|b) / [x]
variety per OVOS-INTENT-1 §4 to pick one variant. How a phrase is selected
(random, round-robin, repetition avoidance) is implementation behaviour and is
out of scope here.
Loads as. The list of phrase strings. Unlike the other formats, a .dialog
file is not expanded at load time — expansion happens per-render, on the one
phrase chosen.
# weather_today.dialog
It is currently {temperature} degrees and {condition}.
Right now it's {condition}, {temperature} degrees.
(Currently|At the moment) it is {temperature} degrees.
.entity, .voc, and .blacklist share the slot-free format: a list of
templates using expansion (a|b) / [x] only, no named slots. They are
syntactically and semantically identical, and a loader parses all three the same
way — each loads as the union of the sample sets of all its lines
(OVOS-INTENT-1 §4). They are how a developer encodes a set of natural-language
phrasings for the assistant to consume.
They differ only in role — which component reads the file and what it does with the expanded phrase set:
| Extension | Role | Consumed by | Pairing |
|---|---|---|---|
.entity |
Example values that can fill a {slot} |
Intent engine, as a slot value set (OVOS-INTENT-1 §5.4) | Base name = the {slot} name it supplies |
.voc |
A named set of localized phrasings | Keyword intent engines; skill runtime helpers; inline <name> references in templates (OVOS-INTENT-1 §3.7) |
Base name = the vocabulary name |
.blacklist |
Words whose presence suppresses an intent | Intent engine, as match suppression | Base name = the .intent it suppresses |
How an .entity or .voc phrase set is used — slot constraint, keyword
test — is engine or skill policy, consistent with matching behaviour being out
of scope for these specifications.
A .voc may additionally be referenced inline from a template by the <name>
token (OVOS-INTENT-1 §3.7), which expands it in place; a .voc may itself
contain such references.
For the .blacklist role the suppression contract is defined: a .blacklist
file is paired by base name with exactly one .intent, and its expanded phrase
set scopes to that intent alone. A blacklist phrase occurs in an utterance
when its words appear there as a contiguous sequence of whole words — a
token subsequence, not a raw substring (the phrase art does not occur within
the word start). If any phrase from the set occurs in the user's utterance,
that intent is suppressed — a hard, score-independent rejection, not a
confidence penalty. A .blacklist does not affect any other intent.
# weekday.entity — values for the {weekday} slot
monday
(wednes|thurs|fri)day
# yes.voc — a localized keyword vocabulary
yes
yeah
(sure|of course|absolutely)
# play_music.blacklist — suppresses play_music.intent
(music|movie) (trailer|video)
trailer
Format. Prompt: the whole file, verbatim, is one prompt. It is plain
text — not a template in the OVOS-INTENT-1 grammar — so (, ), [,
], <, >, |, every newline, a single { or }, and all other
characters are literal. There is no expansion and no line filtering: a
.prompt is read whole (§3), # lines and blank lines included, because every
character is part of the prompt. The only special handling a .prompt
receives is the {{name}} substitution described below; nothing else in
the file is interpreted. In particular there is no comment handling: an
HTML-style <!-- … --> sequence is ordinary literal text and reaches the
language model unchanged.
Role. The localized prompt a skill feeds to a language model. Like every
other resource it is shipped per language under locale/<lang>/ and resolved
through the override precedence of §2.1, so a prompt can be translated,
adjusted per region, or overridden by a user.
Substitution. The one special construct is the {{name}} substitution
point — the double-brace form only. A slot name consists of lowercase
ASCII letters, digits, and underscores, and MUST NOT begin with a digit. At
render time a caller supplies values keyed by name; an occurrence of
{{name}} is replaced by the caller-supplied value only when both hold:
- it forms a complete, well-formed
{{name}}— a double-brace pair enclosing a valid name; - the caller supplied a value for that name.
A single {name}, a lone { or }, and literal JSON or markup such as
{}, { }, or {"key": 1} are never substitution points — they pass
through unchanged. This single-brace pass-through is precisely why a
.prompt requires the double-brace form: prompts routinely contain literal
single braces (JSON examples, code, set notation), and reserving substitution
to {{name}} lets that literal text survive untouched. (This is the opposite
convention to .intent / .dialog, where {name} and {{name}} are
equivalent per OVOS-INTENT-1 §3.4, because those templates cannot contain a
literal brace at all.)
Slots are optional. A {{name}} for which the caller supplied no value is
left as literal text — an unfilled slot is not an error. This is the
deliberate opposite of .dialog (§4.2), where the caller MUST fill every slot
and an unfilled one MUST NOT be rendered. A prompt is free-form text that may
legitimately contain brace sequences the author never intended as slots, so
substitution is conservative: it touches only the {{name}} occurrences whose
names the caller explicitly provides.
Loads as. The single whole-file string, with substitution applied per the rules above.
The full content of a file weather_report.prompt, rendered with the caller
value {"query": "weather in Lisbon"} (a # line and an <!-- … -->
sequence are both ordinary prompt text here — neither is stripped):
# Weather assistant
<!-- author note: keep this terse -->
You are a concise weather assistant. Answer the user's question.
User asked: {{query}}
Reply as JSON shaped like {"summary": "...", "temp_c": 0}. The {response}
single-brace placeholder below is illustrative only:
```
{"summary": "{response}", "temp_c": 18}
```
{{query}} is substituted; the # Weather assistant heading and the
<!-- … --> line are kept verbatim, the literal single-brace JSON is left
untouched, and the single-brace {response} is not a substitution point and so
is unchanged (whether or not it sits in a code block). A {{tone}} slot the
caller passed no value for would likewise stay literal.
A loader for these resources, in any language, MUST:
- Discover languages — list the language directories under
locale/. - Locate a file — within the resolved language directory, searching its subdirectories recursively, find a file by its base name and extension, honouring the override precedence of §2.1.
- Apply the common reader — UTF-8, accept
LF/CRLF, strip lines, skip blanks and#-comments (§3). - Apply the per-format rule:
.intentand the slot-free roles (.entity,.voc,.blacklist) — expand each line to its sample set at load time via an OVOS-INTENT-1-conformant expander, leaving any named slots intact;.dialog— retain each line as a phrase string; expand per-render (§4.2).
- Reject an empty file — a resource file of any role that yields no templates after step 3 MUST be treated as malformed: every file MUST contribute at least one template. (Each template must in turn expand to at least one non-empty sample, or it is itself malformed — OVOS-INTENT-1 §3.6.)
A loader MAY cache parsed results and MAY implement a language-fallback policy per §2.2, but MUST NOT change the meaning of the formats defined here, and MUST NOT introduce additional resource file roles under this specification.
- Sentence Template Grammar Specification (OVOS-INTENT-1) — the template
grammar, its two facets (expansion and named slots), expansion semantics, and
the slot fill modes. Five resource roles —
.intent,.dialog,.entity,.voc,.blacklist— are lists of templates written in this grammar. The.promptrole (§4.4) is not a template file; it is verbatim text with optional{{name}}double-brace substitution only.