Glossarist gem implements the Glossarist model in ruby. All the entities in the model are available as classes and all the attributes are available as methods of those classes. This gem also allows you to read/write data to concept dataset or create your own collection and save that to glossarist model V2 dataset.
The YAML schema for concept and localized_concept is available at Concept model/yaml_schemas
Add this line to your application’s Gemfile:
gem 'glossarist'And then execute:
bundle installOr install it yourself as:
gem install glossaristGlossarist model V2 dataset is a collection of concepts and their localized concepts in the form of YAML files.
The storage structure of the dataset has 2 forms:
-
Each concept is stored in a concept YAML file and its localized concepts are stored in separate YAML files. The concept files are stored in the
conceptfolder and its localized concepts are stored in thelocalized_conceptfolder. -
Each concept and its related localized concepts are stored in a single YAML file. These concept files are stored directly in the specified path.
To load the glossarist model V2 dataset:
collection = Glossarist::ManagedConceptCollection.new
collection.load_from_files("path/to/glossarist-v2-dataset")To write the glossarist model V2 dataset to files:
# load the collection from files
collection = Glossarist::ManagedConceptCollection.new
collection.load_from_files("path/to/glossarist-v2-dataset")
# ... Update the collection ...
collection.save_to_files("path/to/glossarist-v2-dataset")To write the glossarist model V2 dataset with concepts and their localized concepts grouped into single files:
# load the collection from files
collection = Glossarist::ManagedConceptCollection.new
collection.load_from_files("path/to/glossarist-v2-dataset")
# ... Update the collection ...
collection.save_grouped_concepts_to_files("path/to/glossarist-v2-dataset")This is a collection for managed concepts. It includes the ruby 'Enumerable' module.
collection = Glossarist::ManagedConceptCollection.newFollowing fields are available for ManagedConcept:
- id
-
String identifier for the concept
- uuid
-
UUID for the concept
- related
-
Array of RelatedConcept
- status
-
Enum for the normative status of the term.
- dates
-
Array of ConceptDate
- localized_concepts
-
Hash of all localizations where keys are language codes and values are uuid of the localized concept.
- groups
-
Array of groups in string format
- localizations
-
Hash of all localizations for this concept where keys are language codes and values are instances of LocalizedConcept.
There are two ways to initialize and populate a managed concept
-
Setting the fields by using a hash while initializing
concept = Glossarist::ManagedConcept.new({ "data" => { "id" => "123", "localized_concepts" => { "ara" => "<uuid>", "eng" => "<uuid>" }, "localizations" => <Array of localized concepts or localized concept hashes>, "groups" => [ "foo", "bar", ], }, })
-
Setting the fields after creating an object
concept = Glossarist::ManagedConcept.new concept.id = "123" concept.groups = ["foo", "bar"] concept.localizations = <Array of localized concepts or localized concept hashes>
Localizations of the term to different languages.
Localized concept has the following fields
- id
-
An optional identifier for the term, to be used in cross-references.
- uuid
-
UUID for the concept
- designations
-
Array of Designations under which the term being defined is known. This method will also accept an array of hashes for designation and will convert them to their respective classes.
- domain
-
An optional semantic domain for the term being defined, in case the term is ambiguous between several semantic domains.
- subject
-
Subject of the term.
- definition
-
Array of Detailed Definition of the term.
- non_verb_rep
-
Array of non-verbal representations used to help define the term.
- notes
-
Zero or more notes about the term. A note is in Detailed Definition format.
- examples
-
Zero or more examples of how the term is to be used in Detailed Definition format.
- language_code
-
The language of the localization, as an ISO-639 3-letter code.
- entry_status
-
Entry status of the concept. Must be one of the following: notValid, valid, superseded, retired.
- classification
-
Classification of the concept. Must be one of the following: preferred, admitted, deprecated.
A name under which a managed term is known.
- Methods
-
from_h(options)-
Creates a new designation instance based on the specified type.
- Parameters
-
-
options (Hash) - The options for creating the designation.
-
"type" (String) - The type of designation (expression, symbol, abbreviation, graphical_symbol, letter_symbol). Note: type key should be string and not a symbol so { type: "expression" } will not work.
-
Additional options depend on the specific designation type.
-
- Returns
-
- Designation::{type}
-
A new instance of specified type. e.g
Glossarist::Designation::Base.from_h("type" ⇒ "expression")will returnGlossarist::Designation::Expression
Example
# Example usage of Designation::Base class
attributes_for_expression = { designation: "foobar", geographical_area: "abc", normative_status: "status" }
designation_expression = Designation::Base.from_h({ "type" => "expression" }.merge(attributes_for_expression))
attributes_for_abbreviation = { designation: "foobar", geographical_area: "abc", normative_status: "status", international: true }
designation_abbreviation = Designation::Base.from_h({ "type" => "abbreviation" }.merge(attributes_for_abbreviation))A term related to the current term.
Following fields are available for the Related Concept
- type
-
An enum to denote the relation of the term to the current term.
- content
-
The designation of the related term.
- ref
-
A citation of the related term, in a Termbase.
There are two ways to initialize and populate a related concept
-
Setting the fields by using a hash while initializing
related_concept = Glossarist::RelatedConcept.new({ content: "Test content", type: :supersedes, ref: <concept citation> })
-
Setting the fields after creating an object
related_concept = Glossarist::RelatedConcept.new related_concept.type = "supersedes" related_concept.content = "designation of the related concept" related_concept.ref = <Citation object>
A date relevant to the lifecycle of the managed term.
Following fields are available for the Concept Date
-
date: The date associated with the managed term in Iso8601Date format.
-
type: An enum to denote the event which occured on the given date and associated with the lifecycle of the managed term.
There are two ways to initialize and populate a concept date
-
Setting the fields by using a hash while initializing
concept_date = Glossarist::ConceptDate.new({ date: "2010-11-01T00:00:00+00:00", type: :accepted, })
-
Setting the fields after creating an object
concept_date = Glossarist::ConceptDate.new concept_date.type = :accepted concept_date.date = "2010-11-01T00:00:00+00:00"
A definition of the managed term.
It has the following attributes:
- content
-
The text of the definition of the managed term.
- sources
-
List of Bibliographic references(Citation) for this particular definition of the managed term.
There are two ways to initialize and populate a detailed definition
-
Setting the fields by using a hash while initializing
detailed_definition = Glossarist::DetailedDefinition.new({ content: "plain text reference", sources: [<list of citations>], })
-
Setting the fields after creating an object
detailed_definition = Glossarist::DetailedDefinition.new detailed_definition.content = "plain text reference", detailed_definition.sources = [<list of citations>]
Citation can be either structured or unstructured. A citation is structured if its reference contains one or all of the following keys { id: "id", source: "source", version: "version"} and is unstructured if its reference is plain text. This also has 2 methods structured? and plain? to check if citation is structured or not.
Citation has the following attributes.
- ref
-
A hash or string based on type of citation. Hash if citation is structured or string if citation is plain.
- clause
-
Referred clause of the document.
- link
-
Link to document.
There are two ways to initialize and populate a Citation
-
Setting the fields by using a hash while initializing
# Unstructured Citation citation = Glossarist::Citation.new({ ref: "plain text reference", clause: "clause", link: "link", }) # Structured Citation citation = Glossarist::Citation.new({ ref: { id: "123", source: "source", version: "1.1" }, clause: "clause", link: "link", })
-
Setting the fields after creating an object
citation = Glossarist::Citation.new citation.ref = <plain or structured ref> citation.clause = "some clause"
Non-verbal Representation have the following fields
- image
-
An image used to help define a term.
- table
-
A table used to help define a term.
- formula
-
A formula used to help define a term.
- sources
-
Bibliographic concept source for the non-verbal representation of the term.
Concept Source has the following fields
- status
-
The status of the managed term in the present context, relative to the term as found in the bibliographic source.
- type
-
The type of the managed term in the present context.
- origin
-
The bibliographic citation for the managed term. This is also aliased as
ref. - modification
-
A description of the modification to the cited definition of the term, if any, as it is to be applied in the present context.
Convert Concepts to Latex format.
glossarist generate_latex -p PATH_TO_CONCEPTSOptions:
p, --concepts-path |
Path to yaml concepts directory |
l, --latex-concepts |
File path having list of concepts that should be converted to LATEX format |
o, --output-file |
Output file path |
e, --extra-attributes |
List of extra attributes that are not in standard Glossarist Concept model |
Create a .gcr ZIP archive from a concept dataset.
glossarist package DIR -o output.gcr --shortname mydataset --version 1.0.0 --uri-prefix urn:iso:std:iso:19111Options:
o, --output (required) |
Output |
--shortname (required) |
Machine-readable dataset shortname (e.g. |
--version (required) |
Semantic version (e.g. |
--title |
Human-readable dataset title |
--description |
Dataset description |
--owner |
Dataset owner |
--register-yaml |
Path to register.yaml to include in package |
--uri-prefix |
URI namespace this dataset provides (e.g. |
--tags |
Tags for the dataset |
Ruby API:
GcrPackage.create_from_directory(
"path/to/dataset",
output: "output.gcr",
shortname: "mydataset",
version: "1.0.0",
uri_prefix: "urn:iso:std:iso:19111",
)Validate a dataset directory or .gcr file for schema compliance.
glossarist validate PATH
glossarist validate PATH --reference-path path/to/gcrs/Options:
--strict |
Treat warnings as errors |
--format |
Output format: |
--reference-path |
Path to directory of |
Ruby API:
result = DatasetValidator.new.validate("path/to/dataset")
result = DatasetValidator.new.validate("path/to/dataset", reference_path: "gcrs/")
result.valid? # => true/false
result.errors # => [...]
result.warnings # => [...]A GCR (Glossarist Concept Repository) is a distributable, versioned ZIP archive containing glossary concepts and metadata. GCR packages are created from v2 datasets.
A .gcr file is a ZIP archive with the following structure:
metadata.yaml # Package metadata register.yaml # Optional register information concepts/ # Concept YAML files 102-01-01.yaml 200.yaml
CLI:
glossarist package path/to/v2-dataset -o mydataset-1.0.0.gcr \
--shortname mydataset --version 1.0.0 --uri-prefix urn:iso:std:iso:19111Ruby API:
GcrPackage.create_from_directory(
"path/to/v2-dataset",
output: "mydataset-1.0.0.gcr",
shortname: "mydataset",
version: "1.0.0",
uri_prefix: "urn:iso:std:iso:19111",
title: "My Dataset",
description: "A terminology dataset",
)pkg = GcrPackage.load("mydataset-1.0.0.gcr")
pkg.metadata # => Hash with metadata fields
pkg.concepts # => Array of concept hashesMetadata fields in metadata.yaml:
shortname |
Machine-readable dataset identifier (e.g. |
version |
Semantic version (e.g. |
title |
Human-readable title |
description |
Dataset description |
owner |
Dataset owner |
tags |
Array of tags |
concept_count |
Number of concepts in the package |
languages |
Array of language codes present |
created_at |
ISO 8601 timestamp of package creation |
glossarist_version |
Version of the Glossarist gem used |
schema_version |
Schema version of the package format |
uri_prefix |
URI namespace this dataset provides (e.g. |
external_references |
Array of |
Concepts can reference other concepts within the same dataset (intra-set) or in different datasets (inter-set) using inline mention syntax. All mentions use double braces {{…}}.
The concept mention syntax mirrors HTML <a href="id">display_text</a> — the display text is independent of the target concept’s canonical designation.
| Form | Syntax | Example | Resolution |
|---|---|---|---|
ID only |
|
|
Intra-set: concept 200, auto-display |
ID + display |
|
|
Intra-set: concept 200, custom display |
Designation |
|
|
Intra-set: find by designation |
URN + display |
|
|
Inter-set: resolve by URN |
URN only |
|
|
Inter-set: resolve URN, auto-display |
- IEC URN (IEV)
-
urn:iec:std:iec:60050-{code}— source isurn:iec:std:iec:60050, concept_id is the IEV code - ISO URN (RFC 5141)
-
urn:iso:std:iso:{std}:…:term:{id}— source isurn:iso:std:iso:{std}, concept_id is the term ID
extractor = ReferenceExtractor.new
# From a text string
refs = extractor.extract_from_text("See {{equality, urn:iec:std:iec:60050-102-01-01}} and {{lat, 200}}")
# => [ConceptReference(term: "equality", concept_id: "102-01-01",
# source: "urn:iec:std:iec:60050", ref_type: "urn"),
# ConceptReference(term: "lat", concept_id: "200",
# source: nil, ref_type: "local")]
# From all text fields in a localized concept
refs = extractor.extract_from_localized(lc_hash)
# From all language blocks in a concept
refs = extractor.extract_from_concept_hash(concept_hash)Resolution uses an adapter chain: route overrides → local → package → remote.
resolver = ReferenceResolver.new
# Register the current dataset for intra-set resolution
resolver.register_self(concepts)
# Register co-loaded GCRs with their URI prefixes
resolver.register_package(iev_concepts, uri_prefix: "urn:iec:std:iec:60050")
resolver.register_package(iso_concepts, uri_prefix: "urn:iso:std:iso:19111")
# Add URI route overrides (e.g. author used wrong URI)
resolver.add_route(from: "urn:iso:std:iso:19115", to: "urn:iso:std:iso:19111")
# Resolve a single reference
ref = ConceptReference.new(term: "equality", concept_id: "102-01-01",
source: "urn:iec:std:iec:60050", ref_type: "urn")
resolver.resolve(ref) # => concept hash
# Validate all references in a package
result = resolver.validate_all(concepts)
result.errors # => structural errors
result.warnings # => unresolvable referencesWhen multiple GCRs are placed together in a directory, a collection.yaml configures resolution:
# collection.yaml
packages:
- file: iev-2.0.0.gcr
- file: iso19111-1.0.0.gcr
routes:
- from: "urn:iso:std:iso:19115"
to: "urn:iso:std:iso:19111"
remote:
- uri_prefix: "urn:iec:std:iec:60050"
endpoint: "https://vocabulary.example.org/api/concepts"resolver = ReferenceResolver.new
resolver.load_collection("path/to/gcr_collection/")
# Packages auto-registered with their uri_prefix from metadata
# Route overrides applied
# Remote endpoints registeredThe resolution framework uses a chain of adapters, each implementing resolve(reference) → concept_hash | nil:
- LocalAdapter
-
Resolves intra-set references by concept ID or designation lookup
- PackageAdapter
-
Resolves inter-set references by matching
sourceURI to a GCR’suri_prefix - RouteAdapter
-
Remaps incorrect source URIs before delegation
- RemoteAdapter
-
Resolves via HTTP to an online GCR endpoint
Concept mentions rendered as hyperlinks need HTTP URLs. The UrnResolver converts URNs to their canonical web locations:
# Class-level convenience
url = UrnResolver.resolve("urn:iec:std:iec:60050-102-01-01")
# => "https://www.electropedia.org/iev/iev.nsf/display?openform&ievref=102-01-01"
url = UrnResolver.resolve("urn:iso:std:iso:19111:ed-3:v1:en:term:3.1.32")
# => "https://www.iso.org/obp/ui/#iso:std:iso:19111:ed-3:v1:en:term:3.1.32"
# Also accepts ConceptReference objects
ref = ConceptReference.new(term: "equality", concept_id: "102-01-01",
source: "urn:iec:std:iec:60050", ref_type: "urn")
url = UrnResolver.resolve(ref)
# => "https://www.electropedia.org/iev/iev.nsf/display?openform&ievref=102-01-01"Built-in mappings:
| URN Prefix | Target | Example URL |
|---|---|---|
|
IEC Electropedia |
|
|
ISO Online Browsing Platform |
|
Register custom schemes:
resolver = UrnResolver.new
resolver.register_scheme("urn:example:") do |urn|
"https://example.org/concepts/#{urn.sub('urn:example:', '')}"
endThis gem is developed, maintained and funded by Ribose Inc.
The gem is available as open source under the terms of the 2-Clause BSD License.