This repository contains Python scripts for converting SFS legislation (Swedish Code of Statutes / Svensk författningssamling) from JSON format to Markdown with temporal tags, HTML, Git, and other formats.
Note
This is part of SE-Lex, read more about the project here.
SFS legislation is exported to https://github.com/se-lex/sfs and also published as HTML at https://selex.se with support for the EU's European Legislation Identifier (ELI) standard.
- Ensure you have Python 3.11 or later installed
- Install required dependencies:
pip install -r requirements.txtConvert JSON files containing legislation to Markdown:
python sfs_processor.py --input sfs_json --output output/md --formats md-markersThe tool can generate legislation in several different formats, depending on use case:
md-markers(default): Markdown with semantic<section>tags and selex attributes for legal status and temporal handlingmd: Clean Markdown files with normalized heading levels, suitable for display and reading. Uses a target-date (default: today's date) to show how the law appears at that point in time
git: Exports legislation as Git commits with historical dates, creating a version history of the legislation
html: Generates HTML files in ELI structure (/eli/sfs/{year}/{number}/index.html) for web publishinghtmldiff: Like HTML but also includes separate versions for each amending law
HTML files can be published via:
- Cloudflare R2: Using
html-export-workflow.yml(requires R2 credentials) - GitHub Pages: Using
github-pages-workflow.yml(simpler setup, requires GitHub Pages enabled)
vector: Converts legislation to vector embeddings for semantic search and RAG applications. Uses OpenAI's text-embedding-3-large model (3072 dimensions) and supports storage in PostgreSQL (pgvector), Elasticsearch, or JSON file.
Example of combining multiple formats:
python sfs_processor.py --input sfs_json --output output --formats md,html,gitTo convert legislation, you first need to download JSON data:
python downloaders/download_sfs_docs.py --ids all --source rkrattsbaserpython downloaders/download_sfs_docs.py --ids "2024:675,2024:700" --source rkrattsbaserDownloaded files are saved by default in the sfs_docs directory. You can specify a different directory with the --out parameter.
Convert all JSON files in a directory to Markdown:
python sfs_processor.py --input sfs_json --output output/md --formats md-markersDepending on which format you choose, you get different structures:
Markdown files with preserved semantic structure through <article> and <section> tags:
<article>: Wraps the entire legislation and may contain temporal attributes (ikraft_datum, upphor_datum, etc.)<section class="avdelning">: Wraps divisions (avdelning) as a higher-level structural unit<section class="kapitel">: Wraps chapters (kapitel) as structural units with underlying paragraphs<section class="paragraf">: Wraps each paragraph (§) as a delimited legal provision
<article selex:status="ikraft" selex:ikraft_datum="2025-01-01">
# Lag (2024:123) om exempel
<section class="avdelning" id="avd1">
## AVDELNING I. ALLMÄNNA BESTÄMMELSER
<section class="kapitel" id="inledande-bestammelser">
### Inledande bestämmelser
<section class="paragraf" id="inledande-bestammelser.1">
#### 1 §
Content of the paragraph...
</section>
</section>
</section>
</article>This semantic structure preserves the document's logical structure and enables automatic processing, analysis, and navigation of the legislation text. ID attributes make it possible to link directly to specific headings and paragraphs (e.g., #inledande-bestammelser.1). The tags can also be used for CSS styling and JavaScript functionality.
Note: Despite the HTML tags, the files are still fully readable as Markdown :)
Clean Markdown files with normalized heading levels, without section tags:
# Lag (2024:123) om exempel
## Inledande bestämmelser
### 1 §
Content of the paragraph...
### 2 §
More content...This format is suitable for simple display and reading, without metadata or temporal handling.
In addition to CSS classes, <section> tags use selex: attributes to handle legal status and dates. These attributes enable filtering of content based on entry-into-force and expiration dates:
-
selex:status: Indicates the section's legal statusikraft: The section contains entry-into-force rules (converted from e.g., "/Träder i kraft I:2025-01-01")upphavd: The section has been repealed (converted if heading contains "upphävd" or "/Upphör att gälla")
-
selex:ikraft_datum: Date when the section enters into force (format: YYYY-MM-DD) -
selex:upphor_datum: Date when the section ceases to apply (format: YYYY-MM-DD) -
selex:ikraft_villkor: Condition for entry into force (when no specific date is given)
Example of selex attributes:
<section class="kapitel" selex:status="ikraft" selex:ikraft_datum="2025-01-01">
### 1 § A paragraph
...
</section>
<section class="paragraf" selex:status="upphavd" selex:upphor_datum="2023-12-31">
#### 2 § A paragraph
...
</section>
<section class="kapitel" selex:status="ikraft" selex:ikraft_villkor="den dag regeringen bestämmer">
### 3 § Heading for conditional entry into force
...
</section>These attributes are automatically used by the system's date filtering to create versions of legislation that are valid at specific points in time. Sections with selex:upphor_datum that have passed are removed, and sections with selex:ikraft_datum that have not yet come are removed from the current version.
The system handles temporal processing (time-based filtering) differently depending on which format is used:
-
md-markers(default): Preserves selex tags and skips temporal processing. This allows all temporal attributes to be retained for later processing. Recommended for preserving all legal metadata. -
md: Applies temporal processing with today's date as the target point. This is important to understand:- Repealed provisions (with
selex:upphor_datumbefore today's date) are removed - Provisions not yet in force (with
selex:ikraft_datumafter today's date) are removed - Selex tags are removed after filtering
- The result is a "clean" Markdown view of how the law appears today
- Note: Since temporal filtering is used automatically, content may disappear if it is repealed or not yet in force
- Repealed provisions (with
-
git: Skips temporal processing in main processing. Temporal handling is done separately in the git workflow to create historical commits. -
htmlandhtmldiff: Apply temporal processing with today's date before HTML generation, similar tomdformat. -
vector: Applies temporal processing with today's date (or specified--target-date) before vector generation. This ensures only current regulations are included in the vector database.
To see how a law appeared at a specific date:
# See how the law appeared on 2023-01-01
python sfs_processor.py --input sfs_json --output output/md --formats md --target-date 2023-01-01This is useful for creating historical versions or for understanding how the law appeared at a certain point in time.
python sfs_processor.py [--input INPUT] [--output OUTPUT] [--formats FORMATS] [--filter FILTER] [--target-date DATE] [--no-year-folder] [--verbose]--input: Input directory with JSON files (default: "sfs_json")--output: Output directory for converted files (default: "SFS")--formats: Output formats to generate, comma-separated. Supports: md-markers, md, git, html, htmldiff, vector (default: "md-markers")md-markers: Generate markdown files with section tags preservedmd: Generate clean markdown files without section tagsgit: Enable Git commits with historical dateshtml: Generate HTML files in ELI structure (basic documents only)htmldiff: Generate HTML files in ELI structure with amendment versionsvector: Generate vector embeddings for semantic search
--filter: Filter files by year (YYYY) or specific reference (YYYY:NNN). Can be comma-separated list.--target-date: Date (YYYY-MM-DD) for temporal filtering, based on selex tags. Used withmd,html,htmldiffandvectorformats to filter content based on validity dates. If not specified, today's date is used. Example:--target-date 2023-01-01--no-year-folder: Don't create year-based subfolders for documents--verbose: Display detailed information about processing
--vector-backend: Backend for vector storage (default: "json")json: Save to JSON file (for testing/development)postgresql: PostgreSQL with pgvector extensionelasticsearch: Elasticsearch with dense_vector
--vector-chunking: Strategy for document chunking (default: "paragraph")paragraph: Split by paragraph (§) - preserves legal structurechapter: Split by chapter - larger contextsection: Split by selex sectionsemantic: Semantic boundaries with overlapfixed_size: Fixed token count with overlap
--embedding-model: Embedding model (default: "text-embedding-3-large")--vector-mock: Use mock embeddings for testing without OpenAI API key
The vector format (--formats vector) converts legislation to vector embeddings that can be used for semantic search, RAG applications (Retrieval-Augmented Generation), and AI assistants.
- Temporal filtering: Only current regulations are included (same as
md/htmlmode) - Intelligent chunking: Documents are split in a way that preserves legal structure
- Embedding generation: Text is converted to vectors using OpenAI text-embedding-3-large
- Storage: Vectors are saved to selected backend with complete metadata
# Test with mock embeddings (without API key)
python sfs_processor.py --formats vector --vector-mock --filter 2024:100
# Production with OpenAI (requires OPENAI_API_KEY environment variable)
python sfs_processor.py --formats vector --filter 2024
# With PostgreSQL/pgvector backend
python sfs_processor.py --formats vector --vector-backend postgresql
# With chapter chunking for larger context
python sfs_processor.py --formats vector --vector-chunking chapter| Backend | Use Case | Requirements |
|---|---|---|
json |
Testing/development | None |
postgresql |
Production | PostgreSQL 12+ with pgvector |
elasticsearch |
Production | Elasticsearch 8.0+ |
Each vector chunk includes:
document_id: Reference number (e.g., "2024:100")chapter: Chapter reference (e.g., "1 kap.")paragraph: Paragraph reference (e.g., "1 §")departement: Responsible ministryeffective_date: Entry-into-force date
This section explains Swedish legal terms used throughout this tool and in the data. Since the source data comes from Swedish authorities, many field names and concepts remain in Swedish.
English: Swedish Code of Statutes Description: The official compilation of all Swedish laws and regulations. Each law is identified by a unique reference number.
English: Reference number / Designation
Format: YYYY:NNN (e.g., "2024:1274")
Description: Unique identifier for each law document, where YYYY is the year of publication and NNN is a sequential number.
English: Sequential number / Running number Description: The numeric portion of the beteckning (the NNN part). Used in file organization and ELI structure paths.
English: Title / Heading Description: The official title of a law document (e.g., "Förordning (2024:1274) om statsbidrag...").
English: Content / Content text / Legislation text Description: The main body text of the law document.
English: Entry into force / Coming into effect
Description: The date when a law or provision becomes legally effective.
JSON field: ikraft_datum (format: YYYY-MM-DD)
Selex attribute: selex:ikraft_datum
English: Cease / Expiration / When the law expires
Description: Date when a law ceases to apply or expires.
JSON field: upphor_datum (format: YYYY-MM-DD)
Selex attribute: selex:upphor_datum
English: Repealed / Revoked / Abolished
Description: Status indicating a law or provision has been officially repealed.
Selex attribute: selex:status="upphavd"
English: Time-limited / Temporally limited
Description: Used for laws with explicit expiration dates (as opposed to being repealed).
JSON field: tidsbegransadDateTime
English: Expires / Ceases to apply
Description: Temporal expiration (similar to tidsbegränsad).
JSON field: utgar_datum
English: Amendment laws / Amending legislation
Description: Laws that modify other laws. Stored as a list of amendments.
JSON structure: Each amendment includes: beteckning, rubrik, ikraft_datum, and anteckningar.
English: Notes / Remarks Description: Additional information or comments about an amendment.
English: Transitional provisions / Interim rules Description: Special rules that apply during the transition period when a law takes effect. Handle cases where implementation requires time or phased application.
English: Repealed by Description: Reference to which law repealed this one.
English: Ordinance / Regulation Description: Type of legal document, usually lower in hierarchy than "lag" (law).
English: Law / Act Description: Primary type of legislation, higher in hierarchy than "förordning".
English: Legislative type / Document type Description: Classification of the document (e.g., "Förordning", "Lag").
English: Division / Part Description: Major structural division in a law (e.g., "AVDELNING I").
English: Chapter Description: Sub-section of an avdelning (e.g., "1 kap.", "2 a kap.").
English: Paragraph / Section Description: Individual legal provision (e.g., "1 §", "3 a §").
English: Attachment / Appendix Description: Supplementary material attached to a law.
These are XML/HTML attributes used to mark legal status and dates in markdown output:
| Attribute | Swedish Term | Description |
|---|---|---|
selex:status |
Status | Legal status: ikraft (in force) or upphavd (repealed) |
selex:ikraft_datum |
Ikraftträdandedatum | Entry-into-force date (YYYY-MM-DD) |
selex:upphor_datum |
Upphörandedatum | Date when provision ceases (YYYY-MM-DD) |
selex:ikraft_villkor |
Ikraftträdandevillkor | Entry-into-force condition (e.g., "den dag regeringen bestämmer") |
selex:upphor_villkor |
Upphörandevillkor | Expiration condition |
selex:utfardad_datum |
Utfärdandedatum | Date issued/enacted |
These fields appear in JSON metadata and document frontmatter:
| Field Name (Swedish) | English Translation | Description |
|---|---|---|
departement |
Ministry/Department | Government department responsible (e.g., "Socialdepartementet") |
organisation |
Organization/Agency | Entity issuing the regulation |
publicerad_datum |
Date published | When document was publicly published |
utfardad_datum |
Date issued/enacted | When document was formally signed |
forarbeten |
Preparatory materials | Legislative preparatory work/parliamentary reports |
celex / celex_nummer |
CELEX number | EU legislative reference number |
eu_direktiv / eUdirektiv |
EU Directive | Boolean flag for EU directive implementation |
These phrases frequently appear in the actual legislation text:
| Swedish Phrase | English Translation | Meaning |
|---|---|---|
| "Träder i kraft I:YYYY-MM-DD" | "Enters into force on [date]" | Entry-into-force marker |
| "Upphör att gälla U:YYYY-MM-DD" | "Ceases to apply on [date]" | Expiration marker |
| "Den dag regeringen bestämmer" | "The day the government decides" | Conditional effective date |
| "Denna lag" | "This law" | Standard opening phrase |
sfs-{YYYY}-{NNN}.mdorsfs-{YYYY}-{NNN}-markers.md: Standard file naming convention (e.g., "sfs-2024-1274.md")sfs-jsondata: Default input directory name for JSON datasfs-export-{format}: Default output directory names (e.g., "sfs-export-md", "sfs-export-html")
Description: The process of showing how a law appeared at specific dates in history. Removes sections that hadn't taken effect yet or had already expired at the target date.
Description: The reference date for temporal processing. Shows how the law appeared on that specific date.
Description: Markdown with section tags preserved (<article>, <section>, etc.). Includes all selex attributes for temporal and status information.
Description: Clean Markdown without section tags. Applies temporal processing (removes future/expired content). Default uses today's date as target date.
Description: Exports as Git commits with historical dates. Creates version history showing law evolution over time.
Description: European Legislation Identifier directory structure. Format: /eli/sfs/{YEAR}/{lopnummer}/index.html
Here's an example of typical JSON data with inline English explanations:
{
"beteckning": "2024:1274", // Reference number
"rubrik": "Förordning om...", // Official title
"ikraft_datum": "2025-01-01", // Entry-into-force date
"upphor_datum": null, // Expiration date (null = still in force)
"departement": "Socialdepartementet", // Ministry (Health & Social Affairs)
"utfardad_datum": "2024-12-19", // Date issued
"publicerad_datum": "2024-12-20", // Date published
"andringsforfattningar": [ // Amending legislation
{
"beteckning": "2025:123",
"rubrik": "Förordning om ändring...",
"ikraft_datum": "2025-07-01",
"anteckningar": "Ändr. 5 §" // Notes: "Amends § 5"
}
]
}We welcome contributions from the community! 🙌
- Read CONTRIBUTING.md for guidelines on how to contribute
- See DEVELOPMENT.md for developer documentation and architecture overview
- Contact: Martin Rimskog via email or LinkedIn