diff --git a/README.md b/README.md
index 64ec2c3..9304fee 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,9 @@
-# ISCC Codes Documentation Site
+# ISCC - Documentation Site
-[](https://zenodo.org/badge/latestdoi/96668860)
+[](https://www.iso.org/standard/77899.html)
+[](https://iscc.codes)
+[](https://creativecommons.org/licenses/by/4.0/)
+[](https://deepwiki.com/iscc/iscc-codes)
| WARNING: This repository is the source for the `iscc.codes` documentation site and preserves historical ISCC Version 1.1 material for continuity. The old Python proof-of-concept code has been retired from the repository root and is not the current ISCC implementation. |
| --- |
@@ -17,8 +20,8 @@ This repository now serves three purposes:
New Python integrations should not use the old `iscc` proof-of-concept package.
-- [`iscc-sdk`](https://github.com/iscc/iscc-sdk) — high-level toolkit for generating ISCCs from media files. Install with `pip install iscc-sdk`.
-- [`iscc-core`](https://github.com/iscc/iscc-core) — lower-level implementation of the ISCC core algorithms used by the SDK. Install with `pip install iscc-core` when you need direct algorithm access.
+- [`iscc-sdk`](https://github.com/iscc/iscc-sdk) - high-level toolkit for generating ISCCs from media files. Install with `pip install iscc-sdk`.
+- [`iscc-core`](https://github.com/iscc/iscc-core) - lower-level implementation of the ISCC core algorithms used by the SDK. Install with `pip install iscc-core` when you need direct algorithm access.
For most application developers, start with `iscc-sdk`.
diff --git a/docs/capabilities.md b/docs/capabilities.md
index c74efdb..162cae6 100644
--- a/docs/capabilities.md
+++ b/docs/capabilities.md
@@ -18,8 +18,7 @@ follow from that design.
{ .left }
-An **ISCC-CODE** is a composite of several **ISCC-UNITs**, each produced by a distinct algorithm and
-each capturing a different layer of identity or similarity. A composite ISCC-CODE contains at minimum a Data-Code and an Instance-Code. The units are
+An **ISCC-CODE** is a made of several **ISCC-UNITs**, each using a distinct algorithm and capturing a different layer of identity. An ISCC-CODE contains at minimum a Data-Code and an Instance-Code. The units are
self-describing and can also be used in isolation.
diff --git a/docs/concept.md b/docs/concept.md
index 628df5c..abab17e 100644
--- a/docs/concept.md
+++ b/docs/concept.md
@@ -1,181 +1,127 @@
----
-title: ISCC - Concept
-icon: lucide/lightbulb
----
-
-# ISCC - Concept
-
-!!! note "Historical context"
- This page preserves early design rationale for the ISCC. Some proof-of-concept and blockchain-specific statements predate the ISO 24138:2024 standardization work and the current `iscc-core` / `iscc-sdk` implementations.
-
-*The internet is shifting towards a network of decentralized peer-to-peer transactions. If we want our transactions on the emerging blockchain networks to be about content we need standardized ways to address content. Our transactions might be payments, attributions, reputation, certification, licenses or entirely new kinds of value transfer. All this will happen much faster and easier if we, as a community, can agree on how to identify content in a decentralized environment.*
-
-This page started as the higher level concept of an open proposal to the wider content community for a common content identification code. We would like to share our ideas and spark a conversation with journalists, news agencies, content creators, publishers, distributors, libraries, musicians, scientists, developers, lawyers, rights organizations and all the other participants of the content ecosystem.
-
-## Introduction
-
-The **structure and management** of **global identifiers** strongly correlates with the grade of achievable **automation** and the potential for **innovation** within and across different sectors of the media industries.
-
-There are many [existing standards](https://xkcd.com/927/) for media identifiers serving a wide array of use cases. Book publishing uses the [**ISBN**](https://www.isbn-international.org/), magazines and journals have the [**ISSN**](https://www.issn.org/), music industry has [**ISRC**](https://isrc.ifpi.org/) and [**ISWC**](http://www.iswc.org/) and film has [**ISAN**](http://www.isan.org/) and [**EIDR**](https://eidr.org/) – each of them serving a set of specific purposes. On the other side of the spectrum there are also generic identifiers standards such as the [**DOI**](https://www.doi.org/), [**ITU HANDLE**](http://www.itu.int/osg/csd/emerging_trends/handle_system/index.html), [**URN**](https://tools.ietf.org/html/rfc8141), [**ARK**](https://tools.ietf.org/html/draft-kunze-ark-18). The DOI, for example, can be used to identify any digital, physical or abstract *object*. All these identifiers have important and distinct roles across different industries and use cases.
-
-The most substantial differentiator of the **ISCC** is the fact that it is **algorithmically bound to the digital content** it identifies. Other standards require human intervention to assign and track the mapping between identifier and object (binding). Many of those standards focus on how to resolve a code to some network location where metadata or the object itself can be found. The **ISCC inverts this principle**. It gives an answer to the question: "Given some digital content, how can I find its code to reference the content in a transaction?". This means that the **ISCC** for any digital content can be *found* (generated) from the content itself, without the need to involve any third-party.
-
-As such the **ISCC** fulfills a distinct role and is **not a replacement for established identifiers**. Rather it is designed as an umbrella standard to augment established identifiers with enhanced algorithmic features. It can be used in the metadata of existing standards or support discoverability (reverse lookup).
-
-Many of the established systems are based on centralized or hierarchical registries that involve manual and costly management processes. To sustain such systems the costs have to be recouped by fees for code assignment, metadata storage or paid access to metadata which inhibits accessibility and discoverability. The overhead, cost and general properties of these systems make them prohibitive for many innovative use cases that require a more informal and generic code assignment (eg. granular content). Communities with short lived or user generated content, don't have any agreed-upon global identifiers for their content.
-
-The fast paced development of the digital media economy has led to an increasing fragmentation of identifiers and new barriers in interoperability. For example major e-book retailers do not require an **ISBN** and instead established their own proprietary identifiers. Amazon has the **ASIN**, Apple has **Apple-ID** and Google has **GKEY**. For many tasks current systems need to track and match all the different vendor specific IDs, which is an inefficient and error prone process.
-
-Resolving an **ISCC** to a network location, metadata or the content itself can be accomplished with neutral and decentralized blockchain-based registries that don't require a centralized or hierarchical system to manage, track and store unique codes, ownership assignments, associated metadata and other information.
-
-Advances in data structures, algorithms, machine learning and the emergence of crypto economics allows us to invent **new** kinds of **media identifiers** and **re-imagine existing identifiers** with innovative use cases in mind. Blockchains and Smart Contracts offer great opportunities in solving many of the challenges of identifier registration, like centralized management, data duplication and disambiguation, vendor lock-in and long term data retention.
-
-This is an open proposal to the digital media community and explores the possibilities of a **decentralized **content identification system. We’d like to establish an open standard for persistent, unique, vendor independent and content-derived cross-media codes that can be stored and managed on global, public and decentralized blockchains. We envision a self-governing ecosystem with a low barrier of entry where **commercial and non-commercial** initiatives can both innovate and thrive next to each other.
-
-## Media Identification Codes for Blockchains
-
-Media cataloging systems tend to get out of hand and become complex and often unmanageable. Our design proposal is focused on keeping the ISCC system as simple and more importantly as **automatable** as possible, while maximizing practical value for the most important use cases — meaning you should get out more than you have to put in. With this in mind we come to the following basic design decisions:
-
-### A “Meaningful” Code
-
-In traditional database systems it is recommended practice to work with **surrogate keys** as identifiers. A surrogate key is a dumb number and has no business meaning and is completely decoupled from the data it identifies. Uniqueness of such identifiers is guaranteed either via centralized incremental assignment by the database system or via random UUIDs which have a very low probability of collisions. While random UUIDs could be generated in a decentralized way, both approaches require some external authority that establishes or certifies the linkage between the identifier and the associated metadata and content. This is why we decided to go with a “meaningful” **content and metadata derived code (CMDC)**. Anyone will be able to verify that a specific code indeed belongs to a given digital content. Even better, anyone can “find” the code for a given content without the need to consult external data sources. This approach also captures essential information about the media in the code itself, which is very useful in scenarios of machine learning and data analytics.
-
-### A Decentralized Code
-
-The **ISCC** is designed to be registry agnostic. This means that content identification codes can be self-issued in a decentralized and parallel fashion without the need for governance by a centralized registration agency. Without registration an **ISCC** is owned by the content and not by a person or organization. An *unregistered* **ISCC** is useful in cases where multiple independent parties exchange information about content. The **CMDC** approach is helpful with common issues like data integrity, validation, de-duplication and disambiguation. Systems that process digital content can integrate ISCC support and benefit immediately. The integrator does not depend on all third-parties having to assign, track and deliver ISCC codes, because those can be generated from the content itself.
-
-ISCC registration becomes **necessary** when an ISCC code needs to be **globally unique, publicly discoverable, resolvable, owned** or **authenticated**. While these features inevitably require some kind of registry, not all of them require a centralized institutional registry.
-
-In a centralized system the central authority is in control of the issuance of codes and safeguards various requirements like code uniqueness or ownership. In a decentralized system where everybody can register a code we need a different approach.
-
-The **ISCC** will specify the necessary protocols to implement the aforementioned features in a decentralized, federated environment and across multiple public blockchains. **Given a registered ISCC code, an application can unambiguously determine on what blockchain (if any), by which account, and at what time an ISCC has been registered. **
-
-Registered ISCC codes have to indicate an authoritative public blockchain network. This indicator is part of the ISCC Code itself, such that codes registered on different networks cannot collide. This guarantees uniqueness of ISCC codes across multiple blockchains.
-
-**Ownership** of ISCC codes (not the identified content) is granted to the signatory of the first transaction for a given ISCC code on the corresponding blockchain.
-
-**Global uniqueness** of ISCC codes is accomplished by the blockchain indicator in combination with a client side counter. Registration clients first check for a prior registration of a given ISCC code on a given blockchain. If the ISCC code is already registered by another account the client may simply increments a suffix of the code before registration.
-
-Applications are instructed to ignore duplicate registrations of identical codes that occur on a blockchain after an initial registration.
-
-This approach retains global clustering and de-duplication features while at the same time offering **owned**, **authenticated** and **globally unique** ISCC codes. The model also allows for verifiable transfers of ISCC ownership. Given an appropriate protocol it is even possible to switch the authoritative blockchain for an ISCC after initial registration without changing the ISCC code itself.
-
-### Registration Services
-
-Registration services offer a plethora of valuable and indispensable benefits. Every industry has its special requirements. Ultimately the stakeholders from those industries will have to set the rules for data curation, metadata management and administrative control. A Blockchain is a low level backend infrastructure. And while blockchains might make access to codes and metadata more accessible, there is still cost involved with storing data, running the infrastructure and providing middleware and frontends. Blockchains work as incentive based economic systems. Registrars can offer **commercially viable** value added services on top of the lower level blockchain networks. For example:
-
-- Identity verification of registrants
-- Certification/attestation of registry entries
-- Data curation and indexing services
-- Blockchain key-management services
-- Custodial blockchain account management
-- Middleware and front-end applications
-- Infrastructure operations
-- Participation in blockchain network governance
-
-### Storage Considerations
-
-On a typical public blockchain all data is **fully replicated** among participants. This allows for independent and autonomous validation of transactions. All blockchain data is highly available, immutable, tamper-proof, timestamped and in most cases openly accessible. However, under high load the limited transaction capacity (storage space per unit of time) creates a transaction fee market for on-chain data. This leads to **growing transaction costs** and makes storage a scarce and increasingly precious resource on public decentralized blockchains. For example storing a 46 character code on the Ethereum blockchain in July 2019 cost ~ $0.50. So it is mandatory for our code and its eventual metadata schema to be very **space efficient **to maximize benefit at minimal cost. The basic metadata that will be required to generate and register codes must be:
-
-- minimal in scope
-- clearly specified
-- robust against human error
-- enforced on technical level
-- adequate for public use (no legal or privacy issues)
-
-## Layers of Digital Media Identification
-
-While we examined existing identifiers we discovered that there is often much confusion about the extent or coverage of what exactly is being identified by a given system. With our idea for a generic cross-media code we want to put special weight on being precise with our definitions and found it helpful to distinguish between “different layers of digital media identification". We found that these layers exist naturally on a scale from abstract to concrete. Our analysis also showed that existing standard identifiers operate on one or at most two of such layers. The ISCC is designed as a **composite content code** that takes the different layers of media identification into consideration:
-
-### Layer 1 – Abstract Creation
-
-In the first and most abstract layer we are concerned with distinguishing between different works or creations in the **broadest possible sense**. The scope of identification is completely independent of any manifestations of the work, be it physical or digital in nature. It is also agnostic to creators, rights holders or any specific interpretations, expressions or language versions of a work. It only relates to the intangible creation - the idea itself.
-
-### Layer 2 – Semantic Field
-
-This layer relates to the meaning or essence of a work. It is an amorphous collection or combination of facts, concepts, categories, subjects, topics, themes, assumptions, observations, conclusions, beliefs and other intangible things that the content conveys. The scope of identification is a set of coordinates within a finite and multidimensional semantic space.
-
-### Layer 3 – Generic Manifestation
-
-In this layer we are concerned with the literal structure of a media type specific and normalized manifestation. Namely the basic text, image, audio or video content independent of its semantic meaning or media file encoding and with a tolerance to variation. This "tolerance to variation" bundles a set of different versions with corrections, revisions, edits, updates, personalization, different format encodings or data compression of the same content under one grouping code. A generic manifestation is independent of a final digital media product and is specific to an expression, version or interpretation of a work.
-
-Unfortunately it is not obvious where generic manifestation of a work ends and another one starts. It depends on human interpretation and context. How much editing do we allow before we call it a “different” manifestation and give it a different code. A practical but only partial solution to this problem is to create an algorithmically defined and testable spectrum of tolerance to variation per media type. This can provide a stable and repeatable process to distinguish between generic content manifestations. But it is important to understand that such a process is not expected to yield results that are always intuitive to human expectations as to where exactly boundaries should be.
-
-### Layer 4 – Media Specific Manifestation
-
-This layer relates to a **manifestation with a specific encoding**. It identifies a **data-file** encoded and offered in a specific **media format **including a tolerance to variation to account for minor edits and updates within a format without creating a new code. For example, one could distinguish between the PDF, DOCX or WEBSITE versions of the same content as generated from a single source publishing system. This layer does only distinguish between products or "artifacts" with a given packaging or encoding.
-
-### Layer 5 – Exact Representation
-
-In this layer we identify a data-file by its exact binary representation without any interpretation of meaning and without any ambiguity. Even a minimal change in data that might not change the interpretation of content would create a different code. Like the first four layers, this layer does **not **express any information related to **content location** or **ownership**.
-
-### Layer 6 – Individual Copy
-
-In the physical world we would call a specific book (one that you can take out of your shelve) an **individual copy**. This implies a notion of **locality **and **ownership**. In the digital world the semantics of an individual copy are very different. An individual copy might be distinguished by a license you own or by a personalized watermark applied by the retailer at time of sale or some digital annotations you have added to your digital media file. While there can only ever be **one exact** individual copy of a **physical object**, there always can be **endless replicas** of an "individual copy" of a **digital object**. It is very important to keep this difference in mind. Ignoring this fact has caused countless misunderstandings and is the source of confusion throughout the media industry – especially in the realm of copyright and license discussions.
-
-We could try to define an **individual digital copy** by its location and exact content on a specific physical storage medium (like a DVD, SSD ...). But this does not account for the fact that it is nearly impossible to stop someone from creating an exact replica of that data or at least a snapshot or recording of the presentation of that data on another storage location.
-
-And most importantly such a replica does not affect the original data and even less can make it magically disappear. In contrast, if you give your individual copy of your book to someone else, you won't **"have it"** anymore. It is clear, that with digital media this **cannot reliably be the case**. The only way would be to build a [tamper-proof physical device](https://opendime.com/) (secure element) that does not reveal the data itself, which would defeat the purpose by making the content itself unavailable. But there are ways to partially simulate such inherently physical properties in the digital world. Most notably with the emergence of blockchain technology it is now possible to have a **cryptographically secured** and publicly notarized tamper-proof **certificate of ownership. ** This can serve as a record of agreement about ownership of an “individual copy”. But is does not by itself enforce location or accessibility of the content, nor does it prove the authorization of the certifying party itself or the legal validity of the agreement.
-
-## Design Principles
-
-As a generic content code the **ISCC Standard** is a an initiative with a broad scope. These are the principles that should guide its design and adoption:
-
-- Target existing, unsolved, real-world problems
-- Provide a technological and automatable solution
-- Be generic and useful to a broad audience
-- Keep the standard pragmatic and simple to implement
-- Keep it extendable and forward compatible
-- Provide marketable user-facing sample applications
-- Provide machine readable test data for implementers
-- Provide developer tools in different programming languages
-- Promote implementations in different sectors
-- The specification should be open and public
-- Engage with other standards and interested parties
-
-## Algorithmic Tools
-
-While many details about the ISCC are still up for discussion we are quite confident about some of the general algorithmic families that will make it into the final specification for the code. These will play an important role in how we generate the different components of the code:
-
-- Similarity preserving hash functions (Simhash, Minhash ...)
-- Perceptual hashing (pHash, Blockhash, Chromaprint …)
-- Content defined chunking (Rabin-Karp, FastCDC ...)
-- Merkle trees
-
-## ISCC Proof-of-Concept
-
-Before we settle on the details of the proposed ISCC code, we built a simple and reduced proof-of-concept implementation of our ideas. It enables us and other developers to test with real world data and systems and find out early what works and what doesn't.
-
-
-
-!!! Update
-
- An interactive demo of the concept is available at https://isccdemo.content-blockchain.org/
-
-The minimal viable, first iteration ISCC will be a byte structure built from the following components:
-
-### Meta-Code
-
-The Meta-Code will be generated as a similarity preserving hash from minimal generic metadata like *title *and *creators*. It operates on **Layer 1 ** and identifies an intangible creation. It is the first and most generic grouping element of the code. We will be experimenting with different n-gram sizes and bit-length to find the practical limits of precision and recall for generic metadata. We will also specify a process to disambiguate unintended collisions by adding optional metadata.
-
-### Partial Content Flag
-
-The Partial Content Flag is a 1-bit flag that indicates whether the remaining elements relate to the complete work or only to a subset of it.
-
-### Media Type Flag
-
-The Media Type Flag is a 3 bit flag that allows us to distinguish between up to 8 generic media types **(GMTs)** to which our Content-Code component applies. We define a generic media type as *basic content types* such as plain text or raw pixel data that is specified exactly and extracted from more complex file formats or encodings. We start with generic text and image types and add audio, video and mixed types later.
-
-### Content-Code
-
-The Content-Code operates on **Layer 3** and will be a GMT-specific similarity preserving hash generated from extracted content. It identifies the normalized content of a specific GMT, independent of file format or encoding. It relates to the structural essence of the content and groups similar GMT-specific manifestations of the abstract creation or parts of it (as indicated by the Partial Content Flag). For practical reasons we intentionally skip a **Layer 2** component at this time. It would add unnecessary complexity for a basic proof-of-concept implementation.
-
-### Data-Code
-
-The Data-Code operates on **Layer 4 **and will be a similarity preserving hash generated from shift-resistant content-defined chunks from the raw data of the encoded media blob. It groups complete encoded files with similar content and encoding. This component does not distinguish between GMTs as the files may include multiple different generic media types.
-
-### Instance-Code
-
-The Instance-Code operates on **Layer 5 **and will be the top hash of a Merkle tree generated from (potentially content-defined) chunks of raw data of an encoded media blob. It identifies a concrete manifestation and proves the integrity of the full content. We use the Merkle tree structure because it also allows as to verify integrity of partial chunks without having to have the full data available. This will be very useful in any scenarios of distributed data storage.
-
-We intentionally skip **Layer 6** at this stage as content ownership and location will be handled on the blockchain layer of the stack and not by the ISCC code itself.
+---
+title: ISCC - Concept
+description: The idea behind the ISCC - identifying digital content by what it is, and describing sameness in layers.
+authors: Titusz Pan
+icon: lucide/lightbulb
+---
+
+# ISCC - Concept
+
+This page explains the idea behind the **ISCC** - how it thinks about digital
+content and about what it means for two files to be "the same." The
+[Capabilities](capabilities.md) page describes what the ISCC can do; this page
+explains *why it is shaped the way it is*.
+
+## "The same" is not one thing
+
+Digital content never stops moving. As a file travels between systems it is
+re-encoded, resized, recompressed, and copied. Each step rewrites the underlying
+bytes, yet to a person the content is unchanged: a photo exported at three
+resolutions, a manuscript saved as PDF, EPUB, and Word, or a song offered in
+several audio formats are all "the same thing."
+
+The catch is that *sameness* has more than one meaning:
+
+- A resized photo is the **same picture**, but not the **same file**.
+- A translated article carries the **same meaning**, but not the **same words**.
+- A re-saved document may be the **same bytes**, or differ by a single character.
+
+So "are these the same?" has several valid answers at once, and a single label
+attached to a work cannot capture them all. The ISCC is built around this
+observation: it describes content at several levels and keeps them separate, so a
+system can tell not just *whether* two assets are related, but *how*.
+
+!!! abstract "In plain terms"
+ The ISCC is a digital fingerprint calculated from a file's own content.
+ Identical files share the same code, and similar files get similar codes.
+ Anyone can compute it with open software and get the same result, with no
+ central registry involved.
+
+## A code read from the content
+
+Most identifiers are *assigned*: an authority issues an ISBN or DOI and attaches
+it to a work. The ISCC inverts this. It is **derived from the content itself** by
+running the open algorithms defined in
+[ISO 24138:2024](https://www.iso.org/standard/77899.html). The code is a function
+of the data, so unrelated parties can independently compute the *same* ISCC for
+the *same* content, and the code references that content without implying anything
+about ownership.
+
+Reading the code from the content is also what makes it *similarity-preserving*:
+when the content changes a little, the code changes a little. A re-compressed
+image or a transcoded audio file produces a code that stays recognizably close to
+the original, even though the raw bytes differ. That is the bridge across the gap
+that re-encoding and resizing create.
+
+## Identification in layers
+
+The ISCC describes a piece of content on a spectrum, from the **abstract** idea
+at the top down to the **concrete** bytes at the bottom. Each level is captured by
+its own **ISCC-UNIT**, and the units combine into a single composite
+**ISCC-CODE**.
+
+| Layer | What it identifies | ISCC-UNIT |
+| ------------ | --------------------------------------------------- | -------------------------- |
+| **Creation** | the work as an idea, via its title and metadata | Meta-Code |
+| **Meaning** | the concepts it conveys, across wording and language | Semantic-Code *(reserved)* |
+| **Content** | what you read, see, or hear - independent of format | Content-Code |
+| **Data** | the encoded file as a stream of bytes | Data-Code |
+| **Instance** | this one exact file, down to the last bit | Instance-Code |
+
+The upper layers describe the *content* - what it is and what it means; the lower
+layers describe the *data* - how it happens to be stored. The **Content-Code**
+applies a dedicated algorithm per media type (Text, Image, Audio, Video, and
+Mixed). A complete ISCC-CODE always includes the Data-Code and Instance-Code,
+with the other units added when they are available.
+
+!!! note "Semantic-Code"
+ ISO 24138:2024 reserves the **Semantic-Code** layer (sameness of *meaning*)
+ but does not yet define its algorithm. Experimental implementations exist for
+ text ([iscc-sct](https://github.com/iscc/iscc-sct)) and images
+ ([iscc-sci](https://github.com/iscc/iscc-sci)).
+
+## Kinds of sameness
+
+The layers correspond to a few independent kinds of similarity. Two files can be
+alike on one and differ on another, and the ISCC keeps each kind separate:
+
+- **Data similarity** - nearly the same bytes. *"Almost the same file."*
+- **Content similarity** - the same once decoded and perceived, regardless of
+ format, compression, or minor edits. *"Looks, reads, or sounds the same."*
+- **Semantic similarity** - the same meaning, including paraphrase and across
+ languages. *"Means the same thing."* (reserved; not yet standardized)
+
+A separate axis, **metadata similarity**, compares how content is *described* -
+its title and description - rather than the content itself.
+
+Set apart from all of these is **data identity**: the Instance-Code is not a
+similarity measure but an exact, bit-for-bit checksum. It does not ask "how
+similar?" - it answers "is this the very same file?" with yes or no.
+
+## Whole works and their parts
+
+Sameness can be judged for an entire work or for parts of it. Two articles may
+share a single quoted paragraph; two recordings may share one sampled passage.
+A *global* comparison asks whether two works match overall, while a *granular*
+comparison finds matching segments within them. The ISCC supports both, so
+partial overlap, quotation, and reuse can be detected - not only whole-file
+duplicates.
+
+## Design principles
+
+The ISCC is deliberately kept simple and broadly useful. A few principles guide
+its design:
+
+- Target real, unsolved content-identification problems.
+- Derive codes algorithmically, with no central authority required.
+- Stay generic across media types, sectors, and use cases.
+- Keep the standard pragmatic and simple to implement.
+- Remain extendable and forward-compatible.
+- Build on open specifications and open-source software.
+- Complement existing identifiers rather than replace them.
+
+## Where to go next
+
+- [Capabilities](capabilities.md) - what the ISCC can do in practice.
+- [Specification](specification.md) - ISO 24138:2024 and the community IEPs.
+- [Resources](resources.md) - software, demos, and tools for generating ISCCs.
diff --git a/docs/history.md b/docs/history.md
index 8dcdc5f..d23f304 100644
--- a/docs/history.md
+++ b/docs/history.md
@@ -173,12 +173,12 @@ The ISCC is the work of a community, but a few roles are central to its history:
implementation, **Principal Editor of ISO 24138:2024**, and **Chairman of the
[ISCC Foundation](https://iscc.io)**
- **Kira Lemke** - director of the ISCC Foundation and convenor of ISO/TC 46/SC
- 9/WG 18 since May 2025
+ 9/WG 18 since 05/2025
- **Martin Etzrodt** - director of the ISCC Foundation (since 2025) and a driving force
behind the BioCodes project, which brings the ISCC to scientific and bioimaging data
- **Sebastian Posth** - evangelist and early adopter of the ISCC, convenor of
- ISO/TC 46/SC 9/WG 18 until May 2025
-- **Sabine Rüsch** - first convenor of ISO/TC 46/SC 9/WG 18 (2019)
+ ISO/TC 46/SC 9/WG 18 (05/2022- 05/2025)
+- **Sabine Rüsch** - first convenor of ISO/TC 46/SC 9/WG 18 (05/2019-05/2022)
- **Gregor Roschkowski** - project manager at DIN who managed the ISCC standardization
process from the German side, coordinating the national mirror committee for ISO/TC 46/SC
9/WG 18
diff --git a/docs/images/favicon.png b/docs/images/favicon.png
index b7827f9..4cb75d3 100644
Binary files a/docs/images/favicon.png and b/docs/images/favicon.png differ
diff --git a/docs/index.md b/docs/index.md
index be37065..9fea978 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,15 +1,15 @@
---
title: iscc-codes
-description: Open, decentralized content identification — derived from the content itself.
+description: Open, decentralized content identification - derived from the content itself.
authors: Titusz Pan
icon: lucide/house
hide:
- toc
---
-# ISCC — International Standard Content Code
+# ISCC - International Standard Content Code
-## Open, decentralized content identification — derived from the content itself
+## Open, decentralized content identification - derived from the content itself
**ISCC** ([ISO 24138:2024](https://www.iso.org/standard/77899.html)) is an open standard for content
identification that works directly from the digital file. Generate a compact, similarity-preserving
@@ -74,7 +74,7 @@ An **ISCC-CODE** is a composite, hierarchically structured fingerprint. It combi
content-derived **ISCC-UNITs** covering embedded metadata, normalized content, and the raw bytes.
Each unit is a compact, similarity-preserving hash.
-[](images/iscc-algo-design.svg)
+[](images/iscc-algo-design3.svg)
[See the full specification →](specification.md)
@@ -85,12 +85,12 @@ Each unit is a compact, similarity-preserving hash.
## Developer entry points
-- [**iscc-core**](https://github.com/iscc/iscc-core) — Python reference implementation of the ISO 24138 core algorithms
-- [**iscc-sdk**](https://github.com/iscc/iscc-sdk) — high-level Python toolkit for generating ISCCs from media files
+- [**iscc-core**](https://github.com/iscc/iscc-core) - Python reference implementation of the ISO 24138 core algorithms
+- [**iscc-sdk**](https://github.com/iscc/iscc-sdk) - high-level Python toolkit for generating ISCCs from media files
See the [Resources](resources.md) page for the wider ISCC ecosystem of tools and services.
-!!! note "ISCC Standard — ISO 24138:2024"
+!!! note "ISCC Standard - ISO 24138:2024"
The ISCC is published as [ISO 24138:2024](https://www.iso.org/standard/77899.html) by
ISO/TC 46/SC 9. Current implementation guidance lives at
[core.iscc.codes](https://core.iscc.codes) and [sdk.iscc.codes](https://sdk.iscc.codes).
diff --git a/docs/javascripts/copilot.js b/docs/javascripts/copilot.js
new file mode 100644
index 0000000..26c52b1
--- /dev/null
+++ b/docs/javascripts/copilot.js
@@ -0,0 +1,25 @@
+/**
+ * Mount the ISCC-AI copilot chat widget with anonymous authentication.
+ *
+ * Fetches a JWT token from the copilot-token endpoint, then mounts the
+ * Chainlit copilot widget with the token for cross-origin authentication.
+ */
+window.addEventListener("load", async function () {
+ if (typeof window.mountChainlitWidget !== "function") return;
+
+ var server = "https://iscc.ai";
+ var tokenUrl = server + "/api/copilot-token";
+
+ try {
+ var response = await fetch(tokenUrl);
+ var data = await response.json();
+ window.mountChainlitWidget({
+ chainlitServer: server,
+ theme: "light",
+ accessToken: data.accessToken,
+ customCssUrl: window.location.origin + "/stylesheets/copilot.css?v=1",
+ });
+ } catch (e) {
+ console.warn("ISCC-AI copilot: failed to fetch token", e);
+ }
+});
diff --git a/docs/license.md b/docs/license.md
index 84a7c75..76d5001 100644
--- a/docs/license.md
+++ b/docs/license.md
@@ -7,7 +7,7 @@ icon: lucide/scale
-**Documentation source ISCC-SUM**: `ISCC:K4AEK2KH4GCOSSYMKR3ORSLHNCRTLTZR6NDLGJBX6N2UZRRCGHYOOHA`
+**Documentation source ISCC-SUM**: `ISCC:K4AEK2KH4GCOSSYMKR3ORSLHNCRTCBBWM7JO7NF3NLKD2I66D27A2MA`
This wide ISCC-SUM identifies the documentation source tree generated with
`iscc-sum --tree docs`. The license page itself is excluded from the tree via
diff --git a/docs/resources.md b/docs/resources.md
index d8d4942..cce9066 100644
--- a/docs/resources.md
+++ b/docs/resources.md
@@ -1,85 +1,212 @@
---
title: ISCC - Resources
-description: ISCC software, demos, tools, developer libs, integrations, presentations, articles and other resources
+description: The open-source software, live demos, publications, and organizations that make up the ISCC ecosystem.
authors: Titusz Pan
icon: lucide/library
---
# ISCC - Resources
-If you find something that is missing from this collection of resources for the ISCC, [please add it](https://github.com/iscc/iscc-codes/edit/main/docs/resources.md).
-
-## ISCC - Official Software & Tools
-
-### [iscc-core](https://github.com/iscc/iscc-core)
-
-Current Python reference implementation of the ISCC core algorithms defined by [ISO 24138:2024](https://www.iso.org/standard/77899.html). Install from PyPI with `pip install iscc-core`. Documentation: