diff --git a/README.md b/README.md index 64ec2c3..9304fee 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,9 @@ -# ISCC Codes Documentation Site +# ISCC - Documentation Site -[![DOI](https://zenodo.org/badge/96668860.svg)](https://zenodo.org/badge/latestdoi/96668860) +[![ISO 24138:2024](https://img.shields.io/badge/ISO-24138%3A2024-EC1C24)](https://www.iso.org/standard/77899.html) +[![Documentation](https://img.shields.io/badge/docs-iscc.codes-1a73e8)](https://iscc.codes) +[![License: CC BY 4.0](https://img.shields.io/badge/license-CC%20BY%204.0-lightgrey)](https://creativecommons.org/licenses/by/4.0/) +[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/iscc/iscc-codes) | WARNING: This repository is the source for the `iscc.codes` documentation site and preserves historical ISCC Version 1.1 material for continuity. The old Python proof-of-concept code has been retired from the repository root and is not the current ISCC implementation. | | --- | @@ -17,8 +20,8 @@ This repository now serves three purposes: New Python integrations should not use the old `iscc` proof-of-concept package. -- [`iscc-sdk`](https://github.com/iscc/iscc-sdk) — high-level toolkit for generating ISCCs from media files. Install with `pip install iscc-sdk`. -- [`iscc-core`](https://github.com/iscc/iscc-core) — lower-level implementation of the ISCC core algorithms used by the SDK. Install with `pip install iscc-core` when you need direct algorithm access. +- [`iscc-sdk`](https://github.com/iscc/iscc-sdk) - high-level toolkit for generating ISCCs from media files. Install with `pip install iscc-sdk`. +- [`iscc-core`](https://github.com/iscc/iscc-core) - lower-level implementation of the ISCC core algorithms used by the SDK. Install with `pip install iscc-core` when you need direct algorithm access. For most application developers, start with `iscc-sdk`. diff --git a/docs/capabilities.md b/docs/capabilities.md index c74efdb..162cae6 100644 --- a/docs/capabilities.md +++ b/docs/capabilities.md @@ -18,8 +18,7 @@ follow from that design. ![ISCC algorithmic design](images/iscc-algo-design3.svg){ .left } -An **ISCC-CODE** is a composite of several **ISCC-UNITs**, each produced by a distinct algorithm and -each capturing a different layer of identity or similarity. A composite ISCC-CODE contains at minimum a Data-Code and an Instance-Code. The units are +An **ISCC-CODE** is a made of several **ISCC-UNITs**, each using a distinct algorithm and capturing a different layer of identity. An ISCC-CODE contains at minimum a Data-Code and an Instance-Code. The units are self-describing and can also be used in isolation. diff --git a/docs/concept.md b/docs/concept.md index 628df5c..abab17e 100644 --- a/docs/concept.md +++ b/docs/concept.md @@ -1,181 +1,127 @@ ---- -title: ISCC - Concept -icon: lucide/lightbulb ---- - -# ISCC - Concept - -!!! note "Historical context" - This page preserves early design rationale for the ISCC. Some proof-of-concept and blockchain-specific statements predate the ISO 24138:2024 standardization work and the current `iscc-core` / `iscc-sdk` implementations. - -*The internet is shifting towards a network of decentralized peer-to-peer transactions. If we want our transactions on the emerging blockchain networks to be about content we need standardized ways to address content. Our transactions might be payments, attributions, reputation, certification, licenses or entirely new kinds of value transfer. All this will happen much faster and easier if we, as a community, can agree on how to identify content in a decentralized environment.* - -This page started as the higher level concept of an open proposal to the wider content community for a common content identification code. We would like to share our ideas and spark a conversation with journalists, news agencies, content creators, publishers, distributors, libraries, musicians, scientists, developers, lawyers, rights organizations and all the other participants of the content ecosystem. - -## Introduction - -The **structure and management** of **global identifiers** strongly correlates with the grade of achievable **automation** and the potential for **innovation** within and across different sectors of the media industries. - -There are many [existing standards](https://xkcd.com/927/) for media identifiers serving a wide array of use cases. Book publishing uses the [**ISBN**](https://www.isbn-international.org/), magazines and journals have the [**ISSN**](https://www.issn.org/), music industry has [**ISRC**](https://isrc.ifpi.org/) and [**ISWC**](http://www.iswc.org/) and film has [**ISAN**](http://www.isan.org/) and [**EIDR**](https://eidr.org/) – each of them serving a set of specific purposes. On the other side of the spectrum there are also generic identifiers standards such as the [**DOI**](https://www.doi.org/), [**ITU HANDLE**](http://www.itu.int/osg/csd/emerging_trends/handle_system/index.html), [**URN**](https://tools.ietf.org/html/rfc8141), [**ARK**](https://tools.ietf.org/html/draft-kunze-ark-18). The DOI, for example, can be used to identify any digital, physical or abstract *object*. All these identifiers have important and distinct roles across different industries and use cases. - -The most substantial differentiator of the **ISCC** is the fact that it is **algorithmically bound to the digital content** it identifies. Other standards require human intervention to assign and track the mapping between identifier and object (binding). Many of those standards focus on how to resolve a code to some network location where metadata or the object itself can be found. The **ISCC inverts this principle**. It gives an answer to the question: "Given some digital content, how can I find its code to reference the content in a transaction?". This means that the **ISCC** for any digital content can be *found* (generated) from the content itself, without the need to involve any third-party. - -As such the **ISCC** fulfills a distinct role and is **not a replacement for established identifiers**. Rather it is designed as an umbrella standard to augment established identifiers with enhanced algorithmic features. It can be used in the metadata of existing standards or support discoverability (reverse lookup). - -Many of the established systems are based on centralized or hierarchical registries that involve manual and costly management processes. To sustain such systems the costs have to be recouped by fees for code assignment, metadata storage or paid access to metadata which inhibits accessibility and discoverability. The overhead, cost and general properties of these systems make them prohibitive for many innovative use cases that require a more informal and generic code assignment (eg. granular content). Communities with short lived or user generated content, don't have any agreed-upon global identifiers for their content. - -The fast paced development of the digital media economy has led to an increasing fragmentation of identifiers and new barriers in interoperability. For example major e-book retailers do not require an **ISBN** and instead established their own proprietary identifiers. Amazon has the **ASIN**, Apple has **Apple-ID** and Google has **GKEY**. For many tasks current systems need to track and match all the different vendor specific IDs, which is an inefficient and error prone process. - -Resolving an **ISCC** to a network location, metadata or the content itself can be accomplished with neutral and decentralized blockchain-based registries that don't require a centralized or hierarchical system to manage, track and store unique codes, ownership assignments, associated metadata and other information. - -Advances in data structures, algorithms, machine learning and the emergence of crypto economics allows us to invent **new** kinds of **media identifiers** and **re-imagine existing identifiers** with innovative use cases in mind. Blockchains and Smart Contracts offer great opportunities in solving many of the challenges of identifier registration, like centralized management, data duplication and disambiguation, vendor lock-in and long term data retention. - -This is an open proposal to the digital media community and explores the possibilities of a **decentralized **content identification system. We’d like to establish an open standard for persistent, unique, vendor independent and content-derived cross-media codes that can be stored and managed on global, public and decentralized blockchains. We envision a self-governing ecosystem with a low barrier of entry where **commercial and non-commercial** initiatives can both innovate and thrive next to each other. - -## Media Identification Codes for Blockchains - -Media cataloging systems tend to get out of hand and become complex and often unmanageable. Our design proposal is focused on keeping the ISCC system as simple and more importantly as **automatable** as possible, while maximizing practical value for the most important use cases — meaning you should get out more than you have to put in. With this in mind we come to the following basic design decisions: - -### A “Meaningful” Code - -In traditional database systems it is recommended practice to work with **surrogate keys** as identifiers. A surrogate key is a dumb number and has no business meaning and is completely decoupled from the data it identifies. Uniqueness of such identifiers is guaranteed either via centralized incremental assignment by the database system or via random UUIDs which have a very low probability of collisions. While random UUIDs could be generated in a decentralized way, both approaches require some external authority that establishes or certifies the linkage between the identifier and the associated metadata and content. This is why we decided to go with a “meaningful” **content and metadata derived code (CMDC)**. Anyone will be able to verify that a specific code indeed belongs to a given digital content. Even better, anyone can “find” the code for a given content without the need to consult external data sources. This approach also captures essential information about the media in the code itself, which is very useful in scenarios of machine learning and data analytics. - -### A Decentralized Code - -The **ISCC** is designed to be registry agnostic. This means that content identification codes can be self-issued in a decentralized and parallel fashion without the need for governance by a centralized registration agency. Without registration an **ISCC** is owned by the content and not by a person or organization. An *unregistered* **ISCC** is useful in cases where multiple independent parties exchange information about content. The **CMDC** approach is helpful with common issues like data integrity, validation, de-duplication and disambiguation. Systems that process digital content can integrate ISCC support and benefit immediately. The integrator does not depend on all third-parties having to assign, track and deliver ISCC codes, because those can be generated from the content itself. - -ISCC registration becomes **necessary** when an ISCC code needs to be **globally unique, publicly discoverable, resolvable, owned** or **authenticated**. While these features inevitably require some kind of registry, not all of them require a centralized institutional registry. - -In a centralized system the central authority is in control of the issuance of codes and safeguards various requirements like code uniqueness or ownership. In a decentralized system where everybody can register a code we need a different approach. - -The **ISCC** will specify the necessary protocols to implement the aforementioned features in a decentralized, federated environment and across multiple public blockchains. **Given a registered ISCC code, an application can unambiguously determine on what blockchain (if any), by which account, and at what time an ISCC has been registered. ** - -Registered ISCC codes have to indicate an authoritative public blockchain network. This indicator is part of the ISCC Code itself, such that codes registered on different networks cannot collide. This guarantees uniqueness of ISCC codes across multiple blockchains. - -**Ownership** of ISCC codes (not the identified content) is granted to the signatory of the first transaction for a given ISCC code on the corresponding blockchain. - -**Global uniqueness** of ISCC codes is accomplished by the blockchain indicator in combination with a client side counter. Registration clients first check for a prior registration of a given ISCC code on a given blockchain. If the ISCC code is already registered by another account the client may simply increments a suffix of the code before registration. - -Applications are instructed to ignore duplicate registrations of identical codes that occur on a blockchain after an initial registration. - -This approach retains global clustering and de-duplication features while at the same time offering **owned**, **authenticated** and **globally unique** ISCC codes. The model also allows for verifiable transfers of ISCC ownership. Given an appropriate protocol it is even possible to switch the authoritative blockchain for an ISCC after initial registration without changing the ISCC code itself. - -### Registration Services - -Registration services offer a plethora of valuable and indispensable benefits. Every industry has its special requirements. Ultimately the stakeholders from those industries will have to set the rules for data curation, metadata management and administrative control. A Blockchain is a low level backend infrastructure. And while blockchains might make access to codes and metadata more accessible, there is still cost involved with storing data, running the infrastructure and providing middleware and frontends. Blockchains work as incentive based economic systems. Registrars can offer **commercially viable** value added services on top of the lower level blockchain networks. For example: - -- Identity verification of registrants -- Certification/attestation of registry entries -- Data curation and indexing services -- Blockchain key-management services -- Custodial blockchain account management -- Middleware and front-end applications -- Infrastructure operations -- Participation in blockchain network governance - -### Storage Considerations - -On a typical public blockchain all data is **fully replicated** among participants. This allows for independent and autonomous validation of transactions. All blockchain data is highly available, immutable, tamper-proof, timestamped and in most cases openly accessible. However, under high load the limited transaction capacity (storage space per unit of time) creates a transaction fee market for on-chain data. This leads to **growing transaction costs** and makes storage a scarce and increasingly precious resource on public decentralized blockchains. For example storing a 46 character code on the Ethereum blockchain in July 2019 cost ~ $0.50. So it is mandatory for our code and its eventual metadata schema to be very **space efficient **to maximize benefit at minimal cost. The basic metadata that will be required to generate and register codes must be: - -- minimal in scope -- clearly specified -- robust against human error -- enforced on technical level -- adequate for public use (no legal or privacy issues) - -## Layers of Digital Media Identification - -While we examined existing identifiers we discovered that there is often much confusion about the extent or coverage of what exactly is being identified by a given system. With our idea for a generic cross-media code we want to put special weight on being precise with our definitions and found it helpful to distinguish between “different layers of digital media identification". We found that these layers exist naturally on a scale from abstract to concrete. Our analysis also showed that existing standard identifiers operate on one or at most two of such layers. The ISCC is designed as a **composite content code** that takes the different layers of media identification into consideration: - -### Layer 1 – Abstract Creation - -In the first and most abstract layer we are concerned with distinguishing between different works or creations in the **broadest possible sense**. The scope of identification is completely independent of any manifestations of the work, be it physical or digital in nature. It is also agnostic to creators, rights holders or any specific interpretations, expressions or language versions of a work. It only relates to the intangible creation - the idea itself. - -### Layer 2 – Semantic Field - -This layer relates to the meaning or essence of a work. It is an amorphous collection or combination of facts, concepts, categories, subjects, topics, themes, assumptions, observations, conclusions, beliefs and other intangible things that the content conveys. The scope of identification is a set of coordinates within a finite and multidimensional semantic space. - -### Layer 3 – Generic Manifestation - -In this layer we are concerned with the literal structure of a media type specific and normalized manifestation. Namely the basic text, image, audio or video content independent of its semantic meaning or media file encoding and with a tolerance to variation. This "tolerance to variation" bundles a set of different versions with corrections, revisions, edits, updates, personalization, different format encodings or data compression of the same content under one grouping code. A generic manifestation is independent of a final digital media product and is specific to an expression, version or interpretation of a work. - -Unfortunately it is not obvious where generic manifestation of a work ends and another one starts. It depends on human interpretation and context. How much editing do we allow before we call it a “different” manifestation and give it a different code. A practical but only partial solution to this problem is to create an algorithmically defined and testable spectrum of tolerance to variation per media type. This can provide a stable and repeatable process to distinguish between generic content manifestations. But it is important to understand that such a process is not expected to yield results that are always intuitive to human expectations as to where exactly boundaries should be. - -### Layer 4 – Media Specific Manifestation - -This layer relates to a **manifestation with a specific encoding**. It identifies a **data-file** encoded and offered in a specific **media format **including a tolerance to variation to account for minor edits and updates within a format without creating a new code. For example, one could distinguish between the PDF, DOCX or WEBSITE versions of the same content as generated from a single source publishing system. This layer does only distinguish between products or "artifacts" with a given packaging or encoding. - -### Layer 5 – Exact Representation - -In this layer we identify a data-file by its exact binary representation without any interpretation of meaning and without any ambiguity. Even a minimal change in data that might not change the interpretation of content would create a different code. Like the first four layers, this layer does **not **express any information related to **content location** or **ownership**. - -### Layer 6 – Individual Copy - -In the physical world we would call a specific book (one that you can take out of your shelve) an **individual copy**. This implies a notion of **locality **and **ownership**. In the digital world the semantics of an individual copy are very different. An individual copy might be distinguished by a license you own or by a personalized watermark applied by the retailer at time of sale or some digital annotations you have added to your digital media file. While there can only ever be **one exact** individual copy of a **physical object**, there always can be **endless replicas** of an "individual copy" of a **digital object**. It is very important to keep this difference in mind. Ignoring this fact has caused countless misunderstandings and is the source of confusion throughout the media industry – especially in the realm of copyright and license discussions. - -We could try to define an **individual digital copy** by its location and exact content on a specific physical storage medium (like a DVD, SSD ...). But this does not account for the fact that it is nearly impossible to stop someone from creating an exact replica of that data or at least a snapshot or recording of the presentation of that data on another storage location. - -And most importantly such a replica does not affect the original data and even less can make it magically disappear. In contrast, if you give your individual copy of your book to someone else, you won't **"have it"** anymore. It is clear, that with digital media this **cannot reliably be the case**. The only way would be to build a [tamper-proof physical device](https://opendime.com/) (secure element) that does not reveal the data itself, which would defeat the purpose by making the content itself unavailable. But there are ways to partially simulate such inherently physical properties in the digital world. Most notably with the emergence of blockchain technology it is now possible to have a **cryptographically secured** and publicly notarized tamper-proof **certificate of ownership. ** This can serve as a record of agreement about ownership of an “individual copy”. But is does not by itself enforce location or accessibility of the content, nor does it prove the authorization of the certifying party itself or the legal validity of the agreement. - -## Design Principles - -As a generic content code the **ISCC Standard** is a an initiative with a broad scope. These are the principles that should guide its design and adoption: - -- Target existing, unsolved, real-world problems -- Provide a technological and automatable solution -- Be generic and useful to a broad audience -- Keep the standard pragmatic and simple to implement -- Keep it extendable and forward compatible -- Provide marketable user-facing sample applications -- Provide machine readable test data for implementers -- Provide developer tools in different programming languages -- Promote implementations in different sectors -- The specification should be open and public -- Engage with other standards and interested parties - -## Algorithmic Tools - -While many details about the ISCC are still up for discussion we are quite confident about some of the general algorithmic families that will make it into the final specification for the code. These will play an important role in how we generate the different components of the code: - -- Similarity preserving hash functions (Simhash, Minhash ...) -- Perceptual hashing (pHash, Blockhash, Chromaprint …) -- Content defined chunking (Rabin-Karp, FastCDC ...) -- Merkle trees - -## ISCC Proof-of-Concept - -Before we settle on the details of the proposed ISCC code, we built a simple and reduced proof-of-concept implementation of our ideas. It enables us and other developers to test with real world data and systems and find out early what works and what doesn't. - -![img](images/iscc-web-demo.svg) - -!!! Update - - An interactive demo of the concept is available at https://isccdemo.content-blockchain.org/ - -The minimal viable, first iteration ISCC will be a byte structure built from the following components: - -### Meta-Code - -The Meta-Code will be generated as a similarity preserving hash from minimal generic metadata like *title *and *creators*. It operates on **Layer 1 ** and identifies an intangible creation. It is the first and most generic grouping element of the code. We will be experimenting with different n-gram sizes and bit-length to find the practical limits of precision and recall for generic metadata. We will also specify a process to disambiguate unintended collisions by adding optional metadata. - -### Partial Content Flag - -The Partial Content Flag is a 1-bit flag that indicates whether the remaining elements relate to the complete work or only to a subset of it. - -### Media Type Flag - -The Media Type Flag is a 3 bit flag that allows us to distinguish between up to 8 generic media types **(GMTs)** to which our Content-Code component applies. We define a generic media type as *basic content types* such as plain text or raw pixel data that is specified exactly and extracted from more complex file formats or encodings. We start with generic text and image types and add audio, video and mixed types later. - -### Content-Code - -The Content-Code operates on **Layer 3** and will be a GMT-specific similarity preserving hash generated from extracted content. It identifies the normalized content of a specific GMT, independent of file format or encoding. It relates to the structural essence of the content and groups similar GMT-specific manifestations of the abstract creation or parts of it (as indicated by the Partial Content Flag). For practical reasons we intentionally skip a **Layer 2** component at this time. It would add unnecessary complexity for a basic proof-of-concept implementation. - -### Data-Code - -The Data-Code operates on **Layer 4 **and will be a similarity preserving hash generated from shift-resistant content-defined chunks from the raw data of the encoded media blob. It groups complete encoded files with similar content and encoding. This component does not distinguish between GMTs as the files may include multiple different generic media types. - -### Instance-Code - -The Instance-Code operates on **Layer 5 **and will be the top hash of a Merkle tree generated from (potentially content-defined) chunks of raw data of an encoded media blob. It identifies a concrete manifestation and proves the integrity of the full content. We use the Merkle tree structure because it also allows as to verify integrity of partial chunks without having to have the full data available. This will be very useful in any scenarios of distributed data storage. - -We intentionally skip **Layer 6** at this stage as content ownership and location will be handled on the blockchain layer of the stack and not by the ISCC code itself. +--- +title: ISCC - Concept +description: The idea behind the ISCC - identifying digital content by what it is, and describing sameness in layers. +authors: Titusz Pan +icon: lucide/lightbulb +--- + +# ISCC - Concept + +This page explains the idea behind the **ISCC** - how it thinks about digital +content and about what it means for two files to be "the same." The +[Capabilities](capabilities.md) page describes what the ISCC can do; this page +explains *why it is shaped the way it is*. + +## "The same" is not one thing + +Digital content never stops moving. As a file travels between systems it is +re-encoded, resized, recompressed, and copied. Each step rewrites the underlying +bytes, yet to a person the content is unchanged: a photo exported at three +resolutions, a manuscript saved as PDF, EPUB, and Word, or a song offered in +several audio formats are all "the same thing." + +The catch is that *sameness* has more than one meaning: + +- A resized photo is the **same picture**, but not the **same file**. +- A translated article carries the **same meaning**, but not the **same words**. +- A re-saved document may be the **same bytes**, or differ by a single character. + +So "are these the same?" has several valid answers at once, and a single label +attached to a work cannot capture them all. The ISCC is built around this +observation: it describes content at several levels and keeps them separate, so a +system can tell not just *whether* two assets are related, but *how*. + +!!! abstract "In plain terms" + The ISCC is a digital fingerprint calculated from a file's own content. + Identical files share the same code, and similar files get similar codes. + Anyone can compute it with open software and get the same result, with no + central registry involved. + +## A code read from the content + +Most identifiers are *assigned*: an authority issues an ISBN or DOI and attaches +it to a work. The ISCC inverts this. It is **derived from the content itself** by +running the open algorithms defined in +[ISO 24138:2024](https://www.iso.org/standard/77899.html). The code is a function +of the data, so unrelated parties can independently compute the *same* ISCC for +the *same* content, and the code references that content without implying anything +about ownership. + +Reading the code from the content is also what makes it *similarity-preserving*: +when the content changes a little, the code changes a little. A re-compressed +image or a transcoded audio file produces a code that stays recognizably close to +the original, even though the raw bytes differ. That is the bridge across the gap +that re-encoding and resizing create. + +## Identification in layers + +The ISCC describes a piece of content on a spectrum, from the **abstract** idea +at the top down to the **concrete** bytes at the bottom. Each level is captured by +its own **ISCC-UNIT**, and the units combine into a single composite +**ISCC-CODE**. + +| Layer | What it identifies | ISCC-UNIT | +| ------------ | --------------------------------------------------- | -------------------------- | +| **Creation** | the work as an idea, via its title and metadata | Meta-Code | +| **Meaning** | the concepts it conveys, across wording and language | Semantic-Code *(reserved)* | +| **Content** | what you read, see, or hear - independent of format | Content-Code | +| **Data** | the encoded file as a stream of bytes | Data-Code | +| **Instance** | this one exact file, down to the last bit | Instance-Code | + +The upper layers describe the *content* - what it is and what it means; the lower +layers describe the *data* - how it happens to be stored. The **Content-Code** +applies a dedicated algorithm per media type (Text, Image, Audio, Video, and +Mixed). A complete ISCC-CODE always includes the Data-Code and Instance-Code, +with the other units added when they are available. + +!!! note "Semantic-Code" + ISO 24138:2024 reserves the **Semantic-Code** layer (sameness of *meaning*) + but does not yet define its algorithm. Experimental implementations exist for + text ([iscc-sct](https://github.com/iscc/iscc-sct)) and images + ([iscc-sci](https://github.com/iscc/iscc-sci)). + +## Kinds of sameness + +The layers correspond to a few independent kinds of similarity. Two files can be +alike on one and differ on another, and the ISCC keeps each kind separate: + +- **Data similarity** - nearly the same bytes. *"Almost the same file."* +- **Content similarity** - the same once decoded and perceived, regardless of + format, compression, or minor edits. *"Looks, reads, or sounds the same."* +- **Semantic similarity** - the same meaning, including paraphrase and across + languages. *"Means the same thing."* (reserved; not yet standardized) + +A separate axis, **metadata similarity**, compares how content is *described* - +its title and description - rather than the content itself. + +Set apart from all of these is **data identity**: the Instance-Code is not a +similarity measure but an exact, bit-for-bit checksum. It does not ask "how +similar?" - it answers "is this the very same file?" with yes or no. + +## Whole works and their parts + +Sameness can be judged for an entire work or for parts of it. Two articles may +share a single quoted paragraph; two recordings may share one sampled passage. +A *global* comparison asks whether two works match overall, while a *granular* +comparison finds matching segments within them. The ISCC supports both, so +partial overlap, quotation, and reuse can be detected - not only whole-file +duplicates. + +## Design principles + +The ISCC is deliberately kept simple and broadly useful. A few principles guide +its design: + +- Target real, unsolved content-identification problems. +- Derive codes algorithmically, with no central authority required. +- Stay generic across media types, sectors, and use cases. +- Keep the standard pragmatic and simple to implement. +- Remain extendable and forward-compatible. +- Build on open specifications and open-source software. +- Complement existing identifiers rather than replace them. + +## Where to go next + +- [Capabilities](capabilities.md) - what the ISCC can do in practice. +- [Specification](specification.md) - ISO 24138:2024 and the community IEPs. +- [Resources](resources.md) - software, demos, and tools for generating ISCCs. diff --git a/docs/history.md b/docs/history.md index 8dcdc5f..d23f304 100644 --- a/docs/history.md +++ b/docs/history.md @@ -173,12 +173,12 @@ The ISCC is the work of a community, but a few roles are central to its history: implementation, **Principal Editor of ISO 24138:2024**, and **Chairman of the [ISCC Foundation](https://iscc.io)** - **Kira Lemke** - director of the ISCC Foundation and convenor of ISO/TC 46/SC - 9/WG 18 since May 2025 + 9/WG 18 since 05/2025 - **Martin Etzrodt** - director of the ISCC Foundation (since 2025) and a driving force behind the BioCodes project, which brings the ISCC to scientific and bioimaging data - **Sebastian Posth** - evangelist and early adopter of the ISCC, convenor of - ISO/TC 46/SC 9/WG 18 until May 2025 -- **Sabine Rüsch** - first convenor of ISO/TC 46/SC 9/WG 18 (2019) + ISO/TC 46/SC 9/WG 18 (05/2022- 05/2025) +- **Sabine Rüsch** - first convenor of ISO/TC 46/SC 9/WG 18 (05/2019-05/2022) - **Gregor Roschkowski** - project manager at DIN who managed the ISCC standardization process from the German side, coordinating the national mirror committee for ISO/TC 46/SC 9/WG 18 diff --git a/docs/images/favicon.png b/docs/images/favicon.png index b7827f9..4cb75d3 100644 Binary files a/docs/images/favicon.png and b/docs/images/favicon.png differ diff --git a/docs/index.md b/docs/index.md index be37065..9fea978 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,15 +1,15 @@ --- title: iscc-codes -description: Open, decentralized content identification — derived from the content itself. +description: Open, decentralized content identification - derived from the content itself. authors: Titusz Pan icon: lucide/house hide: - toc --- -# ISCC — International Standard Content Code +# ISCC - International Standard Content Code -## Open, decentralized content identification — derived from the content itself +## Open, decentralized content identification - derived from the content itself **ISCC** ([ISO 24138:2024](https://www.iso.org/standard/77899.html)) is an open standard for content identification that works directly from the digital file. Generate a compact, similarity-preserving @@ -74,7 +74,7 @@ An **ISCC-CODE** is a composite, hierarchically structured fingerprint. It combi content-derived **ISCC-UNITs** covering embedded metadata, normalized content, and the raw bytes. Each unit is a compact, similarity-preserving hash. -[![ISCC algorithmic design](images/iscc-algo-design.svg)](images/iscc-algo-design.svg) +[![ISCC algorithmic design](images/iscc-algo-design3.svg)](images/iscc-algo-design3.svg) [See the full specification →](specification.md) @@ -85,12 +85,12 @@ Each unit is a compact, similarity-preserving hash. ## Developer entry points -- [**iscc-core**](https://github.com/iscc/iscc-core) — Python reference implementation of the ISO 24138 core algorithms -- [**iscc-sdk**](https://github.com/iscc/iscc-sdk) — high-level Python toolkit for generating ISCCs from media files +- [**iscc-core**](https://github.com/iscc/iscc-core) - Python reference implementation of the ISO 24138 core algorithms +- [**iscc-sdk**](https://github.com/iscc/iscc-sdk) - high-level Python toolkit for generating ISCCs from media files See the [Resources](resources.md) page for the wider ISCC ecosystem of tools and services. -!!! note "ISCC Standard — ISO 24138:2024" +!!! note "ISCC Standard - ISO 24138:2024" The ISCC is published as [ISO 24138:2024](https://www.iso.org/standard/77899.html) by ISO/TC 46/SC 9. Current implementation guidance lives at [core.iscc.codes](https://core.iscc.codes) and [sdk.iscc.codes](https://sdk.iscc.codes). diff --git a/docs/javascripts/copilot.js b/docs/javascripts/copilot.js new file mode 100644 index 0000000..26c52b1 --- /dev/null +++ b/docs/javascripts/copilot.js @@ -0,0 +1,25 @@ +/** + * Mount the ISCC-AI copilot chat widget with anonymous authentication. + * + * Fetches a JWT token from the copilot-token endpoint, then mounts the + * Chainlit copilot widget with the token for cross-origin authentication. + */ +window.addEventListener("load", async function () { + if (typeof window.mountChainlitWidget !== "function") return; + + var server = "https://iscc.ai"; + var tokenUrl = server + "/api/copilot-token"; + + try { + var response = await fetch(tokenUrl); + var data = await response.json(); + window.mountChainlitWidget({ + chainlitServer: server, + theme: "light", + accessToken: data.accessToken, + customCssUrl: window.location.origin + "/stylesheets/copilot.css?v=1", + }); + } catch (e) { + console.warn("ISCC-AI copilot: failed to fetch token", e); + } +}); diff --git a/docs/license.md b/docs/license.md index 84a7c75..76d5001 100644 --- a/docs/license.md +++ b/docs/license.md @@ -7,7 +7,7 @@ icon: lucide/scale -**Documentation source ISCC-SUM**: `ISCC:K4AEK2KH4GCOSSYMKR3ORSLHNCRTLTZR6NDLGJBX6N2UZRRCGHYOOHA` +**Documentation source ISCC-SUM**: `ISCC:K4AEK2KH4GCOSSYMKR3ORSLHNCRTCBBWM7JO7NF3NLKD2I66D27A2MA` This wide ISCC-SUM identifies the documentation source tree generated with `iscc-sum --tree docs`. The license page itself is excluded from the tree via diff --git a/docs/resources.md b/docs/resources.md index d8d4942..cce9066 100644 --- a/docs/resources.md +++ b/docs/resources.md @@ -1,85 +1,212 @@ --- title: ISCC - Resources -description: ISCC software, demos, tools, developer libs, integrations, presentations, articles and other resources +description: The open-source software, live demos, publications, and organizations that make up the ISCC ecosystem. authors: Titusz Pan icon: lucide/library --- # ISCC - Resources -If you find something that is missing from this collection of resources for the ISCC, [please add it](https://github.com/iscc/iscc-codes/edit/main/docs/resources.md). - -## ISCC - Official Software & Tools - -### [iscc-core](https://github.com/iscc/iscc-core) - -Current Python reference implementation of the ISCC core algorithms defined by [ISO 24138:2024](https://www.iso.org/standard/77899.html). Install from PyPI with `pip install iscc-core`. Documentation: . - -### [iscc-sdk](https://github.com/iscc/iscc-sdk) - -High-level Python toolkit for creating and managing ISCCs from media files. It builds on `iscc-core` and adds media type detection, metadata handling, content extraction, and a command-line interface. Install from PyPI with `pip install iscc-sdk`. Documentation: . +The open-source software, live demos, publications, and organizations that make up the ISCC +ecosystem. Everything below builds on [ISO 24138:2024](https://www.iso.org/standard/77899.html), +the International Standard Content Code. + + + +## Open-source ecosystem + +Every repository below is developed in the open under the **Apache-2.0** license. + +### Core components + + + +### Discovery infrastructure + + + +### ISCC Semantic Codes + + + +### Specifications & quality assurance + + + +
+Stable Production-ready, stable API, ISO-aligned where applicable +Beta Feature-complete, API may change before v1.0, suitable for pilots +Draft Proposal documents under discussion +
+ +## Try it live + +### [Cover Matching Demo](https://covers.iscc.io/) + +Three million book covers from the Amazon Reviews'23 dataset, indexed and searchable through ISCC +similarity matching. ### [ISCC Web Demo](https://demo.iscc.io/) -Minimal web application for generating ISCCs in the browser. Source code: [iscc-web](https://github.com/iscc/iscc-web). - -### [ISCC Codes](https://github.com/iscc/iscc-codes) +Minimal web application for generating ISCCs directly in the browser. Live instance of +[iscc-web](https://github.com/iscc/iscc-web). -Source repository for this `iscc.codes` documentation site and historical ISCC Version 1.1 specification material. The legacy Python package [`iscc`](https://pypi.org/project/iscc/) is an outdated proof-of-concept retained for compatibility; new integrations should use `iscc-sdk`. Use `iscc-core` only when lower-level algorithm access is needed. +### [ISCC Playground](https://huggingface.co/spaces/iscc/iscc-playground) -## ISCC - Third-Party Implementations +Interactive Hugging Face Space for exploring every ISCC-UNIT and inspecting how codes change as +content changes. -### [ISCC-RS](https://github.com/iscc/iscc-rs) +### [BioCodes Imagewalk Demo](https://bio-codes.io/viz/imagewalk/) -Rust implementation of the [ISCC specification](https://iscc.codes/specification). +Interactive visualization of the proposed **Imagewalk** algorithm for robust bio-image traversal and ISCC-CODE generation. -### [ISCC-RS-CLI](https://github.com/iscc/iscc-rs-cli) -Command-line tool based on the [iscc-rs](https://github.com/iscc/iscc-rs) library. -### [ISCC-GOLANG](https://github.com/coblo/iscc-golang) +## Selected publications & coverage -Golang implementation of the ISCC protocol. +A curated selection from the wider body of work that references the ISCC. Dates are given as +published. -### [ISCC-DOTNET](https://github.com/iscc/iscc-dotnet) +### Standards -C# .Net Core implementation of the ISCC protocol. +- **ISO 24138:2024 - International Standard Content Code** - ISO/TC 46/SC 9, 2024. The normative + definition of the ISCC. [iso.org](https://www.iso.org/standard/77899.html) +- **C2PA Technical Specification** - Coalition for Content Provenance and Authenticity, 2025. Adopts + the ISCC as a soft-binding mechanism that links provenance data to content even after metadata is + stripped. [spec.c2pa.org](https://spec.c2pa.org) -## ISCC - Technical Demos & Integrations +### Reviews & standards landscape -### [Web Demo](https://iscc.coblo.net/) +- **Technical Report on AI and Multimedia Authenticity Standards** - IEC, ISO & ITU (AMAS / World + Standards Cooperation), 2025. Maps more than 30 standards for AI-era media and dedicates a section + to the ISCC as an asset identifier. + [PDF](https://www.worldstandardscooperation.org/wp-content/uploads/2025/07/IEC-ISO-ITU-Technical_Report_on_AI_and_Multimedia_Authenticity_Standards.pdf) +- **Mapping of EU Databases and Metadata Standards for Copyright-Protected Works** - EUIPO, 2026. + Highlights the ISCC and the ISCC Discovery Protocol as a reference architecture for federated + content discovery. [euipo.europa.eu](https://www.euipo.europa.eu/en/publications/mapping-of-eu-databases-and-metadata-standards-providing-information-on-copyright-protected-works) +- **Introducing the Newest ISO Identifier Standard** - Todd A. Carpenter, NISO, 2024. Introduces + ISO 24138 to the information-standards community as a shift toward content-derived identifiers. + [niso.org](https://www.niso.org/niso-io/2024/06/introducing-newest-iso-identifier-standard) +- **A Successful Start to a New Festival of Identifiers: PIDfest 2024** - Meadows, Jones & + Carpenter, The Scholarly Kitchen, 2024. Positions the ISCC as a flagship intrinsic identifier + alongside ISBN, DOI, and ISSN. + [scholarlykitchen.sspnet.org](https://scholarlykitchen.sspnet.org/2024/07/18/a-successful-start-to-a-new-festival-of-identifiers-pidfest-2024/) -A demo web application that can generate and lookup ISCC codes from files or URLs and visualizes differences between ISCC Codes. The [source code](https://github.com/coblo/iscc-demo) is also available. +### Research & academic -### [Data Streams](https://explorer.coblo.net/streams/) +- **EU AI-Act: Tagging GenAI Content** - Heeger, Berchtold, Bugert & Steinebach (Fraunhofer SIT / + ATHENE), Electronic Imaging, 2025. Selects the ISCC as the robust hashing primitive for an EU AI + Act compliance infrastructure. [DOI](https://doi.org/10.2352/EI.2025.37.4.MWSF-301) +- **ISCC: Neue Perspektiven für die KI-gesteuerte Identifikation von Inhalten** - Titusz Pan, + Information - Wissenschaft & Praxis (iwp), 2024. Academic exposition of the ISCC and its semantic + text-code extension. [DOI](https://doi.org/10.1515/iwp-2024-2032) +- **Why Libraries, Archives and Museums Should Use the ISCC** - Heller & Gragert (TIB / SBB), 2024. + An advocacy piece for adoption across the GLAM sector. + [blog.tib.eu](https://blog.tib.eu/2024/07/05/the-international-standard-content-code-iscc-why-libraries-archives-and-museums-should-use-it/) -The Content Blockchain Testnet is running a public data-stream of ISCC codes for testing and demonstration purposes. The web demo uses the [ISCC data-stream](https://explorer.coblo.net/stream/iscc) for lookups. +### Adoption & press -### [Clink.ID](https://clink.id/) +- **CommonsDB surpasses one million declarations** - Doug McCarthy, Open Future, 2026. Each + declaration binds to its file through a content-derived ISCC. + [commonsdb.org](https://www.commonsdb.org/blog/commonsdb-surpasses-1-million-declarations/) +- **Frankfurt Book Fair 2025: Identity Stamps** - Ed Nawotka, Publishers Weekly, 2025. Describes the + ISCC as an ISO-certified digital fingerprint powering content registries for AI licensing. + [publishersweekly.com](https://www.publishersweekly.com/pw/by-topic/international/Frankfurt-Book-Fair/article/98859-frankfurt-book-fair-2025-identity-stamps.html) +- **Bookwire Offers 'Protection' From Wrongful AI Usage** - Porter Anderson, Publishing + Perspectives, 2024. Reports ISCC codes generated for every product in Bookwire OS, including TDM + opt-out notices. + [publishingperspectives.com](https://publishingperspectives.com/2024/10/frankfurt-countdown-bookwire-offers-protection-from-wrongful-ai-usage/) -[CLink.ID](https://clink.id/) is an interoperable registry, architected to recognize identifiers and meta-data regardless of whether they are Handle- or content-based and/or block-chain inspired. CLink.ID is operated by [CLink Media , Inc.](https://clink.media/) and has integrated [ISCC in its registry](https://clink.id/#objects/20.500.12200.100/5d8e3c3f9d6c6a759261). +## Talks & presentations -### [Smart License Demo](https://smartlicense.coblo.net/) +- **The ISCC Discovery Protocol** - Titusz Pan, F1000Research slides, 2025. Decentralized signing, + timestamping, and discovery for the ISCC. [DOI](https://doi.org/10.7490/f1000research.1120329.1) +- **International Standard Content Code (ISCC) ISO 24138:2024** - Titusz Pan, EDItEUR Supply Chain + Conference, 2024. The standard for the publishing supply chain. + [PDF](https://www.editeur.org/files/Events%20pdfs/Supply%20chain%202024/20241015%20Titusz%20Pan.pdf) +- **Similarity hashing for digital content identification in decentralized environments** - + Blockchain for Science Conference, Berlin, 2019. A 30-minute talk. + [Recording](https://www.youtube.com/watch?v=4OCvPrDhGuQ) -Prototype demo of a smart licensing framework that uses ISCC codes for content identification. [Source code](https://github.com/coblo/smartlicense) is also available. - -### [Blockchain Wallet Demo](https://github.com/coblo/gui-demo) -An early prototype demo of a blockchain wallet that uses ISCC codes for license tokenization. - -## ISCC - Presentations & Articles - -### [Blockchain for Science Conference (Berlin, 2019)](https://www.youtube.com/watch?v=4OCvPrDhGuQ) - -ISCC - Similarity hashing for digital content identification in decentralized environments. [Recording](https://www.youtube.com/watch?v=4OCvPrDhGuQ) of the 30-minute talk. - -## Organizations and Initiatives +## Organizations & standards ### [ISCC Foundation](https://iscc.io/) -The **ISCC Foundation** is an independent international **nonprofit organization** that promotes information technologies for the common good. - -In particular, the foundation supports the **ISCC** and promotes the development and adoption of open standards and open source technologies as well as tools and services that enable individuals and organizations to better **create, manage, discover, access, share, and monetize digital content, knowledge, and ideas**. +The **ISCC Foundation** is an independent international **nonprofit organization** that promotes +information technologies for the common good. It supports the **ISCC** and the development and +adoption of open standards and open-source technologies that help individuals and organizations +**create, manage, discover, access, share, and monetize digital content, knowledge, and ideas**. ### [ISO - International Organization for Standardization](https://www.iso.org/committee/48836.html) -**ISO/TC 46/SC 9** (Identification and description) standardized the **International Standard Content Code** as [ISO 24138:2024](https://www.iso.org/standard/77899.html). +**ISO/TC 46/SC 9** (Identification and description) standardized the **International Standard +Content Code** as [ISO 24138:2024](https://www.iso.org/standard/77899.html). diff --git a/docs/specification.md b/docs/specification.md index eb55039..5ee9c4e 100644 --- a/docs/specification.md +++ b/docs/specification.md @@ -1,133 +1,133 @@ ---- -title: ISCC - Specification -description: The ISCC specification — ISO 24138:2024 and the community ISCC Enhancement Proposals (IEPs). -authors: Titusz Pan -icon: lucide/book-open ---- - -# ISCC - Specification - -The ISCC is specified at two complementary levels. **ISO 24138:2024** is the authoritative -International Standard that fixes the normative core. The **ISCC Enhancement Proposals -(IEPs)** are the open community specification process that documents the standard in detail -and develops new functionality. The open-source reference implementation connects the two. - -!!! note "ISCC Standard - ISO 24138:2024" - [**ISO 24138:2024 - International Standard Content Code (ISCC)**](https://www.iso.org/standard/77899.html) - is the authoritative specification of the ISCC. - -## ISO 24138:2024 — the International Standard - -ISO 24138:2024 was developed under **ISO/TC 46/SC 9/WG 18** and published on 15 May 2024 as -the first-edition International Standard for the ISCC. It is the authoritative, normative -specification. - -The standard defines the ISCC as a multi-component, similarity-preserving code for -digital content and specifies: - -- the **ISCC-CODE** structure — a self-describing header (MainType, SubType, Version, and - Length) and a composite body assembled from one or more **ISCC-UNITs**; -- the **Meta-Code** — similarity of the content's metadata; -- the **Content-Code** — perceptual and structural content similarity, with per-modality - algorithms for Text, Image, Audio, Video, and Mixed content; -- the **Data-Code** — similarity of the raw bitstream; -- the **Instance-Code** — exact data identity via a cryptographic checksum. - -ISO 24138:2024 also carries a normative annex, **Annex D, "Reference implementation"**. The -reference software is published as a freely available **electronic insert** to the standard at -. It is *normative* in a precise sense: any -conforming implementation, given the same conformant input, must produce the same output as -the reference software. Implementations need not reuse its algorithms or programming -techniques, and the reference software cannot add anything to the standard's textual technical -description. The code of this electronic insert is maintained as the open-source -[`iscc-core`](https://github.com/iscc/iscc-core) library. - -- [ISO 24138:2024 at ISO](https://www.iso.org/standard/77899.html) — the full standard text -- [Reference software — electronic insert](https://standards.iso.org/iso/24138/ed-1/en/) — freely available - -## Enhancing the standard — the ISCC Enhancement Proposals - -ISO 24138:2024 is a complete working specification, and future editions are expected to add -functionality, track the state of the art, address security concerns, and retire deprecated -features. Annex C of the standard names the channel for this work: - -> A process to structure and substantiate proposals to enhance ISCC is maintained by the -> ISCC community. \[…\] The ISCC community is gathering ISCC enhancement proposals at -> . - -The **ISCC Enhancement Proposals (IEPs)** are design documents that describe a feature of the -ISCC system or provide information about its processes and environment. They are the detailed, -openly developed working specifications of the ISCC community, and the route through which new -work is substantiated before it can feed into a future edition of the standard. - -- **Rendered specifications:** -- **Source repository:** - -### How the IEP process works - -The process is itself defined in an IEP (IEP-0000). In outline: - -1. **Idea and discussion** — a champion develops the idea and fosters public discussion. -2. **Draft submission** — the proposal is submitted as a pull request to - [`iscc/iscc-ieps`](https://github.com/iscc/iscc-ieps). -3. **Editor review and numbering** — an editor assigns a number and category and checks that - the proposal is sound, complete, and motivated. -4. **Iteration** — the author refines the draft through further pull requests. - -IEPs are categorized as **Core** (changes affecting most implementations; require both a -design document and a reference implementation), **Informational**, or **Process**. Each IEP -moves through a status workflow of `Draft → Proposed → Stable → Obsolete`, with side paths for -`Deferred`, `Withdrawn`, and `Rejected` proposals. - -### IEPs and the scope of ISO 24138 - -Several IEPs were contributed as input to ISO/TC 46/SC 9/WG 18, and the corresponding -normative material was published in the standard. Others document ongoing community work that -extends beyond the current edition. - -- **Within the normative scope of ISO 24138:2024** — the ISCC structure, the standardized - ISCC-UNITs (Meta-, Content-, Data-, and Instance-Code), the composite ISCC-CODE, and ISCC - metadata. -- **Beyond the current standard** — work such as the ISCC-ID, decentralized content - registries, the ISCC DID method, and experimental Semantic-Codes. These may inform a future - edition but are **not** part of ISO 24138:2024 today. - -When precision matters, treat "contributed as input to ISO" and "published in the standard" as -distinct claims, and consult the published standard for the definitive normative scope. - -## Open-source software - -The specifications are made executable and verifiable through open-source software, released -under permissive licenses: - -- [**`iscc-core`**](https://core.iscc.codes) — the reference implementation of the - standardized codec and fingerprinting algorithms. Its code is the reference software - published as the normative electronic insert of ISO 24138:2024, and the foundation for all - ISCC generation. -- [**`iscc-sdk`**](https://sdk.iscc.codes) — the high-level Python toolkit and primary - integration entry point. It builds on `iscc-core` and adds content-type detection, metadata - extraction and embedding, and content extraction so applications can produce a full - ISCC-CODE directly from a media file. - -```python -import iscc_sdk as idk - -# Generate a full ISCC-CODE from a media file -iscc_meta = idk.code_iscc("example.jpg") -print(iscc_meta.iscc) -``` - -## Historical specification drafts - -Before standardization, the ISCC was developed as a public working specification by Titusz Pan -within the [Content Blockchain Project](https://content-blockchain.org). These -early drafts predate ISO 24138:2024 and current implementations. They are **superseded** and -retained for historical reference and URL continuity only: - -- [ISCC Specification v1.x](legacy/specification.md) — the archived working draft (formerly - published at this page). -- [ISCC Specification v1.0](https://github.com/iscc/iscc-codes/blob/version-1.0/docs/specification.md) - — the 2018 draft, on the `version-1.0` branch. - -For the history of how these drafts became an International Standard, see the -[History](history.md) page. +--- +title: ISCC - Specification +description: The ISCC specification - ISO 24138:2024 and the community ISCC Enhancement Proposals (IEPs). +authors: Titusz Pan +icon: lucide/book-open +--- + +# ISCC - Specification + +The ISCC is specified at two complementary levels. **ISO 24138:2024** is the authoritative +International Standard that fixes the normative core. The **ISCC Enhancement Proposals +(IEPs)** are the open community specification process that documents the standard in detail +and develops new functionality. The open-source reference implementation connects the two. + +!!! note "ISCC Standard - ISO 24138:2024" + [**ISO 24138:2024 - International Standard Content Code (ISCC)**](https://www.iso.org/standard/77899.html) + is the authoritative specification of the ISCC. + +## ISO 24138:2024 - the International Standard + +ISO 24138:2024 was developed under **ISO/TC 46/SC 9/WG 18** and published on 15 May 2024 as +the first-edition International Standard for the ISCC. It is the authoritative, normative +specification. + +The standard defines the ISCC as a multi-component, similarity-preserving code for +digital content and specifies: + +- the **ISCC-CODE** structure - a self-describing header (MainType, SubType, Version, and + Length) and a composite body assembled from one or more **ISCC-UNITs**; +- the **Meta-Code** - similarity of the content's metadata; +- the **Content-Code** - perceptual and structural content similarity, with per-modality + algorithms for Text, Image, Audio, Video, and Mixed content; +- the **Data-Code** - similarity of the raw bitstream; +- the **Instance-Code** - exact data identity via a cryptographic checksum. + +ISO 24138:2024 also carries a normative annex, **Annex D, "Reference implementation"**. The +reference software is published as a freely available **electronic insert** to the standard at +. It is *normative* in a precise sense: any +conforming implementation, given the same conformant input, must produce the same output as +the reference software. Implementations need not reuse its algorithms or programming +techniques, and the reference software cannot add anything to the standard's textual technical +description. The code of this electronic insert is maintained as the open-source +[`iscc-core`](https://github.com/iscc/iscc-core) library. + +- [ISO 24138:2024 at ISO](https://www.iso.org/standard/77899.html) - the full standard text +- [Reference software - electronic insert](https://standards.iso.org/iso/24138/ed-1/en/) - freely available + +## Enhancing the standard - the ISCC Enhancement Proposals + +ISO 24138:2024 is a complete working specification, and future editions are expected to add +functionality, track the state of the art, address security concerns, and retire deprecated +features. Annex C of the standard names the channel for this work: + +> A process to structure and substantiate proposals to enhance ISCC is maintained by the +> ISCC community. \[…\] The ISCC community is gathering ISCC enhancement proposals at +> . + +The **ISCC Enhancement Proposals (IEPs)** are design documents that describe a feature of the +ISCC system or provide information about its processes and environment. They are the detailed, +openly developed working specifications of the ISCC community, and the route through which new +work is substantiated before it can feed into a future edition of the standard. + +- **Rendered specifications:** +- **Source repository:** + +### How the IEP process works + +The process is itself defined in an IEP (IEP-0000). In outline: + +1. **Idea and discussion** - a champion develops the idea and fosters public discussion. +2. **Draft submission** - the proposal is submitted as a pull request to + [`iscc/iscc-ieps`](https://github.com/iscc/iscc-ieps). +3. **Editor review and numbering** - an editor assigns a number and category and checks that + the proposal is sound, complete, and motivated. +4. **Iteration** - the author refines the draft through further pull requests. + +IEPs are categorized as **Core** (changes affecting most implementations; require both a +design document and a reference implementation), **Informational**, or **Process**. Each IEP +moves through a status workflow of `Draft → Proposed → Stable → Obsolete`, with side paths for +`Deferred`, `Withdrawn`, and `Rejected` proposals. + +### IEPs and the scope of ISO 24138 + +Several IEPs were contributed as input to ISO/TC 46/SC 9/WG 18, and the corresponding +normative material was published in the standard. Others document ongoing community work that +extends beyond the current edition. + +- **Within the normative scope of ISO 24138:2024** - the ISCC structure, the standardized + ISCC-UNITs (Meta-, Content-, Data-, and Instance-Code), the composite ISCC-CODE, and ISCC + metadata. +- **Beyond the current standard** - work such as the ISCC-ID, decentralized content + registries, the ISCC DID method, and experimental Semantic-Codes. These may inform a future + edition but are **not** part of ISO 24138:2024 today. + +When precision matters, treat "contributed as input to ISO" and "published in the standard" as +distinct claims, and consult the published standard for the definitive normative scope. + +## Open-source software + +The specifications are made executable and verifiable through open-source software, released +under permissive licenses: + +- [**`iscc-core`**](https://core.iscc.codes) - the reference implementation of the + standardized codec and fingerprinting algorithms. Its code is the reference software + published as the normative electronic insert of ISO 24138:2024, and the foundation for all + ISCC generation. +- [**`iscc-sdk`**](https://sdk.iscc.codes) - the high-level Python toolkit and primary + integration entry point. It builds on `iscc-core` and adds content-type detection, metadata + extraction and embedding, and content extraction so applications can produce a full + ISCC-CODE directly from a media file. + +```python +import iscc_sdk as idk + +# Generate a full ISCC-CODE from a media file +iscc_meta = idk.code_iscc("example.jpg") +print(iscc_meta.iscc) +``` + +## Historical specification drafts + +Before standardization, the ISCC was developed as a public working specification by Titusz Pan +within the [Content Blockchain Project](https://content-blockchain.org). These +early drafts predate ISO 24138:2024 and current implementations. They are **superseded** and +retained for historical reference and URL continuity only: + +- [ISCC Specification v1.x](legacy/specification.md) - the archived working draft (formerly + published at this page). +- [ISCC Specification v1.0](https://github.com/iscc/iscc-codes/blob/version-1.0/docs/specification.md): + the 2018 draft, on the `version-1.0` branch. + +For the history of how these drafts became an International Standard, see the +[History](history.md) page. diff --git a/docs/stylesheets/copilot.css b/docs/stylesheets/copilot.css new file mode 100644 index 0000000..ccb1f27 --- /dev/null +++ b/docs/stylesheets/copilot.css @@ -0,0 +1,117 @@ +/* ISCC Copilot Widget Theme for Documentation Sites + * + * Styles the Chainlit copilot chat widget to match the ISCC brand. + * Injected into the widget Shadow DOM via customCssUrl. + * + * Zensical sets html { font-size: 125% } (20px) which inflates all rem-based + * sizes inside the Chainlit widget by 25%. Instead of blanket zoom (which + * over-corrects body text), we pin Tailwind text utilities to their intended + * pixel values and zoom only the self-contained toggle button. + */ + +/* Toggle button: self-contained element, zoom works cleanly here */ +#chainlit-copilot-button { + zoom: 0.8; +} + +/* --- Fix Tailwind text utilities inflated by 125% root font-size --- + * Tailwind rem values resolve against html 20px instead of 16px. + * Pin each utility to its intended pixel size. */ +.text-xs { font-size: 12px !important; line-height: 16px !important; } +.text-sm { font-size: 14px !important; line-height: 20px !important; } +.text-base { font-size: 16px !important; line-height: 24px !important; } +.text-lg { font-size: 18px !important; line-height: 28px !important; } +.text-xl { font-size: 20px !important; line-height: 28px !important; } +.text-2xl { font-size: 24px !important; line-height: 32px !important; } + +/* --- Widen the chat panel --- */ +.copilot-container-collapsed { + min-width: 420px !important; +} + +/* Fix inflated rem-based spacing inside the chat panel. + * Scale factor: 0.8rem per intended 1rem (16/20 = 0.8). */ +[data-radix-popper-content-wrapper] { + font-size: 16px; +} + +/* Ensure chat message body text renders at readable size */ +#chainlit-copilot-chat p, +#chainlit-copilot-chat li, +#chainlit-copilot-chat span:not([class]), +#chainlit-copilot-chat div:not([class]) > span { + font-size: 15px !important; + line-height: 1.2 !important; +} + +/* Force light theme variables - ISCC brand on white background. + * Target both .light and .dark to override regardless of server theme. */ +:root, +:host, +.light, +.dark { + --background: 0 0% 100% !important; + --foreground: 213 69% 23% !important; + --primary: 356 89% 67% !important; + --primary-foreground: 0 0% 100% !important; + --secondary: 210 30% 95% !important; + --secondary-foreground: 213 69% 23% !important; + --accent: 210 30% 95% !important; + --accent-foreground: 213 69% 23% !important; + --muted: 210 20% 96% !important; + --muted-foreground: 213 40% 40% !important; + --card: 0 0% 100% !important; + --card-foreground: 213 69% 23% !important; + --popover: 0 0% 100% !important; + --popover-foreground: 213 69% 23% !important; + --border: 210 20% 85% !important; + --input: 0 0% 100% !important; + --ring: 211 100% 35% !important; +} + +/* Floating toggle button - ISCC Coral */ +button.bg-primary.rounded-full, +button[class*="bg-primary"][class*="rounded-full"] { + background-color: #f56169 !important; + background: #f56169 !important; +} + +button.bg-primary.rounded-full:hover, +button[class*="bg-primary"][class*="rounded-full"]:hover { + background-color: #e04550 !important; + background: #e04550 !important; +} + +/* Send button and primary actions - ISCC Blue */ +button.bg-primary:not(.rounded-full), +button[class*="bg-primary"]:not([class*="rounded-full"]), +.bg-primary:not(.rounded-full) { + background-color: #0054b2 !important; + background: #0054b2 !important; +} + +button.bg-primary:not(.rounded-full):hover, +button[class*="bg-primary"]:not([class*="rounded-full"]):hover, +.bg-primary:not(.rounded-full):hover { + background-color: #123663 !important; + background: #123663 !important; +} + +/* Links - ISCC Blue */ +a { + color: #0054b2 !important; +} + +/* Code blocks - ISCC navy */ +pre, +pre code, +.hljs { + background-color: #0d2847 !important; + color: #e6edf5 !important; +} + +/* Inline code */ +code:not(pre code):not(.hljs) { + background-color: #e8ebef !important; + color: inherit !important; +} diff --git a/docs/stylesheets/custom.css b/docs/stylesheets/custom.css index 73d0d4c..077fb91 100644 --- a/docs/stylesheets/custom.css +++ b/docs/stylesheets/custom.css @@ -133,9 +133,12 @@ html [data-md-color-scheme="default"] .md-footer-meta.md-typeset a { color: rgba(255, 255, 255, 0.7); } -/* Existing layout/readability tweaks */ -.md-grid { - max-width: 1550px; +/* Content width: cap at 1385px on large screens. Default Material caps at + * ~61rem; this widens it on large screens only, capped to 90vw to keep margins. */ +@media screen and (min-width: 76.25em) { + .md-grid { + max-width: min(1385px, 90vw); + } } .md-typeset { @@ -231,14 +234,20 @@ a code { border-color: var(--md-primary-fg-color--dark); } -/* Audience cards */ +/* Audience cards: stacked on narrow screens, three across once there is room */ .iscc-cards { display: grid; - grid-template-columns: repeat(auto-fit, minmax(16rem, 1fr)); + grid-template-columns: 1fr; gap: 1.2rem; margin: 1.5em 0; } +@media screen and (min-width: 60em) { + .iscc-cards { + grid-template-columns: repeat(3, 1fr); + } +} + .md-typeset .iscc-card { position: relative; display: flex; @@ -289,3 +298,180 @@ a code { .md-typeset .iscc-card p:last-child a { font-weight: 600; } + +/* ---------- Resources: repository cards ---------- */ + +/* Responsive grid of repository cards: one column on phones, two on tablets, + * three on wide screens. Mirrors the audience-card breakpoints. */ +.iscc-repos { + display: grid; + grid-template-columns: 1fr; + gap: 1.1rem; + margin: 1.4em 0 2.2em; +} + +@media screen and (min-width: 48em) { + .iscc-repos { + grid-template-columns: repeat(2, 1fr); + } +} + +/* Three across only on large monitors, where the capped content column is + * wide enough for the repository name and status badge to share a line. */ +@media screen and (min-width: 100em) { + .iscc-repos { + grid-template-columns: repeat(3, 1fr); + } +} + +/* The whole card is a single link to the GitHub repository. */ +.md-typeset .iscc-repo { + position: relative; + display: flex; + flex-direction: column; + padding: 1.3rem 1.3rem 1.1rem; + border: 1px solid var(--md-default-fg-color--lightest); + border-radius: 0.5rem; + background-color: var(--md-default-bg-color); + box-shadow: 0 1px 3px rgba(0, 0, 0, 0.06); + color: var(--md-default-fg-color); + text-decoration: none; + overflow: hidden; + transition: transform 150ms ease, box-shadow 150ms ease, border-color 150ms ease; +} + +/* ISCC-blue gradient accent bar across the top of each card */ +.md-typeset .iscc-repo::before { + content: ""; + position: absolute; + top: 0; + left: 0; + right: 0; + height: 4px; + background: linear-gradient( + 90deg, + var(--md-primary-fg-color), + var(--md-primary-fg-color--light) + ); +} + +.md-typeset .iscc-repo:hover { + transform: translateY(-4px); + border-color: var(--md-primary-fg-color--light); + box-shadow: 0 0.5rem 1.25rem rgba(0, 84, 178, 0.15); +} + +/* Repository name as the card title */ +.md-typeset .iscc-repo__name { + font-family: var(--md-code-font), monospace; + font-weight: 700; + font-size: 0.95em; + color: var(--md-primary-fg-color); + word-break: break-word; +} + +/* Status badge on its own line under the name, sized to its content so the + * name keeps full width and never truncates regardless of card size. */ +.md-typeset .iscc-repo > .iscc-badge { + align-self: flex-start; + margin: 0.5rem 0 0.9rem; +} + +.iscc-repo__desc { + font-size: 0.82em; + line-height: 1.5; + color: var(--md-default-fg-color--light); + margin: 0 0 1rem; +} + +/* CTA pins to the bottom so it aligns across every card in a row */ +.md-typeset .iscc-repo__cta { + display: flex; + align-items: center; + gap: 0.4rem; + margin-top: auto; + font-size: 0.78em; + font-weight: 600; + color: var(--md-primary-fg-color); + white-space: nowrap; +} + +.iscc-repo__icon { + flex: 0 0 auto; + width: 1.1em; + height: 1.1em; + color: var(--md-default-fg-color--light); +} + +/* The GitHub mark tints to the link color when the card is hovered */ +.md-typeset .iscc-repo:hover .iscc-repo__icon { + color: var(--md-primary-fg-color); +} + +/* ---------- Status badges ---------- */ + +.md-typeset .iscc-badge { + display: inline-flex; + align-items: center; + font-family: var(--md-code-font), monospace; + font-size: 0.62em; + font-weight: 700; + letter-spacing: 0.04em; + text-transform: uppercase; + padding: 0.25em 0.7em; + border-radius: 999px; + line-height: 1.5; + white-space: nowrap; +} + +.md-typeset .iscc-badge--stable { + background-color: #a6db50; + color: #1d3800; +} + +.md-typeset .iscc-badge--beta { + background-color: #4596f5; + color: #ffffff; +} + +.md-typeset .iscc-badge--dev { + background-color: #ffc300; + color: #3a2d00; +} + +.md-typeset .iscc-badge--exp { + background-color: #f56169; + color: #ffffff; +} + +.md-typeset .iscc-badge--draft { + background-color: #6c757d; + color: #ffffff; +} + +/* ---------- Status legend ---------- */ + +.iscc-legend { + display: flex; + flex-wrap: wrap; + gap: 0.6rem 1.4rem; + margin: 0.5em 0 2em; + padding: 1rem 1.2rem; + border: 1px solid var(--md-default-fg-color--lightest); + border-radius: 0.5rem; + background-color: var(--md-code-bg-color); +} + +.iscc-legend__item { + display: flex; + align-items: center; + gap: 0.5rem; + font-size: 0.78em; + color: var(--md-default-fg-color--light); +} + +/* Draft marks proposal documents rather than software maturity, so it breaks + * onto its own line below the Stable and Beta badges. */ +.iscc-legend__item--break { + flex-basis: 100%; +} diff --git a/zensical.toml b/zensical.toml index c412fa3..16d11af 100644 --- a/zensical.toml +++ b/zensical.toml @@ -8,14 +8,18 @@ repo_name = "iscc/iscc-codes" edit_uri = "edit/main/docs/" copyright = "Documentation licensed under CC BY 4.0 | Copyright © 2016-2026 The Authors | Privacy Policy | Cookie Policy | Imprint | Disclaimer" extra_css = ["stylesheets/custom.css"] +extra_javascript = [ + "https://iscc.ai/copilot/index.js", + "javascripts/copilot.js", +] nav = [ - { "Overview" = "index.md" }, + { "iscc-codes" = "index.md" }, { "Capabilities" = "capabilities.md" }, { "Concept" = "concept.md" }, - { "History" = "history.md" }, - { "Specification" = "specification.md" }, { "Resources" = "resources.md" }, + { "Specification" = "specification.md" }, + { "History" = "history.md" }, { "License" = "license.md" }, ]