Skip to content

Nested Documents UI — Phase 0: EmailArchive Metadata Schema (XSD + XSLT) #3660

@luis100

Description

@luis100

Overview

Define the XML schema (XSD) and ingest XSLT crosswalk for the emailarchive metadata type. This is the foundation phase — no Java code required. Once complete, an AIP with emailarchive descriptive metadata will index nested email child documents into Solr automatically via the existing SolrXMLLoader + indexDescriptiveMetadataFields pipeline.

Note

This phase follows the exact pattern established by rakenskapsinfo.xslt. The SolrXMLLoader already handles <field name="X"><doc>...</doc></field> blocks by producing Collection<SolrInputDocument> field values, which propagate to the parent AIP document through indexDescriptiveMetadataFields. No changes to Java indexing code are needed.

Warning

The email archive use case is the reference implementation for the generic nested documents UI feature. No hardcoded email-specific logic should exist anywhere outside this phase's config files. All UI phases must work for any content_type, not just emails.

Part of: #3382


XML Schema

File: roda-core/roda-core/src/main/resources/config/schemas/emailarchive.xsd

The XML describes a single mailbox (parent) containing N email records (children).

XSD content
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified"
           targetNamespace="https://roda-community.org/schemas/emailarchive/v1"
           xmlns:ea="https://roda-community.org/schemas/emailarchive/v1">

  <xs:element name="emailArchive">
    <xs:complexType>
      <xs:sequence>
        <!-- Mailbox-level (parent AIP) fields -->
        <xs:element name="custodian"       type="xs:string"  minOccurs="1"/>
        <xs:element name="emailAddress"    type="xs:string"  minOccurs="1"/>
        <xs:element name="dateStart"       type="xs:date"    minOccurs="0"/>
        <xs:element name="dateEnd"         type="xs:date"    minOccurs="0"/>
        <xs:element name="totalMessages"   type="xs:integer" minOccurs="0"/>
        <xs:element name="originalFormat"  type="xs:string"  minOccurs="0"/>
        <xs:element name="archivingMotive" type="xs:string"  minOccurs="0"/>
        <!-- One element per archived email -->
        <xs:element name="email" type="ea:emailType"
                    minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:complexType name="emailType">
    <xs:sequence>
      <xs:element name="messageId"      type="xs:string"   minOccurs="1"/>
      <xs:element name="subject"        type="xs:string"   minOccurs="0"/>
      <xs:element name="sender"         type="xs:string"   minOccurs="0"/>
      <xs:element name="recipients"     type="xs:string"   minOccurs="0"/>
      <xs:element name="sentDate"       type="xs:dateTime" minOccurs="0"/>
      <xs:element name="folderPath"     type="xs:string"   minOccurs="0"/>
      <xs:element name="hasAttachments" type="xs:boolean"  minOccurs="0"/>
      <xs:element name="filePath"       type="xs:string"   minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>

</xs:schema>

Field descriptions

Mailbox-level (parent — visible in AIP Details view):

Field Type Description
custodian string Owner of the mailbox (e.g. "João Silva")
emailAddress string Primary email address
dateStart date Earliest message date in the archive
dateEnd date Latest message date in the archive
totalMessages integer Total number of archived emails
originalFormat string Source format (e.g. "PST", "MBOX", "Exchange API")
archivingMotive string Reason for archiving (e.g. "Offboarding", "Legal Hold")

Per-email (child — nested document):

Field Type Solr dynamic field Description
messageId string messageId_s RFC 5322 Message-ID header — key for deduplication
subject string subject_txt Email subject line
sender string sender_s From address
recipients string recipients_txt To, CC, BCC addresses
sentDate dateTime sentDate_dt Date and time sent
folderPath string folderPath_s Original folder (e.g. "Inbox/Projects")
hasAttachments boolean hasAttachments_b Attachment indicator
filePath string filePath_s Relative path to the .eml file within the representation

XSLT Crosswalk

File: roda-core/roda-core/src/main/resources/config/crosswalks/ingest/emailarchive.xslt

Uses Solr dynamic field suffixes (_txt, _s, _dt, _b, _i) — no managed-schema.xml changes required.

XSLT content
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:ea="https://roda-community.org/schemas/emailarchive/v1"
    exclude-result-prefixes="ea">

  <xsl:output method="xml" indent="yes" encoding="UTF-8" omit-xml-declaration="yes"/>

  <xsl:template match="/">
    <doc>
      <xsl:apply-templates/>
    </doc>
  </xsl:template>

  <xsl:template match="*:emailArchive">
    <!-- Parent-level Solr fields -->
    <field name="custodian_txt"><xsl:value-of select="*:custodian/text()"/></field>
    <field name="emailAddress_s"><xsl:value-of select="*:emailAddress/text()"/></field>
    <field name="dateStart_dt"><xsl:value-of select="*:dateStart/text()"/></field>
    <field name="dateEnd_dt"><xsl:value-of select="*:dateEnd/text()"/></field>
    <field name="totalMessages_i"><xsl:value-of select="*:totalMessages/text()"/></field>
    <field name="originalFormat_s"><xsl:value-of select="*:originalFormat/text()"/></field>
    <field name="archivingMotive_txt"><xsl:value-of select="*:archivingMotive/text()"/></field>
    <field name="content_type">emailarchive</field>

    <!-- NESTED DOCUMENTS — one Solr child document per email -->
    <field name="emails">
      <xsl:for-each select="*:email">
        <doc>
          <field name="content_type">email</field>
          <field name="messageId_s"><xsl:value-of select="*:messageId/text()"/></field>
          <field name="subject_txt"><xsl:value-of select="*:subject/text()"/></field>
          <field name="sender_s"><xsl:value-of select="*:sender/text()"/></field>
          <field name="recipients_txt"><xsl:value-of select="*:recipients/text()"/></field>
          <field name="sentDate_dt"><xsl:value-of select="*:sentDate/text()"/></field>
          <field name="folderPath_s"><xsl:value-of select="*:folderPath/text()"/></field>
          <field name="hasAttachments_b"><xsl:value-of select="*:hasAttachments/text()"/></field>
          <field name="filePath_s"><xsl:value-of select="*:filePath/text()"/></field>
        </doc>
      </xsl:for-each>
    </field>

    <xsl:apply-templates/>
  </xsl:template>

  <!-- Suppress email child nodes from top-level field processing -->
  <xsl:template match="*:email"/>

</xsl:stylesheet>

Configuration Registration

File: roda-ui/roda-wui/src/main/resources/config/roda-wui.properties

ui.browser.metadata.descriptive.types = emailarchive

i18n Keys Required

File: roda-ui/roda-wui/src/main/resources/config/i18n/ServerMessages.properties (and all locale files)

ui.browse.metadata.descriptive.type.emailarchive = Email Archive

Files to Create / Change

File Action
roda-core/roda-core/src/main/resources/config/schemas/emailarchive.xsd Create
roda-core/roda-core/src/main/resources/config/crosswalks/ingest/emailarchive.xslt Create
roda-ui/roda-wui/src/main/resources/config/roda-wui.properties Edit — add emailarchive to ui.browser.metadata.descriptive.types
roda-ui/roda-wui/src/main/resources/config/i18n/ServerMessages.properties Edit — add type label
All other locale .properties files Edit — add translations

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions