Overview
Define the XML schema (XSD) and ingest XSLT crosswalk for the emailarchive metadata type. This is the foundation phase — no Java code required. Once complete, an AIP with emailarchive descriptive metadata will index nested email child documents into Solr automatically via the existing SolrXMLLoader + indexDescriptiveMetadataFields pipeline.
Note
This phase follows the exact pattern established by rakenskapsinfo.xslt. The SolrXMLLoader already handles <field name="X"><doc>...</doc></field> blocks by producing Collection<SolrInputDocument> field values, which propagate to the parent AIP document through indexDescriptiveMetadataFields. No changes to Java indexing code are needed.
Warning
The email archive use case is the reference implementation for the generic nested documents UI feature. No hardcoded email-specific logic should exist anywhere outside this phase's config files. All UI phases must work for any content_type, not just emails.
Part of: #3382
XML Schema
File: roda-core/roda-core/src/main/resources/config/schemas/emailarchive.xsd
The XML describes a single mailbox (parent) containing N email records (children).
XSD content
<?xml version =" 1.0" encoding =" UTF-8" ?>
<xs : schema xmlns : xs =" http://www.w3.org/2001/XMLSchema"
elementFormDefault =" qualified"
targetNamespace =" https://roda-community.org/schemas/emailarchive/v1"
xmlns : ea =" https://roda-community.org/schemas/emailarchive/v1" >
<xs : element name =" emailArchive" >
<xs : complexType >
<xs : sequence >
<!-- Mailbox-level (parent AIP) fields -->
<xs : element name =" custodian" type =" xs:string" minOccurs =" 1" />
<xs : element name =" emailAddress" type =" xs:string" minOccurs =" 1" />
<xs : element name =" dateStart" type =" xs:date" minOccurs =" 0" />
<xs : element name =" dateEnd" type =" xs:date" minOccurs =" 0" />
<xs : element name =" totalMessages" type =" xs:integer" minOccurs =" 0" />
<xs : element name =" originalFormat" type =" xs:string" minOccurs =" 0" />
<xs : element name =" archivingMotive" type =" xs:string" minOccurs =" 0" />
<!-- One element per archived email -->
<xs : element name =" email" type =" ea:emailType"
minOccurs =" 0" maxOccurs =" unbounded" />
</xs : sequence >
</xs : complexType >
</xs : element >
<xs : complexType name =" emailType" >
<xs : sequence >
<xs : element name =" messageId" type =" xs:string" minOccurs =" 1" />
<xs : element name =" subject" type =" xs:string" minOccurs =" 0" />
<xs : element name =" sender" type =" xs:string" minOccurs =" 0" />
<xs : element name =" recipients" type =" xs:string" minOccurs =" 0" />
<xs : element name =" sentDate" type =" xs:dateTime" minOccurs =" 0" />
<xs : element name =" folderPath" type =" xs:string" minOccurs =" 0" />
<xs : element name =" hasAttachments" type =" xs:boolean" minOccurs =" 0" />
<xs : element name =" filePath" type =" xs:string" minOccurs =" 0" />
</xs : sequence >
</xs : complexType >
</xs : schema >
Field descriptions
Mailbox-level (parent — visible in AIP Details view):
Field
Type
Description
custodian
string
Owner of the mailbox (e.g. "João Silva")
emailAddress
string
Primary email address
dateStart
date
Earliest message date in the archive
dateEnd
date
Latest message date in the archive
totalMessages
integer
Total number of archived emails
originalFormat
string
Source format (e.g. "PST", "MBOX", "Exchange API")
archivingMotive
string
Reason for archiving (e.g. "Offboarding", "Legal Hold")
Per-email (child — nested document):
Field
Type
Solr dynamic field
Description
messageId
string
messageId_s
RFC 5322 Message-ID header — key for deduplication
subject
string
subject_txt
Email subject line
sender
string
sender_s
From address
recipients
string
recipients_txt
To, CC, BCC addresses
sentDate
dateTime
sentDate_dt
Date and time sent
folderPath
string
folderPath_s
Original folder (e.g. "Inbox/Projects")
hasAttachments
boolean
hasAttachments_b
Attachment indicator
filePath
string
filePath_s
Relative path to the .eml file within the representation
XSLT Crosswalk
File: roda-core/roda-core/src/main/resources/config/crosswalks/ingest/emailarchive.xslt
Uses Solr dynamic field suffixes (_txt, _s, _dt, _b, _i) — no managed-schema.xml changes required .
XSLT content
<?xml version =" 1.0" encoding =" UTF-8" ?>
<xsl : stylesheet version =" 2.0"
xmlns : xsl =" http://www.w3.org/1999/XSL/Transform"
xmlns : ea =" https://roda-community.org/schemas/emailarchive/v1"
exclude-result-prefixes =" ea" >
<xsl : output method =" xml" indent =" yes" encoding =" UTF-8" omit-xml-declaration =" yes" />
<xsl : template match =" /" >
<doc >
<xsl : apply-templates />
</doc >
</xsl : template >
<xsl : template match =" *:emailArchive" >
<!-- Parent-level Solr fields -->
<field name =" custodian_txt" ><xsl : value-of select =" *:custodian/text()" /></field >
<field name =" emailAddress_s" ><xsl : value-of select =" *:emailAddress/text()" /></field >
<field name =" dateStart_dt" ><xsl : value-of select =" *:dateStart/text()" /></field >
<field name =" dateEnd_dt" ><xsl : value-of select =" *:dateEnd/text()" /></field >
<field name =" totalMessages_i" ><xsl : value-of select =" *:totalMessages/text()" /></field >
<field name =" originalFormat_s" ><xsl : value-of select =" *:originalFormat/text()" /></field >
<field name =" archivingMotive_txt" ><xsl : value-of select =" *:archivingMotive/text()" /></field >
<field name =" content_type" >emailarchive</field >
<!-- NESTED DOCUMENTS — one Solr child document per email -->
<field name =" emails" >
<xsl : for-each select =" *:email" >
<doc >
<field name =" content_type" >email</field >
<field name =" messageId_s" ><xsl : value-of select =" *:messageId/text()" /></field >
<field name =" subject_txt" ><xsl : value-of select =" *:subject/text()" /></field >
<field name =" sender_s" ><xsl : value-of select =" *:sender/text()" /></field >
<field name =" recipients_txt" ><xsl : value-of select =" *:recipients/text()" /></field >
<field name =" sentDate_dt" ><xsl : value-of select =" *:sentDate/text()" /></field >
<field name =" folderPath_s" ><xsl : value-of select =" *:folderPath/text()" /></field >
<field name =" hasAttachments_b" ><xsl : value-of select =" *:hasAttachments/text()" /></field >
<field name =" filePath_s" ><xsl : value-of select =" *:filePath/text()" /></field >
</doc >
</xsl : for-each >
</field >
<xsl : apply-templates />
</xsl : template >
<!-- Suppress email child nodes from top-level field processing -->
<xsl : template match =" *:email" />
</xsl : stylesheet >
Configuration Registration
File: roda-ui/roda-wui/src/main/resources/config/roda-wui.properties
ui.browser.metadata.descriptive.types = emailarchive
i18n Keys Required
File: roda-ui/roda-wui/src/main/resources/config/i18n/ServerMessages.properties (and all locale files)
ui.browse.metadata.descriptive.type.emailarchive = Email Archive
Files to Create / Change
File
Action
roda-core/roda-core/src/main/resources/config/schemas/emailarchive.xsd
Create
roda-core/roda-core/src/main/resources/config/crosswalks/ingest/emailarchive.xslt
Create
roda-ui/roda-wui/src/main/resources/config/roda-wui.properties
Edit — add emailarchive to ui.browser.metadata.descriptive.types
roda-ui/roda-wui/src/main/resources/config/i18n/ServerMessages.properties
Edit — add type label
All other locale .properties files
Edit — add translations
Overview
Define the XML schema (XSD) and ingest XSLT crosswalk for the
emailarchivemetadata type. This is the foundation phase — no Java code required. Once complete, an AIP withemailarchivedescriptive metadata will index nested email child documents into Solr automatically via the existingSolrXMLLoader+indexDescriptiveMetadataFieldspipeline.Note
This phase follows the exact pattern established by
rakenskapsinfo.xslt. TheSolrXMLLoaderalready handles<field name="X"><doc>...</doc></field>blocks by producingCollection<SolrInputDocument>field values, which propagate to the parent AIP document throughindexDescriptiveMetadataFields. No changes to Java indexing code are needed.Warning
The email archive use case is the reference implementation for the generic nested documents UI feature. No hardcoded email-specific logic should exist anywhere outside this phase's config files. All UI phases must work for any
content_type, not just emails.Part of: #3382
XML Schema
File:
roda-core/roda-core/src/main/resources/config/schemas/emailarchive.xsdThe XML describes a single mailbox (parent) containing N email records (children).
XSD content
Field descriptions
Mailbox-level (parent — visible in AIP Details view):
custodianemailAddressdateStartdateEndtotalMessagesoriginalFormatarchivingMotivePer-email (child — nested document):
messageIdmessageId_ssubjectsubject_txtsendersender_srecipientsrecipients_txtsentDatesentDate_dtfolderPathfolderPath_shasAttachmentshasAttachments_bfilePathfilePath_s.emlfile within the representationXSLT Crosswalk
File:
roda-core/roda-core/src/main/resources/config/crosswalks/ingest/emailarchive.xsltUses Solr dynamic field suffixes (
_txt,_s,_dt,_b,_i) — nomanaged-schema.xmlchanges required.XSLT content
Configuration Registration
File:
roda-ui/roda-wui/src/main/resources/config/roda-wui.propertiesui.browser.metadata.descriptive.types = emailarchivei18n Keys Required
File:
roda-ui/roda-wui/src/main/resources/config/i18n/ServerMessages.properties(and all locale files)ui.browse.metadata.descriptive.type.emailarchive = Email ArchiveFiles to Create / Change
roda-core/roda-core/src/main/resources/config/schemas/emailarchive.xsdroda-core/roda-core/src/main/resources/config/crosswalks/ingest/emailarchive.xsltroda-ui/roda-wui/src/main/resources/config/roda-wui.propertiesemailarchivetoui.browser.metadata.descriptive.typesroda-ui/roda-wui/src/main/resources/config/i18n/ServerMessages.properties.propertiesfiles