Skip to content

Conversation

@ghukill
Copy link
Contributor

@ghukill ghukill commented Dec 8, 2025

Purpose and background context

This PR adds a new field fulltext to the TimdexRecord model class.

This field is not yet used by any sources, but the first planned will be mitlibwebsite which will store the full-text of library websites.

This work falls under epic USE-260 which has a few tightly coupled tickets, with this being the first work performed. Next steps include:

  • updating browsertrix-harvester to return full HTML in record
  • back here in Transmog, parse HTML and begin adding content to this new fulltext field
  • adding fulltext field in TIM's Opensearch mapping, which we could then test with mitlibwebsite records with fulltext values

How can a reviewer manually see the effects of these changes?

Nothing really to see! A field has been added, but no sources using it yet.

Please see this section in a confluence write-up on this with spike examples: https://mitlibraries.atlassian.net/wiki/spaces/D/pages/4989517825/2025-12-03+-+Full-text+search+spike+for+mitlibwebsite+source#Changes-in-Transmogrifier.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

@ghukill ghukill requested a review from a team December 8, 2025 18:55
Why these changes are being introduced:

Decisions have been made to move incrementally into supporting full-text
search in TIMDEX.  The first known use case is for the mitlibwebsite
source, where we'll store the full-text of websites for searching.

While it's possible a single record may have multiple full-texts associated
with it, our initial requirements are only for a single, root level, simple
field.

How this addresses that need:

Adds new root level, simple, field called 'fulltext' to TimdexRecord
model class.  This field is not yet used by any sources.

Side effects of this change:
* Eventually, sources will begin to populate this.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/USE-256
@ghukill ghukill force-pushed the USE-256-new-fulltext-field branch from 07b25e2 to dbb6f5c Compare December 8, 2025 20:58
@ghukill ghukill merged commit 2b34059 into main Dec 8, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants