linguistics: reorganized + added new profile info and lucene linguistics #4468

radu-gheorghe · 2026-01-29T14:43:24Z

I've done a bunch of things here:

Added info about recent new linguistics features regarding profiles.
Reorganized the linguistics docs into the "common" page and the "implementation-specific" ones. It's not a perfect delimitation, because things are intertwined, but it's a start :)
Fixed an issue that annoyed me for a while, where if you go to a link in the docs, we didn't expand the sidebar tree to see where we are. Now we do expand it and see where we are (e.g., if you come to a documentation page from outside).
Fixed broken links because of the reorganization. Unfortunately, this highlights some of those unclear boundaries between common and implementation-specific options (especially when we take defaults into account). But again, it looks like a decent start.

If you think I should make additional changes before merging, please feel free to make suggestions.

bratseth

Great! I think "stemming" etc. are general concepts that should be documented in linguistics, not under open-nlp?

In any case, feel free to merge!

bratseth · 2026-01-29T14:51:03Z

en/linguistics/linguistics.html

+  in <code>services.xml</code>:
+</p>
+<pre>{% highlight xml %}
+  <item key="profile=whitespaceLowercase;language=en">


This is Lucene-linguistics specific config so seems a bit strange to put it here?

I thought it was a representative example. I think this could be used with other implementations in theory, but I don't know how (or if people will want that).

Do you think we should use another implementation as an example? I thought an example is important to illustrate the concept, which is otherwise hard to explain (given that we also have language to keep in mind).

radu-gheorghe · 2026-01-29T15:38:41Z

I think "stemming" etc. are general concepts that should be documented in linguistics, not under open-nlp?
Yes, I think it's worth discussing them there, but the current descriptions are OpenNLP-specific.

I was thinking that maybe we can add some generic stuff about these concepts and have inbound links point to those (as opposed to OpenNLP), but then I got stuck because inbound links often assume OpenNLP in some fashion. So I just gave up and pointed to previous content 🙈

What do you think? Is the offer to merge still up? 😅 Or should I make some changes beforehand?

linguistics: reorganized + added new profile info and lucene linguistics

62b77f6

radu-gheorghe requested review from arnej27959, bratseth and kkraune January 29, 2026 14:43

bratseth reviewed Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linguistics: reorganized + added new profile info and lucene linguistics #4468

linguistics: reorganized + added new profile info and lucene linguistics #4468

Uh oh!

radu-gheorghe commented Jan 29, 2026

Uh oh!

bratseth left a comment

Uh oh!

bratseth Jan 29, 2026

Uh oh!

radu-gheorghe Jan 29, 2026

Uh oh!

radu-gheorghe commented Jan 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linguistics: reorganized + added new profile info and lucene linguistics #4468

Are you sure you want to change the base?

linguistics: reorganized + added new profile info and lucene linguistics #4468

Uh oh!

Conversation

radu-gheorghe commented Jan 29, 2026

Uh oh!

bratseth left a comment

Choose a reason for hiding this comment

Uh oh!

bratseth Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

radu-gheorghe Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

radu-gheorghe commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

radu-gheorghe commented Jan 29, 2026 •

edited

Loading