Skip to content

Conversation

@radu-gheorghe
Copy link
Contributor

I've done a bunch of things here:

  1. Added info about recent new linguistics features regarding profiles.
  2. Reorganized the linguistics docs into the "common" page and the "implementation-specific" ones. It's not a perfect delimitation, because things are intertwined, but it's a start :)
  3. Fixed an issue that annoyed me for a while, where if you go to a link in the docs, we didn't expand the sidebar tree to see where we are. Now we do expand it and see where we are (e.g., if you come to a documentation page from outside).
  4. Fixed broken links because of the reorganization. Unfortunately, this highlights some of those unclear boundaries between common and implementation-specific options (especially when we take defaults into account). But again, it looks like a decent start.

If you think I should make additional changes before merging, please feel free to make suggestions.

Copy link
Member

@bratseth bratseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I think "stemming" etc. are general concepts that should be documented in linguistics, not under open-nlp?

In any case, feel free to merge!

in <code>services.xml</code>:
</p>
<pre>{% highlight xml %}
<item key="profile=whitespaceLowercase;language=en">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Lucene-linguistics specific config so seems a bit strange to put it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was a representative example. I think this could be used with other implementations in theory, but I don't know how (or if people will want that).

Do you think we should use another implementation as an example? I thought an example is important to illustrate the concept, which is otherwise hard to explain (given that we also have language to keep in mind).

@radu-gheorghe
Copy link
Contributor Author

radu-gheorghe commented Jan 29, 2026

I think "stemming" etc. are general concepts that should be documented in linguistics, not under open-nlp?
Yes, I think it's worth discussing them there, but the current descriptions are OpenNLP-specific.

I was thinking that maybe we can add some generic stuff about these concepts and have inbound links point to those (as opposed to OpenNLP), but then I got stuck because inbound links often assume OpenNLP in some fashion. So I just gave up and pointed to previous content 🙈

What do you think? Is the offer to merge still up? 😅 Or should I make some changes beforehand?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants