Skip to content

Conversation

@ritinae
Copy link

@ritinae ritinae commented Aug 1, 2025

This PR adjusts the lang file parsing, to allow nesting the keys. This added structure helps working with larger lang files, and allows utilizing e.g. code folding in editors.

The translation keys are generated as paths, based on the structure of the JSON. For example, I would like to be able to write JSON like this in the lang files:

{
  "foo": {
    "bar": "baz"
  },
  "yay": "wohoo!"
}

The resulting translation keys should then be "foo-bar" and "yay". This can almost already be achieved with the current parser, by writing the paths as parts of the keys:

{
  "foo-bar": "baz",
  "yay": "wohoo!",
}

However, this lacks structure, making it harder to read when the desired structure has multiple levels of nesting and/or when there are more keys around. The structured version has the added benefit of supporting code folding in editors, helping when working with larger sets of translation keys.

The example JSONs for the existing parser and for the updated parser both generate identical translation keys:

  1. domain:foo-bar which maps to "baz"
  2. and domain:yay which maps to "wohoo!"

The changes are opt-in. No other logic outside the parser is changed. Translation keys with special characters (e.g. here '-') are already perfectly valid with how the TranslationService currently behaves, meaning these changes provide more flexibility, while still allowing using the old "flat" lang files just as fine.

Implementation

The implementation itself is quite straightforward:

  1. Instead of directly deserializing into Dictionary<string, string>, parse the json as a raw JToken tree when calling LoadEntries
  2. Then, consume the tree:
    1. If the current token is a JSON Object, for each JSON property on the object:
      1. append property name to the key, with - as the delimiter
      2. recursively call LoadEntries, with the appended key
    2. If the current token is a string, load the string as a translation entry, with path to it (from the root of the JSON) as the key.
    3. All other JSON tokens are treated as errors

The LoadEntry, and everything from there on functions exactly as before. Only the parsing step has changed.

Real-world example

For a bit more real-world example, I have a en.json file, which currently looks something like this:

{
    "settings-startup-title": "Startup settings",
    "settings-startup-mergestacksongroundenabled-label": "Enable Module: Merge Stacks on Ground",
    "settings-startup-mergestacksongroundenabled-comment": "Reduces lag by merging stacks on ground to larger clumps. Requires restart.",
    "settings-startup-anothermoduleenabled-label": "Enable Module: Another",
    "settings-startup-anothermoduleenabled-comment": "Does something else",
    "settings-mergestacksonground-title": "Merge Stacks on Ground - Settings",
    "settings-mergestacksonground-enablerendertweaks-label": "Custom rendering for merged stacks",
    "settings-mergestacksonground-enablerendertweaks-comment": "Renders the merged stacks as larger groups. Requires restart.",
    "settings-mergestacksonground-maxrenderedstacks-label": "Max rendered stacks",
    "settings-mergestacksonground-maxrenderedstacks-comment": "Maximum number of stacks rendered when stacks are merged.",
    "settings-anothermodule-title": "Another Module - Settings",
    "settings-anothermodule-dostuff-label": "Yes/No",
    "settings-anothermodule-dostuff-comment": "Maybe?"
}

This is still manageable, but one can see that it is quickly becoming quite a word salad. The proposed changes would allow me to write this as:

{
    "flat-keys-without-nesting-are-still": "Perfectly Valid",
    "the-nesting-structure-is": "Fully Opt-in",
    "settings": {
        "startup": {
            "title": "Startup settings",
            "mergestacksongroundenabled": {
                "label": "Enable Module: Merge Stacks on Ground",
                "comment": "Reduces lag by merging stacks on ground to larger clumps. Requires restart."
            },
            "anothermoduleenabled": {
                "label": "Enable Module: Another",
                "comment": "Does something else"
            }
        },
        "mergestacksonground": {
            "title": "Merge Stacks on Ground - Settings",
            "enablerendertweaks": {
                "label": "Custom rendering for merged stacks",
                "comment": "Renders the merged stacks as . Requires restart."
            },
            "maxrenderedstacks": {
                "label": "Max rendered stacks",
                "comment": "Maximum number of stacks rendered when stacks are merged."
            }
        },
        "anothermodule": {
            "title": "Another Module - Settings",
            "dostuff": {
                "label": "Yes/No",
                "comment": "Maybe?"
            }
        }
    },
}

@pizza2004
Copy link

pizza2004 commented Aug 1, 2025

Would it be possible to make it use - instead of / for this? I feel this would better match with what we currently do if we were going to change this in vanilla.

Not that I am certain this change would be accepted. I think it might cause an issue with the Crowdin translation. I like it in concept, though.

@pizza2004 pizza2004 requested a review from SaculRennorb August 1, 2025 19:29
@ritinae
Copy link
Author

ritinae commented Aug 2, 2025

Would it be possible to make it use - instead of / for this?

Good point! I was thinking about the translation keys as paths, so resorted to / as a path-like separator, but looking at vanilla, the - is much more consistent.

@ritinae
Copy link
Author

ritinae commented Aug 2, 2025

Did two quick fixes:

  1. The naive string replacement fails if translation keys contain . characters, so we need to construct the key path/prefix manually to avoid this. The implementation is already traversing the tree in correct order, so this wasn't too hard to implement.
  2. Fixed a bug which allowed parsing a JSON file with a singular string as a valid translation value with an empty key

Updated the PR description to reflect the separator character change (/ -> -), too!

@ritinae ritinae force-pushed the feat/structured-lang-json branch from 6bbb914 to 80da595 Compare August 12, 2025 16:27
Copy link

@SaculRennorb SaculRennorb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this PR seems reasonable and should allow for this new structured style of declaring translations, while not causing issues when parsing the old style.

Without further adjustments, using a StringBuilder as i've suggested might incur a small performance penalty for the old style, since it introduces an additional copy in that case.
This can likely be mitigated by passing it directly into the LoadEntry function, seeing as that function also does additional string manipulation.
That way it should be a net zero for the old style.

If you don't want to do that last part please leave a brief note to investigate that option in the future.

Opinions:

It is debatable if there is even much value in adding that new style from the loader side of things, since the gain of potentially more compact files is likely impacted by the loader having to reconstruct the actual keys. In general the number one priority for these files should be parsing speed, and if the format required to do so incurs usability issues, then these should be addressed by external tooling, not by a slower, more complicated loader.

I also personally find the nested style more difficult to read than the flat list of kvps, but i admit this is absolutely up to taste.

I think it is still reasonable to at least allow this style, as this should not introduce any noticeable maintenance overhead.

@ritinae ritinae force-pushed the feat/structured-lang-json branch from 9a88d8c to 45c4590 Compare August 15, 2025 13:07
@ritinae
Copy link
Author

ritinae commented Aug 15, 2025

Without further adjustments, using a StringBuilder as i've suggested might incur a small performance penalty for the old style, since it introduces an additional copy in that case.
This can likely be mitigated by passing it directly into the LoadEntry function, seeing as that function also does additional string manipulation.

After a bit of head scratching on what LoadEntry actually does, I realized all keys processed during a single LoadEntries batch always share the same domain prefix. Therefore, now that everything is using a string builder to share the common path prefix, it was quite trivial to move the domain-prefix to that same shared string builder. That is, just append the domain to the start of the key before stepping into the recursive function.

The only complication was that the path separator prefixing (adding - between path parts) no longer could rely on path being empty for the first part of a key. To resolve that, a boolean flag as a parameter to the recursive function was a good enough solution. I could have checked the last character of the builder, but var isFirstPart = <is the last char of the builder a ':'> looked way messier than a simple boolean parameter, so I preferred the parameter over that.

@SaculRennorb
Copy link

SaculRennorb commented Aug 15, 2025

This seems like a good idea, however the previous implementation using KeyWithDomain

private static string KeyWithDomain(string key, string domain = GlobalConstants.DefaultDomain)
{
if (key.Contains(AssetLocation.LocationSeparator)) return key;
return new StringBuilder(domain)
.Append(AssetLocation.LocationSeparator)
.Append(key)
.ToString();
}
checks if the domain is already present before prepending it.

Is there a situation where this could be causing issues? Namely could the domain already be present in old style translations?

@ritinae
Copy link
Author

ritinae commented Aug 15, 2025

Ah, right, that's an oversight. I think it would make sense to check if the domain exists already before appending it again. I'll add that check back, just in case 🤦‍♀️

EDIT: Right. It's not that simple if the key itself contains a domain. I'll think about a solution for a bit.

@SaculRennorb
Copy link

SaculRennorb commented Aug 15, 2025

One option could be to always prepend the domain as you do right now, but before finalizing the key, count the domain separators. If there is more than one turn the slice starting from after the first one into the actual key, instead of using the whole buffer.

something along the lines of

string GetTrueKey(StringBuilder sb, string domain)
{
	var key = sb.ToString();
	if(key.IndexOf(AssetLocation.LocationSeparator, domain.Length + 1) >= 0) {
		// there is a second domain in here, from the key itself.
		return key.SubString(domain.Length + 1);
	}
	return key;
}

You could also do the actual counting, but passing the domain from the outer function might or might not be easier.

You could also start checking for the separator on each recursion level, since one separator will pollute all further levels, so in theory those deeper parts then don't need checks anymore - but this is getting into the weeds too much. Further optimization above the old complexity, and feature parity ofcourse, can be something for the team.

@ritinae
Copy link
Author

ritinae commented Aug 15, 2025

Always appending the prefix feels quite hacky, but I could not come up with any alternative approaches, which would be as simple as it, so I think that's the way to go.

@SaculRennorb SaculRennorb self-requested a review August 15, 2025 16:04
@pizza2004
Copy link

Due to the way in which we merge these things, I kind of need you to squash all your changes into one commit, please. Then I can look it over and patch it into the internal repository for Tyron to decide whether he thinks it's a good change. Just FYI, we're close enough to stable that this wouldn't make it into 1.21 either way, so it'll likely be months before you see this in vanilla even if we do merge it.

@ritinae ritinae force-pushed the feat/structured-lang-json branch from 326970b to ae7e6ae Compare August 17, 2025 00:20
@ritinae
Copy link
Author

ritinae commented Aug 17, 2025

Squashed the commits. I can get the parser changes in by monkey patching, so I'm not holding my breath waiting for this to land 😄 I do acknowledge that these changes and any potential gains from them, are a heavily opinionated, so I fully understand if the changes eventually get rejected 👍

Adjust the lang file parsing to allow nesting of keys. Generate
translation keys as paths based on the structure of the json. For
example, given this JSON:
  {
    "foo": {
      "bar": "baz"
    },
    "yay": "wohoo!"
  }

This can *kind of* already be achieved with the current parser as:
  {
    "foo-bar": "baz",
    "yay": "wohoo!",
  }

However, this lacks structure and is harder to read, especially if
desired structure has multiple levels of nesting. The structured version
has the added benefit of supporting code folding in editors, helping
when working with larger sets of translation keys.

The example JSONs for the existing parser and for the updated parser
both generate identical translation keys:
  domain:foo-bar => "baz"
  domain:yay => "wohoo!"

Therefore, the changes are mostly opt-in, giving more flexibility, while
still allowing using the old "flat" lang files as-is.
@ritinae ritinae force-pushed the feat/structured-lang-json branch from ae7e6ae to 5cf5fdf Compare August 17, 2025 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants