Performance Improvement: Avoid Second Pass for name_len when parsing an element#942
Open
flxbe wants to merge 3 commits intotafia:masterfrom
Open
Performance Improvement: Avoid Second Pass for name_len when parsing an element#942flxbe wants to merge 3 commits intotafia:masterfrom
name_len when parsing an element#942flxbe wants to merge 3 commits intotafia:masterfrom
Conversation
Collaborator
|
Actually any changes with <10% change just a noise, you will get them just by repeatly running the benchmarks. It looks like the XML size in our benchmarks is too small. You need at least a few megabytes. In my experiments, there was no significant difference in speed between a 10 MB file and 1 GB file, so benchmarking of 10 MB files should at least give more adequate and, I think, more stable results |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi,
Currently, after reading a start element, there’s a second pass over the data to determine the
name_len. I’ve implemented a newStartElementParserthat calculates the length during the initial parsing, eliminating the need for the second pass.The new parser could also be used to parse the end element, but this slightly changes the current behaviour, as the new parser always trims trailing whitespace. This would therefore effectively remove the option
trim_markup_names_in_closing_tags.I’d love to hear your thoughts on these changes. I am of course happy to change the code in any way necessary to make it more suitable for your code base.
Benchmark Results
The changes are mostly neutral or positive, with improvements up to 5% in some cases. There’s a notable negative outlier (~3%) in parse_document_nocopy/test_writer_ident.xml, which I’m happy to investigate further if needed.