locodedb: allocate less, don't leave references to csv data by roman-khimov · Pull Request #60 · nspcc-dev/locode-db

roman-khimov · 2025-10-21T10:10:42Z

We have a reference to data allocated by csv reader code which means we're wasting memory and have a lot of useless active objects (can be clearly seen in inuse_objects of NeoFS node.
string is like 16 bytes while we can avoid storing it at all by writing 3 bytes more into the data blob.

benchstat:

goos: linux
goarch: amd64
pkg: github.com/nspcc-dev/locode-db/pkg/locodedb
cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
          │  code.old   │              code.new              │
          │   sec/op    │   sec/op     vs base               │
Unpack-16   132.5m ± 2%   130.4m ± 3%       ~ (p=0.052 n=10)
Get-16      351.4n ± 2%   385.9n ± 1%  +9.83% (p=0.000 n=10)
geomean     215.8µ        224.3µ       +3.96%

          │   code.old   │                code.new                │
          │     B/op     │     B/op      vs base                  │
Unpack-16   35.63Mi ± 0%   30.24Mi ± 0%  -15.13% (p=0.000 n=10)
Get-16        5.000 ± 0%     5.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean     13.35Ki        12.30Ki        -7.88%
¹ all samples are equal

          │  code.old   │               code.new               │
          │  allocs/op  │  allocs/op   vs base                 │
Unpack-16   191.4k ± 0%   191.4k ± 0%  +0.02% (p=0.000 n=10)
Get-16       1.000 ± 0%    1.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean      437.5         437.5       +0.01%
¹ all samples are equal

Get clearly costs a bit more , but 15% less memory is much more valuable here because it's not a frequentely accessed data.

No traces of csv-allocated strings left, before:

File: locodedb.test
Build ID: 3b6646cf8a2a939e7bb4055621d8b1b1d4ae096f Type: inuse_objects
Time: 2025-10-21 12:52:13 MSK
Entering interactive mode (type "help" for commands, "o" for options) (pprof) top
Showing nodes accounting for 86170, 100% of 86171 total Dropped 1 node (cum <= 430)
Showing top 10 nodes out of 32
      flat  flat%   sum%        cum   cum%
     47515 55.14% 55.14%      47515 55.14%  encoding/csv.(*Reader).readRecord
     32768 38.03% 93.17%      32768 38.03%  runtime.(*timers).addHeap
      3277  3.80% 96.97%       3277  3.80%  strings.NewReplacer
      2308  2.68% 99.65%       2308  2.68%  runtime.allocm
       302  0.35%   100%      47818 55.49%  github.com/nspcc-dev/locode-db/pkg/locodedb.unpackLocodesData
         0     0%   100%      47515 55.14%  encoding/csv.(*Reader).Read
         0     0%   100%      47818 55.49%  github.com/nspcc-dev/locode-db/pkg/locodedb.Get
         0     0%   100%      47818 55.49%  github.com/nspcc-dev/locode-db/pkg/locodedb.initLocodeData
         0     0%   100%      47818 55.49%  github.com/nspcc-dev/locode-db/pkg/locodedb.initLocodeData.func1
         0     0%   100%      47818 55.49%  github.com/nspcc-dev/locode-db/pkg/locodedb_test.TestGet.func1

After:

File: locodedb.test
Build ID: 7c43b87dd117ccb052a1a569160cbe68e44dbe25 Type: inuse_objects
Time: 2025-10-21 12:52:02 MSK
Entering interactive mode (type "help" for commands, "o" for options) (pprof) top
Showing nodes accounting for 69083, 100% of 69094 total Dropped 12 nodes (cum <= 345)
Showing top 10 nodes out of 18
      flat  flat%   sum%        cum   cum%
     65537 94.85% 94.85%      65537 94.85%  runtime.(*timers).addHeap
      2521  3.65% 98.50%       2521  3.65%  net/http.init
      1025  1.48%   100%       1025  1.48%  runtime.allocm
         0     0%   100%      65537 94.85%  runtime.(*scavengerState).sleep
         0     0%   100%      65537 94.85%  runtime.(*timer).maybeAdd
         0     0%   100%      65537 94.85%  runtime.(*timer).modify
         0     0%   100%      65537 94.85%  runtime.(*timer).reset (inline)
         0     0%   100%      65537 94.85%  runtime.bgscavenge
         0     0%   100%       2521  3.65%  runtime.doInit
         0     0%   100%       2521  3.65%  runtime.doInit1

1. We have a reference to data allocated by csv reader code which means we're wasting memory and have a lot of useless active objects (can be clearly seen in `inuse_objects` of NeoFS node. 2. string is like 16 bytes while we can avoid storing it at all by writing 3 bytes more into the data blob. benchstat: goos: linux goarch: amd64 pkg: github.com/nspcc-dev/locode-db/pkg/locodedb cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics │ code.old │ code.new │ │ sec/op │ sec/op vs base │ Unpack-16 132.5m ± 2% 130.4m ± 3% ~ (p=0.052 n=10) Get-16 351.4n ± 2% 385.9n ± 1% +9.83% (p=0.000 n=10) geomean 215.8µ 224.3µ +3.96% │ code.old │ code.new │ │ B/op │ B/op vs base │ Unpack-16 35.63Mi ± 0% 30.24Mi ± 0% -15.13% (p=0.000 n=10) Get-16 5.000 ± 0% 5.000 ± 0% ~ (p=1.000 n=10) ¹ geomean 13.35Ki 12.30Ki -7.88% ¹ all samples are equal │ code.old │ code.new │ │ allocs/op │ allocs/op vs base │ Unpack-16 191.4k ± 0% 191.4k ± 0% +0.02% (p=0.000 n=10) Get-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ geomean 437.5 437.5 +0.01% ¹ all samples are equal Get clearly costs a bit more , but 15% less memory is much more valuable here because it's not a frequentely accessed data. No traces of csv-allocated strings left, before: File: locodedb.test Build ID: 3b6646cf8a2a939e7bb4055621d8b1b1d4ae096f Type: inuse_objects Time: 2025-10-21 12:52:13 MSK Entering interactive mode (type "help" for commands, "o" for options) (pprof) top Showing nodes accounting for 86170, 100% of 86171 total Dropped 1 node (cum <= 430) Showing top 10 nodes out of 32 flat flat% sum% cum cum% 47515 55.14% 55.14% 47515 55.14% encoding/csv.(*Reader).readRecord 32768 38.03% 93.17% 32768 38.03% runtime.(*timers).addHeap 3277 3.80% 96.97% 3277 3.80% strings.NewReplacer 2308 2.68% 99.65% 2308 2.68% runtime.allocm 302 0.35% 100% 47818 55.49% github.com/nspcc-dev/locode-db/pkg/locodedb.unpackLocodesData 0 0% 100% 47515 55.14% encoding/csv.(*Reader).Read 0 0% 100% 47818 55.49% github.com/nspcc-dev/locode-db/pkg/locodedb.Get 0 0% 100% 47818 55.49% github.com/nspcc-dev/locode-db/pkg/locodedb.initLocodeData 0 0% 100% 47818 55.49% github.com/nspcc-dev/locode-db/pkg/locodedb.initLocodeData.func1 0 0% 100% 47818 55.49% github.com/nspcc-dev/locode-db/pkg/locodedb_test.TestGet.func1 After: File: locodedb.test Build ID: 7c43b87dd117ccb052a1a569160cbe68e44dbe25 Type: inuse_objects Time: 2025-10-21 12:52:02 MSK Entering interactive mode (type "help" for commands, "o" for options) (pprof) top Showing nodes accounting for 69083, 100% of 69094 total Dropped 12 nodes (cum <= 345) Showing top 10 nodes out of 18 flat flat% sum% cum cum% 65537 94.85% 94.85% 65537 94.85% runtime.(*timers).addHeap 2521 3.65% 98.50% 2521 3.65% net/http.init 1025 1.48% 100% 1025 1.48% runtime.allocm 0 0% 100% 65537 94.85% runtime.(*scavengerState).sleep 0 0% 100% 65537 94.85% runtime.(*timer).maybeAdd 0 0% 100% 65537 94.85% runtime.(*timer).modify 0 0% 100% 65537 94.85% runtime.(*timer).reset (inline) 0 0% 100% 65537 94.85% runtime.bgscavenge 0 0% 100% 2521 3.65% runtime.doInit 0 0% 100% 2521 3.65% runtime.doInit1 Signed-off-by: Roman Khimov <roman@nspcc.ru>

codecov · 2025-10-21T10:11:36Z

Codecov Report

❌ Patch coverage is 88.88889% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 26.04%. Comparing base (d6549f7) to head (c20dea5).
⚠️ Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
pkg/locodedb/calls.go	87.50%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #60      +/-   ##
==========================================
+ Coverage   25.88%   26.04%   +0.15%     
==========================================
  Files          20       20              
  Lines         935      937       +2     
==========================================
+ Hits          242      244       +2     
  Misses        672      672              
  Partials       21       21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

It returns boolean flag for a reason, so 9bb4b26 could do a bit better here and avoid useless comparisons. goos: linux goarch: amd64 pkg: github.com/nspcc-dev/locode-db/pkg/locodedb cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics │ search.old │ search.new │ │ sec/op │ sec/op vs base │ Get-16 383.6n ± 1% 375.6n ± 2% -2.07% (p=0.012 n=10) │ search.old │ search.new │ │ B/op │ B/op vs base │ Get-16 5.000 ± 0% 5.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ search.old │ search.new │ │ allocs/op │ allocs/op vs base │ Get-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal Signed-off-by: Roman Khimov <roman@nspcc.ru>

roman-khimov · 2025-10-21T11:42:24Z

Pictures from M3 and M4 for reference (inuse_objects):

roman-khimov requested review from End-rey, carpawell and cthulhu-rider as code owners October 21, 2025 10:10

End-rey approved these changes Oct 21, 2025

View reviewed changes

carpawell approved these changes Oct 21, 2025

View reviewed changes

roman-khimov merged commit e44a9fc into master Oct 21, 2025
10 checks passed

roman-khimov deleted the allocation-optimization branch October 21, 2025 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

locodedb: allocate less, don't leave references to csv data#60

locodedb: allocate less, don't leave references to csv data#60
roman-khimov merged 2 commits intomasterfrom
allocation-optimization

roman-khimov commented Oct 21, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

roman-khimov commented Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

roman-khimov commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

roman-khimov commented Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

roman-khimov commented Oct 21, 2025 •

edited

Loading

codecov bot commented Oct 21, 2025 •

edited

Loading