-
Notifications
You must be signed in to change notification settings - Fork 0
ROX-30064: Script to generate catalog-template.yaml #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROX-30064: Script to generate catalog-template.yaml #137
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
msugakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratched the surface a bit. Got enough comments for the first round of review.
Please expect more review rounds and more comments.
msugakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, I completed the full pass now.
| image: registry.redhat.io/advanced-cluster-security/rhacs-operator-bundle@sha256:f61189397263f05214c2d36b4dc0a71a924c2481a1e365b7fb3c71d8dfce6b27 | ||
| - schema: olm.bundle | ||
| image: registry.redhat.io/advanced-cluster-security/rhacs-operator-bundle@sha256:b0590a2248d948f82e8a116e37a2be42f49a3edeb4a92d41416420ea604d5b34 | ||
| - schema: olm.bundle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N.b.: once we finish editing the code, let's "smart"-diff contents of master v.s. new catalog-template.yaml. The amount of diff is quite big. We should make sure the new thing won't break customers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, and also I should make a test deployment to see how OLM deals with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Now it's time for this.
Tom and I have figured that yq 'explode(.)' <filename> will inline all aliases/anchors and so this can be done on the catalog-template.yaml from master. Then send it together with this catalog-template.yaml to some YAML differ, and we should see what's changed.
I'll do this myself but I also recommend you trying too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
olm.deprecations look good.
olm.bundle-s look good too (some entries in the middle reordered but nothing added or removed compared to master).
olm.package no diff, hence good.
olm.channel were the hardest, looked through with the eyes and they seem good.
When looking at olm.package-s, I realized it's O(N^2) problem: if N is the total number of patches in the stable lineage, then the number of channels is O(N), i.e. linear from N. Therefore, the file size grows O(N) times O(N) -> O(N^2).
Each single channel remains having O(N) versions and that's a good thing for OLM so it can efficiently figure out upgrades when some channel is selected, but the overall file grows quickly in size. I don't know whether it will become a problem, e.g. due to long times loading the file since these are file-based catalogs (not sqlite-based as the former ones).
O(N^2) grows quickly and suddenly so I don't know how much time we may have until this becomes a problem.
Today catalog-template.yaml is just 152Kb and the rendered catalogs are:
$ du -h ./catalog-csv-metadata/rhacs-operator/catalog.json ./catalog-bundle-object/rhacs-operator/catalog.json
12M ./catalog-csv-metadata/rhacs-operator/catalog.json
24M ./catalog-bundle-object/rhacs-operator/catalog.json
Not terribly bad but actually larger than I would expect them to be.
What's actionable for us here? Well, the "unpublishing" check in FBC pipeline/Conforma forced us to this O(N^2) thing which we did not do before. Before, we could get away by unpublishing previous patches from latest and stable channels and that allowed to stay O(N). Maybe we should find ways to allow unpublishing deprecated versions from channels?
Let's cut a ticket to address it. Though, something I don't want to happen is that we cut a ticket and it dies in the backlog because nobody understands the problem since nobody was involved in the problem discussion.
Let me additionally ping @porridge here so that you both have a chance to share your thoughts on this.
Following up on Slack: https://redhat-internal.slack.com/archives/C05TS9N0S7L/p1764973355627099
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more precise to call space complexity K*N where K is the number of supported versions. Because it's not like we add a new entry to each version for every patch but rather to the supported versions.
I wonder what catalog size has some big bundles? Just to have a picture what file size is still ok-ish for the OLM.
Note: I did a manual test with bundle from this PR and via OLM UI the bundle looks good. I've update the gif attached to this PR. Since manual testing is fine I am going to merge this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When looking at
olm.package-s, I realized it'sO(N^2)problem
Did you mean
When looking at
olm.channel-s, I realized it's O(N^2) problem
?
@kurlov I believe @msugakov is correct, total size of olm.channel objects is O(N²/2) which is the same as O(N²).
I used this guide and the following shell snippet to compute the polynomial regression of the combined size of the olm.channel rhacs-X.Y objects to estimate the size in ~10 years (assuming ~4 minor releases per year) and computer said ~200KB, which is not too bad I think.
minors=$(jq 'select(.schema=="olm.channel" and (.name |startswith("rhacs-")))' catalog-csv-metadata/rhacs-operator/catalog.json |grep ^\ \ .name|cut -d \" -f 4)
for m in $minors; do echo -n "$m ";jq "select(.schema==\"olm.channel\" and .name == \"$m\")" catalog-csv-metadata/rhacs-operator/catalog.json |wc -c;done
FTR, I ignored 4.7-4.9 since the growth there was sub-linear, I guess because they have not lived as long as the older channels yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what catalog size has some big bundles? Just to have a picture what file size is still ok-ish for the OLM.
I thought we would be able to find those in the ultimate index images like registry.redhat.io/openshift4/ose-operator-registry-rhel9:v4.19, but I'm not finding catalogs there.
[operator-index]$ opm migrate registry.redhat.io/redhat/redhat-operator-index:v4.21 ./catalog-migrate
INFO[0000] rendering index "registry.redhat.io/redhat/redhat-operator-index:v4.21" as file-based catalog
INFO[0050] wrote rendered file-based catalog to "./catalog-migrate"
[operator-index]$ ncdu catalog-migrate
24.6 MiB [#################################] /ansible-automation-platform-operator
16.1 MiB [##################### ] /amq-broker-rhel8
8.8 MiB [########### ] /rhacs-operator
4.3 MiB [##### ] /cryostat-operator
3.3 MiB [#### ] /amq-streams
3.1 MiB [#### ] /amq-broker-rhel9
2.2 MiB [## ] /openshift-gitops-operator
1.9 MiB [## ] /tempo-product
1.7 MiB [## ] /quay-operator
1.5 MiB [# ] /devspaces
1.4 MiB [# ] /quay-bridge-operator
1.3 MiB [# ] /datagrid
1.3 MiB [# ] /service-registry-operator
1.1 MiB [# ] /kubevirt-hyperconverged
1.1 MiB [# ] /businessautomation-operator
[...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thanks! opm migrate did not occur to me.
It's interesting that we are the third largest. Clearly, most of space isn't occupied by the olm.channels.
Here's a little study of this
$ opm migrate registry.redhat.io/redhat/redhat-operator-index:v4.19 ./
$ cd rhacs-operator
$ cat catalog.json | python3 -c 'import sys; size=len(sys.stdin.buffer.read()); print(f"{size:,}")'
11,665,037
$ jq '.' catalog.json | python3 -c 'import sys; size=len(sys.stdin.buffer.read()); print(f"{size:,}")'
9,129,759
# 22% less after jq, but that's ok
# Packages:
$ jq 'select(.schema == "olm.package")' catalog.json | python3 -c 'import sys; size=len(sys.stdin.buffer.read()); print(f"{size:,}")'
11,520
# Channels:
$ jq 'select(.schema == "olm.channel")' catalog.json | python3 -c 'import sys; size=len(sys.stdin.buffer.read()); print(f"{size:,}")'
94,486
# Bundles:
$ jq 'select(.schema == "olm.bundle")' catalog.json | python3 -c 'import sys; size=len(sys.stdin.buffer.read()); print(f"{size:,}")'
9,023,753
# Here are counts:
$ jq '.schema' catalog.json | uniq -c
1 "olm.package"
24 "olm.channel"
122 "olm.bundle"Comparing that to the biggest ansible-automation-platform-operator:
$ jq '.schema' catalog.json | uniq -c
1 "olm.package"
4 "olm.channel"
62 "olm.bundle"
# Byte sizes:
# olm.package - 12,666
# olm.channel - 10,458
# olm.bundle - 18,442,021amq-broker-rhel8, the second biggest:
$ jq '.schema' catalog.json | uniq -c
1 "olm.package"
3 "olm.channel"
69 "olm.bundle"
# Byte sizes:
# olm.package - 11,974
# olm.channel - 8,960
# olm.bundle - 12,703,918We can also compare that to our FBC file:
$ du -h ./catalog-bundle-object/rhacs-operator/catalog.json
24M ./catalog-bundle-object/rhacs-operator/catalog.json
$ jq '.schema' ./catalog-bundle-object/rhacs-operator/catalog.json | uniq -c
1 "olm.package"
24 "olm.channel"
122 "olm.bundle"
1 "olm.deprecations"
# Byte sizes:
# olm.package - 11,520
# olm.channel - 94,486
# olm.bundle - 24,838,426
# olm.deprecations - 5,092There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't deprecations that dominate the size. Channels are also small (despite O(N^2)).
The most space is taken by bundles. I think I see why it happens.
One way or another, we're among the biggest ones and it's clear that ACS catalogs will only get bigger over time.
We need to do something in order to prevent us forgetting that and things silently grow until the size is too big and everything is broken and we're in emergency. Certainly we can't stop releasing, nor should we.
We don't know how big is too big. Maybe nobody knows. We can start finding this out now, or we can postpone. If the former, the question who is "we", who would own this?
If we are to postpone, I suggest creating a ticket with this context (for later) and adding a simple check to the checks pipeline (now) that would fail as soon as either of ./catalog-*/rhacs-operator/catalog.json files grows bigger than 40MB.
When this check fails - the affected engineer bumps the size limit a bit more and flags the Install team has to prioritize working on the mentioned ticket.
How does that sound?
If you agree, let us please do it as a separate PR (again, need owner).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we need such check. Created a ticket: https://issues.redhat.com/browse/ROX-32232
Co-authored-by: Misha Sugakov <537715+msugakov@users.noreply.github.com>
msugakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This time, only thoughts on interdiff. We're converging.
Co-authored-by: Misha Sugakov <537715+msugakov@users.noreply.github.com>
Co-authored-by: Misha Sugakov <537715+msugakov@users.noreply.github.com>
msugakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time has come to diff the yaml and do some final tests.
| image: registry.redhat.io/advanced-cluster-security/rhacs-operator-bundle@sha256:f61189397263f05214c2d36b4dc0a71a924c2481a1e365b7fb3c71d8dfce6b27 | ||
| - schema: olm.bundle | ||
| image: registry.redhat.io/advanced-cluster-security/rhacs-operator-bundle@sha256:b0590a2248d948f82e8a116e37a2be42f49a3edeb4a92d41416420ea604d5b34 | ||
| - schema: olm.bundle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Now it's time for this.
Tom and I have figured that yq 'explode(.)' <filename> will inline all aliases/anchors and so this can be done on the catalog-template.yaml from master. Then send it together with this catalog-template.yaml to some YAML differ, and we should see what's changed.
I'll do this myself but I also recommend you trying too.
msugakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side, this PR is good to be merged at any point.
|
🚀 |
catalog-template.yamlfrom a newbundles.yamlfile.bundles.yamlhas a lists operator bundle images with versions, theoldest_supported_versionand a list ofbroken_versions.oldest_supported_versionspecifies what is the lowest supported version. Any version or channel beforeoldest_supported_versionwill be marked as deprecated.broken_versionsa list of versions which should be skipped. For each broken version X.Y.Z the script adds "skips" for all versions > X.Y.Z and < X.Y+2.0checkswhich validates thatcatalog-template.yamlis up-to-date withbundles.yamlcmdfolder is changedcatalog-template.yaml changes:
schema: olm.bundledeprecation references for all version <oldest_supported_version. So not only channels are deprecated.rhacs-3.63channelskipslatestchannel has all 3.X.X versionsstablechannel has all 4.X.X versionsTesting this PR catalogs on OCP 4.19:
image:
quay.io/rhacs-eng/stackrox-operator-index@sha256:9345ef4e16b463205ab9dcf4446c5a34babc8c497ad9cbbeae327fb44b2f356dcommit: ca7b4bd
Now not only channels but versions are also deprecated.
Test locally: