feat: set vector as default data pipeline#1199
feat: set vector as default data pipeline#1199Ian2012 wants to merge 13 commits intoopenedx:bmtcril/vector_bumpfrom
Conversation
Previously installing a clean Aspects with Vector set as the xAPI database migrations would fail due to ASPECTS_XAPI_DATABASE not being the Ralph database. This upgrade fixes the migrations by adding an explicit Ralph database variable allowing both databases to be created independantly as designed.
Previously Alembic state was stored in ASPECTS_XAPI_DATABASE, which can change when switching between Ralph and Vector pipelines and cause Alembic to lose state and try to re-run all migrations. This is now explicit. Also makes sure Ralph uses the RALPH_DATABASE, simplifies and re-organizes the ClickHouse init script and makes sure the Vector user can access databases needed for inserting into downstream MVs.
|
Thanks for the pull request, @Ian2012! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
| type = "filter" | ||
| inputs = ["docker_logs"] | ||
| condition = 'includes(["lms", "cms", "lms-job", "cms-job"], .label."com.docker.compose.service")' | ||
| condition = 'includes(["lms", "cms", "lms-worker", "cms-worker", "lms-job", "cms-job"], .label."com.docker.compose.service")' |
There was a problem hiding this comment.
I think this won't do anything due to overhangio/tutor#1263
There was a problem hiding this comment.
But I support adding it anyway so we don't have to do it later
| batch_size: 100 | ||
| log_dir: logs | ||
| num_xapi_batches: 10 | ||
| batch_size: 100000 |
There was a problem hiding this comment.
I don't think we need to do this many events, it's just a smoke test to make sure inserts work. I don't think this helps as much as the Celery version since I don't think we'll see errors here if inserts fail like we do there. Is there a good way to check that the right number of rows have landed in CH and downstream tables? I think the row counts from the performance test script can be flaky based on the course that gets chosen, but maybe we can just limit things to 1 course for this test.
|
Were there additional fixes to get Vector working at all beyond #1132 ? I'd like to separate that PR from this one so we can release a bug fix version before doing the big breaking change. |
This PR sets Vector as the default data pipeline. It also includes a couple of improvements:
tutor mounts add ./aspects-dbt/or the directory where aspects-dbt is stored and run your local copy.Caution
This is a breaking change. Users which install this version will disable their Ralph workloads if they do not update their configuration.
Depends on: openedx/aspects-dbt#164
Fixes: #1126 #1096