feat: set vector as default data pipeline#339
Conversation
|
Thanks for the pull request, @Ian2012! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
|
These changes look good, but there are several other places that reference vector vs. celery that need to be updated too. Basically any place that references Vector could use a look, and especially the "Aspects Production Configuration" doc. |
5ce75f9 to
4c21316
Compare
bmtcril
left a comment
There was a problem hiding this comment.
Thanks just a few more comments!
| Vector Pipeline | ||
| ############### | ||
|
|
||
| The Vector pipeline is the default pipeline. It works by capturing the standard output from |
There was a problem hiding this comment.
We should update this to include something like "as of version 5.0... previous versions default to a Celery / Ralph pipeline".
|
|
||
| The Vector pipeline is the default pipeline. It works by capturing the standard output from | ||
| the LMS logs and sending them directly to configured "sinks" or data destinations. | ||
| It implements two similar pipelines, one for xAPI data and one for tracking logs. |
There was a problem hiding this comment.
We should note that xAPI is on by default and tracking logs are off.
| ############## | ||
|
|
||
| The Ralph pipeline is the default pipeline, and is the most robust. It will retry the | ||
| The Ralph pipeline is an alternative pipeline, and is the most robust. It will retry the |
There was a problem hiding this comment.
Same note about adding the major version of the change here
|
|
||
| Vector is lightweight and ultra-fast tool for building observability pipelines. | ||
| In the Aspects project, Vector can optionally be used as a replacement for Ralph to | ||
| In the Aspects project, Vector is the default tool used to |
| Cons: | ||
|
|
||
| - It is a new service for most operators | ||
| - Events are not de-duplicated before insert, which can result in some (mostly temporary) incorrect data in a disaster recovery |
There was a problem hiding this comment.
I'd say "which can result in some temporary duplicate / incorrect data in a disaster recovery or log replay situation", though I think we should actually see how bad this is by replaying the same log file a few times and looking at the table results.
|
|
||
| - It is a new service for most operators | ||
| - Events are not de-duplicated before insert, which can result in some (mostly temporary) incorrect data in a disaster recovery | ||
| - Disaster recovery hasn't been tested with Aspects yet |
There was a problem hiding this comment.
Do you mean log replay here? I'm trying to figure out how we can test this.
| Installation instructions for Aspects are available on the plugin site: https://github.com/openedx/tutor-contrib-aspects | ||
|
|
||
| Ralph is the default option to send xAPI events to Clickhouse. To run it make sure to enable the `RUN_RALPH` option in the `config.yml` file. | ||
| Ralph is an alternative option to send xAPI events to Clickhouse, providing full LRS support and deduplication. To use Ralph as your xAPI pipeline, you need to enable it and set it as the source in your `config.yml` file. |
There was a problem hiding this comment.
Again with version, unfortunately. Should probably change "full LRS support" to "full xAPI learning record store support for statements" since it doesn't actually handle some of the LRS spec for other things.
18a49d7 to
b6530aa
Compare
b6530aa to
dc883bf
Compare
Depends on: openedx/tutor-contrib-aspects#1199