Skip to content

feat: set vector as default data pipeline#339

Open
Ian2012 wants to merge 1 commit intoopenedx:mainfrom
Ian2012:cag/vector-default
Open

feat: set vector as default data pipeline#339
Ian2012 wants to merge 1 commit intoopenedx:mainfrom
Ian2012:cag/vector-default

Conversation

@Ian2012
Copy link
Contributor

@Ian2012 Ian2012 commented Mar 10, 2026

@openedx-webhooks
Copy link

openedx-webhooks commented Mar 10, 2026

Thanks for the pull request, @Ian2012!

This repository is currently maintained by @bmtcril.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Mar 10, 2026
@github-project-automation github-project-automation bot moved this to Needs Triage in Contributions Mar 10, 2026
@mphilbrick211 mphilbrick211 moved this from Needs Triage to Needs Tests Run or CLA Signed in Contributions Mar 11, 2026
@Ian2012 Ian2012 closed this Mar 16, 2026
@github-project-automation github-project-automation bot moved this from Needs Tests Run or CLA Signed to Done in Contributions Mar 16, 2026
@Ian2012 Ian2012 reopened this Mar 16, 2026
@bmtcril
Copy link
Contributor

bmtcril commented Mar 17, 2026

These changes look good, but there are several other places that reference vector vs. celery that need to be updated too. Basically any place that references Vector could use a look, and especially the "Aspects Production Configuration" doc.

@Ian2012 Ian2012 force-pushed the cag/vector-default branch from 5ce75f9 to 4c21316 Compare March 18, 2026 14:02
Copy link
Contributor

@bmtcril bmtcril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks just a few more comments!

Vector Pipeline
###############

The Vector pipeline is the default pipeline. It works by capturing the standard output from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update this to include something like "as of version 5.0... previous versions default to a Celery / Ralph pipeline".


The Vector pipeline is the default pipeline. It works by capturing the standard output from
the LMS logs and sending them directly to configured "sinks" or data destinations.
It implements two similar pipelines, one for xAPI data and one for tracking logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should note that xAPI is on by default and tracking logs are off.

##############

The Ralph pipeline is the default pipeline, and is the most robust. It will retry the
The Ralph pipeline is an alternative pipeline, and is the most robust. It will retry the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note about adding the major version of the change here


Vector is lightweight and ultra-fast tool for building observability pipelines.
In the Aspects project, Vector can optionally be used as a replacement for Ralph to
In the Aspects project, Vector is the default tool used to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note about version

Cons:

- It is a new service for most operators
- Events are not de-duplicated before insert, which can result in some (mostly temporary) incorrect data in a disaster recovery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "which can result in some temporary duplicate / incorrect data in a disaster recovery or log replay situation", though I think we should actually see how bad this is by replaying the same log file a few times and looking at the table results.


- It is a new service for most operators
- Events are not de-duplicated before insert, which can result in some (mostly temporary) incorrect data in a disaster recovery
- Disaster recovery hasn't been tested with Aspects yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean log replay here? I'm trying to figure out how we can test this.

Installation instructions for Aspects are available on the plugin site: https://github.com/openedx/tutor-contrib-aspects

Ralph is the default option to send xAPI events to Clickhouse. To run it make sure to enable the `RUN_RALPH` option in the `config.yml` file.
Ralph is an alternative option to send xAPI events to Clickhouse, providing full LRS support and deduplication. To use Ralph as your xAPI pipeline, you need to enable it and set it as the source in your `config.yml` file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again with version, unfortunately. Should probably change "full LRS support" to "full xAPI learning record store support for statements" since it doesn't actually handle some of the LRS spec for other things.

@Ian2012 Ian2012 force-pushed the cag/vector-default branch 2 times, most recently from 18a49d7 to b6530aa Compare March 18, 2026 17:30
@Ian2012 Ian2012 force-pushed the cag/vector-default branch from b6530aa to dc883bf Compare March 18, 2026 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open-source-contribution PR author is not from Axim or 2U

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants