Skip to content

Add OSS-Fuzz Atheris fuzzers for core serialization#60148

Closed
skypher wants to merge 3 commits intoapache:mainfrom
rmc-infosec:ossfuzz-fuzzers-v2
Closed

Add OSS-Fuzz Atheris fuzzers for core serialization#60148
skypher wants to merge 3 commits intoapache:mainfrom
rmc-infosec:ossfuzz-fuzzers-v2

Conversation

@skypher
Copy link
Copy Markdown

@skypher skypher commented Jan 6, 2026

Summary

Adds an upstream-owned OSS-Fuzz fuzzer suite under ossfuzz/.

Fuzz targets (Atheris):

  • DAG serialization/deserialization (serialized_dag_fuzz)
  • Connection URI parsing (connection_uri_fuzz)

Each fuzzer includes:

  • .options files with tuned input size limits
  • .dict files for structured input fuzzing
  • Small seed corpora under ossfuzz/seed_corpus/

Security Model Alignment

These fuzzers target code paths with clear security boundaries per Airflow's security model, avoiding the "DAG author trust zone" where DAG authors are expected to run arbitrary code.

Test plan

  • Tested locally with atheris (-max_total_time=10)
  • OSS-Fuzz integration build validation

@skypher skypher force-pushed the ossfuzz-fuzzers-v2 branch 2 times, most recently from e6f5a77 to f8ddf18 Compare January 6, 2026 02:10
@jscheffl
Copy link
Copy Markdown
Contributor

jscheffl commented Jan 6, 2026

As we have a large repo, we should maybe put this in an existing subfolder, not top-level. I would propose moving all below the ci/ folder

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Jan 6, 2026

Agreed: but we have scripts/ci folder :) - so maybe scripts/ossfuzz ?

Comment thread ossfuzz/README.md Outdated
@skypher skypher force-pushed the ossfuzz-fuzzers-v2 branch 2 times, most recently from 67bbb5b to 5926f49 Compare January 6, 2026 13:41
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Jan 6, 2026

Nice ! Now - just rebase and resolving conflict :)

@skypher skypher force-pushed the ossfuzz-fuzzers-v2 branch from 5926f49 to fd8a690 Compare January 7, 2026 02:10
@skypher
Copy link
Copy Markdown
Author

skypher commented Jan 7, 2026

Nice ! Now - just rebase and resolving conflict :)

Awesome, I think it looks good now!

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Jan 7, 2026

Hmm.. the problem we see now - is that atheris does not seem to be well prepared for our environment:

Attempting to install atheris in our image - which is our "gold standard" repeatable "works for me" environment that we use in our CI and when we want to reproduce what happens there locally - fails:

  1. Atheris seems like mostly C based library - that has thin python wrapper around - and it does not have Python 3.10 pre-compiled wheels - only 3.11- 3.13. And it fails when it's being build, because of clang compiler missing (and likely it would need some configuration of the image environment to fix it). We could likely overcome it by simply limiting the oss fuzzer to 3.11 - 3.13 though.

  2. But more importantlky - there are no ARM wheels. Which means that most of our developers will not be able to run it locally on their M1 macs. This is a bigger issue, because it means that if someone would like to reproduce it locally on Mac, they won't be able to do so - or it will be generally much more brittle - also if they will not use image, this is more likely to fail because their environment is not properly configured for compiling atheris.

I believe you are somewhat connected to Atheris @skypher -> maybe they can simply update their build and release process and produce the binary wheels for all the common platforms - including ARM and MacOS ? Or at the very list make sure that manylinux ARM wheels are available.

See https://pypi.org/project/atheris/3.0.0/#files - there are just three binary wheels, only for AMD

@skypher
Copy link
Copy Markdown
Author

skypher commented Jan 9, 2026

Atheris PR: google/atheris#99

@skypher skypher force-pushed the ossfuzz-fuzzers-v2 branch from fd8a690 to 7378248 Compare January 9, 2026 09:46
@AidenRHall
Copy link
Copy Markdown

Hey folks, thanks so much for putting this together. We have dropped support for old Python versions because now that the bytecode is changing to much between Python versions it's not realistic to maintain so many versions with the relatively limited engineering cycles we have to devote to this project. You can use older Atheris versions for that however. Adding ARM / Apple Silicon support is something we have on our roadmap and we are hoping to get that up and running this year.

I am curious what problem this PR is trying to solve? Having some CI tests sounds useful but why are we moving OSSFuzz tests into the fuzzing framework itself? There are a ton of atheris fuzzers in OSSFuzz, why is this one being moved specifically?

@skypher
Copy link
Copy Markdown
Author

skypher commented Jan 13, 2026

Hey folks, thanks so much for putting this together. We have dropped support for old Python versions because now that the bytecode is changing to much between Python versions it's not realistic to maintain so many versions with the relatively limited engineering cycles we have to devote to this project. You can use older Atheris versions for that however. Adding ARM / Apple Silicon support is something we have on our roadmap and we are hoping to get that up and running this year.

I am curious what problem this PR is trying to solve? Having some CI tests sounds useful but why are we moving OSSFuzz tests into the fuzzing framework itself? There are a ton of atheris fuzzers in OSSFuzz, why is this one being moved specifically?

Hi @AidenRHall, thanks for the comment!

I think there may be a small misunderstanding - we're not moving anything out of OSS-Fuzz. Airflow doesn't currently have OSS-Fuzz integration at all.

This PR adds new fuzzing harnesses to the Airflow repository itself, following the standard pattern where projects maintain their fuzz targets in-tree. These aren't CI tests - they're fuzz targets intended for continuous fuzzing via OSS-Fuzz. The eventual goal would be to set up a projects/airflow/ configuration in OSS-Fuzz that points to these harnesses.

Does that clarify things?

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Jan 16, 2026

And just to add @AidenRHall -> the idea here is that we would also like to experiment with more fuzzing ourselves in Airflow. We generally have approach that we do not add anything in our repo - even if it is going to be run externally by OSSFuzz - so that we can reproduce it locally easily.

We are starting small and we want to add more fuzzing in Airflow And @skypher was kind enough to propose the PR and adding PR that might be usable by OSSFuzz. However, if we are to make a good use of fuzzing and add it in various parts of Airflow, our contributors need to have an easy way of iterating on it - adding new fuzzing, modifying existing one - and this all should be locally runnable. Many of our contributors have Mac ARM devices they are developing Airflow on. Most PMC members and committers in fact. So if we are serious about fuzzing and about getting people involved in making good use of it - we need to make it easy for them to contribute to our fuzzing.

This is the main reason why we also try to use our CI to test it. While Python version is not a blocker (we can easily run it in CI only for Python 3.11+), lack of native ARM wheels is pretty much a blocker - taking into account the time it takes to build Atheris and the environment needed for build to succeed.

I hope that clarifies why ARM support is so important for us.

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Jan 16, 2026

BTW. @skypher -> you can get the Python 3.10 failure go away by adding ; python_version >= "3.11" to the dependencies in the added pyproject.toml

@skypher skypher force-pushed the ossfuzz-fuzzers-v2 branch from a6df5aa to d3b6ca3 Compare January 16, 2026 03:13
@skypher
Copy link
Copy Markdown
Author

skypher commented Jan 16, 2026

BTW. @skypher -> you can get the Python 3.10 failure go away by adding ; python_version >= "3.11" to the dependencies in the added pyproject.toml

Updated, thanks a lot!

@skypher
Copy link
Copy Markdown
Author

skypher commented Jan 21, 2026

Hey folks, thanks so much for putting this together. We have dropped support for old Python versions because now that the bytecode is changing to much between Python versions it's not realistic to maintain so many versions with the relatively limited engineering cycles we have to devote to this project. You can use older Atheris versions for that however. Adding ARM / Apple Silicon support is something we have on our roadmap and we are hoping to get that up and running this year.

Hey again Aiden, just wondering if there's anything we can do to unblock our Atheris PR for these binaries. For your convenience, here's a copy of the link: google/atheris#99

Let us know please :-)

@skypher
Copy link
Copy Markdown
Author

skypher commented Feb 11, 2026

@AidenRHall any update on this? Thanks!

Adds base infrastructure for OSS-Fuzz fuzzing under `scripts/ossfuzz/`.

Includes:
- pyproject.toml with proper Python packaging (Private :: Do Not Upload)
- Dependencies on apache-airflow-core, apache-airflow-providers-standard, atheris
- Entry points for uv run support
- README documenting security model alignment and local testing
Fuzzer for DAG serialization/deserialization targeting
`DagSerialization.from_dict()`.

Used by Scheduler and API Server with schema validation.
Input comes from DAG parsing and caching.

Includes:
- `.options` with max_len tuning
- `.dict` for structured input fuzzing
- Seed corpus with minimal DAG JSON
Adds fuzzer for Connection URI parsing which is a security boundary
(API input validation).

Target: Connection._parse_from_uri() and sanitize_conn_id()

Includes dictionary, options file, and seed corpus.
@potiuk potiuk force-pushed the ossfuzz-fuzzers-v2 branch from d3b6ca3 to bb46319 Compare February 15, 2026 19:03
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Feb 15, 2026

Would be great to get it in and try it :D

@skypher
Copy link
Copy Markdown
Author

skypher commented Feb 16, 2026

Would be great to get it in and try it :D

Glad to see your ping! What's needed to get it merged? Are we blocked on the Atheris issue or do you think we can proceed as-is? Doesn't seem like they're willing to get this in anytime soon.

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Feb 17, 2026

Would be great to get it in and try it :D

Glad to see your ping! What's needed to get it merged? Are we blocked on the Atheris issue or do you think we can proceed as-is? Doesn't seem like they're willing to get this in anytime soon.

I guess if ARM is not supported, we can try it without. That will limit local testing, but well, tough.

@jscheffl jscheffl marked this pull request as draft March 15, 2026 21:56
@jscheffl
Copy link
Copy Markdown
Contributor

@skypher This PR has been converted to draft because it does not yet meet our Pull Request quality criteria.

Issues found:

  • Pre-commit / static checks: Failing: CI image checks / Static checks. Run prek run --from-ref main locally to find and fix issues. See Pre-commit / static checks docs.
  • mypy (type checking): Failing: CI image checks / MyPy checks (mypy-dev). Run prek --stage manual mypy-dev --all-files locally to reproduce. You need breeze ci-image build --python 3.10 for Docker-based mypy. See mypy (type checking) docs.

Note: Your branch is 846 commits behind main. Some check failures may be caused by changes in the base branch rather than by your PR. Please rebase your branch and push again to get up-to-date CI results.

What to do next:

  • The comment informs you what you need to do.
  • Fix each issue, then mark the PR as "Ready for review" in the GitHub UI - but only after making sure that all the issues are fixed.
  • Maintainers will then proceed with a normal review.

Converting a PR to draft is not a rejection — it is an invitation to bring the PR up to the project's standards so that maintainer review time is spent productively. If you have questions, feel free to ask on the Airflow Slack.

@potiuk potiuk closed this Apr 6, 2026
@potiuk potiuk added the closed because of multiple quality violations Label used to close the PRs when there are multiple quality violations label Apr 6, 2026
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 6, 2026

This draft pull request has had no activity from the author for over 3 weeks.

@skypher, you are welcome to reopen this PR when you are ready to continue working on it. Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

closed because of multiple quality violations Label used to close the PRs when there are multiple quality violations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants