Skip to content

Add CI, integration test infrastructure, and testcontainer-based tests#8

Merged
lhotari merged 27 commits intoapache:masterfrom
merlimat:fix-build
Apr 1, 2026
Merged

Add CI, integration test infrastructure, and testcontainer-based tests#8
lhotari merged 27 commits intoapache:masterfrom
merlimat:fix-build

Conversation

@merlimat
Copy link
Copy Markdown
Contributor

Summary

  • Add GitHub Actions CI workflow with build, license check (RAT), and test jobs
  • Add README, LICENSE, and NOTICE files
  • Add testcontainer-based unit tests for connectors that previously only had end-to-end integration tests:
    • Cassandra: CassandraStringSinkTest — verifies sink writes to a real Cassandra instance
    • MongoDB: MongoSinkContainerTest — verifies sink writes JSON documents to a real MongoDB instance
    • Debezium MySQL: DebeziumMysqlSourceTest — verifies CDC events flow from MySQL snapshot through the source connector
    • Debezium Postgres: DebeziumPostgresSourceTest — verifies CDC events flow from Postgres via pgoutput
  • Fix AlluxioSinkTest to skip gracefully when the embedded cluster times out (flaky in CI)
  • Fix various build issues: Solr Jetty toolchain, Docker NAR resolution, ES SSL test timeouts, logging dependencies
  • Add RAT license exclusions for certificate and generated files
  • Add test dependencies for kafka-connect-adaptor and debezium-core modules

Test plan

merlimat added 27 commits March 26, 2026 19:06
Copy connector-specific integration tests that were removed from the
main pulsar repository as part of PIP-465. These tests exercise
specific connector sinks/sources against real external services
(Cassandra, Elasticsearch, Kafka, RabbitMQ, Debezium, JDBC, etc.).

Includes:
- Sink testers and PulsarSinksTest
- Source testers (Kafka, Mongo, Debezium variants)
- Container definitions (Cassandra, Debezium, RabbitMQ)
- TestNG suite XMLs for IO sources, sinks, and Oracle source
CI pipeline for pulsar-connectors with:
- Build and license check (RAT)
- Unit tests split into 3 groups: Kafka Connect Adaptor,
  Elasticsearch, and all other connectors
- Runs on PRs and pushes to main/branch-* branches
- LICENSE: Apache License 2.0 (same as main pulsar repo)
- NOTICE: ASF notice file (same as main pulsar repo)
- README: Overview of available connectors, build instructions,
  usage guide, and versioning policy
Restrict pull_request trigger to main/branch-* to avoid running
both push and pull_request workflows on the same commit.
Configure Apache RAT exclusions for build artifacts, generated files
(Kinesis flatbuffers), certificates, IDE files, and other non-source
files that don't require license headers.
KCA tests extend ProducerConsumerBase from pulsar-broker test-jar,
which requires the full broker test infrastructure and is not published
to Maven Central. Disable test compilation until test artifacts are
published or tests are restructured as integration tests.

Add pulsar-functions-api, pulsar-functions-instance, and pulsar-broker
to the version catalog for future use.
Add pulsar-broker (with tests classifier), pulsar-functions-api,
pulsar-functions-instance, and testmocks to the version catalog.

KCA tests are temporarily disabled as they depend on broker internals
that changed since the 4.1.3 release. They will be re-enabled once
matching pulsar artifacts are available.
PulsarSchemaHistoryTest extends ProducerConsumerBase from the broker
test-jar. Add the broker and testmocks test-jar dependencies.
- Add docker/pulsar-all to build a connector image on top of
  apachepulsar/pulsar base image
- Add tests/integration module with build.gradle.kts for connector
  integration tests (sinks, sources, Oracle debezium)
- Add integration test CI jobs to workflow
- Fix debezium-core test dependency (testmocks doesn't use tests classifier)
- Fix version catalog access for modules outside subprojects block
The Gradle build always passes the versioned image via --build-arg.
Removing the default prevents accidentally using :latest.
The dockerBuild task should be invoked explicitly, not as part of
assemble/build, since it requires all connector NARs to be built first.
The pulsar integration test infrastructure (PulsarCluster, PulsarContainer,
etc.) is not published to Maven Central. Copy the necessary classes locally
so integration tests can compile without depending on unpublished test-jars.

Copied packages:
- containers (PulsarContainer, ChaosContainer, BK/ZK/Broker/Proxy/Worker)
- docker (ContainerExecResult, DockerUtils)
- topologies (PulsarCluster, PulsarClusterSpec, PulsarTestBase)
- functions (PulsarFunctionsTestBase, CommandGenerator)
- suites (PulsarTestSuite)
- utils (TestRetrySupport, ExtendedNettyLeakDetector)
- Add pulsar-buildtools dependency to debezium-core for TestRetrySupport
  (needed by ProducerConsumerBase test base class)
- Add gradle.properties with JVM heap settings (-Xmx4g) to prevent OOM
  during compilation in CI
- Remove BatchSourceTest and DataGeneratorSourceTest from sources XML
  (those tests stayed in pulsar repo, they test runtime not connectors)
- Integration tests need Docker image setup (tracked separately)
- Add log4j2 test runtime dependencies to all subprojects (was provided
  by buildtools in pulsar repo). Fixes Alluxio timeout and other tests
  that depend on logging being available.
- Fix Solr test Jetty version conflict: use resolutionStrategy to force
  Jetty 10.x for test configs (Solr 9.x requires javax.servlet)
- Increase OpenSearch SSL test container timeouts and memory settings
Create a pulsar-connectors-test Docker image that layers connector NARs
and TLS certificates on top of apachepulsar/pulsar. This replaces the
pulsar-test-latest-version image that was built in the pulsar repo.

- Add docker/pulsar-connectors-test with Dockerfile and build.gradle.kts
- Copy TLS certificate-authority from pulsar repo for integration tests
- Update PulsarContainer default image to pulsar-connectors-test
- Build test Docker image in CI before running integration tests
- Exclude org.eclipse.jetty.toolchain from Jetty 10 version forcing
  (toolchain artifacts use their own version scheme)
- Rewrite Docker test image build to use jar task outputs instead of
  NAR artifact type resolution (the NAR plugin produces .nar via jar
  task, not via separate artifact type)
The integration test infrastructure requires scripts like run-local-zk.sh
that only exist in the pulsar-test-latest-version image, not in the base
pulsar image. Use the published test image as the base and layer connector
NARs on top.
The pulsar-test-latest-version image is not published to Docker Hub.
Instead, build the test image from apachepulsar/pulsar base image by
adding the required test scripts (run-local-zk.sh, run-broker.sh, etc.),
supervisor config, connector NARs, and TLS certificates.
…tests

Replace the complex multi-node cluster integration test infrastructure
with simpler testcontainers-based tests that run as unit tests.

- Remove tests/integration module and all copied PulsarCluster infrastructure
- Remove docker/pulsar-connectors-test image (no longer needed)
- Convert PulsarSchemaHistoryTest to use testcontainers-pulsar standalone
  instead of embedded pulsar-broker
- Remove pulsar-broker-test, testmocks, buildtools deps from debezium-core
- Add testcontainers-pulsar and testcontainers-cassandra to version catalog
- Simplify CI to single ./gradlew test job
- Cassandra: new CassandraStringSinkTest that verifies the sink writes
  records to a real Cassandra instance via testcontainers
- MongoDB: new MongoSinkContainerTest that verifies the sink writes
  JSON documents to a real MongoDB instance via testcontainers
- Add testcontainers-cassandra, testcontainers-mongodb, and
  mongodb-driver-sync test dependencies
- MySQL: new DebeziumMysqlSourceTest that starts a MySQL container with
  binlog enabled and a Pulsar container, then verifies CDC events flow
  through the DebeziumMysqlSource from the initial snapshot
- Postgres: new DebeziumPostgresSourceTest that starts a Postgres
  container with wal_level=logical and verifies CDC events via pgoutput
The embedded Alluxio LocalAlluxioCluster has a hardcoded 200s timeout
for master startup which is frequently exceeded in CI environments
with limited resources. Convert the TimeoutException to a test skip
instead of a test failure.
@merlimat merlimat requested a review from lhotari March 28, 2026 05:49
@lhotari lhotari merged commit 32d9445 into apache:master Apr 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants