Skip to content
11 changes: 11 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -461,3 +461,14 @@ sourcesDistTar.finalizedBy generateDistributionChecksums
sourcesDistZip.finalizedBy generateDistributionChecksums
assembleDist.finalizedBy generateDistributionChecksums
assembleSourcesDist.finalizedBy generateDistributionChecksums

// Wire shadowJar to run as part of the core subproject's assemble (and therefore the
// root aggregate assemble), without creating a root-level finalizedBy on shadowJar.
// The latter chains through distTar/distZip (which `dependsOn` every subproject Jar via
// the `distributions { from subprojects.collect { it.tasks.withType(Jar) } }` block) and
// loops back to :assemble, causing a Gradle 8 circular-dependency failure.
project(':cassandra-analytics-core') {
afterEvaluate {
tasks.named('assemble').configure { dependsOn tasks.named('shadowJar') }
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
import java.util.Set;

import org.apache.cassandra.spark.data.CqlField;
import org.apache.cassandra.spark.data.FileType;
import org.apache.cassandra.spark.data.SSTable;
import org.apache.cassandra.spark.data.SSTablesSupplier;
import org.apache.cassandra.spark.reader.IndexEntry;
Expand Down Expand Up @@ -498,4 +499,50 @@ public void indexFileSkipped()
{

}

/**
* S3 headObject operation performed for existence check.
*
* @param timeNanos time taken in nanoseconds for the S3 headObject operation
*/
public void s3HeadObjectOperation(long timeNanos)
{

}

/**
* S3 getObject operation performed for data retrieval.
*
* @param timeNanos time taken in nanoseconds for the S3 getObject operation
*/
public void s3GetObjectOperation(long timeNanos)
{

}

/**
* Mutable SSTable metadata objects (Summary.db, Filter.db, Statistics.db) can be rewritten in place
* on Cassandra live data directories. Backup manifests can therefore carry a stale size for the
* same S3 key. This event records when the actual object size differs from the manifest-provided
* size.
*
* @param fileType mutable SSTable component type
* @param manifestSize size recorded in the autosnap manifest
* @param actualSize current S3 object size
*/
public void s3MutableMetadataDriftDetected(FileType fileType, long manifestSize, long actualSize)
{

}

/**
* Mutable metadata object was large enough that the reader used HEAD + exact ranged GET instead
* of a single open-ended GET.
*
* @param fileType mutable SSTable component type
*/
public void s3MutableMetadataHeadFallback(FileType fileType)
{

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -83,4 +83,15 @@ public String getFileSuffix()
{
return fileSuffix;
}

/**
* Whether the on-disk size of this component can drift from the value recorded in a backup
* manifest. Cassandra rewrites Summary/Filter/Statistics in place during compaction, so a
* stale manifest size for these components can produce truncated ranged-GETs against the
* backing store. The data layer treats these components specially when issuing reads.
*/
public boolean isMutableMetadata()
{
return this == SUMMARY || this == FILTER || this == STATISTICS;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code/comment fully accurate? I'm aware Statistics.db can mutate the repair metadata (repairedAt, pendingRepair) to avoid recompaction, and also the compaction level to avoid a full rewrite.

Summary.db appears to only mutate if there is a change in the index sample size (redistributeSummaries).

Filter.db I can't find any place where it mutates. I think it is immutable from the first flush.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, i think filter.db is theoretically immutable but there is a code path that re-writes it:

https://github.com/apache/cassandra/blob/45fa31b/src/java/org/apache/cassandra/io/sstable/format/SSTableReaderBuilder.java#L450
It should be very rare, and if I read the code correctly, it happens during live re-opens of an SSTable (startup, import, streaming receipt), and only when its Filter.db is missing or it has no ValidationMetadata.

So it is theoretically possible but most likely can be treated immutable.

}
}
Loading