PHOENIX-7677 TTL_DELETE CDC event to use batch mutation #2247

virajjasani · 2025-07-25T19:50:24Z

Inserting TTL_DELETE CDC event from CompactionScanner for single expired row at a time can cause slight degradation to the write workload to the CDC index table.

To improve the performance of inserting TTL_DELETE events, CompactionScanner can accumulate 50 mutations by default and perform the batchMutate(). Any mutations that are not inserted into the CDC index should be inserted with batchMutate() while closing the CompactionScanner.

palashc

Left a couple of comments.
LGTM otherwise, +1

palashc · 2025-07-29T17:35:52Z

phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/CDCCompactionUtil.java

+      for (int retryCount = 0; retryCount < cdcTtlMutationMaxRetries; retryCount++) {
+        try (Table cdcIndexTable =
+          env.getConnection().getTable(TableName.valueOf(cdcIndex.getPhysicalName().getBytes()))) {
+          cdcIndexTable.put(new ArrayList<>(pendingMutations.values()));


For my understanding, what are the semantics here for partial failures when trying to apply batch mutations? We will log failure even if some mutations from the batch were applied after all retries?

Since we are using batchMutate() on the table where all the rowkeys are most likely expected to go to same region, either all mutations will be successful or none

palashc · 2025-07-29T17:39:00Z

phoenix-core/src/it/java/org/apache/phoenix/end2end/TableTTLIT.java

+   * that all 82 rows have CDC TTL_DELETE events recorded with correct pre-image data.
+   */
+  @Test
+  public void testCDCBatchMutationsForTTLExpiredRows() throws Exception {


Is there a way to verify the actual batching and retry logic from this IT? Do you think a metric for this operation would be helpful?

Metric would be helpful but initiating retry is not that simple, certainly should be kept separately

@palashc i have added bunch of logs, do you think we still need metrics?

tkhurana · 2025-07-29T19:43:06Z

phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/CDCCompactionUtil.java

 */
 public final class CDCCompactionUtil {

  private static final Logger LOGGER = LoggerFactory.getLogger(CDCCompactionUtil.class);

+  // Shared cache for row images across all CompactionScanner instances in the JVM.
+  // Entries expire after 1200 seconds (20 minutes) by default.
+  // The JVM level cache helps merge the pre-image for the row with multiple CFs.


Is it guaranteed that for multiple CFs all compactionscanner instances will run in the same JVM ? What if the RS aborts between compaction on two CFs ?

Yes, this is best case scenario only. Let me add a note here. Generating TTL_DELETE event for multi-CF is not guaranteed to be accurate due to the edge case with server crash.

Rather than comment in the code, let me clarify this in Jira

Let me also evaluate doing batch writes only during CompactionScanner close(). This way, we might be able to guarantee no incorrect values for the CDC event. Let me think about this.

PHOENIX-7677 TTL_DELETE CDC event to use batch mutation

b68149e

virajjasani requested review from palashc and tkhurana July 25, 2025 19:50

virajjasani added 3 commits July 27, 2025 10:15

test fix

b639002

cache expiry in sec

21201e1

minor change in sleep

2dd3440

palashc approved these changes Jul 29, 2025

View reviewed changes

tkhurana reviewed Jul 29, 2025

View reviewed changes

batch mutate only while closing the scanner

9abc4f5

virajjasani requested a review from tkhurana July 29, 2025 20:58

tkhurana approved these changes Jul 29, 2025

View reviewed changes

virajjasani merged commit 26444e0 into apache:master Jul 30, 2025
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PHOENIX-7677 TTL_DELETE CDC event to use batch mutation #2247

PHOENIX-7677 TTL_DELETE CDC event to use batch mutation #2247

Uh oh!

virajjasani commented Jul 25, 2025

Uh oh!

palashc left a comment

Uh oh!

palashc Jul 29, 2025

Uh oh!

virajjasani Jul 29, 2025

Uh oh!

palashc Jul 29, 2025

Uh oh!

virajjasani Jul 29, 2025

Uh oh!

virajjasani Jul 29, 2025

Uh oh!

tkhurana Jul 29, 2025

Uh oh!

virajjasani Jul 29, 2025

Uh oh!

virajjasani Jul 29, 2025

Uh oh!

virajjasani Jul 29, 2025

Uh oh!

Uh oh!

Uh oh!

PHOENIX-7677 TTL_DELETE CDC event to use batch mutation #2247

PHOENIX-7677 TTL_DELETE CDC event to use batch mutation #2247

Uh oh!

Conversation

virajjasani commented Jul 25, 2025

Uh oh!

palashc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!