Skip to content

PHOENIX-7677 TTL_DELETE CDC event to use batch mutation #2247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 30, 2025

Conversation

virajjasani
Copy link
Contributor

Jira: PHOENIX-7677

Inserting TTL_DELETE CDC event from CompactionScanner for single expired row at a time can cause slight degradation to the write workload to the CDC index table.

To improve the performance of inserting TTL_DELETE events, CompactionScanner can accumulate 50 mutations by default and perform the batchMutate(). Any mutations that are not inserted into the CDC index should be inserted with batchMutate() while closing the CompactionScanner.

@virajjasani virajjasani requested review from palashc and tkhurana July 25, 2025 19:50
Copy link
Contributor

@palashc palashc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments.
LGTM otherwise, +1

for (int retryCount = 0; retryCount < cdcTtlMutationMaxRetries; retryCount++) {
try (Table cdcIndexTable =
env.getConnection().getTable(TableName.valueOf(cdcIndex.getPhysicalName().getBytes()))) {
cdcIndexTable.put(new ArrayList<>(pendingMutations.values()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, what are the semantics here for partial failures when trying to apply batch mutations? We will log failure even if some mutations from the batch were applied after all retries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using batchMutate() on the table where all the rowkeys are most likely expected to go to same region, either all mutations will be successful or none

* that all 82 rows have CDC TTL_DELETE events recorded with correct pre-image data.
*/
@Test
public void testCDCBatchMutationsForTTLExpiredRows() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to verify the actual batching and retry logic from this IT? Do you think a metric for this operation would be helpful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metric would be helpful but initiating retry is not that simple, certainly should be kept separately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palashc i have added bunch of logs, do you think we still need metrics?

*/
public final class CDCCompactionUtil {

private static final Logger LOGGER = LoggerFactory.getLogger(CDCCompactionUtil.class);

// Shared cache for row images across all CompactionScanner instances in the JVM.
// Entries expire after 1200 seconds (20 minutes) by default.
// The JVM level cache helps merge the pre-image for the row with multiple CFs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that for multiple CFs all compactionscanner instances will run in the same JVM ? What if the RS aborts between compaction on two CFs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is best case scenario only. Let me add a note here. Generating TTL_DELETE event for multi-CF is not guaranteed to be accurate due to the edge case with server crash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than comment in the code, let me clarify this in Jira

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me also evaluate doing batch writes only during CompactionScanner close(). This way, we might be able to guarantee no incorrect values for the CDC event. Let me think about this.

@virajjasani virajjasani requested a review from tkhurana July 29, 2025 20:58
@virajjasani virajjasani merged commit 26444e0 into apache:master Jul 30, 2025
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants