Skip to content

Conversation

gouhongshen
Copy link
Contributor

@gouhongshen gouhongshen commented Sep 8, 2025

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue ##22460

What this PR does / why we need it:

Delete files after merging deletions on objects.


PR Type

Enhancement


Description

  • Refactor garbage collection for transaction objects

  • Add file cleanup after deletion merging operations

  • Extract common GC logic into reusable functions

  • Improve error handling and logging for GC operations


Diagram Walkthrough

flowchart LR
  A["Transaction Operations"] --> B["Merge Deletions"]
  B --> C["gcObjsByStats"]
  A --> D["Rollback/Cleanup"]
  D --> E["gcObjsByIdxRange"]
  C --> F["gcFiles"]
  E --> F
  F --> G["Async File Deletion"]
Loading

File Walkthrough

Relevant files
Enhancement
txn.go
Update GC calls and add deletion cleanup                                 

pkg/vm/engine/disttae/txn.go

  • Rename gcObjs to gcObjsByIdxRange for clarity
  • Add gcObjsByStats calls after deletion merging operations
  • Update rollback operations to use renamed GC function
+6/-3     
types.go
Refactor GC functions and improve error handling                 

pkg/vm/engine/disttae/types.go

  • Extract common file deletion logic into gcFiles function
  • Add new gcObjsByStats method for stats-based cleanup
  • Refactor gcObjsByIdxRange to use extracted gcFiles
  • Improve error handling with deferred logging
+70/-44 

Copy link

qodo-merge-pro bot commented Sep 8, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

22460 - Partially compliant

Compliant requirements:

  • Ensure objects created by CN that are not committed to TN are properly cleaned up.
  • Perform GC after deletion-merge/compaction so temporary objects are removed.
  • Handle GC during rollback and statement rollback paths.
  • Add robust error handling and logging around GC operations.

Non-compliant requirements:

  • Prevent file/object leaks when a transaction performs insert and drop table (and similar patterns) within the same transaction.

Requires further human verification:

  • Validate via integration test or manual reproduction steps that no files leak in the provided scenario (insert + drop table within a single transaction).
  • Confirm GC runs reliably on CN failure scenarios (asynchronously submitted deletions) and no orphaned objects remain.
  • Confirm behavior under concurrent compaction and rollback.
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

In gcFiles, the log inside the async closure references indices i and step which belong to the outer loop and can change before execution, potentially logging a wrong slice range and even causing out-of-bounds if used elsewhere later. Use the captured start/end for both deletion and logging.

for i := 0; i < len(names); i += step {
	if i+step > len(names) {
		step = len(names) - i
	}
	start := i
	end := i + step
	if err := e.gcPool.Submit(func() {
		//notice that the closure can't capture the loop variable i, so we need to use start and end.
		if err := e.fs.Delete(context.Background(), names[start:end]...); err != nil {
			logutil.Warnf("failed to delete objects:%v, err:%v", names[i:i+step], err)
		}
	}); err != nil {
		return err
	}
}
Error Handling

The GC functions swallow deletion errors by logging and returning nil from submitted tasks; callers only get Submit errors. Consider propagating or aggregating deletion failures, or at least exposing metrics/callbacks to ensure leaks are detectable.

func gcFiles(e *Engine, names ...string) error {
	//gc the objects asynchronously.
	//TODO:: to handle the failure when CN is down.
	step := GCBatchOfFileCount
	if len(names) > 0 && len(names) < step {
		if err := e.gcPool.Submit(func() {
			if err := e.fs.Delete(context.Background(), names...); err != nil {
				logutil.Warnf("failed to delete objects:%v, err:%v", names, err)
			}
		}); err != nil {
			return err
		}

		return nil
	}

	for i := 0; i < len(names); i += step {
		if i+step > len(names) {
			step = len(names) - i
		}
		start := i
		end := i + step
		if err := e.gcPool.Submit(func() {
			//notice that the closure can't capture the loop variable i, so we need to use start and end.
			if err := e.fs.Delete(context.Background(), names[start:end]...); err != nil {
				logutil.Warnf("failed to delete objects:%v, err:%v", names[i:i+step], err)
			}
		}); err != nil {
			return err
		}
	}

	return nil
GC Trigger Timing

Calling gcObjsByStats immediately after compacting deletions may race with subsequent operations that still reference those objects. Verify lifecycle guarantees and ensure stats passed to GC are safe to delete at these points.

					entry.tableName,
					stats.ObjectName().String(),
					bat,
					entry.tnStore,
				); err != nil {
					bat.Clean(txn.proc.Mp())
					return err
				}

				_ = txn.gcObjsByStats(stats)
				bat.Clean(txn.proc.Mp())

			} else {
				dirtyObject = append(dirtyObject, stats)
			}

			offset += int(stats.BlkCnt())
		}

		txn.writes[i].bat.Clean(txn.proc.GetMPool())
		txn.writes[i].bat = nil
	}

	if txn.compactWorker == nil {
		txn.compactWorker, _ = ants.NewPool(min(runtime.NumCPU(), 4))
		defer func() {
			txn.compactWorker.Release()
			txn.compactWorker = nil
		}()
	}

	for _, stats := range dirtyObject {
		waiter.Add(1)
		txn.compactWorker.Submit(func() {
			compactFunc(stats)
		})
	}

	waiter.Wait()
	_ = txn.gcObjsByStats(dirtyObject...)

	return nil
}

Copy link

qodo-merge-pro bot commented Sep 8, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Fix zap import and closure capture

The new logging in types.go uses zap fields but the file does not import
go.uber.org/zap, which will break the build. In gcFiles, the goroutine logs
names[i:i+step], capturing loop variables i/step that may be invalid at
execution time and cause a panic; use the precomputed start/end (or copy the
slice) and avoid referencing i/step inside the closure.

Examples:

pkg/vm/engine/disttae/types.go [729]
				logutil.Warnf("failed to delete objects:%v, err:%v", names[i:i+step], err)
pkg/vm/engine/disttae/types.go [740-751]
func (txn *Transaction) gcObjsByStats(sl ...objectio.ObjectStats) (err error) {
	var names []string

	defer func() {
		if err != nil {
			logutil.Warn("gc objects by stats list failed",
				zap.String("txn", txn.op.Txn().DebugString()),
				zap.Strings("names", names),
				zap.Error(err),
			)

 ... (clipped 2 lines)

Solution Walkthrough:

Before:

// in pkg/vm/engine/disttae/types.go
// import "go.uber.org/zap" is missing

func gcFiles(e *Engine, names ...string) error {
  // ...
  for i := 0; i < len(names); i += step {
    if i+step > len(names) {
      step = len(names) - i
    }
    start := i
    end := i + step
    e.gcPool.Submit(func() {
      if err := e.fs.Delete(context.Background(), names[start:end]...); err != nil {
        logutil.Warnf("failed to delete objects:%v, err:%v", names[i:i+step], err)
      }
    })
  }
  return nil
}

After:

// in pkg/vm/engine/disttae/types.go
import "go.uber.org/zap" // Add missing import

func gcFiles(e *Engine, names ...string) error {
  // ...
  for i := 0; i < len(names); i += step {
    if i+step > len(names) {
      step = len(names) - i
    }
    start := i
    end := i + step
    e.gcPool.Submit(func() {
      if err := e.fs.Delete(context.Background(), names[start:end]...); err != nil {
        logutil.Warnf("failed to delete objects:%v, err:%v", names[start:end], err)
      }
    })
  }
  return nil
}
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a missing zap import that would break the build and a subtle closure variable capture bug in gcFiles that could lead to a panic.

High
Possible issue
Fix incorrect error logging indices

The error logging uses incorrect slice indices names[i:i+step] instead of the
actual slice being deleted names[start:end]. This could log wrong filenames when
deletion fails.

pkg/vm/engine/disttae/types.go [728-730]

 if err := e.fs.Delete(context.Background(), names[start:end]...); err != nil {
-	logutil.Warnf("failed to delete objects:%v, err:%v", names[i:i+step], err)
+	logutil.Warnf("failed to delete objects:%v, err:%v", names[start:end], err)
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a bug where the logging statement uses incorrect slice indices (names[i:i+step]) which could lead to incorrect error messages in the final iteration of the loop.

Medium
General
Handle empty input case

The function returns early when len(names) == 0, but this case is not handled
explicitly. Add an early return for empty input to avoid unnecessary processing
and potential issues with the batch deletion logic.

pkg/vm/engine/disttae/types.go [704-738]

 func gcFiles(e *Engine, names ...string) error {
+	if len(names) == 0 {
+		return nil
+	}
+	
 	//gc the objects asynchronously.
 	//TODO:: to handle the failure when CN is down.
 	step := GCBatchOfFileCount
-	if len(names) > 0 && len(names) < step {
+	if len(names) < step {
 		if err := e.gcPool.Submit(func() {
 			if err := e.fs.Delete(context.Background(), names...); err != nil {
 				logutil.Warnf("failed to delete objects:%v, err:%v", names, err)
 			}
 		}); err != nil {
 			return err
 		}
 
 		return nil
 	}
 
 	for i := 0; i < len(names); i += step {
 		if i+step > len(names) {
 			step = len(names) - i
 		}
 		start := i
 		end := i + step
 		if err := e.gcPool.Submit(func() {
 			//notice that the closure can't capture the loop variable i, so we need to use start and end.
 			if err := e.fs.Delete(context.Background(), names[start:end]...); err != nil {
-				logutil.Warnf("failed to delete objects:%v, err:%v", names[i:i+step], err)
+				logutil.Warnf("failed to delete objects:%v, err:%v", names[start:end], err)
 			}
 		}); err != nil {
 			return err
 		}
 	}
 
 	return nil
 
 }
  • Apply / Chat
Suggestion importance[1-10]: 3

__

Why: The suggestion correctly points out that adding an explicit check for an empty names slice improves readability, although the current code already handles this case correctly by skipping the loop.

Low
  • Update

@mergify mergify bot merged commit 5d98dc1 into matrixorigin:main Sep 9, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working Review effort 3/5 size/M Denotes a PR that changes [100,499] lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants