feat: [HUDI-9766] Support for show_timeline Procedure with appropriate start and end time for both active and archive timelines #14261

PavithranRick · 2025-11-13T21:59:07Z

Describe the issue this Pull Request addresses

This PR introduces a new comprehensive show_timeline procedure for Hudi Spark SQL that provides detailed timeline information for all table operations. The procedure displays timeline instants including commits, deltacommits, compactions, clustering, cleaning, and rollback operations with support for both active and archived timelines and completed/pending state instants.

Summary and Changelog

Comprehensive timeline view:
Shows all timeline instants with detailed metadata including state transitions (REQUESTED, INFLIGHT, COMPLETED)

Time-based filtering:
Support for startTime and endTime parameters to filter results within specific time ranges

Archive timeline support:
showArchived parameter to include archived timeline data for complete historical view

Generic SQL filtering:
filter parameter supporting SQL expressions for flexible result filtering

Rich metadata output:
Includes formatted timestamps, rollback information, and table type details

The procedure replaces multiple fragmented timeline-related procedures with a single unified interface that provides both pending and completed instant information with partition-specific metadata support.

Impact-related changelog details:

New procedure: show_timeline with parameters (table, path, limit, showArchived, filter, startTime, endTime)
Enhanced schema: 8-column output including instant_time, action, state, requested_time, inflight_time, completed_time, timeline_type, rollback_info
Backward compatibility: existing timeline procedures remain functional (deprecated with guidance to use the new procedure)

Impact

User-facing Features:

Unified timeline interface replacing multiple specialized procedures
Advanced filtering capabilities: time-based + SQL expression filtering
Historical data access through archive timeline support

Performance Impact:

Optimized timeline scanning with proper extension filtering
Configurable limits (default 20 entries if no start/end time provided)
Archive timeline accessed only when explicitly requested

Risk Level

Low

Verification performed

Comprehensive test coverage: 8 focused test cases covering basic functionality, MoR tables, rollback operations, and state transitions, timeline
Schema validation: all output fields properly typed and validated
Error handling: graceful handling of invalid filters, missing tables, and timeline access failures
Timeline consistency: proper handling of both active and archived timelines with correct state mapping

Documentation Update

Add show_timeline procedure to Hudi Spark SQL procedures documentation
Update timeline management examples to use new procedure
Add advanced filtering examples and use cases

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

…ring and generic SQL filters on top

…-for-show_timeline-Procedure-with-appropriate-start-and-end-time-for-both-active-and-archive-timelines

…ring and generic SQL filters on top

…Support-for-show_timeline-Procedure-with-appropriate-start-and-end-time-for-both-active-and-archive-timelines

…9766

…ests for show timelies

PavithranRick · 2025-11-18T19:11:08Z

...udi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestShowTimelineProcedure.scala

+
+        // test show timeline after compaction with archived timeline
+        val timelineResultAfterCompactionDf = spark.sql(s"call show_timeline(table => '$tableName', showArchived => true)")
+        timelineResultAfterCompactionDf.show(false)


show timeline result

…oodieInstant

…from timeline

...on/src/main/java/org/apache/hudi/common/table/timeline/versioning/v1/ArchivedTimelineV1.java

...on/src/main/java/org/apache/hudi/common/table/timeline/versioning/v2/ArchivedTimelineV2.java

...on/src/main/java/org/apache/hudi/common/table/timeline/versioning/v1/ArchivedTimelineV1.java

danny0405 · 2025-11-20T03:44:33Z

...park/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTimelineProcedure.scala

+
+    val activeRollbackInfoMap = getRolledBackInstantInfo(metaClient.getActiveTimeline, metaClient)
+    val archivedRollbackInfoMap = if (showArchived) {
+      val archivedTimeline = metaClient.getArchivedTimeline.reload()


this line will triggers the whole timeline instants loading twice already, metaClient.getArchivedTimeline for one time with load mode as LoadMode.ACTION, and .reload() another time with load mode as LoadMode.ACTION, note that for V2, only the FULL mode supports reading the plans(rollback, cleaning and compaciton).

maybe we just new an empty timeline there and invokes these APIs.

danny0405 · 2025-11-20T03:47:55Z

...park/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTimelineProcedure.scala

+    val limitedInstants = if (startTime.trim.nonEmpty && endTime.trim.nonEmpty) {
+      sortedInstants
+    } else {
+      sortedInstants.take(limit)


if we have a final limit on the timeline instants, do we stil need to add limit to the timeline API?

…rchived timeline partial loading (stop after limit / avoid loading entire archived timeline)

nsivabalan

sharing some feedback for now. will review the rest later today.

nsivabalan · 2025-11-21T23:18:00Z

hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java


  void loadCompletedInstantDetailsInMemory();

+  void loadCompletedInstantDetailsInMemory(String startTs, String endTs);


do we have UTs for these?

hudi-common/src/main/java/org/apache/hudi/common/table/timeline/StoppableRecordConsumer.java

nsivabalan · 2025-11-21T23:31:15Z

...on/src/main/java/org/apache/hudi/common/table/timeline/versioning/v1/ArchivedTimelineV1.java

+        record -> {
+          Object action = record.get(ACTION_STATE);
+          return record.get(ACTION_TYPE_KEY).toString().equals(HoodieTimeline.COMPACTION_ACTION)
+              && (action == null || org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT.toString().equals(action.toString()));


is the action expected to be null for a completed entry?

lets add a comment
// Older files don't have action state set.

nsivabalan · 2025-11-21T23:40:52Z

...on/src/main/java/org/apache/hudi/common/table/timeline/versioning/v1/ArchivedTimelineV1.java

+        metaClient, null, Option.empty(), LoadMode.PLAN, commitsFilter, loader);
+    List<HoodieInstant> collectedInstants = loader.getCollectedInstants();
+    List<HoodieInstant> newInstants = collectedInstants.stream()
+        .filter(instant -> !getInstants().contains(instant))


can we do getInstants() once outside the loop and use it here.
currently, this will result in N array lists

nsivabalan · 2025-11-21T23:42:45Z

...on/src/main/java/org/apache/hudi/common/table/timeline/versioning/v1/ArchivedTimelineV1.java

+      return instantsInRange.values()
+          .stream()
+          .flatMap(Collection::stream)
+          .sorted()


don't we need sorting based on instant times followed by states?
i.e REQEUSTED followed by INFLIGHT followed by COMPLETED

nsivabalan · 2025-11-21T23:46:17Z

.../main/java/org/apache/hudi/common/table/timeline/versioning/v1/ArchivedTimelineLoaderV1.java

      entryList.sort(new ArchiveFileVersionComparator());

      for (StoragePathInfo fs : entryList) {
+        if (stoppable != null && stoppable.shouldStop()) {


lets sync f2f on this. I have some questions/clarifications

nsivabalan · 2025-11-21T23:47:16Z

.../main/java/org/apache/hudi/common/table/timeline/versioning/v2/ArchivedTimelineLoaderV2.java

+          try {
+            return LSMTimeline.getMaxInstantTime(fileName);
+          } catch (Exception e) {
+            return "";


why return EMPTY_STRING?

.../main/java/org/apache/hudi/common/table/timeline/versioning/v2/ArchivedTimelineLoaderV2.java

nsivabalan · 2025-11-21T23:49:47Z

.../main/java/org/apache/hudi/common/table/timeline/versioning/v2/ArchivedTimelineLoaderV2.java

-                  }
-                }
+      filteredFiles.parallelStream().forEach(fileName -> {
+        if (stoppable != null && stoppable.shouldStop()) {


if we are doing parallel stream, we can't guarantee latest N with limit N right?
there could be non-continuous entries

actually, we can't marry limit and parallel stream.
so, whenever we wanted to do limit, lets avoid parallel processing.

...on/src/main/java/org/apache/hudi/common/table/timeline/versioning/v2/ArchivedTimelineV2.java

nsivabalan · 2025-11-22T01:34:20Z

.../main/java/org/apache/hudi/common/table/timeline/versioning/v2/ArchivedTimelineLoaderV2.java

-                  }
-                }
+      filteredFiles.parallelStream().forEach(fileName -> {
+        if (stoppable != null && stoppable.shouldStop()) {


actually, we can't marry limit and parallel stream.
so, whenever we wanted to do limit, lets avoid parallel processing.

...park/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTimelineProcedure.scala

nsivabalan · 2025-11-22T01:53:46Z

...park/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTimelineProcedure.scala

+      )
+      val combinedEntries = (activeEntries ++ archivedEntries)
+        .sortWith((a, b) => {
+          val timePriorityOrder = a.getString(0).compareTo(b.getString(0))


lets use RequestedTimeBasedComparator and not come up w/ our own comparator.

nsivabalan · 2025-11-22T01:54:26Z

...park/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTimelineProcedure.scala

+        })
+
+      if (startTime.trim.nonEmpty && endTime.trim.nonEmpty) {
+        combinedEntries


lets try to apply code reuse wherever possible

nsivabalan · 2025-11-22T02:01:58Z

...park/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTimelineProcedure.scala

+
+    rollbackInstants.asScala.foreach { rollbackInstant =>
+      try {
+        if (rollbackInstant.isInflight) {


lets do

if (!rollbackInstant.isComplete...) {

nsivabalan · 2025-11-22T02:02:24Z

...park/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTimelineProcedure.scala

+  private def getRollbackInfo(instant: HoodieInstant, timeline: HoodieTimeline, rollbackInfoMap: Map[String, List[String]], metaClient: HoodieTableMetaClient): String = {
+    try {
+      if (HoodieTimeline.ROLLBACK_ACTION.equalsIgnoreCase(instant.getAction)) {
+        if (instant.isInflight) {


same comment as above

...park/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTimelineProcedure.scala

nsivabalan · 2025-11-22T02:23:51Z

...udi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestShowTimelineProcedure.scala

+
+import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase
+
+class TestShowTimelineProcedure extends HoodieSparkSqlTestBase {


lets try to cover all cases listed below

both v1 and v2

limit 50.

both active and archived 20. where active contains 40.

both active and archived 20, where active contains 10.

start within active timeline.

end within active timeline.

start and end w/n active timeline.

start in archived. but arch not explicitly enabled.

"". archived enabled.

start in archived and end in active timeline. archived not enabled.

start in archived and end in active timeline. archived enabled.

start and end in archived. arch not enabled.

start and end in archived. arch enabled.

completed commits.
infight commits.
completed dc (mor)
inflight dc (mor)
clean commits.
clean inflight commits.
completed compaction.
infligth compaction.
completed replace commits. -> clustering
inflight replace commits. -> clustering
completed replace commits. -> insert overwrite
inflight replace commits. -> insert overwrite
completed rollback.
pending rollback.

In active timeline, we should have mix of inflight and completed. but in archived, we can only have completed and its fine.

danny0405 · 2025-11-22T04:41:55Z

...park/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTimelineProcedure.scala

+    // This way, if all archived instants are older than the active timeline's max instant,
+    // the archived timeline will be empty and won't load anything, avoiding unnecessary loading.
+    // Instead of getArchivedTimeline() which loads with LoadMode.ACTION, we use the startTs
+    // constructor which loads with LoadMode.METADATA, and then load specific details (PLAN for compactions).


The LoadMode.METADATA does not include the plan, are you saying the LoadMode.PLAN or LoadMode.FULL ?

Even if we use the constructor ArchivedTimelineV2(HoodieTableMetaClient metaClient, String startTs), the whole timeline would still be loaded with filtering, a better way to avoid the eager loading is adding a new API in TimelineFactory.createArchivedTimeline(HoodieTableMetaClient metaClient, boolean loadingInstants), add in each ArchivedTimeline, we add a new constructor that accepts the meta client but does not load any instants by default.

Because in the use cases of this PR, we always want a lazy loading of archived timeline and the load is triggered as needed.

danny0405 · 2025-11-22T10:59:03Z

.../main/java/org/apache/hudi/common/table/timeline/versioning/v2/ArchivedTimelineLoaderV2.java

+      }
+
+      // Sort files in reverse chronological order if needed (newest first for limit queries)
+      if (stoppable != null && stoppable.needsReverseOrder()) {


If we do not have good way to plugin the limit logic simply and clean, maybe we just add a separate method in ArchivedTimelineLoader.loadInstants with an explicit param StoppableRecordConsumer, the benefits:

get rid of the null check and instance of check;

always sort the files in reverse chronological order;

read the files in single thread instead of in parallel.

Read with limit is somehow a range query instead of full scan, by doting this, we can freely plugin in the logic required for limit while still keep the basic scan query efficient and clean. We can always comeback to this for better abstraction when it is necessary.

linliu-code · 2025-11-24T17:41:22Z

...on/src/main/java/org/apache/hudi/common/table/timeline/versioning/v1/ArchivedTimelineV1.java

+  public void loadCompactionDetailsInMemory(int limit) {
+    loadInstantsWithLimit(limit, true,
+        record -> {
+          Object action = record.get(ACTION_STATE);


Here action should be state. We should rename it to be clear.

linliu-code · 2025-11-24T17:42:32Z

...on/src/main/java/org/apache/hudi/common/table/timeline/versioning/v1/ArchivedTimelineV1.java

+  public void loadCompletedInstantDetailsInMemory(int limit) {
+    loadInstantsWithLimit(limit, true,
+        record -> {
+          Object action = record.get(ACTION_STATE);


… tests

…use atomic boolean flag

…9766

hudi-bot · 2025-11-25T04:18:59Z

CI report:

c0a4a97 UNKNOWN
3bd5aa3 UNKNOWN
f97ff88 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

vamshikrishnakyatham and others added 7 commits August 28, 2025 17:46

[HUDI-9766] Support for show_timeline procedure with time-based filte…

ffe4ca1

…ring and generic SQL filters on top

[HUDI-9766] Support for show_timeline procedure with time-based filte…

1827abc

…ring and generic SQL filters on top

Merge remote-tracking branch 'upstream/master' into HUDI-9766-Support…

09594b3

…-for-show_timeline-Procedure-with-appropriate-start-and-end-time-for-both-active-and-archive-timelines

[HUDI-9766] Support for show_timeline procedure with time-based filte…

afd7d6b

…ring and generic SQL filters on top

optimization on driver for reading instants

521ed76

Merge branch 'master' of ssh://github.com/apache/hudi into HUDI-9766-…

a0d33b5

…Support-for-show_timeline-Procedure-with-appropriate-start-and-end-time-for-both-active-and-archive-timelines

Merge branch 'master' of ssh://github.com/apache/hudi into pavi-hudi-…

c0a4a97

…9766

github-actions bot added the size:L PR with lines of changes in (300, 1000] label Nov 13, 2025

Pavithran Ravichandiran added 2 commits November 17, 2025 11:11

HUDI-9766 - Added instant loader for archived timeline v2

45aec77

HUDI-9766 - Added instant loader for archived timeline v2 and added t…

5c0a7ae

…ests for show timelies

PavithranRick commented Nov 18, 2025

View reviewed changes

github-actions bot added size:XL PR with lines of changes > 1000 and removed size:L PR with lines of changes in (300, 1000] labels Nov 18, 2025

Pavithran Ravichandiran added 3 commits November 18, 2025 16:08

HUDI-9766 - populating timestamps for achived timeline entries from H…

77d9162

…oodieInstant

HUDI-9766 - using existing methods for filtering time range instants …

3d71552

…from timeline

HUDI-9766 - show_timeline new tests for start and end time

c2daa7a

PavithranRick changed the title ~~[HUDI-9766] Support for show_timeline Procedure with appropriate start and end time for both active and archive timelines~~ feat: [HUDI-9766] Support for show_timeline Procedure with appropriate start and end time for both active and archive timelines Nov 19, 2025

danny0405 reviewed Nov 20, 2025

View reviewed changes

...on/src/main/java/org/apache/hudi/common/table/timeline/versioning/v1/ArchivedTimelineV1.java Show resolved Hide resolved

danny0405 reviewed Nov 20, 2025

View reviewed changes

...on/src/main/java/org/apache/hudi/common/table/timeline/versioning/v2/ArchivedTimelineV2.java Outdated Show resolved Hide resolved

danny0405 reviewed Nov 20, 2025

View reviewed changes

...on/src/main/java/org/apache/hudi/common/table/timeline/versioning/v1/ArchivedTimelineV1.java Outdated Show resolved Hide resolved

danny0405 reviewed Nov 20, 2025

View reviewed changes

Pavithran Ravichandiran added 2 commits November 21, 2025 13:02

HUDI-9766 - show_timeline - addressed PR comments, bug fixes around a…

47eee42

…rchived timeline partial loading (stop after limit / avoid loading entire archived timeline)

HUDI-9766 - show_timeline new tests for start and end time

3ce75dd

nsivabalan reviewed Nov 21, 2025

View reviewed changes

nsivabalan reviewed Nov 22, 2025

View reviewed changes

danny0405 reviewed Nov 22, 2025

View reviewed changes

linliu-code reviewed Nov 24, 2025

View reviewed changes

Pavithran Ravichandiran added 3 commits November 24, 2025 16:47

HUDI-9766 - show_timeline - addressed PR comments and added extensive…

3bd5aa3

… tests

HUDI-9766 - show_timeline - timeline loaders extra check removal and …

81be85d

…use atomic boolean flag

Merge branch 'master' of ssh://github.com/apache/hudi into pavi-hudi-…

f97ff88

…9766


		void loadCompletedInstantDetailsInMemory();

		void loadCompletedInstantDetailsInMemory(String startTs, String endTs);


		import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase

		class TestShowTimelineProcedure extends HoodieSparkSqlTestBase {

feat: [HUDI-9766] Support for show_timeline Procedure with appropriate start and end time for both active and archive timelines #14261

Are you sure you want to change the base?

feat: [HUDI-9766] Support for show_timeline Procedure with appropriate start and end time for both active and archive timelines #14261

Conversation

PavithranRick commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Verification performed

Documentation Update

Contributor's checklist

Uh oh!

PavithranRick Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nsivabalan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danny0405 Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danny0405 Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

PavithranRick commented Nov 13, 2025 •

edited

Loading

PavithranRick Nov 18, 2025 •

edited

Loading

danny0405 Nov 22, 2025 •

edited

Loading

danny0405 Nov 22, 2025 •

edited

Loading