-
Notifications
You must be signed in to change notification settings - Fork 58
Description
Context
The resample refactor (#428, commit d7e2446) removed the _ResampledTSDF class and instead stored resample metadata (resample_freq, resample_func) as mutable attributes on the regular TSDF class. This creates a corruption vector: any TSDF operation that chains after resample() propagates the metadata blindly through __withTransformedDF(), meaning operations like .filter(), .union(), or .withColumn() can silently invalidate the resample context while still carrying the metadata forward to .interpolate().
A proposal for a proper ResampledTSDF intermediate object was written (commit 437e3b1) but never implemented. This issue tracks restoring that pattern.
The Problem
Current state on v0.2-integration:
# TSDF.__init__() accepts resample metadata (tsdf.py:127-128)
resample_freq: Optional[str] = None,
resample_func: Optional[Union[Callable, str]] = None,
# resample() returns a regular TSDF with metadata attached (resample.py:452-457)
return TSDF(
enriched_df,
...
resample_freq=freq,
resample_func=func,
)
# __withTransformedDF() blindly propagates metadata to all derived TSDFs (tsdf.py:163-164)
resample_freq=self.resample_freq,
resample_func=self.resample_func,This means the following produces silently wrong results:
# Metadata propagates through filter — interpolate trusts stale metadata
tsdf.resample(freq="min", func="mean").filter(...).interpolate(method="linear")Proposed Solution
Introduce a ResampledTSDF class that acts as a restricted intermediate object, following the same pattern as Apache Spark's GroupedData:
| Spark Pattern | Tempo Pattern |
|---|---|
df.groupBy("key") → GroupedData |
tsdf.resample(freq, func) → ResampledTSDF |
GroupedData.agg(...) → DataFrame |
ResampledTSDF.interpolate(...) → TSDF |
GroupedData.filter(...) → AttributeError |
ResampledTSDF.filter(...) → AttributeError |
Key Changes
- Create
ResampledTSDFclass — restricted wrapper exposing only valid post-resample operations (interpolate(),as_tsdf(),show()) - Update
TSDF.resample()— returnResampledTSDFinstead ofTSDF - Remove
resample_freq/resample_funcfromTSDF.__init__()— metadata lives only onResampledTSDF, never onTSDF - Remove metadata propagation from
__withTransformedDF()— no more stale state
Valid Usage
# Chain resample → interpolate (primary use case)
result = tsdf.resample(freq="min", func="mean").interpolate(method="linear")
# Get resampled data without interpolation
resampled = tsdf.resample(freq="min", func="mean").as_tsdf()
# Inspect before interpolating
resampled = tsdf.resample(freq="min", func="mean")
resampled.show()
result = resampled.interpolate(method="linear")Invalid Usage (Now Prevented)
# AttributeError — operations not available on ResampledTSDF
tsdf.resample(freq="min", func="mean").filter(...)
tsdf.resample(freq="min", func="mean").withColumn(...)
# If you need those operations, finalize first (explicit opt-out of safety)
tsdf.resample(freq="min", func="mean").as_tsdf().filter(...)Why This Matters
- Prevents silent data corruption — invalid operation chains fail loudly instead of producing wrong results
- Type safety — IDE autocompletion only shows valid operations after
resample() - Self-documenting — the class name and restricted API indicate the expected workflow
- Precedent — this is exactly how Spark handles
GroupedDataand for the same reasons
Git History Reference
| Commit | Description |
|---|---|
437e3b1 |
Proposal document for ResampledTSDF intermediate object pattern |
d7e2446 |
Resample refactor (#428) — removed _ResampledTSDF, added metadata attrs to TSDF |
ec4fe38 |
Original refactor that removed _ResampledTSDF class |
Implementation Checklist
- Create
ResampledTSDFclass (intempo/resampled.pyortempo/resample.py) - Update
TSDF.resample()return type toResampledTSDF - Remove
resample_freq/resample_funcfromTSDF.__init__()and__withTransformedDF() - Update
TSDF.interpolate()to require explicitfreq/funcargs (called internally byResampledTSDF.interpolate()) - Add tests for
ResampledTSDF(valid chains, invalid chains,as_tsdf()escape hatch) - Update existing resample/interpolation tests
- Update documentation and migration guide
Related
- Proposal doc: commit
437e3b1