Releases · apache/hudi

Release 0.4.0

Highlights

Spark datasource API now supported for Copy-On-Write datasets, across all views
BloomIndex can now prune based on key ranges & cut down index tagging time dramatically, for time-prefixed/ordered record keys
Hive sync tool registers RO and RT tables now.
Client application can now specify the partitioner to be used by bulkInsert(), useful for low-level control over initial record placement
Framework for metadata tracking inside IO handles, to implement Spark accumulator-style counters, that are consistent with the timeline
Bug fixes around cleaning, savepoints & upsert's partitioner.

Full PR List

@gekath - Writes relative paths to .commit files #184
@kaushikd49 - Correct clean bug that causes exception when partitionPaths are empty #202
@vinothchandar - Refactor HoodieTableFileSystemView using FileGroups & FileSlices #201
@prazanna - Savepoint should not create a hole in the commit timeline #207
@jianxu - Fix TimestampBasedKeyGenerator in HoodieDeltaStreamer when DATE_STRING is used #211
@n3nash - Sync Tool registers 2 tables, RO and RT Tables #210
@n3nash - Using FsUtils instead of Files API to extract file extension #213
@vinothchandar - Edits to documentation #219
@n3nash - Enabled deletes in merge_on_read #218
@n3nash - Use HoodieLogFormat for the commit archived log #205
@n3nash - fix for cleaning log files in master branch (mor) #228
@vinothchandar - Adding range based pruning to bloom index #232
@n3nash - Use CompletedFileSystemView instead of CompactedView considering deltacommits too #229
@n3nash - suppressing logs (under 4MB) for jenkins #240
@jianxu - Add nested fields support for MOR tables #234
@n3nash - adding new config to separate shuffle and write parallelism #230
@n3nash - adding ability to read archived files written in log format #252
@ovj - Removing randomization from UpsertPartitioner #253
@ovj - Replacing SortBy with custom partitioner #245
@esmioley - Update deprecated hash function #259
@vinothchandar - Adding canIndexLogFiles(), isImplicitWithStorage(), isGlobal() to HoodieIndex #268
@kaushikd49 - Hoodie Event callbacks #251
@vinothchandar - Spark Data Source (finally) #266

Highlights

Merge on Read tested end to end. Ingestion - Hive Registration - Querying non-nested fields
Contributions from @kaushikd49 @n3nash @dannyhchen @zqureshi @vinothchandar and @prazanna

New Features

#149 Introduce custom log format (HoodieLogFormat) for the log files
#141 Introduce Compaction Strategies for Merge on Read table and implement UnboundedCompactionStrategy and IOBoundedCompactionStrategy
#42 Implement HoodieRealtimeInputFormat and HoodieRealtimeRecordReader
#150 Rewrite hoodie-hive to incrementally sync partitions based on the last commit that was sucessfully synced

Changes

#168 - Handle skew in time taken to clean
Updated community committership guidelines
Add GCS support
Add S3 support
Support for viewFS

Commits: 21e334...4b26be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Release 0.4.0

Highlights

Full PR List

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

New Features

Changes

Uh oh!

Releases: apache/hudi

hoodie-0.4.0

Release 0.4.0

Highlights

Full PR List

Uh oh!

hoodie-0.3.8 (MOR) MVP

Highlights

New Features

Changes

Uh oh!