-
Notifications
You must be signed in to change notification settings - Fork 190
Delta Kernel Draft PR #729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
xtable-core/src/main/java/org/apache/xtable/delta/DeltaKernelConversionSourceProvider.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/delta/DeltaKernelSchemaExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/delta/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/delta/DeltaSchemaExtractor.java
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelConversionSource.java
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/delta/DeltaKernelDataFileExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/delta/DeltaKernelDataFileExtractor.java
Outdated
Show resolved
Hide resolved
vinishjail97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great progress @vaibhavk1992, added some comments.
xtable-core/src/main/java/org/apache/xtable/delta/DeltaKernelActionsConverter.java
Outdated
Show resolved
Hide resolved
xtable-core/src/test/java/org/apache/xtable/kernel/ITDeltaKernelConversionSource.java
Show resolved
Hide resolved
|
@the-other-tim-brown @vinishjail97 @rahil-c All the comments have been addressed and the build is passing too. |
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelActionsConverter.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelPartitionExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/test/java/org/apache/xtable/kernel/TestDeltaKernelStatsExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/test/java/org/apache/xtable/DeltaTableKernel.java
Outdated
Show resolved
Hide resolved
xtable-core/src/test/java/org/apache/xtable/kernel/ITDeltaKernelConversionSource.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelPartitionExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/test/java/org/apache/xtable/kernel/TestDeltaKernelPartitionExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/test/java/org/apache/xtable/kernel/TestDeltaKernelSchemaExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
| myScan.getScanFiles(engine, includeColumnStats); | ||
|
|
||
| List<InternalDataFile> dataFiles = new ArrayList<>(); | ||
| while (scanFiles.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is eagerly pulling all the files into memory instead of using the iterator pattern. Can the code be updated to iterate through these values when next is called instead of eagerly materializing them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@the-other-tim-brown
the scanFiles.hasNext() Returns true if the iteration has more elements. (In other words, returns true if next would return an element rather than throwing an exception.). I am not sure what do you mean by materializing here and what we have to do exactly.
I have followed this convention from the suggestion by the doc form kernel community. Please check this.
https://github.com/delta-io/delta/blob/master/kernel/USER_GUIDE.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that the DeltaDataFileIterator is an iterator so we don't need to compute all the values in the constructor. We can instead compute the next value when next is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the code, please check if that make sense. @the-other-tim-brown
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelIncrementalChangesState.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelIncrementalChangesState.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelIncrementalChangesState.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelPartitionExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelConversionSource.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
|
@vaibhavk1992 make sure to fill out the PR template |
Important Read
What is the purpose of the pull request
This draft introduces the Delta Kernel to read the tables instead of using spark sessions.
Brief change log
Verify this pull request
This pull request is a trivial rework , added unit tests, integration test to verify all the edge cases
-Unit Tests
-TestDeltaKernelSchemaExtractor
-TestDeltaKernelPartitionExtractor
-TestDeltaKernelStatsExtractor
Tested statistics extraction
Validated range and value statistics
Tested null count handling
Added integration tests for end-to-end.