-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[spanner-to-sourcedb] DELETEs: Error fetching shard id column #3024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[spanner-to-sourcedb] DELETEs: Error fetching shard id column #3024
Conversation
Summary of ChangesHello @aasthabharill, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug related to the handling of null values when marshalling data from Google Cloud Spanner. The core change introduces a safeguard to ensure that null column values are properly processed and reflected in the transformed data, thereby improving the robustness and reliability of the data pipeline. Accompanying these changes are comprehensive unit tests designed to validate the new null-handling logic across a diverse set of data types, ensuring the fix is thorough and prevents regressions. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3024 +/- ##
============================================
+ Coverage 50.03% 50.25% +0.22%
- Complexity 4974 5022 +48
============================================
Files 967 967
Lines 59363 59267 -96
Branches 6455 6459 +4
============================================
+ Hits 29700 29787 +87
+ Misses 27554 27376 -178
+ Partials 2109 2104 -5
🚀 New features to boost your workflow:
|
...ourcedb/src/main/java/com/google/cloud/teleport/v2/templates/transforms/AssignShardIdFn.java
Show resolved
Hide resolved
bharadwaj-aditya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
VardhanThigle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thanks for the fix.
b/458068271
b/458070941
Problem:
Whenever a row has column with NULL value, the reverse replication fails with the following error messages:
Error fetching shard Id column: Illegal call to getter of null value"error_message":"No shard identified for the record"Fix:
marshalSpannerValues calls getter functions to get value of the column, but these function throw a NullPointerException when the value is NULL. This is handled incorrectly and causes the above error. So the fix is to catch a NULL value at the beginning of the function itself.
Tests:
Dataflow job with container built on fixed code.
Template Container built in gs://ea-functional-tests/templates/flex/Spanner_to_SourceDb
Ran dataflow job with above container.
1. double datatype
SQL Query: column skill of type double is kept NULL, column variance of type double is non-NULL
2. timestamp datatype
SQL Query:
3. Reserved Keywords
SQL Query: column content of type varbinary is kept NULL
4. INTERLEAVING
5. Regression testing: Custom sharding logic