Skip to content

Conversation

@aasthabharill
Copy link
Member

@aasthabharill aasthabharill commented Nov 25, 2025

b/458068271
b/458070941

Problem:

Whenever a row has column with NULL value, the reverse replication fails with the following error messages:

  1. AssignShardId step in dataflow: Error fetching shard Id column: Illegal call to getter of null value
  2. DLQ entry: "error_message":"No shard identified for the record"

Fix:

marshalSpannerValues calls getter functions to get value of the column, but these function throw a NullPointerException when the value is NULL. This is handled incorrectly and causes the above error. So the fix is to catch a NULL value at the beginning of the function itself.

Tests:

  1. Unit test updated - failed without the fix with the expected error and passed with it.
  2. Integration test updated to include NULL values in the row.

Dataflow job with container built on fixed code.

Template Container built in gs://ea-functional-tests/templates/flex/Spanner_to_SourceDb

Ran dataflow job with above container.

1. double datatype

SQL Query: column skill of type double is kept NULL, column variance of type double is non-NULL

INSERT INTO ut_scl_squad (ddrKey, gameSpaceId, ownerPersId, teammateIndex, teammatePersId, `variance`, lastUpdateTime) VALUES (3002, 17613, 172377253, 202, 1269841562, 1234.56, 2720290009);
DELETE FROM ut_scl_squad WHERE ddrKey = 3002 AND gameSpaceId = 17613 AND ownerPersId = 172377253;

2. timestamp datatype

SQL Query:

  • column createdTime of type timestamp is kept non-NULL, column squadName of type varchar is NULL
INSERT INTO ut_showoff (ddrkey, showoffId, userId, createdTime, `count`) VALUES (2002, 2002, -285444577, TIMESTAMP('2025-01-01T12:00:00Z'), 2444035295);
DELETE FROM ut_showoff WHERE ddrkey = 2002 AND showoffId = 2002;
  • column createdTime of type timestamp is kept NULL
INSERT INTO ut_showoff (ddrkey, showoffId, userId) VALUES (202, 202, -28577);
DELETE FROM ut_showoff WHERE ddrkey = 202 AND showoffId = 202;

3. Reserved Keywords

SQL Query: column content of type varbinary is kept NULL

INSERT INTO sedges (ddrkey, created, `from`, `to`, `type`, `value`, flags) VALUES (2001, 2001, 'nodeA', 'nodeB', 'edgeType1', 'edgeValue1', -1326272220);
DELETE FROM sedges WHERE ddrkey = 2001 AND created = 2001;

4. INTERLEAVING

  • INTERLEAVE IN PARENT ON DELETE CASCADE: on deleting parent on spanner, both parent and child records from sql were also deleted.
  • INTERLEAVE IN: on deleting parent on spanner, the parent row was also deleted from sql.

5. Regression testing: Custom sharding logic

  • Flow when theres no migration_shard_id column and pipeline uses ddrkey value to compute shard is not affected by this change.

@gemini-code-assist
Copy link

Summary of Changes

Hello @aasthabharill, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug related to the handling of null values when marshalling data from Google Cloud Spanner. The core change introduces a safeguard to ensure that null column values are properly processed and reflected in the transformed data, thereby improving the robustness and reliability of the data pipeline. Accompanying these changes are comprehensive unit tests designed to validate the new null-handling logic across a diverse set of data types, ensuring the fix is thorough and prevents regressions.

Highlights

  • Null Value Handling: Implemented a explicit check within the marshalSpannerValues method to correctly handle null values originating from Spanner columns, ensuring they are represented as null in the output JSON and preventing potential NullPointerException issues.
  • Expanded Test Coverage: Significantly enhanced unit test coverage for AssignShardIdFn by introducing new mock scenarios and DDL definitions to thoroughly test the handling of null values across various Spanner data types, including strings, integers, bytes, floats, booleans, and timestamps.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link

codecov bot commented Nov 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 50.25%. Comparing base (9ad5345) to head (734dd1f).
⚠️ Report is 53 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3024      +/-   ##
============================================
+ Coverage     50.03%   50.25%   +0.22%     
- Complexity     4974     5022      +48     
============================================
  Files           967      967              
  Lines         59363    59267      -96     
  Branches       6455     6459       +4     
============================================
+ Hits          29700    29787      +87     
+ Misses        27554    27376     -178     
+ Partials       2109     2104       -5     
Components Coverage Δ
spanner-templates 70.44% <100.00%> (+0.05%) ⬆️
spanner-import-export 68.99% <ø> (-0.08%) ⬇️
spanner-live-forward-migration 79.69% <ø> (ø)
spanner-live-reverse-replication 77.07% <100.00%> (-0.03%) ⬇️
spanner-bulk-migration 88.33% <ø> (+0.09%) ⬆️
Files with missing lines Coverage Δ
...eport/v2/templates/transforms/AssignShardIdFn.java 79.23% <100.00%> (+0.63%) ⬆️

... and 25 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aasthabharill aasthabharill changed the title initial changes [spanner-to-sourcedb] DELETEs: Error fetching shard id column Nov 25, 2025
Copy link
Contributor

@bharadwaj-aditya bharadwaj-aditya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@VardhanThigle VardhanThigle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks for the fix.

@aasthabharill aasthabharill merged commit efb574a into GoogleCloudPlatform:main Nov 28, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants