perf: Fetch table columns once in prepare_table instead of per column#3627
perf: Fetch table columns once in prepare_table instead of per column#3627edgarrmondragon wants to merge 4 commits into
Conversation
Resolves #2353. Previously `prepare_table` made 2N round-trips to the database per schema sync — one in `column_exists` and one in `_get_column_type` for each of N columns. Now `get_table_columns` is called once and the result is threaded through `prepare_column` and `_adapt_column_type` via new optional keyword parameters (`existing_columns` and `current_type` respectively), reducing to a single query per `prepare_table` call. Both new parameters default to `None`, preserving full backward compatibility for subclasses that override these methods. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reviewer's GuideOptimize SQLConnector schema preparation by fetching table column metadata once in prepare_table and threading it through prepare_column/_adapt_column_type via new optional parameters, reducing redundant database round-trips while preserving backward-compatible call signatures and adding focused tests for the new behavior. Sequence diagram for optimized prepare_table column metadata fetchingsequenceDiagram
participant SQLConnector
participant Database
SQLConnector->>Database: get_table_columns(full_table_name)
Database-->>SQLConnector: existing_columns
loop for each property in schema.properties
SQLConnector->>SQLConnector: prepare_column(full_table_name, column_name, sql_type, existing_columns=existing_columns)
alt existing_columns is provided
SQLConnector->>SQLConnector: existing_column = existing_columns.get(column_name)
alt existing_column is None
SQLConnector->>SQLConnector: _create_empty_column(full_table_name, column_name, sql_type)
else existing_column exists
SQLConnector->>SQLConnector: _adapt_column_type(full_table_name, column_name=column_name, sql_type=sql_type, current_type=existing_column.type)
opt current_type is None
SQLConnector->>Database: _get_column_type(full_table_name, column_name)
Database-->>SQLConnector: current_type
end
end
else existing_columns is None
SQLConnector->>Database: get_table_columns(full_table_name)
Database-->>SQLConnector: existing_columns
end
end
File-Level Changes
Assessment against linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Documentation build overview
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3627 +/- ##
=======================================
Coverage 93.86% 93.86%
=======================================
Files 73 73
Lines 5948 5953 +5
Branches 729 731 +2
=======================================
+ Hits 5583 5588 +5
Misses 271 271
Partials 94 94
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
prepare_column, the control flow aroundexisting_columnsis a bit complex and duplicates the_create_empty_column/_adapt_column_typelogic; consider normalizingexisting_columns(e.g., always using a mapping and a single branch) to keep the method simpler and reduce the chance of future divergence between the two paths. - When using
existing_columns[column_name].typeinprepare_column, it might be safer to useexisting_columns.get(column_name)and handle a missing entry explicitly, to avoid surprisingKeyErrors ifexisting_columnsis stale or uses different column-name casing thanschema['properties'].
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `prepare_column`, the control flow around `existing_columns` is a bit complex and duplicates the `_create_empty_column`/`_adapt_column_type` logic; consider normalizing `existing_columns` (e.g., always using a mapping and a single branch) to keep the method simpler and reduce the chance of future divergence between the two paths.
- When using `existing_columns[column_name].type` in `prepare_column`, it might be safer to use `existing_columns.get(column_name)` and handle a missing entry explicitly, to avoid surprising `KeyError`s if `existing_columns` is stale or uses different column-name casing than `schema['properties']`.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Add two tests for the case where prepare_column is called without a pre-fetched existing_columns dict (the legacy call-site path): - column missing → _create_empty_column is called - column present → _adapt_column_type is called, _create_empty_column is not Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ront Address review feedback: - Eliminate the duplicated _create_empty_column/_adapt_column_type logic by resolving existing_columns to a real dict at the top of the method (fetching from the DB only when None), then using a single unified branch. - Use .get() instead of direct dict access so a missing or differently-cased column name never raises a surprising KeyError. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Adding the new keyword-only parameters
existing_columnsandcurrent_typeand then always passing them fromprepare_table/prepare_columnwill raiseTypeErrorfor any external subclass that overridesprepare_columnor_adapt_column_typewithout those keywords; consider making these kwargs optional via**kwargsat the call site or avoiding keyword-only parameters to preserve true backward compatibility. - For
existing_columnsyou may want to type it as aMapping[str, sa.Column]instead of a concretedict[str, sa.Column]to allow more flexible inputs (e.g., ordered dicts or other mapping-like containers) without narrowing the API.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Adding the new keyword-only parameters `existing_columns` and `current_type` and then always passing them from `prepare_table` / `prepare_column` will raise `TypeError` for any external subclass that overrides `prepare_column` or `_adapt_column_type` without those keywords; consider making these kwargs optional via `**kwargs` at the call site or avoiding keyword-only parameters to preserve true backward compatibility.
- For `existing_columns` you may want to type it as a `Mapping[str, sa.Column]` instead of a concrete `dict[str, sa.Column]` to allow more flexible inputs (e.g., ordered dicts or other mapping-like containers) without narrowing the API.
## Individual Comments
### Comment 1
<location path="singer_sdk/sql/connector.py" line_range="1478-1479" />
<code_context>
full_table_name: str | FullyQualifiedName,
column_name: str,
sql_type: sqlalchemy.types.TypeEngine,
+ *,
+ existing_columns: dict[str, sa.Column] | None = None,
) -> None:
"""Adapt target table to provided schema if possible.
</code_context>
<issue_to_address>
**suggestion:** Use a more general type than `dict` for `existing_columns` to keep the API flexible.
Because `prepare_column` only calls `.get()` on `existing_columns`, it doesn’t need a concrete `dict`. Typing this as `Mapping[str, sa.Column] | None` (with `Mapping` from `collections.abc`) better describes the required interface and allows other mapping implementations.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
…lumn` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #2353.
Problem
SQLConnector.prepare_tablepreviously made 2N database round-trips per schema sync, where N is the number of columns:prepare_column→column_exists→get_table_columns(one query per column to check existence)_adapt_column_type→_get_column_type→get_table_columns(another query per column to read the current type before deciding whether to alter it)For a table with 100 columns this means 200 queries just to prepare the schema. This shows up noticeably on high-latency connections.
Solution
prepare_tablenow callsget_table_columnsonce before the column loop and passes the result through the call chain via two new optional keyword-only parameters:prepare_column(..., *, existing_columns: dict[str, sa.Column] | None = None)— when provided, uses the pre-fetched mapping to decide whether the column exists without an extra query._adapt_column_type(..., *, current_type: sqlalchemy.types.TypeEngine | None = None)— when provided, skips the_get_column_typelookup entirely.Both parameters default to
None, so the old per-query behaviour is preserved when calling these methods directly (full backward compatibility — no changes needed in subclasses that only overrideprepare_columnor_adapt_column_type).Migration guide
No action required for most targets. The optimization is applied automatically by
prepare_table.If your target overrides
prepare_columnand performs its own column-existence check that hits the database, you can opt in to the batch result to avoid the redundant queries:Similarly, if you override
_adapt_column_typeand call_get_column_typeinside, you can accept the optionalcurrent_typeparameter to skip the lookup when the caller already has the type:Test plan
test_adapt_column_type— existing test, unmodified, still passes (old call-site withoutcurrent_type)test_adapt_column_type_skips_lookup_when_current_type_provided— new: asserts_get_column_typeis not called whencurrent_typeis suppliedtest_prepare_table_fetches_columns_once— new: assertsget_table_columnsis called exactly once across a multi-columnprepare_tableinvocationtests/sql/suite — 132 passed, no regressions🤖 Generated with Claude Code
Summary by Sourcery
Reduce database round-trips during SQL table preparation by reusing pre-fetched column metadata across column operations.
Enhancements:
Tests: