perf: Fetch table columns once in prepare_table instead of per column by edgarrmondragon · Pull Request #3627 · meltano/sdk

edgarrmondragon · 2026-05-12T17:18:24Z

Closes #2353.

Problem

SQLConnector.prepare_table previously made 2N database round-trips per schema sync, where N is the number of columns:

prepare_column → column_exists → get_table_columns (one query per column to check existence)
_adapt_column_type → _get_column_type → get_table_columns (another query per column to read the current type before deciding whether to alter it)

For a table with 100 columns this means 200 queries just to prepare the schema. This shows up noticeably on high-latency connections.

Solution

prepare_table now calls get_table_columns once before the column loop and passes the result through the call chain via two new optional keyword-only parameters:

prepare_column(..., *, existing_columns: dict[str, sa.Column] | None = None) — when provided, uses the pre-fetched mapping to decide whether the column exists without an extra query.
_adapt_column_type(..., *, current_type: sqlalchemy.types.TypeEngine | None = None) — when provided, skips the _get_column_type lookup entirely.

Both parameters default to None, so the old per-query behaviour is preserved when calling these methods directly (full backward compatibility — no changes needed in subclasses that only override prepare_column or _adapt_column_type).

Migration guide

No action required for most targets. The optimization is applied automatically by prepare_table.

If your target overrides prepare_column and performs its own column-existence check that hits the database, you can opt in to the batch result to avoid the redundant queries:

# Before — still works, but makes one DB query per column
class MyConnector(SQLConnector):
    def prepare_column(self, full_table_name, column_name, sql_type):
        if not self.column_exists(full_table_name, column_name):
            self._create_empty_column(full_table_name, column_name, sql_type)
            return
        self._adapt_column_type(full_table_name, column_name=column_name, sql_type=sql_type)

# After — accepts the pre-fetched columns dict to avoid the extra query
class MyConnector(SQLConnector):
    def prepare_column(
        self,
        full_table_name,
        column_name,
        sql_type,
        *,
        existing_columns: dict | None = None,
    ):
        if existing_columns is None:
            existing_columns = self.get_table_columns(full_table_name)
        if column_name not in existing_columns:
            self._create_empty_column(full_table_name, column_name, sql_type)
            return
        self._adapt_column_type(
            full_table_name,
            column_name=column_name,
            sql_type=sql_type,
            current_type=existing_columns[column_name].type,
        )

Similarly, if you override _adapt_column_type and call _get_column_type inside, you can accept the optional current_type parameter to skip the lookup when the caller already has the type:

class MyConnector(SQLConnector):
    def _adapt_column_type(
        self,
        full_table_name,
        column_name,
        sql_type,
        *,
        current_type=None,
    ):
        if current_type is None:
            current_type = self._get_column_type(full_table_name, column_name)
        # ... rest of your logic

Test plan

test_adapt_column_type — existing test, unmodified, still passes (old call-site without current_type)
test_adapt_column_type_skips_lookup_when_current_type_provided — new: asserts _get_column_type is not called when current_type is supplied
test_prepare_table_fetches_columns_once — new: asserts get_table_columns is called exactly once across a multi-column prepare_table invocation
Full tests/sql/ suite — 132 passed, no regressions

🤖 Generated with Claude Code

Summary by Sourcery

Reduce database round-trips during SQL table preparation by reusing pre-fetched column metadata across column operations.

Enhancements:

Prefetch table columns once in prepare_table and reuse them across prepare_column calls to avoid redundant existence checks.
Extend prepare_column and _adapt_column_type to accept optional existing_columns and current_type parameters while preserving backward-compatible behavior.

Tests:

Add tests to verify _adapt_column_type skips type lookups when current_type is provided and that prepare_table fetches columns only once.
Add tests to ensure prepare_column creates missing columns and adapts existing ones correctly under the new behavior.

Resolves #2353. Previously `prepare_table` made 2N round-trips to the database per schema sync — one in `column_exists` and one in `_get_column_type` for each of N columns. Now `get_table_columns` is called once and the result is threaded through `prepare_column` and `_adapt_column_type` via new optional keyword parameters (`existing_columns` and `current_type` respectively), reducing to a single query per `prepare_table` call. Both new parameters default to `None`, preserving full backward compatibility for subclasses that override these methods. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sourcery-ai · 2026-05-12T17:18:32Z

Reviewer's Guide

Optimize SQLConnector schema preparation by fetching table column metadata once in prepare_table and threading it through prepare_column/_adapt_column_type via new optional parameters, reducing redundant database round-trips while preserving backward-compatible call signatures and adding focused tests for the new behavior.

Sequence diagram for optimized prepare_table column metadata fetching

sequenceDiagram
    participant SQLConnector
    participant Database

    SQLConnector->>Database: get_table_columns(full_table_name)
    Database-->>SQLConnector: existing_columns

    loop for each property in schema.properties
        SQLConnector->>SQLConnector: prepare_column(full_table_name, column_name, sql_type, existing_columns=existing_columns)
        alt existing_columns is provided
            SQLConnector->>SQLConnector: existing_column = existing_columns.get(column_name)
            alt existing_column is None
                SQLConnector->>SQLConnector: _create_empty_column(full_table_name, column_name, sql_type)
            else existing_column exists
                SQLConnector->>SQLConnector: _adapt_column_type(full_table_name, column_name=column_name, sql_type=sql_type, current_type=existing_column.type)
                opt current_type is None
                    SQLConnector->>Database: _get_column_type(full_table_name, column_name)
                    Database-->>SQLConnector: current_type
                end
            end
        else existing_columns is None
            SQLConnector->>Database: get_table_columns(full_table_name)
            Database-->>SQLConnector: existing_columns
        end
    end

File-Level Changes

Change	Details	Files
Fetch table column metadata once in prepare_table and reuse it across all column preparations.	Call get_table_columns once at the start of prepare_table before iterating over schema properties. Pass the fetched columns mapping into each prepare_column invocation to avoid per-column metadata lookups.	`singer_sdk/sql/connector.py`
Extend prepare_column to accept an optional existing_columns mapping and use it to decide between column creation and type adaptation without extra queries.	Add keyword-only existing_columns parameter to prepare_column with a default of None to keep the public API backward compatible. When existing_columns is None, lazily fetch table columns via get_table_columns; otherwise reuse the provided mapping. Determine column existence via existing_columns.get and call _create_empty_column for missing columns or _adapt_column_type for existing ones, passing the existing column's type as current_type.	`singer_sdk/sql/connector.py`
Allow _adapt_column_type to accept an optional current_type to skip redundant type lookups.	Add keyword-only current_type parameter to _adapt_column_type with a default of None. If current_type is not provided, fall back to calling _get_column_type as before, preserving behavior for existing callers. Thread current_type from prepare_column when the existing column metadata is available.	`singer_sdk/sql/connector.py`
Add tests covering the new optional parameters and ensuring only a single metadata fetch per prepare_table call.	Add test_adapt_column_type_skips_lookup_when_current_type_provided to assert _get_column_type is not called when current_type is passed. Add test_prepare_table_fetches_columns_once to assert get_table_columns is called exactly once during multi-column prepare_table. Add tests ensuring prepare_column creates missing columns and adapts existing ones, validating behavior with the new code path.	`tests/sql/test_connector.py`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#2353	Optimize SQLConnector.prepare_table so that it fetches table column metadata once and reuses it for all columns, reducing database round-trips from O(n) per schema to O(1).	✅
#2353	Preserve existing behaviour and backwards compatibility of prepare_column and related methods while enabling the batched column existence/type checks for all SQLConnector-based targets.	✅

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

read-the-docs-community · 2026-05-12T17:19:32Z

Documentation build overview

📚 Meltano SDK | 🛠️ Build #32659943 | 📁 Comparing 59283c6 against latest (8373aec)

🔍 Preview build

1 file changed

± classes/singer_sdk.sql.SQLConnector.html

codecov · 2026-05-12T17:20:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.86%. Comparing base (8373aec) to head (59283c6).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3627   +/-   ##
=======================================
  Coverage   93.86%   93.86%           
=======================================
  Files          73       73           
  Lines        5948     5953    +5     
  Branches      729      731    +2     
=======================================
+ Hits         5583     5588    +5     
  Misses        271      271           
  Partials       94       94

Flag	Coverage Δ
core	`82.36% <100.00%> (-0.01%)`	⬇️
end-to-end	`75.20% <71.42%> (-0.07%)`	⬇️
optional-components	`42.76% <0.00%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codspeed-hq · 2026-05-12T17:21:18Z

Merging this PR will not alter performance

✅ 8 untouched benchmarks

_{Comparing perf/prepare-table-batch-column-fetch (59283c6) with main (fa54c4c)¹}

No successful run was found on main (8373aec) during the generation of this report, so fa54c4c was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

sourcery-ai

Hey - I've left some high level feedback:

In prepare_column, the control flow around existing_columns is a bit complex and duplicates the _create_empty_column/_adapt_column_type logic; consider normalizing existing_columns (e.g., always using a mapping and a single branch) to keep the method simpler and reduce the chance of future divergence between the two paths.
When using existing_columns[column_name].type in prepare_column, it might be safer to use existing_columns.get(column_name) and handle a missing entry explicitly, to avoid surprising KeyErrors if existing_columns is stale or uses different column-name casing than schema['properties'].

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `prepare_column`, the control flow around `existing_columns` is a bit complex and duplicates the `_create_empty_column`/`_adapt_column_type` logic; consider normalizing `existing_columns` (e.g., always using a mapping and a single branch) to keep the method simpler and reduce the chance of future divergence between the two paths.
- When using `existing_columns[column_name].type` in `prepare_column`, it might be safer to use `existing_columns.get(column_name)` and handle a missing entry explicitly, to avoid surprising `KeyError`s if `existing_columns` is stale or uses different column-name casing than `schema['properties']`.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

Add two tests for the case where prepare_column is called without a pre-fetched existing_columns dict (the legacy call-site path): - column missing → _create_empty_column is called - column present → _adapt_column_type is called, _create_empty_column is not Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ront Address review feedback: - Eliminate the duplicated _create_empty_column/_adapt_column_type logic by resolving existing_columns to a real dict at the top of the method (fetching from the DB only when None), then using a single unified branch. - Use .get() instead of direct dict access so a missing or differently-cased column name never raises a surprising KeyError. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

edgarrmondragon · 2026-05-12T17:27:58Z

@sourcery-ai review

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

Adding the new keyword-only parameters existing_columns and current_type and then always passing them from prepare_table / prepare_column will raise TypeError for any external subclass that overrides prepare_column or _adapt_column_type without those keywords; consider making these kwargs optional via **kwargs at the call site or avoiding keyword-only parameters to preserve true backward compatibility.
For existing_columns you may want to type it as a Mapping[str, sa.Column] instead of a concrete dict[str, sa.Column] to allow more flexible inputs (e.g., ordered dicts or other mapping-like containers) without narrowing the API.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Adding the new keyword-only parameters `existing_columns` and `current_type` and then always passing them from `prepare_table` / `prepare_column` will raise `TypeError` for any external subclass that overrides `prepare_column` or `_adapt_column_type` without those keywords; consider making these kwargs optional via `**kwargs` at the call site or avoiding keyword-only parameters to preserve true backward compatibility.
- For `existing_columns` you may want to type it as a `Mapping[str, sa.Column]` instead of a concrete `dict[str, sa.Column]` to allow more flexible inputs (e.g., ordered dicts or other mapping-like containers) without narrowing the API.

## Individual Comments

### Comment 1
<location path="singer_sdk/sql/connector.py" line_range="1478-1479" />
<code_context>
         full_table_name: str | FullyQualifiedName,
         column_name: str,
         sql_type: sqlalchemy.types.TypeEngine,
+        *,
+        existing_columns: dict[str, sa.Column] | None = None,
     ) -> None:
         """Adapt target table to provided schema if possible.
</code_context>
<issue_to_address>
**suggestion:** Use a more general type than `dict` for `existing_columns` to keep the API flexible.

Because `prepare_column` only calls `.get()` on `existing_columns`, it doesn’t need a concrete `dict`. Typing this as `Mapping[str, sa.Column] | None` (with `Mapping` from `collections.abc`) better describes the required interface and allows other mapping implementations.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

…lumn` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sourcery-ai Bot reviewed May 12, 2026

View reviewed changes

edgarrmondragon and others added 2 commits May 12, 2026 11:23

sourcery-ai Bot reviewed May 12, 2026

View reviewed changes

Comment thread singer_sdk/sql/connector.py Outdated

refactor: Use Mapping type and non-keyword-only param for `prepare_co…

59283c6

…lumn` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Fetch table columns once in prepare_table instead of per column#3627

perf: Fetch table columns once in prepare_table instead of per column#3627
edgarrmondragon wants to merge 4 commits into
mainfrom
perf/prepare-table-batch-column-fetch

edgarrmondragon commented May 12, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented May 12, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

read-the-docs-community Bot commented May 12, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 12, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented May 12, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

edgarrmondragon commented May 12, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

edgarrmondragon commented May 12, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Migration guide

Test plan

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for optimized prepare_table column metadata fetching

File-Level Changes

Assessment against linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

read-the-docs-community Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

edgarrmondragon commented May 12, 2026

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

edgarrmondragon commented May 12, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented May 12, 2026 •

edited

Loading

read-the-docs-community Bot commented May 12, 2026 •

edited

Loading

codecov Bot commented May 12, 2026 •

edited

Loading

codspeed-hq Bot commented May 12, 2026 •

edited

Loading