Skip to content

perf: Fetch table columns once in prepare_table instead of per column#3627

Open
edgarrmondragon wants to merge 4 commits into
mainfrom
perf/prepare-table-batch-column-fetch
Open

perf: Fetch table columns once in prepare_table instead of per column#3627
edgarrmondragon wants to merge 4 commits into
mainfrom
perf/prepare-table-batch-column-fetch

Conversation

@edgarrmondragon

@edgarrmondragon edgarrmondragon commented May 12, 2026

Copy link
Copy Markdown
Collaborator

Closes #2353.

Problem

SQLConnector.prepare_table previously made 2N database round-trips per schema sync, where N is the number of columns:

  1. prepare_columncolumn_existsget_table_columns (one query per column to check existence)
  2. _adapt_column_type_get_column_typeget_table_columns (another query per column to read the current type before deciding whether to alter it)

For a table with 100 columns this means 200 queries just to prepare the schema. This shows up noticeably on high-latency connections.

Solution

prepare_table now calls get_table_columns once before the column loop and passes the result through the call chain via two new optional keyword-only parameters:

  • prepare_column(..., *, existing_columns: dict[str, sa.Column] | None = None) — when provided, uses the pre-fetched mapping to decide whether the column exists without an extra query.
  • _adapt_column_type(..., *, current_type: sqlalchemy.types.TypeEngine | None = None) — when provided, skips the _get_column_type lookup entirely.

Both parameters default to None, so the old per-query behaviour is preserved when calling these methods directly (full backward compatibility — no changes needed in subclasses that only override prepare_column or _adapt_column_type).

Migration guide

No action required for most targets. The optimization is applied automatically by prepare_table.

If your target overrides prepare_column and performs its own column-existence check that hits the database, you can opt in to the batch result to avoid the redundant queries:

# Before — still works, but makes one DB query per column
class MyConnector(SQLConnector):
    def prepare_column(self, full_table_name, column_name, sql_type):
        if not self.column_exists(full_table_name, column_name):
            self._create_empty_column(full_table_name, column_name, sql_type)
            return
        self._adapt_column_type(full_table_name, column_name=column_name, sql_type=sql_type)

# After — accepts the pre-fetched columns dict to avoid the extra query
class MyConnector(SQLConnector):
    def prepare_column(
        self,
        full_table_name,
        column_name,
        sql_type,
        *,
        existing_columns: dict | None = None,
    ):
        if existing_columns is None:
            existing_columns = self.get_table_columns(full_table_name)
        if column_name not in existing_columns:
            self._create_empty_column(full_table_name, column_name, sql_type)
            return
        self._adapt_column_type(
            full_table_name,
            column_name=column_name,
            sql_type=sql_type,
            current_type=existing_columns[column_name].type,
        )

Similarly, if you override _adapt_column_type and call _get_column_type inside, you can accept the optional current_type parameter to skip the lookup when the caller already has the type:

class MyConnector(SQLConnector):
    def _adapt_column_type(
        self,
        full_table_name,
        column_name,
        sql_type,
        *,
        current_type=None,
    ):
        if current_type is None:
            current_type = self._get_column_type(full_table_name, column_name)
        # ... rest of your logic

Test plan

  • test_adapt_column_type — existing test, unmodified, still passes (old call-site without current_type)
  • test_adapt_column_type_skips_lookup_when_current_type_provided — new: asserts _get_column_type is not called when current_type is supplied
  • test_prepare_table_fetches_columns_once — new: asserts get_table_columns is called exactly once across a multi-column prepare_table invocation
  • Full tests/sql/ suite — 132 passed, no regressions

🤖 Generated with Claude Code

Summary by Sourcery

Reduce database round-trips during SQL table preparation by reusing pre-fetched column metadata across column operations.

Enhancements:

  • Prefetch table columns once in prepare_table and reuse them across prepare_column calls to avoid redundant existence checks.
  • Extend prepare_column and _adapt_column_type to accept optional existing_columns and current_type parameters while preserving backward-compatible behavior.

Tests:

  • Add tests to verify _adapt_column_type skips type lookups when current_type is provided and that prepare_table fetches columns only once.
  • Add tests to ensure prepare_column creates missing columns and adapts existing ones correctly under the new behavior.

Resolves #2353. Previously `prepare_table` made 2N round-trips to the
database per schema sync — one in `column_exists` and one in
`_get_column_type` for each of N columns. Now `get_table_columns` is
called once and the result is threaded through `prepare_column` and
`_adapt_column_type` via new optional keyword parameters
(`existing_columns` and `current_type` respectively), reducing to a
single query per `prepare_table` call.

Both new parameters default to `None`, preserving full backward
compatibility for subclasses that override these methods.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sourcery-ai

sourcery-ai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Optimize SQLConnector schema preparation by fetching table column metadata once in prepare_table and threading it through prepare_column/_adapt_column_type via new optional parameters, reducing redundant database round-trips while preserving backward-compatible call signatures and adding focused tests for the new behavior.

Sequence diagram for optimized prepare_table column metadata fetching

sequenceDiagram
    participant SQLConnector
    participant Database

    SQLConnector->>Database: get_table_columns(full_table_name)
    Database-->>SQLConnector: existing_columns

    loop for each property in schema.properties
        SQLConnector->>SQLConnector: prepare_column(full_table_name, column_name, sql_type, existing_columns=existing_columns)
        alt existing_columns is provided
            SQLConnector->>SQLConnector: existing_column = existing_columns.get(column_name)
            alt existing_column is None
                SQLConnector->>SQLConnector: _create_empty_column(full_table_name, column_name, sql_type)
            else existing_column exists
                SQLConnector->>SQLConnector: _adapt_column_type(full_table_name, column_name=column_name, sql_type=sql_type, current_type=existing_column.type)
                opt current_type is None
                    SQLConnector->>Database: _get_column_type(full_table_name, column_name)
                    Database-->>SQLConnector: current_type
                end
            end
        else existing_columns is None
            SQLConnector->>Database: get_table_columns(full_table_name)
            Database-->>SQLConnector: existing_columns
        end
    end
Loading

File-Level Changes

Change Details Files
Fetch table column metadata once in prepare_table and reuse it across all column preparations.
  • Call get_table_columns once at the start of prepare_table before iterating over schema properties.
  • Pass the fetched columns mapping into each prepare_column invocation to avoid per-column metadata lookups.
singer_sdk/sql/connector.py
Extend prepare_column to accept an optional existing_columns mapping and use it to decide between column creation and type adaptation without extra queries.
  • Add keyword-only existing_columns parameter to prepare_column with a default of None to keep the public API backward compatible.
  • When existing_columns is None, lazily fetch table columns via get_table_columns; otherwise reuse the provided mapping.
  • Determine column existence via existing_columns.get and call _create_empty_column for missing columns or _adapt_column_type for existing ones, passing the existing column's type as current_type.
singer_sdk/sql/connector.py
Allow _adapt_column_type to accept an optional current_type to skip redundant type lookups.
  • Add keyword-only current_type parameter to _adapt_column_type with a default of None.
  • If current_type is not provided, fall back to calling _get_column_type as before, preserving behavior for existing callers.
  • Thread current_type from prepare_column when the existing column metadata is available.
singer_sdk/sql/connector.py
Add tests covering the new optional parameters and ensuring only a single metadata fetch per prepare_table call.
  • Add test_adapt_column_type_skips_lookup_when_current_type_provided to assert _get_column_type is not called when current_type is passed.
  • Add test_prepare_table_fetches_columns_once to assert get_table_columns is called exactly once during multi-column prepare_table.
  • Add tests ensuring prepare_column creates missing columns and adapts existing ones, validating behavior with the new code path.
tests/sql/test_connector.py

Assessment against linked issues

Issue Objective Addressed Explanation
#2353 Optimize SQLConnector.prepare_table so that it fetches table column metadata once and reuses it for all columns, reducing database round-trips from O(n) per schema to O(1).
#2353 Preserve existing behaviour and backwards compatibility of prepare_column and related methods while enabling the batched column existence/type checks for all SQLConnector-based targets.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@read-the-docs-community

read-the-docs-community Bot commented May 12, 2026

Copy link
Copy Markdown

Documentation build overview

📚 Meltano SDK | 🛠️ Build #32659943 | 📁 Comparing 59283c6 against latest (8373aec)

  🔍 Preview build  

1 file changed
± classes/singer_sdk.sql.SQLConnector.html

@codecov

codecov Bot commented May 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.86%. Comparing base (8373aec) to head (59283c6).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3627   +/-   ##
=======================================
  Coverage   93.86%   93.86%           
=======================================
  Files          73       73           
  Lines        5948     5953    +5     
  Branches      729      731    +2     
=======================================
+ Hits         5583     5588    +5     
  Misses        271      271           
  Partials       94       94           
Flag Coverage Δ
core 82.36% <100.00%> (-0.01%) ⬇️
end-to-end 75.20% <71.42%> (-0.07%) ⬇️
optional-components 42.76% <0.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq

codspeed-hq Bot commented May 12, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 8 untouched benchmarks


Comparing perf/prepare-table-batch-column-fetch (59283c6) with main (fa54c4c)1

Open in CodSpeed

Footnotes

  1. No successful run was found on main (8373aec) during the generation of this report, so fa54c4c was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In prepare_column, the control flow around existing_columns is a bit complex and duplicates the _create_empty_column/_adapt_column_type logic; consider normalizing existing_columns (e.g., always using a mapping and a single branch) to keep the method simpler and reduce the chance of future divergence between the two paths.
  • When using existing_columns[column_name].type in prepare_column, it might be safer to use existing_columns.get(column_name) and handle a missing entry explicitly, to avoid surprising KeyErrors if existing_columns is stale or uses different column-name casing than schema['properties'].
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `prepare_column`, the control flow around `existing_columns` is a bit complex and duplicates the `_create_empty_column`/`_adapt_column_type` logic; consider normalizing `existing_columns` (e.g., always using a mapping and a single branch) to keep the method simpler and reduce the chance of future divergence between the two paths.
- When using `existing_columns[column_name].type` in `prepare_column`, it might be safer to use `existing_columns.get(column_name)` and handle a missing entry explicitly, to avoid surprising `KeyError`s if `existing_columns` is stale or uses different column-name casing than `schema['properties']`.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

edgarrmondragon and others added 2 commits May 12, 2026 11:23
Add two tests for the case where prepare_column is called without a
pre-fetched existing_columns dict (the legacy call-site path):
- column missing → _create_empty_column is called
- column present → _adapt_column_type is called, _create_empty_column is not

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ront

Address review feedback:
- Eliminate the duplicated _create_empty_column/_adapt_column_type logic
  by resolving existing_columns to a real dict at the top of the method
  (fetching from the DB only when None), then using a single unified branch.
- Use .get() instead of direct dict access so a missing or differently-cased
  column name never raises a surprising KeyError.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@edgarrmondragon

Copy link
Copy Markdown
Collaborator Author

@sourcery-ai review

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Adding the new keyword-only parameters existing_columns and current_type and then always passing them from prepare_table / prepare_column will raise TypeError for any external subclass that overrides prepare_column or _adapt_column_type without those keywords; consider making these kwargs optional via **kwargs at the call site or avoiding keyword-only parameters to preserve true backward compatibility.
  • For existing_columns you may want to type it as a Mapping[str, sa.Column] instead of a concrete dict[str, sa.Column] to allow more flexible inputs (e.g., ordered dicts or other mapping-like containers) without narrowing the API.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Adding the new keyword-only parameters `existing_columns` and `current_type` and then always passing them from `prepare_table` / `prepare_column` will raise `TypeError` for any external subclass that overrides `prepare_column` or `_adapt_column_type` without those keywords; consider making these kwargs optional via `**kwargs` at the call site or avoiding keyword-only parameters to preserve true backward compatibility.
- For `existing_columns` you may want to type it as a `Mapping[str, sa.Column]` instead of a concrete `dict[str, sa.Column]` to allow more flexible inputs (e.g., ordered dicts or other mapping-like containers) without narrowing the API.

## Individual Comments

### Comment 1
<location path="singer_sdk/sql/connector.py" line_range="1478-1479" />
<code_context>
         full_table_name: str | FullyQualifiedName,
         column_name: str,
         sql_type: sqlalchemy.types.TypeEngine,
+        *,
+        existing_columns: dict[str, sa.Column] | None = None,
     ) -> None:
         """Adapt target table to provided schema if possible.
</code_context>
<issue_to_address>
**suggestion:** Use a more general type than `dict` for `existing_columns` to keep the API flexible.

Because `prepare_column` only calls `.get()` on `existing_columns`, it doesn’t need a concrete `dict`. Typing this as `Mapping[str, sa.Column] | None` (with `Mapping` from `collections.abc`) better describes the required interface and allows other mapping implementations.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread singer_sdk/sql/connector.py Outdated
…lumn`

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: prepare_table() makes O(1) calls to prepare_column(), instead of O(n) (n number of columns)

1 participant