Skip to content

feat: Support extracting and linting pre/post_operations blocks#44

Open
hiracky16 wants to merge 1 commit intomainfrom
feat/support-operations-blocks
Open

feat: Support extracting and linting pre/post_operations blocks#44
hiracky16 wants to merge 1 commit intomainfrom
feat/support-operations-blocks

Conversation

@hiracky16
Copy link
Copy Markdown
Owner

@hiracky16 hiracky16 commented Mar 21, 2026

Overview

Instead of removing pre_operations and post_operations blocks, the templater now extracts the SQL inside them so they can be linted. Additionally, it automatically appends a semicolon (;) if one is missing, preventing SQLFluff parsing errors when multiple statements are joined.

Changes

  • Modified DataformTemplater.replace_blocks to keep operation blocks.
  • Added DataformTemplater.extract_operation_blocks to handle SQL extraction and semicolon insertion.
  • Updated DataformTemplater.slice_sqlx_template to correctly slice and map these blocks.
  • Updated and added test cases in test/templater_test.py to verify the new behavior.

Verification

  • All tests in test/templater_test.py passed.
  • Verified that operation blocks are now included in the templated SQL output.

Copilot AI review requested due to automatic review settings March 21, 2026 13:44
@hiracky16 hiracky16 self-assigned this Mar 21, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Dataform SQLX templater so pre_operations/post_operations blocks are no longer discarded; instead, their contents are extracted into the templated SQL (with a trailing semicolon added when missing) so they can be linted alongside the main query.

Changes:

  • Stop removing pre_operations/post_operations in replace_blocks() and introduce extract_operation_blocks() to extract their SQL and ensure a terminating ;.
  • Update slice_sqlx_template() to incorporate operation-block extraction into both the final templated SQL and slice mapping.
  • Update/add tests to validate operation-block extraction and updated slicing behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
sqlfluff_templater_dataform/templater.py Keeps operation blocks, extracts their inner content into the templated SQL with semicolon insertion, and updates slice mapping accordingly.
test/templater_test.py Adjusts existing expectations and adds coverage for operation-block extraction and updated slice_sqlx_template() outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 136 to 140
Returns:
The input string with all Dataform blocks removed
"""
block_keywords = ['config', 'pre_operations', 'post_operations', 'js']
block_keywords = ['config', 'js']
for keyword in block_keywords:
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace_blocks() docstring/return description says it removes all Dataform blocks (including pre/post_operations), but the implementation now only removes config and js. Please update the docstring (and any related docs/comments) to reflect that operation blocks are handled separately via extract_operation_blocks() and are no longer removed here.

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +168
inner_sql = in_str[start+1:end-1].strip()
if inner_sql and not inner_sql.endswith(";"):
inner_sql += ";"
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_operation_blocks() uses .strip() on the inner block content, which removes leading indentation/newlines from the extracted SQL. This mutates formatting and can make SQLFluff lint results and source mapping less representative of the original SQLX. Consider preserving leading whitespace (e.g., avoid stripping the left side; use rstrip() only for semicolon detection) and only normalize what’s required to safely append the semicolon.

Suggested change
inner_sql = in_str[start+1:end-1].strip()
if inner_sql and not inner_sql.endswith(";"):
inner_sql += ";"
# Preserve leading whitespace/newlines inside the block.
inner_block = in_str[start + 1 : end - 1]
# Use rstrip() only for semicolon detection and insertion.
content_no_trailing = inner_block.rstrip()
# Determine if there is any non-whitespace content.
has_content = bool(content_no_trailing.strip())
if has_content and not content_no_trailing.endswith(";"):
content_no_trailing += ";"
# Reattach the original trailing whitespace so formatting is preserved.
trailing_ws = inner_block[len(content_no_trailing.rstrip()):] if has_content else ""
inner_sql = (content_no_trailing + trailing_ws) if has_content else ""

Copilot uses AI. Check for mistakes.
Comment on lines +557 to +563
if next_match_type == 'templated' and re.match(r'^(pre_operations|post_operations)\s*\{', match_raw):
brace_start = match_raw.find('{')
inner_sql = match_raw[brace_start+1:-1].strip()
if inner_sql and not inner_sql.endswith(";"):
inner_sql += ";"

op_replaced = inner_sql
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semicolon insertion + operation-block extraction logic is duplicated here and in extract_operation_blocks(). This increases the risk that future changes (e.g., handling trailing comments/whitespace when appending ;) diverge between the two paths. Consider factoring the shared logic into a single helper used by both locations so replaced_sql and slice mapping stay consistent.

Copilot uses AI. Check for mistakes.
Comment thread test/templater_test.py
TO "group:allusers@example.com", "user:otheruser@example.com";
} SELECT * FROM my_table"""

expected_sql = '\nCREATE TEMP FUNCTION AddFourAndDivide(x INT64, y INT64)\n RETURNS FLOAT64\n AS ((x + 4) / y);\n \nGRANT `roles/bigquery.dataViewer`\n ON\n TABLE ${self()}\n TO "group:allusers@example.com", "user:otheruser@example.com";\n SELECT * FROM my_table'
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected_sql in this test is a long, heavily escaped single-quoted string, which is difficult to read and maintain. Consider expressing it as a triple-quoted string (optionally with textwrap.dedent) so the expected output structure/whitespace is clear and future changes are less error-prone.

Suggested change
expected_sql = '\nCREATE TEMP FUNCTION AddFourAndDivide(x INT64, y INT64)\n RETURNS FLOAT64\n AS ((x + 4) / y);\n \nGRANT `roles/bigquery.dataViewer`\n ON\n TABLE ${self()}\n TO "group:allusers@example.com", "user:otheruser@example.com";\n SELECT * FROM my_table'
expected_sql = (
'\nCREATE TEMP FUNCTION AddFourAndDivide(x INT64, y INT64)\n'
' RETURNS FLOAT64\n'
' AS ((x + 4) / y);\n'
' \n'
'GRANT `roles/bigquery.dataViewer`\n'
' ON\n'
' TABLE ${self()}\n'
' TO "group:allusers@example.com", "user:otheruser@example.com";\n'
' SELECT * FROM my_table'
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants