feat: Support extracting and linting pre/post_operations blocks#44
feat: Support extracting and linting pre/post_operations blocks#44
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the Dataform SQLX templater so pre_operations/post_operations blocks are no longer discarded; instead, their contents are extracted into the templated SQL (with a trailing semicolon added when missing) so they can be linted alongside the main query.
Changes:
- Stop removing
pre_operations/post_operationsinreplace_blocks()and introduceextract_operation_blocks()to extract their SQL and ensure a terminating;. - Update
slice_sqlx_template()to incorporate operation-block extraction into both the final templated SQL and slice mapping. - Update/add tests to validate operation-block extraction and updated slicing behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
sqlfluff_templater_dataform/templater.py |
Keeps operation blocks, extracts their inner content into the templated SQL with semicolon insertion, and updates slice mapping accordingly. |
test/templater_test.py |
Adjusts existing expectations and adds coverage for operation-block extraction and updated slice_sqlx_template() outputs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Returns: | ||
| The input string with all Dataform blocks removed | ||
| """ | ||
| block_keywords = ['config', 'pre_operations', 'post_operations', 'js'] | ||
| block_keywords = ['config', 'js'] | ||
| for keyword in block_keywords: |
There was a problem hiding this comment.
replace_blocks() docstring/return description says it removes all Dataform blocks (including pre/post_operations), but the implementation now only removes config and js. Please update the docstring (and any related docs/comments) to reflect that operation blocks are handled separately via extract_operation_blocks() and are no longer removed here.
| inner_sql = in_str[start+1:end-1].strip() | ||
| if inner_sql and not inner_sql.endswith(";"): | ||
| inner_sql += ";" |
There was a problem hiding this comment.
extract_operation_blocks() uses .strip() on the inner block content, which removes leading indentation/newlines from the extracted SQL. This mutates formatting and can make SQLFluff lint results and source mapping less representative of the original SQLX. Consider preserving leading whitespace (e.g., avoid stripping the left side; use rstrip() only for semicolon detection) and only normalize what’s required to safely append the semicolon.
| inner_sql = in_str[start+1:end-1].strip() | |
| if inner_sql and not inner_sql.endswith(";"): | |
| inner_sql += ";" | |
| # Preserve leading whitespace/newlines inside the block. | |
| inner_block = in_str[start + 1 : end - 1] | |
| # Use rstrip() only for semicolon detection and insertion. | |
| content_no_trailing = inner_block.rstrip() | |
| # Determine if there is any non-whitespace content. | |
| has_content = bool(content_no_trailing.strip()) | |
| if has_content and not content_no_trailing.endswith(";"): | |
| content_no_trailing += ";" | |
| # Reattach the original trailing whitespace so formatting is preserved. | |
| trailing_ws = inner_block[len(content_no_trailing.rstrip()):] if has_content else "" | |
| inner_sql = (content_no_trailing + trailing_ws) if has_content else "" |
| if next_match_type == 'templated' and re.match(r'^(pre_operations|post_operations)\s*\{', match_raw): | ||
| brace_start = match_raw.find('{') | ||
| inner_sql = match_raw[brace_start+1:-1].strip() | ||
| if inner_sql and not inner_sql.endswith(";"): | ||
| inner_sql += ";" | ||
|
|
||
| op_replaced = inner_sql |
There was a problem hiding this comment.
Semicolon insertion + operation-block extraction logic is duplicated here and in extract_operation_blocks(). This increases the risk that future changes (e.g., handling trailing comments/whitespace when appending ;) diverge between the two paths. Consider factoring the shared logic into a single helper used by both locations so replaced_sql and slice mapping stay consistent.
| TO "group:allusers@example.com", "user:otheruser@example.com"; | ||
| } SELECT * FROM my_table""" | ||
|
|
||
| expected_sql = '\nCREATE TEMP FUNCTION AddFourAndDivide(x INT64, y INT64)\n RETURNS FLOAT64\n AS ((x + 4) / y);\n \nGRANT `roles/bigquery.dataViewer`\n ON\n TABLE ${self()}\n TO "group:allusers@example.com", "user:otheruser@example.com";\n SELECT * FROM my_table' |
There was a problem hiding this comment.
expected_sql in this test is a long, heavily escaped single-quoted string, which is difficult to read and maintain. Consider expressing it as a triple-quoted string (optionally with textwrap.dedent) so the expected output structure/whitespace is clear and future changes are less error-prone.
| expected_sql = '\nCREATE TEMP FUNCTION AddFourAndDivide(x INT64, y INT64)\n RETURNS FLOAT64\n AS ((x + 4) / y);\n \nGRANT `roles/bigquery.dataViewer`\n ON\n TABLE ${self()}\n TO "group:allusers@example.com", "user:otheruser@example.com";\n SELECT * FROM my_table' | |
| expected_sql = ( | |
| '\nCREATE TEMP FUNCTION AddFourAndDivide(x INT64, y INT64)\n' | |
| ' RETURNS FLOAT64\n' | |
| ' AS ((x + 4) / y);\n' | |
| ' \n' | |
| 'GRANT `roles/bigquery.dataViewer`\n' | |
| ' ON\n' | |
| ' TABLE ${self()}\n' | |
| ' TO "group:allusers@example.com", "user:otheruser@example.com";\n' | |
| ' SELECT * FROM my_table' | |
| ) |
Overview
Instead of removing
pre_operationsandpost_operationsblocks, the templater now extracts the SQL inside them so they can be linted. Additionally, it automatically appends a semicolon (;) if one is missing, preventing SQLFluff parsing errors when multiple statements are joined.Changes
DataformTemplater.replace_blocksto keep operation blocks.DataformTemplater.extract_operation_blocksto handle SQL extraction and semicolon insertion.DataformTemplater.slice_sqlx_templateto correctly slice and map these blocks.test/templater_test.pyto verify the new behavior.Verification
test/templater_test.pypassed.