Skip to content

Apparently wrong position of RawStmt in multi-statements script #287

@lelit

Description

@lelit

Hi,

I am investigating on an issue reported on pglast, where the stmt_location of the RawStmt are not consistent, when feeding a multi-statements SQL to pg_query_parse() function. I could not figure either a reason, or a workaround.

A little detail: pglast uses the location/stmt_location only when it asked to preserve the comments in the statement. In that case, it first collects all the comments using pg_query_scan(), and then it does its best to emit them as soon as it reprints (deparse, in libpg_query parlance) each node, roughly comparing the node position with the one of the collected comments.

Consider the reported sample SQL:

-- First comment
SELECT * FROM first_table;

-- Second comment
SELECT * FROM second_table;

Adding it to examples/scan.c, I get the following:

version: 170000, tokens: 12, size: 114
"-- First comment" = [ 0, 16, SQL_COMMENT, NO_KEYWORD ]
"SELECT" = [ 17, 23, SELECT, RESERVED_KEYWORD ]
"*" = [ 24, 25, ASCII_42, NO_KEYWORD ]
"FROM" = [ 26, 30, FROM, RESERVED_KEYWORD ]
"first_table" = [ 31, 42, IDENT, NO_KEYWORD ]
";" = [ 42, 43, ASCII_59, NO_KEYWORD ]
"-- Second comment" = [ 45, 62, SQL_COMMENT, NO_KEYWORD ]
"SELECT" = [ 63, 69, SELECT, RESERVED_KEYWORD ]
"*" = [ 70, 71, ASCII_42, NO_KEYWORD ]
"FROM" = [ 72, 76, FROM, RESERVED_KEYWORD ]
"second_table" = [ 77, 89, IDENT, NO_KEYWORD ]
";" = [ 89, 90, ASCII_59, NO_KEYWORD ]

Notice the position of the second SELECT: 63.

Using pg_query_parse() I get this (reformatted, for clarity):

{
  "version": 170000,
  "stmts": [
    {
      "stmt": {
        "SelectStmt": {
          "targetList": [
            {
              "ResTarget": {
                "val": {
                  "ColumnRef": {
                    "fields": [
                      {
                        "A_Star": {}
                      }
                    ],
                    "location": 24
                  }
                },
                "location": 24
              }
            }
          ],
          "fromClause": [
            {
              "RangeVar": {
                "relname": "first_table",
                "inh": true,
                "relpersistence": "p",
                "location": 31
              }
            }
          ],
          "limitOption": "LIMIT_OPTION_DEFAULT",
          "op": "SETOP_NONE"
        }
      },
      "stmt_len": 42
    },
    {
      "stmt": {
        "SelectStmt": {
          "targetList": [
            {
              "ResTarget": {
                "val": {
                  "ColumnRef": {
                    "fields": [
                      {
                        "A_Star": {}
                      }
                    ],
                    "location": 70
                  }
                },
                "location": 70
              }
            }
          ],
          "fromClause": [
            {
              "RangeVar": {
                "relname": "second_table",
                "inh": true,
                "relpersistence": "p",
                "location": 77
              }
            }
          ],
          "limitOption": "LIMIT_OPTION_DEFAULT",
          "op": "SETOP_NONE"
        }
      },
      "stmt_location": 43,
      "stmt_len": 46
    }
  ]
}

As you can see, the second RawStmt is reported to start at 43, just after the semicolon ending the first SELECT, that is clearly wrong, while all other sub-nodes location are correct.

Thanks in advance for any clue!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions