Skip to content

Inconsistent internal parser state #53

Open
@pithub

Description

@pithub

This issue describes a bug in Elm.Kernel.Parser.findSubString.


Note: the following issues describe symptoms of this bug:

In the same way, the following pull request tries to fix the symptoms:


The Elm Parser internally keeps track of the current position in two ways:

  • as a row and a column (like a code editor)
  • as an offset into the source string.

Normally both kinds of position infos (row and column vs. offset) are in sync with each other.
(For a given source string, you can calculate both row and column from the offset and vice versa.)

The bug in Elm.Kernel.Parser.findSubString breaks this synchronicity, though.
This affects the following parsers:

  • lineComment
  • multiComment
  • chompUntil
  • chompUntilEndOr

They set...

  • row and column after the (closing) token
  • the offset before the (closing) token

Here's an example with chompUntil:

import Parser exposing ((|.), (|=), Parser)

testParser : Parser { row : Int, col : Int, offset : Int }
testParser =
    Parser.succeed (\row col offset -> { row = row, col = col, offset = offset })
        |. Parser.chompUntil "token"
        |= Parser.getRow
        |= Parser.getCol
        |= Parser.getOffset

Parser.run testParser "< token >"
--> Ok { row = 1, col = 8, offset = 2 }

The state after the test parser is run:

  • row = 1, col = 8 (corresponding to offset = 7) --> after the token
  • offset = 2 (corresponding to row = 1, col = 3) --> before the token

The root cause for these bugs lies in the Elm.Kernel.Parser.findSubString function:

var _Parser_findSubString = F5(function(smallString, offset, row, col, bigString)
{
var newOffset = bigString.indexOf(smallString, offset);
var target = newOffset < 0 ? bigString.length : newOffset + smallString.length;
while (offset < target)
{
var code = bigString.charCodeAt(offset++);
code === 0x000A /* \n */
? ( col=1, row++ )
: ( col++, (code & 0xF800) === 0xD800 && offset++ )
}
return __Utils_Tuple3(newOffset, row, col);
});

If the smallString is found, the returned newOffset is at the position before the smallString (the result of the indexOf function), but the new row and col after the smallString (at the target position).


Note: the following pull request tries to fix the comment of the Elm.Kernel.Parser.findSubString function
to correctly describe the buggy behavior:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions