Inconsistent internal parser state

This issue describes a bug in `Elm.Kernel.Parser.findSubString`.

---

**Note:** the following issues describe _symptoms_ of this bug:

- #2
- #20
- #46

In the same way, the following pull request tries to fix the _symptoms_:

- #21

---

The Elm Parser internally keeps track of the current position in two ways:

- as a row and a column (like a code editor)
- as an offset into the source string.

Normally both kinds of position infos (row and column vs. offset) are in sync with each other.
(For a given source string, you can calculate both row and column from the offset and vice versa.)

The bug in `Elm.Kernel.Parser.findSubString` breaks this synchronicity, though.
This affects the following parsers:

- `lineComment`
- `multiComment`
- `chompUntil`
- `chompUntilEndOr`

They set...

- row and column **after** the (closing) token
- the offset **before** the (closing) token

Here's an example with `chompUntil`:

```elm
import Parser exposing ((|.), (|=), Parser)

testParser : Parser { row : Int, col : Int, offset : Int }
testParser =
    Parser.succeed (\row col offset -> { row = row, col = col, offset = offset })
        |. Parser.chompUntil "token"
        |= Parser.getRow
        |= Parser.getCol
        |= Parser.getOffset

Parser.run testParser "< token >"
--> Ok { row = 1, col = 8, offset = 2 }
```

The state after the test parser is run:

- row = 1, col = 8 (corresponding to offset = 7) --> **after** the token
- offset = 2 (corresponding to row = 1, col = 3) --> **before** the token

---

The root cause for these bugs lies in the `Elm.Kernel.Parser.findSubString` function:
https://github.com/elm/parser/blob/02839df10e462d8423c91917271f4b6f8d2f284d/src/Elm/Kernel/Parser.js#L120-L134

If the `smallString` is found, the returned `newOffset` is at the position **before** the smallString (the result of the `indexOf` function), but the new `row` and `col` **after** the smallString (at the `target` position).

---

**Note:** the following pull request tries to fix the comment of the `Elm.Kernel.Parser.findSubString` function
to correctly describe the buggy behavior:

- #37


	var _Parser_findSubString = F5(function(smallString, offset, row, col, bigString)
	{
	var newOffset = bigString.indexOf(smallString, offset);
	var target = newOffset < 0 ? bigString.length : newOffset + smallString.length;

	while (offset < target)
	{
	var code = bigString.charCodeAt(offset++);
	code === 0x000A /* \n */
	? ( col=1, row++ )
	: ( col++, (code & 0xF800) === 0xD800 && offset++ )
	}

	return __Utils_Tuple3(newOffset, row, col);
	});

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent internal parser state #53

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent internal parser state #53

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions