Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

an infinite loop [token == 0 "" L (1, n)-(1, n-1)] #22

Closed
xaionaro opened this issue Feb 26, 2018 · 10 comments
Closed

an infinite loop [token == 0 "" L (1, n)-(1, n-1)] #22

xaionaro opened this issue Feb 26, 2018 · 10 comments

Comments

@xaionaro
Copy link

xaionaro commented Feb 26, 2018

This test-case will cause an infinite loop. Of course the code is incorrect anyway (character ":" would cause an error). But maybe lexmachine shouldn't go to an infinite loop and should report an error.

package main

import (
        "fmt"
        "github.com/timtadh/lexmachine"
        lexmachines "github.com/timtadh/lexmachine/machines"
)


func newLexer() *lexmachine.Lexer {
        tokens := []string{
                "VALUE",
        }
        tokenIds := map[string]int{}
        for i, tok := range tokens {
                tokenIds[tok] = i
        }
        lex := lexmachine.NewLexer()

        skip := func(*lexmachine.Scanner, *lexmachines.Match) (interface{}, error) {
                return nil, nil
        }
        token := func(name string) lexmachine.Action {
                return func(s *lexmachine.Scanner, m *lexmachines.Match) (interface{}, error) {
                        return s.Token(tokenIds[name], string(m.Bytes), m), nil
                }
        }

        lex.Add([]byte(`([a-z]|[A-Z]|[0-9]|_|\-|\.|=)*`), token("VALUE"))
        lex.Add([]byte("[\n \t]"), skip)

        err := lex.Compile()
        if err != nil {
                panic(err)
        }

        return lex
}

func main() {
        lex := newLexer()
        scanner, err := lex.Scanner([]byte("line1:\nline2"))
        if err != nil {
                panic(err)
        }

        for tok, err, eof := scanner.Next(); !eof; tok, err, eof = scanner.Next() {
                fmt.Println("token ==", tok)
                fmt.Println("err ==", err)
        }
}
token == 0 "line1" 0 (1, 1)-(1, 5)
err == <nil>
token == 0 "" 5 (1, 6)-(1, 5)
err == <nil>
token == 0 "" 5 (1, 6)-(1, 5)
err == <nil>
token == 0 "" 5 (1, 6)-(1, 5)
err == <nil>
token == 0 "" 5 (1, 6)-(1, 5)
err == <nil>
token == 0 "" 5 (1, 6)-(1, 5)
err == <nil>
token == 0 "" 5 (1, 6)-(1, 5)
err == <nil>
token == 0 "" 5 (1, 6)-(1, 5)
err == <nil>
...
@xaionaro xaionaro changed the title an infinite loop an infinite loop [token == 0 "" L (1, n)-(1, n-1)] Feb 26, 2018
@ty2
Copy link

ty2 commented Feb 26, 2018

I had run your code. It should fixed on #21 cd38be4

@xaionaro
Copy link
Author

Yep, there's no infinite loop, now. However the behavior seems to be wrong:

$ ./test
token == 0 "line1" 0 (1, 1)-(1, 5)
err == <nil>
token == 0 "" 5 (1, 6)-(1, 5)
err == <nil>
token == 0 "line2" 7 (2, 1)-(2, 5)
err == <nil>

There should be an error, IMHO.

Expected behavior:

token == 0 "line1" 0 (1, 1)-(1, 5)
err == <nil>
token == <nil>
err == Lexer error: could not match text starting at 1:6 failing at 2:0.
        unmatched text: ":"

@ty2
Copy link

ty2 commented Feb 26, 2018

It should work if replace * to + that matches at least one character in the group

use this regex:
([a-z]|[A-Z]|[0-9]|_|\-|\.|=)+

@xaionaro
Copy link
Author

xaionaro commented Feb 26, 2018

Yes, it will work if I replace the character. However it should report an error <unmatcher text ":"> with "*" too, IMHO. There's no expression for ":", how this character was skipped?

token == 0 "line1" 0 (1, 1)-(1, 5)
err == <nil>
token == 0 "" 5 (1, 6)-(1, 5)
err == <nil>
token == 0 "line2" 7 (2, 1)-(2, 5)
err == <nil>

@ty2
Copy link

ty2 commented Feb 26, 2018

@xaionaro you are right.

Also, if input string is start from non matched char, it will panic.

func main() {
        lex := newLexer()
        scanner, err := lex.Scanner([]byte("@line1:\nline2"))
        if err != nil {
                panic(err)
        }

        for tok, err, eof := scanner.Next(); !eof; tok, err, eof = scanner.Next() {
                fmt.Println("token ==", tok)
                fmt.Println("err ==", err)
        }
}
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/ty2-exp/gdlparser/vendor/github.com/timtadh/lexmachine/machines.DFALexerEngine.func1(0x0, 0x4, 0x4, 0xc4200e2000, 0x4, 0x4)
        /Users/terry/go/src/github.com/ty2-exp/gdlparser/vendor/github.com/timtadh/lexmachine/machines/dfa_machine.go:65 +0x955
github.com/ty2-exp/gdlparser/vendor/github.com/timtadh/lexmachine.(*Scanner).Next(0xc420093320, 0xc42003df48, 0x4, 0x20, 0xc420093320, 0x0)
        /Users/terry/go/src/github.com/ty2-exp/gdlparser/vendor/github.com/timtadh/lexmachine/lexer.go:146 +0x50
github.com/ty2-exp/gdlparser.(*Lexer).Scan(0xc4200dd120, 0xc42003df48, 0x4, 0x20, 0x4, 0x20, 0x118e0e0, 0xc420076300, 0xc42003df50)
        /Users/terry/go/src/github.com/ty2-exp/gdlparser/lexer.go:252 +0x90
main.main()
        /Users/terry/go/src/github.com/ty2-exp/gdlparser/example/main.go:7 +0x7a
exit status 2

@timtadh
Copy link
Owner

timtadh commented Feb 26, 2018

@xaionaro ok I am going to take a look at this. I am still concerned about the correct behavior when matching the empty string. If you have an unmatchable character (like :) in your example but you also have a token when matches the empty string then no progress can be made. The empty string will be matched over and over again. On the other hand, I can ensure progress by always incrementing the tc (text counter) after a match. This prevents an infinite loop at the cost of potentially skipping characters which it should have returned an error on.

A third option is to disallow at compilation matching the empty string. What do you think?

@xaionaro
Copy link
Author

xaionaro commented Feb 26, 2018

I'm as an user of this library would be happier if matching of empty strings will be forbidden (the third option). It will help me to write better code (on my side).

Sorry for my English :)

@timtadh
Copy link
Owner

timtadh commented Feb 26, 2018

@xaionaro that seems reasonable. I will make a separate PR for that (and change the current one to not preserve progress).

@timtadh
Copy link
Owner

timtadh commented Feb 27, 2018

fixed by #23

@timtadh timtadh closed this as completed Feb 27, 2018
@xaionaro
Copy link
Author

Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants