Two don't cares ('-') adjacent in a literal is parsed as a comment #529

ktbarrett · 2019-08-14T15:41:01Z

The remove_comments(code) function is overly simplistic. It matches any two adjacent - characters until the end of the line. '-' is also used as a "don't care" value for std_logic. If there is an std_logic_vector literal where there are two don't care values in a row (e.g. '----'), it is parsed as a comment and the rest of the line is removed, causing the parse to fail.

vunit/vunit/vhdl_parser.py

Lines 991 to 1007 in 12be973

    
           VHDL_REMOVE_COMMENT_RE = re.compile(r'--[^\n]*') 
        
           def _comment_repl(match): 
        
               """ 
        
               Replace comment with equal amount of whitespace to make 
        
               lexical position unaffected 
        
               """ 
        
               text = match.group(0) 
        
               return " " * len(text) 
        
           def remove_comments(code): 
        
               """ 
        
               Return the code with comments removed 
        
               """ 
        
               return VHDL_REMOVE_COMMENT_RE.sub(_comment_repl, code)

The text was updated successfully, but these errors were encountered:

ktbarrett · 2019-08-14T16:07:51Z

I am fairly certain that you can remove comments if they are the only thing on the line. There are no such thing as multiline string literals in VHDL, so there is no way for -- to appear on a line without any code and not be a comment. Outside of that, whether a -- is a start of a comment or not depends upon the parse of the code previous.

ktbarrett · 2019-08-14T16:10:23Z

Also, I'm uncertain as to why you are replacing the comment with spaces. No code can follow a comment, replacing it with whitespace won't affect character index in error reporting because no error could exist after a comment starts.

LarsAsplund · 2019-08-15T05:16:00Z

The purpose of our parser is to solve the small subset of the full parsing task we need to run VUnit. For example finding design unit and use statements so that we can figure out dependencies. Do you have a use case where this breaks due to don't care assignments or are you thinking about parsing more generally?

I recognize that there may be such use cases but the question is whether or not they motivate extra parsing efforts or should be solved with a workaround. A common use case we address by removing comments is when people comment out part of their code (not using block comments).

Parsing in general is can be tricky when using a simplistic regexp approach

-- dc := "----"; -- Assign "----" to dc.

ktbarrett · 2019-08-15T17:40:10Z

The parser fails because of syntax errors introduced by removing the rest of the line. For example, if a closing or opening paren is removed it will cause a failure to parse exception: ValueError: Failed to find closing delimiter to ( in {code}

dc := SomeFunc("----");

After comment removal...

dc := SomeFunc("

I can't post any of the failing code. And I know I could modify the source to get it to not fail, but that's simply a hack. I want to use vunit for determining compile order, but these failures make it impossible.

go2sh · 2019-08-18T13:27:35Z

It is because of the simple comment regex. It doesn't take into account string literals. Any kind of string literals, not only bit string literals.

I did some test and found a solution here:
^(?:(?:[^\"\-]*\"(?:\"\"|[^\"])*\"[^\"\-]*)+(--.*)|[^-\"]*(--.*))$
It needs the multiline flag. But it is quite expensive compared to the current version.

ktbarrett · 2019-08-19T15:09:00Z

It may be more complicated and take more time, but it's probably the cheapest fix.

ktbarrett · 2019-08-19T16:41:38Z

I have a slightly more optimal and straightforward regex that works on the example given. Although maybe you know more about potential pitfalls than me.

patt = re.compile('^([^\"\-]*(?:\"[^\"]*\"[^\"\-]*)*)--.*')
def remove_comments(match):
    return match.group(1)
patt.sub(remove_comments, code)

LarsAsplund · 2019-08-21T13:01:21Z

@ktbarrett Have you done some measurements on the performance hit?

go2sh · 2019-08-21T14:33:47Z

My test with regex tester show ruffly 6x more steps then the simple version. But I do not think it matters in absolute terms.

kraigher · 2019-09-12T17:22:05Z

The "real" way of solving the parsing problem would be to do lexing/tokenization of the input. But as Lars mentioned, making a correct parser is much more work than making a heuristic parser that is only supposed to be a private implementation detail of a dependency scanner. I have actually written a full VHDL parser in Rust (https://github.com/kraigher/rust_hdl/) FYI

eine · 2019-09-13T01:19:29Z

As a possible future enhancement, does it make sense to pluggable/selectable VHDL parsers? Mimicking what we do with simulator interfaces, it would allow to have default_parser_interface, rusthdl_parser_interface, pyvhdl_parser_interface, ghdl_parser_interface, etc. I think that this might be a cleaner approach than trying to 'fix' the regex based implementation as complexities arise. @kraigher, since you know both VUnit and rust_hdl, do you think this is just a matter of having time to do it? Or are there some features which might need to be added to rust_hdl's API?

kraigher · 2019-09-14T06:29:25Z

First of all I do not think VUnit should have a parser as a publicly visible component. What VUnit needs is a dependency scanner which might have a parser as a private implementation detail. In my opinion the vhdl parser of VUnit already does to much by parsing generic and port lists. Ironically if it did not parse port or generic list it might never have been sensitive to the problem of this issue since the don't care literals probably only appear as initialization values in such lists.

One of the reasons I wrote the VHDL parser of rust_hdl is that I do not believe it is a good idea to write a parser in Python. This is due to:

Lack of performance
Lack of static type system and pattern matching (Makes AST traversal unmaintainable)

Thus if VUnit ever needed a full parser it would use the parser from rust_hdl. However the question is if it would be worth it. The current method works well enough and introducing a platform dependent binary to VUnit would complicate the build & delivery process. Maybe it can be tolerated since most normal users are found on only two platforms and advanced users on other platforms can be expected to build the binary themselves.

I am not so keen on supporting pluggable parsers as you can only assume the least common denominator of functionality and it is twice the code to maintain. I would rather throw away the old and in with the new.

eine · 2019-09-14T06:45:03Z

The current method works well enough and introducing a platform dependent binary to VUnit would complicate the build & delivery process.

This was my main motivation to suggest a pluggable system with, at least, the current (old) implementation and the new (rust_hdl). Users that want to keep installation of VUnit simple and multi-platform could use the old implementation (which can be a default fallback if rust_hdl is not available).

Maybe it can be tolerated since most normal users are found on only two platforms and advanced users on other platforms can be expected to build the binary themselves.

I use VUnit on four platforms: GNU/Linux (amd64, aarch64 and armv7) and Win10 (amd64). This is not a great deal (2 vs 4), and I do have docker images to automate the builds. Nevertheless, it is kind of painful if most users don't have any problem with the regex solution.

I am not so keen on supporting pluggable parsers as you can only assume the least common denominator of functionality and it is twice the code to maintain. I would rather throw away the old and in with the new.

What about the following middle ground?

Do not support pluggable parsers, but two only: the old one and rust_hdl.
Add a note explaining that the old one is no longer maintained and that it is suggested to use rust_hdl. A kind of deprecation warning without explicit deprecation date.
In the following months/years, when some heavy modification to the old parser is required to keep it up-do-date, remove it.

kraigher · 2019-09-14T06:52:38Z

It is just that I cannot think of any important aspect that would be better for the users of VUnit by using the rust hdl parser for vunit dependency scanning at the moment. The current method even if it is primitive and not formally correct works well. If anything I would like to strip away parsing of elements that do not contribute to dependency scanning.

Thus there is really not much which would motivate the work effort to integrate rust hdl. There would need to be a lot of code written to integrate the rust parser and actually use it for dependency scanning.

eine · 2019-09-14T07:02:37Z

If anything I would like to strip away parsing of elements that do not contribute to dependency scanning.

What are those elements used for?

ktbarrett changed the title ~~Two don't cares ('-') in a literal is parsed as a comment~~ Two don't cares ('-') adjacent in a literal is parsed as a comment Aug 14, 2019

ktbarrett mentioned this issue Aug 21, 2019

Improve remove_comments function in the VHDL parser #530

Closed

eine mentioned this issue Oct 26, 2019

Compile Order Issues #577

Open

eine added the Parsing label Oct 26, 2019

eine mentioned this issue Jan 9, 2020

Add keywords and tokens line attributes jeremiah-c-leary/vhdl-style-guide#312

Closed

ktbarrett mentioned this issue Jan 23, 2020

fix parsing adjacent hyphens in a literal #616

Merged

eine closed this as completed in 53f681c Jan 26, 2020

eine mentioned this issue Jan 29, 2020

add_manual_dependency has no effect #617

Open

eine mentioned this issue Jul 20, 2020

Integration with current VHDL VSCode extensions jeremiah-c-leary/vhdl-style-guide#328

Open

umarcor mentioned this issue Jul 30, 2021

Installation and Use VHDL/VHDLDomain#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two don't cares ('-') adjacent in a literal is parsed as a comment #529

Two don't cares ('-') adjacent in a literal is parsed as a comment #529

ktbarrett commented Aug 14, 2019

ktbarrett commented Aug 14, 2019

ktbarrett commented Aug 14, 2019

LarsAsplund commented Aug 15, 2019

ktbarrett commented Aug 15, 2019

go2sh commented Aug 18, 2019 •

edited

Loading

ktbarrett commented Aug 19, 2019

ktbarrett commented Aug 19, 2019

LarsAsplund commented Aug 21, 2019

go2sh commented Aug 21, 2019

kraigher commented Sep 12, 2019

eine commented Sep 13, 2019

kraigher commented Sep 14, 2019 •

edited

Loading

eine commented Sep 14, 2019

kraigher commented Sep 14, 2019

eine commented Sep 14, 2019 •

edited

Loading

Two don't cares ('-') adjacent in a literal is parsed as a comment #529

Two don't cares ('-') adjacent in a literal is parsed as a comment #529

Comments

ktbarrett commented Aug 14, 2019

ktbarrett commented Aug 14, 2019

ktbarrett commented Aug 14, 2019

LarsAsplund commented Aug 15, 2019

ktbarrett commented Aug 15, 2019

go2sh commented Aug 18, 2019 • edited Loading

ktbarrett commented Aug 19, 2019

ktbarrett commented Aug 19, 2019

LarsAsplund commented Aug 21, 2019

go2sh commented Aug 21, 2019

kraigher commented Sep 12, 2019

eine commented Sep 13, 2019

kraigher commented Sep 14, 2019 • edited Loading

eine commented Sep 14, 2019

kraigher commented Sep 14, 2019

eine commented Sep 14, 2019 • edited Loading

go2sh commented Aug 18, 2019 •

edited

Loading

kraigher commented Sep 14, 2019 •

edited

Loading

eine commented Sep 14, 2019 •

edited

Loading