Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two don't cares ('-') adjacent in a literal is parsed as a comment #529

Closed
ktbarrett opened this issue Aug 14, 2019 · 15 comments
Closed

Two don't cares ('-') adjacent in a literal is parsed as a comment #529

ktbarrett opened this issue Aug 14, 2019 · 15 comments
Labels

Comments

@ktbarrett
Copy link
Contributor

The remove_comments(code) function is overly simplistic. It matches any two adjacent - characters until the end of the line. '-' is also used as a "don't care" value for std_logic. If there is an std_logic_vector literal where there are two don't care values in a row (e.g. '----'), it is parsed as a comment and the rest of the line is removed, causing the parse to fail.

vunit/vunit/vhdl_parser.py

Lines 991 to 1007 in 12be973

VHDL_REMOVE_COMMENT_RE = re.compile(r'--[^\n]*')
def _comment_repl(match):
"""
Replace comment with equal amount of whitespace to make
lexical position unaffected
"""
text = match.group(0)
return " " * len(text)
def remove_comments(code):
"""
Return the code with comments removed
"""
return VHDL_REMOVE_COMMENT_RE.sub(_comment_repl, code)

@ktbarrett
Copy link
Contributor Author

I am fairly certain that you can remove comments if they are the only thing on the line. There are no such thing as multiline string literals in VHDL, so there is no way for -- to appear on a line without any code and not be a comment. Outside of that, whether a -- is a start of a comment or not depends upon the parse of the code previous.

@ktbarrett
Copy link
Contributor Author

Also, I'm uncertain as to why you are replacing the comment with spaces. No code can follow a comment, replacing it with whitespace won't affect character index in error reporting because no error could exist after a comment starts.

@ktbarrett ktbarrett changed the title Two don't cares ('-') in a literal is parsed as a comment Two don't cares ('-') adjacent in a literal is parsed as a comment Aug 14, 2019
@LarsAsplund
Copy link
Collaborator

The purpose of our parser is to solve the small subset of the full parsing task we need to run VUnit. For example finding design unit and use statements so that we can figure out dependencies. Do you have a use case where this breaks due to don't care assignments or are you thinking about parsing more generally?

I recognize that there may be such use cases but the question is whether or not they motivate extra parsing efforts or should be solved with a workaround. A common use case we address by removing comments is when people comment out part of their code (not using block comments).

Parsing in general is can be tricky when using a simplistic regexp approach

-- dc := "----"; -- Assign "----" to dc.

@ktbarrett
Copy link
Contributor Author

The parser fails because of syntax errors introduced by removing the rest of the line. For example, if a closing or opening paren is removed it will cause a failure to parse exception: ValueError: Failed to find closing delimiter to ( in {code}

dc := SomeFunc("----");

After comment removal...

dc := SomeFunc("

I can't post any of the failing code. And I know I could modify the source to get it to not fail, but that's simply a hack. I want to use vunit for determining compile order, but these failures make it impossible.

@go2sh
Copy link
Contributor

go2sh commented Aug 18, 2019

It is because of the simple comment regex. It doesn't take into account string literals. Any kind of string literals, not only bit string literals.

I did some test and found a solution here:
^(?:(?:[^\"\-]*\"(?:\"\"|[^\"])*\"[^\"\-]*)+(--.*)|[^-\"]*(--.*))$
It needs the multiline flag. But it is quite expensive compared to the current version.

@ktbarrett
Copy link
Contributor Author

It may be more complicated and take more time, but it's probably the cheapest fix.

@ktbarrett
Copy link
Contributor Author

I have a slightly more optimal and straightforward regex that works on the example given. Although maybe you know more about potential pitfalls than me.

patt = re.compile('^([^\"\-]*(?:\"[^\"]*\"[^\"\-]*)*)--.*')
def remove_comments(match):
    return match.group(1)
patt.sub(remove_comments, code)

@LarsAsplund
Copy link
Collaborator

@ktbarrett Have you done some measurements on the performance hit?

@go2sh
Copy link
Contributor

go2sh commented Aug 21, 2019

My test with regex tester show ruffly 6x more steps then the simple version. But I do not think it matters in absolute terms.

@kraigher
Copy link
Collaborator

The "real" way of solving the parsing problem would be to do lexing/tokenization of the input. But as Lars mentioned, making a correct parser is much more work than making a heuristic parser that is only supposed to be a private implementation detail of a dependency scanner. I have actually written a full VHDL parser in Rust (https://github.com/kraigher/rust_hdl/) FYI

@eine
Copy link
Collaborator

eine commented Sep 13, 2019

As a possible future enhancement, does it make sense to pluggable/selectable VHDL parsers? Mimicking what we do with simulator interfaces, it would allow to have default_parser_interface, rusthdl_parser_interface, pyvhdl_parser_interface, ghdl_parser_interface, etc. I think that this might be a cleaner approach than trying to 'fix' the regex based implementation as complexities arise. @kraigher, since you know both VUnit and rust_hdl, do you think this is just a matter of having time to do it? Or are there some features which might need to be added to rust_hdl's API?

@kraigher
Copy link
Collaborator

kraigher commented Sep 14, 2019

First of all I do not think VUnit should have a parser as a publicly visible component. What VUnit needs is a dependency scanner which might have a parser as a private implementation detail. In my opinion the vhdl parser of VUnit already does to much by parsing generic and port lists. Ironically if it did not parse port or generic list it might never have been sensitive to the problem of this issue since the don't care literals probably only appear as initialization values in such lists.

One of the reasons I wrote the VHDL parser of rust_hdl is that I do not believe it is a good idea to write a parser in Python. This is due to:

  1. Lack of performance
  2. Lack of static type system and pattern matching (Makes AST traversal unmaintainable)

Thus if VUnit ever needed a full parser it would use the parser from rust_hdl. However the question is if it would be worth it. The current method works well enough and introducing a platform dependent binary to VUnit would complicate the build & delivery process. Maybe it can be tolerated since most normal users are found on only two platforms and advanced users on other platforms can be expected to build the binary themselves.

I am not so keen on supporting pluggable parsers as you can only assume the least common denominator of functionality and it is twice the code to maintain. I would rather throw away the old and in with the new.

@eine
Copy link
Collaborator

eine commented Sep 14, 2019

The current method works well enough and introducing a platform dependent binary to VUnit would complicate the build & delivery process.

This was my main motivation to suggest a pluggable system with, at least, the current (old) implementation and the new (rust_hdl). Users that want to keep installation of VUnit simple and multi-platform could use the old implementation (which can be a default fallback if rust_hdl is not available).

Maybe it can be tolerated since most normal users are found on only two platforms and advanced users on other platforms can be expected to build the binary themselves.

I use VUnit on four platforms: GNU/Linux (amd64, aarch64 and armv7) and Win10 (amd64). This is not a great deal (2 vs 4), and I do have docker images to automate the builds. Nevertheless, it is kind of painful if most users don't have any problem with the regex solution.

I am not so keen on supporting pluggable parsers as you can only assume the least common denominator of functionality and it is twice the code to maintain. I would rather throw away the old and in with the new.

What about the following middle ground?

  • Do not support pluggable parsers, but two only: the old one and rust_hdl.
  • Add a note explaining that the old one is no longer maintained and that it is suggested to use rust_hdl. A kind of deprecation warning without explicit deprecation date.
  • In the following months/years, when some heavy modification to the old parser is required to keep it up-do-date, remove it.

@kraigher
Copy link
Collaborator

It is just that I cannot think of any important aspect that would be better for the users of VUnit by using the rust hdl parser for vunit dependency scanning at the moment. The current method even if it is primitive and not formally correct works well. If anything I would like to strip away parsing of elements that do not contribute to dependency scanning.

Thus there is really not much which would motivate the work effort to integrate rust hdl. There would need to be a lot of code written to integrate the rust parser and actually use it for dependency scanning.

@eine
Copy link
Collaborator

eine commented Sep 14, 2019

If anything I would like to strip away parsing of elements that do not contribute to dependency scanning.

What are those elements used for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants