-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement table name extraction. #1598
Conversation
|
||
def test_join(self): | ||
query = "SELECT t1.*, t2.* FROM t1 JOIN t2 ON t1.a = t2.a;" | ||
self.assertEquals(["t1", "t2"], sql_parse.extract_tables(query)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably use sets instead of lists as ordering shouldn't get in the way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add more nesting, say 3 levels deep, or a UNION ALL within a subquery, a subquery whithin a union all.
also a select in an expression
SELECT f1, (SELECT count(1) FROM t2) FROM t1
|
||
RESULT_OPERATIONS = {'UNION', 'INTERSECT', 'EXCEPT'} | ||
PRECEDES_TABLE_NAME = {'FROM', 'JOIN', 'DESC', 'DESCRIBE', 'WITH'} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be nice to have a SqlStatement and/or SqlSegment class(es)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep it as simple as I can for now. Happy to redesign it once we have other use cases.
@mistercrunch - build is green, could you take another look ? |
@bkyryliuk have you thought about testing columns named the same as reserved keywords which need escaping. |
You also may want to look at other join types: (LEFT, RIGHT) INNER, (LEFT, RIGHT) OUTER, FULL OUTER, LEFT SEMI. |
def process_identifier(identifier, table_names, aliases): | ||
# exclude subselects | ||
if '(' not in '{}'.format(identifier): | ||
table_names.append(get_full_name(identifier)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize this was going on when I first read the code but it's not a good idea to have a function mutate their input as a way to "return", that's just a no-no. This is a sign that you need an object, because an object's method is expected to mutate its properties.
Alternatively, if you want to go "functional programming" on this one and want to return multiple things, you can return a tuple (preferably namedtuple) or a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it's more c++ approach, I'll try to rewrite it using yield - it will be more pythonic way
def extract_tables(sql): | ||
table_names = [] | ||
aliases = [] | ||
extract_from_token(sqlparse.parse(sql)[0], table_names, aliases) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd advise to raise if there are multiple statements as it may be misleading to only return for the first statement. The caller should make sure they have a single statement before calling.
table_names.append(get_full_name(identifier)) | ||
else: | ||
# store aliases | ||
if hasattr(identifier, 'get_alias'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh interesting, can the identifier object be of different types? What are these types? Should the condition be using isinstance
?
I still feel like an object there would be useful, even some of the logic in |
|
||
|
||
# TODO: some sql_lab logic here. | ||
class SupersetQuery: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always derive object
when supporting py2
.filter_by(datasource_name=datasource_name) | ||
.all() | ||
) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return None
is always implicit in Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, I prefer to return in cases when the function value will be used.
The post on the stackoverflow explains better:
http://stackoverflow.com/questions/15300550/python-return-return-none-and-no-return-at-all
Using return None:
This tells that the function is indeed meant to return a value for later use, and in this case it returns None.
class SupersetQuery(object): | ||
def __init__(self, sql_statement): | ||
self._tokens = [] | ||
self._sql = sql_statement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public
LGTM |
This function parses SQL statement and extracts table names using sqlparse.
Resolves: #1607
Reviewers:
If you have tests in mind - please add SQL statements in the comments, happy to extend test coverage.