Implement table name extraction. #1598

bkyryliuk · 2016-11-14T18:53:27Z

This function parses SQL statement and extracts table names using sqlparse.

Resolves: #1607

Reviewers:

If you have tests in mind - please add SQL statements in the comments, happy to extend test coverage.

mistercrunch · 2016-11-14T18:56:32Z

tests/sql_parse_tests.py

+
+    def test_join(self):
+        query = "SELECT t1.*, t2.* FROM t1 JOIN t2 ON t1.a = t2.a;"
+        self.assertEquals(["t1", "t2"], sql_parse.extract_tables(query))


should probably use sets instead of lists as ordering shouldn't get in the way

mistercrunch

I'd add more nesting, say 3 levels deep, or a UNION ALL within a subquery, a subquery whithin a union all.

also a select in an expression

SELECT f1, (SELECT count(1) FROM t2) FROM t1

mistercrunch · 2016-11-15T04:29:19Z

superset/sql_parse.py

+
+RESULT_OPERATIONS = {'UNION', 'INTERSECT', 'EXCEPT'}
+PRECEDES_TABLE_NAME = {'FROM', 'JOIN', 'DESC', 'DESCRIBE', 'WITH'}
+


could be nice to have a SqlStatement and/or SqlSegment class(es)

I want to keep it as simple as I can for now. Happy to redesign it once we have other use cases.

bkyryliuk · 2016-11-16T01:48:51Z

@mistercrunch - build is green, could you take another look ?

john-bodley · 2016-11-16T02:17:08Z

@bkyryliuk have you thought about testing columns named the same as reserved keywords which need escaping.

john-bodley · 2016-11-16T02:22:18Z

You also may want to look at other join types: (LEFT, RIGHT) INNER, (LEFT, RIGHT) OUTER, FULL OUTER, LEFT SEMI.

mistercrunch · 2016-11-22T01:31:11Z

superset/sql_parse.py

+def process_identifier(identifier, table_names, aliases):
+    # exclude subselects
+    if '(' not in '{}'.format(identifier):
+        table_names.append(get_full_name(identifier))


I didn't realize this was going on when I first read the code but it's not a good idea to have a function mutate their input as a way to "return", that's just a no-no. This is a sign that you need an object, because an object's method is expected to mutate its properties.

Alternatively, if you want to go "functional programming" on this one and want to return multiple things, you can return a tuple (preferably namedtuple) or a dict.

yeah, it's more c++ approach, I'll try to rewrite it using yield - it will be more pythonic way

mistercrunch · 2016-11-22T01:39:00Z

superset/sql_parse.py

+def extract_tables(sql):
+    table_names = []
+    aliases = []
+    extract_from_token(sqlparse.parse(sql)[0], table_names, aliases)


I'd advise to raise if there are multiple statements as it may be misleading to only return for the first statement. The caller should make sure they have a single statement before calling.

mistercrunch · 2016-11-22T01:42:34Z

superset/sql_parse.py

+        table_names.append(get_full_name(identifier))
+    else:
+        # store aliases
+        if hasattr(identifier, 'get_alias'):


oh interesting, can the identifier object be of different types? What are these types? Should the condition be using isinstance ?

mistercrunch · 2016-11-22T01:47:05Z

I still feel like an object there would be useful, even some of the logic in sql_lab.py could be moved to this object, it allows to abstract SqlParse, tokenize only once and run multiple tests against it all in one place.

mistercrunch · 2016-11-28T03:15:46Z

superset/sql_parse.py

+
+
+# TODO: some sql_lab logic here.
+class SupersetQuery:


always derive object when supporting py2

mistercrunch · 2016-11-29T19:47:35Z

superset/source_registry.py

+                .filter_by(datasource_name=datasource_name)
+                .all()
+            )
+        return None


return None is always implicit in Python

true, I prefer to return in cases when the function value will be used.
The post on the stackoverflow explains better:
http://stackoverflow.com/questions/15300550/python-return-return-none-and-no-return-at-all

Using return None: This tells that the function is indeed meant to return a value for later use, and in this case it returns None.

mistercrunch · 2016-11-29T19:54:33Z

superset/sql_parse.py

+class SupersetQuery(object):
+    def __init__(self, sql_statement):
+        self._tokens = []
+        self._sql = sql_statement


mistercrunch · 2016-11-29T19:58:36Z

LGTM

mistercrunch reviewed Nov 14, 2016

View reviewed changes

mistercrunch reviewed Nov 15, 2016

View reviewed changes

bkyryliuk changed the title ~~WIP. Implement table name extraction tests.~~ Implement table name extraction tests. Nov 15, 2016

bkyryliuk changed the title ~~Implement table name extraction tests.~~ Implement table name extraction. Nov 16, 2016

mistercrunch requested changes Nov 22, 2016

View reviewed changes

mistercrunch reviewed Nov 28, 2016

View reviewed changes

superset/sql_parse.py

# TODO: some sql_lab logic here.

class SupersetQuery:

Copy link

Member

mistercrunch Nov 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always derive object when supporting py2

mistercrunch reviewed Nov 29, 2016

View reviewed changes

mistercrunch approved these changes Nov 29, 2016

View reviewed changes

Bogdan Kyryliuk added 12 commits November 29, 2016 12:15

Implement table name extraction tests.

cffb966

Address comments.

c1e5c84

Fix tests and reimplement the token processing.

6d55ea6

Exclude aliases.

b653a84

Clean up print statements and code.

1fd5ec9

Reverse select test.

3c40992

Fix failing test.

70819cc

Test JOINs

bfc7b22

refactore as a class

1ffdb96

Check for permissions in SQL Lab.

fa29aed

Implement permissions check for the datasources in sql_lab

2417a83

Address comments.

3609dac

bkyryliuk merged commit dc98c67 into apache:master Nov 29, 2016

victorarbuesmallada mentioned this pull request Aug 26, 2022

ParsedQuery to return tables within subselects #21207

Closed

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.14.0 labels Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement table name extraction. #1598

Implement table name extraction. #1598

bkyryliuk commented Nov 14, 2016 •

edited

Loading

mistercrunch Nov 14, 2016

mistercrunch left a comment

mistercrunch Nov 15, 2016

bkyryliuk Nov 15, 2016

bkyryliuk commented Nov 16, 2016

john-bodley commented Nov 16, 2016

john-bodley commented Nov 16, 2016

mistercrunch Nov 22, 2016

bkyryliuk Nov 22, 2016

mistercrunch Nov 22, 2016

mistercrunch Nov 22, 2016 •

edited

Loading

mistercrunch commented Nov 22, 2016

mistercrunch Nov 28, 2016

mistercrunch Nov 29, 2016

bkyryliuk Nov 29, 2016

mistercrunch Nov 29, 2016

mistercrunch commented Nov 29, 2016


		RESULT_OPERATIONS = {'UNION', 'INTERSECT', 'EXCEPT'}
		PRECEDES_TABLE_NAME = {'FROM', 'JOIN', 'DESC', 'DESCRIBE', 'WITH'}

Implement table name extraction. #1598

Implement table name extraction. #1598

Conversation

bkyryliuk commented Nov 14, 2016 • edited Loading

Choose a reason for hiding this comment

mistercrunch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkyryliuk commented Nov 16, 2016

john-bodley commented Nov 16, 2016

john-bodley commented Nov 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mistercrunch Nov 22, 2016 • edited Loading

Choose a reason for hiding this comment

mistercrunch commented Nov 22, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mistercrunch commented Nov 29, 2016

bkyryliuk commented Nov 14, 2016 •

edited

Loading

mistercrunch Nov 22, 2016 •

edited

Loading