Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GX fails with SAP HANA #10039

Closed
Someone894 opened this issue Jun 17, 2024 · 1 comment
Closed

GX fails with SAP HANA #10039

Someone894 opened this issue Jun 17, 2024 · 1 comment

Comments

@Someone894
Copy link

Someone894 commented Jun 17, 2024

Describe the bug

Currently I am testing out how GX could integrate in our existing infrastructure. We store most of oure Bussines Data within the SAP HANA. Since GX uses SQLAlchemy and its Dialects support SAP HANA via sqlalchemy-hana I hoped that GX could work with SAP HANA too, but it throws an error when I try out the "Get started with GX and SQL" tutorial with the HANA DB. See the details below.

To Reproduce

Here is the first part of the code till the line that produces the error:

context = gx.get_context(project_root_dir=to_absolute_path("./gx_hana"))

datasource = context.sources.add_or_update_sql(
    name="PHT",
    connection_string="hana+hdbcli://${hana_username}:${hana_password}@${hana_host}:${hana_port}",
)

# datasource.add_table_asset(name="POS", table_name="POS_DATA.RESULT_BY_ID")
datasource.add_query_asset(name="POS", query="SELECT * FROM POS_DATA.RESULT_BY_ID;")

batch_request = datasource.get_asset("POS").build_batch_request()

expectation_suite_name = "insert_your_expectation_suite_name_here"
context.add_or_update_expectation_suite(expectation_suite_name=expectation_suite_name)
validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name=expectation_suite_name,
)

validator.expect_column_values_to_not_be_null(column="COUNTRY")

Where the last line is the problematic one and creates the following error:

validator.expect_column_values_to_not_be_null(column="COUNTRY")
2

Calculating Metrics:   0%|                                                                                                                                                              | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
hdbcli.dbapi.ProgrammingError: (257, 'sql syntax error: incorrect syntax near "FROM": line 2 col 43 (at pos 53)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 548, in _process_direct_and_bundled_metric_computation_configurations
    ] = metric_computation_configuration.metric_fn(  # type: ignore[misc] # F not callable
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\metric_provider.py", line 60, in inner_func
    return metric_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\table_metrics\table_column_types.py", line 73, in _sqlalchemy
    return _get_sqlalchemy_column_metadata(execution_engine, batch_data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\table_metrics\table_column_types.py", line 112, in _get_sqlalchemy_column_metadata
    return get_sqlalchemy_column_metadata(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\util.py", line 427, in get_sqlalchemy_column_metadata
    columns = column_reflection_fallback(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\util.py", line 679, in column_reflection_fallback
    result_object = connection.execute(query)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1418, in execute
    return meth(
           ^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\sql\elements.py", line 515, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1640, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1846, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1986, in _exec_single_context
    self._handle_dbapi_exception(
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 2353, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (hdbcli.dbapi.ProgrammingError) (257, 'sql syntax error: incorrect syntax near "FROM": line 2 col 43 (at pos 53)')
[SQL: SELECT *
FROM (SELECT * FROM POS_DATA.RESULT_BY_ID FROM DUMMY) AS anon_1
WHERE true = TRUE]
(Background on this error at: https://sqlalche.me/e/20/f405)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 590, in inst_expectation
    raise err
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 553, in inst_expectation
    validation_result = expectation.validate(
                        ^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\expectation.py", line 1314, in validate
    ] = validator.graph_validate(
        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 1065, in graph_validate
    raise err
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 1044, in graph_validate
    ) = self._resolve_suite_level_graph_and_process_metric_evaluation_errors(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 1200, in _resolve_suite_level_graph_and_process_metric_evaluation_errors
    ) = self._metrics_calculator.resolve_validation_graph(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\metrics_calculator.py", line 274, in resolve_validation_graph
    resolved_metrics, aborted_metrics_info = graph.resolve(
                                             ^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validation_graph.py", line 202, in resolve
    aborted_metrics_info: _AbortedMetricsInfoDict = self._resolve(
                                                    ^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validation_graph.py", line 302, in _resolve
    raise err
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validation_graph.py", line 269, in _resolve
    self._execution_engine.resolve_metrics(
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 283, in resolve_metrics
    return self._process_direct_and_bundled_metric_computation_configurations(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 552, in _process_direct_and_bundled_metric_computation_configurations
    raise gx_exceptions.MetricResolutionError(
great_expectations.exceptions.exceptions.MetricResolutionError: (hdbcli.dbapi.ProgrammingError) (257, 'sql syntax error: incorrect syntax near "FROM": line 2 col 43 (at pos 53)')
[SQL: SELECT *
FROM (SELECT * FROM POS_DATA.RESULT_BY_ID FROM DUMMY) AS anon_1
WHERE true = TRUE]
(Background on this error at: https://sqlalche.me/e/20/f405)

Here is my great_expectations.yml file

config_version: 3.0

datasources: {}

config_variables_file_path: uncommitted/config_variables.yml

plugins_directory: plugins/

stores:
  expectations_store:
    class_name: ExpectationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: expectations/

  validations_store:
    class_name: ValidationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/validations/

  evaluation_parameter_store:
    class_name: EvaluationParameterStore
  checkpoint_store:
    class_name: CheckpointStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      suppress_store_backend_id: true
      base_directory: checkpoints/

  profiler_store:
    class_name: ProfilerStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      suppress_store_backend_id: true
      base_directory: profilers/

expectations_store_name: expectations_store
validations_store_name: validations_store
evaluation_parameter_store_name: evaluation_parameter_store
checkpoint_store_name: checkpoint_store

data_docs_sites:
  local_site:
    class_name: SiteBuilder
    show_how_to_buttons: false
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/
    site_index_builder:
      class_name: DefaultSiteIndexBuilder

anonymous_usage_statistics:
  data_context_id: e41e7ae4-9958-4d80-a3b8-8668e6f1a11a
  enabled: false
fluent_datasources:
  PHT:
    type: sql
    assets:
      POS:
        type: query
        order_by: []
        batch_metadata: {}
        query: SELECT * FROM POS_DATA.RESULT_BY_ID;
    connection_string: hana+hdbcli://${hana_username}:${hana_password}@${hana_host}:${hana_port}
notebooks:
include_rendered_content:
  globally: false
  expectation_suite: false
  expectation_validation_result: false

Expected behavior

I expected GX to be able to check the expectation without crashing.

Environment (please complete the following information):

  • Operating System: Windows 11 (but its only for testing and development, in production i would use docker containers on linux)
  • Great Expectations Version: 0.18.15
    • sqlalchemy: 2.0.30
    • sqlalchemy-hana: 2.1.0
  • Data Source: SAP HANA
  • Cloud environment: None

Additional context

I also went ahead and tested the SAP HANA connection (and connection string) via
SQLAlchemy directly, which worked great:

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("hana+hdbcli://${hana_username}:${hana_password}@${hana_host}:${hana_port}", echo=True)
pd.read_sql('SELECT * FROM "POS_DATA"."RESULT_BY_ID";', con=engine)

So im pretty sure that the SQL code GX is generating clashes with what the SAP HANA is expecting
since when I run the GX generated code in the SAP HANA natively it also throws the same error:

SELECT * FROM (SELECT * FROM "POS_DATA"."RESULT_BY_ID" FROM DUMMY) AS anon_1 WHERE true = TRUE;

Could not execute 'SELECT * FROM (SELECT * FROM "POS_DATA"."RESULT_BY_ID" FROM DUMMY) AS anon_1 WHERE true = TRUE'
SAP DBTech JDBC: [257]: sql syntax error: incorrect syntax near "FROM": line 1 col 56 (at pos 56)

Then I changed the SQL statement to this:

SELECT * FROM (SELECT * FROM "POS_DATA"."RESULT_BY_ID") AS anon_1 WHERE true = TRUE;

and it worked out fine in the SAP HANA Studio.

@molliemarie
Copy link
Contributor

Hello @Someone894. With the launch of Great Expectations Core (GX 1.0), we are closing old issues posted regarding previous versions. Moving forward, we will focus our resources on supporting and improving GX Core (version 1.0 and beyond). If you find that an issue you previously reported still exists in GX Core, we encourage you to resubmit it against the new version. With more resources dedicated to community support, we aim to tackle new issues swiftly. For specific details on what is GX-supported vs community-supported, you can reference our integration and support policy.

To get started on your transition to GX Core, check out the GX Core quickstart (click “Full example code” tab to see a code example).

You can also join our upcoming community meeting on August 28th at 9am PT (noon ET / 4pm UTC) for a comprehensive rundown of everything GX Core, plus Q&A as time permits. Go to https://greatexpectations.io/meetup and click “follow calendar” to follow the GX community calendar.

Thank you for being part of the GX community and thank you for submitting this issue. We're excited about this new chapter and look forward to your feedback on GX Core. 🤗

@molliemarie molliemarie closed this as not planned Won't fix, can't repro, duplicate, stale Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants