Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial Athena Clickhouse connector commit related to issue 1754 #1770

Merged
merged 20 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
a59ee61
initial Athena Clickhouse connector commit related to https://github.…
Feb 22, 2024
9dbdb76
Merge branch 'master' into master
bishrtabbaa Feb 29, 2024
084ceee
Merge branch 'master' into master
bishrtabbaa Mar 5, 2024
0c6c207
Merge branch 'awslabs:master' into master
bishrtabbaa Mar 20, 2024
384d833
incorporating Athena team feedback to reuse and extend Athena MySqlMe…
Mar 21, 2024
7efdc55
incorporate Athena service team feedback
bishrtabbaa Apr 9, 2024
cd4a577
Merge pull request #1 from awslabs/master
bishrtabbaa Apr 9, 2024
48d9290
Delete athena-clickhouse/.aws-sam/build.toml per service team feedback
bishrtabbaa Apr 9, 2024
c3def5d
updating jar lib version number
bishrtabbaa Apr 10, 2024
d63b413
Merge branch 'master' into master
bishrtabbaa Apr 26, 2024
5acee22
Merge branch 'master' into master
bishrtabbaa May 3, 2024
37ada59
Merge branch 'awslabs:master' into master
bishrtabbaa Jun 6, 2024
953263f
incorporating Athena service team feedback related to pull/1770
bishrtabbaa Jun 6, 2024
15e2b0c
Merge branch 'master' into master
bishrtabbaa Jun 11, 2024
0190490
prepared README for public GA release and streamlined SAM CLI instruc…
bishrtabbaa Jun 12, 2024
80b0ecd
switched from curl to wget for jar file download instructions
bishrtabbaa Jun 13, 2024
303e084
cleaned up download instructions
bishrtabbaa Jun 13, 2024
9728d46
Merge branch 'master' into master
chngpe Jun 18, 2024
e85080e
incorporated Athena service team feedback from https://github.com/aws…
bishrtabbaa Jun 24, 2024
0979abd
Merge branch 'master' into master
aimethed Jun 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
prepared README for public GA release and streamlined SAM CLI instruc…
…tions
  • Loading branch information
bishrtabbaa committed Jun 12, 2024
commit 019049047d1e7c7fdd52942ebbb8da48f930c330
80 changes: 79 additions & 1 deletion athena-clickhouse/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,82 @@

This connector enables Amazon Athena to access your ClickHouse databases.

Documentation has moved [here](https://docs.aws.amazon.com/athena/latest/ug/connectors-athena.html).
Official Public documentation has moved [here](https://docs.aws.amazon.com/athena/latest/ug/connectors-athena.html).

This README walks through the SAM CLI installation method (not Serverless Application Repository via AWS Console).

## 1. Download Athena Clickhouse Connector source and release repositories

Download latest Athena source
```
git clone https://github.com/awslabs/aws-athena-query-federation
cd athena-clickhouse
```

Download latest Athena Clickhouse JAR binary
Browse to https://github.com/awslabs/aws-athena-query-federation/releases then choose latest release
```
curl -O https://github.com/awslabs/aws-athena-query-federation/releases/download/v2024.19.1/athena-clickhouse-2024.19.1.jar
```

### 2. Copy Athena Clickhouse Connector JAR binary since it is >= 50 MB local file upload limit

You **MUST** change the S3 bucket and prefix folder where the Connector JAR file will be stored for the subsequent SAM deployment step (4).

```
aws s3 cp --region us-east-2 athena-clickhouse-2024.19.1.jar s3://my-athena-demo/code/
```

### 3. Validate Athena Clickhouse Connector as Serverless Cloudformation stack

```
sam validate --region us-east-2 --template-file athena-clickhouse.yaml
```

### 4. Deploy Athena Clickhouse Connector as Serverless Cloudformation stack

You can change the Lambda function configuration at deployment time and also once the stack has been deployed. Parameters that **MUST** change are listed in section below.

Also, note that you **MUST** create and configure VPC endpoints for S3 (and *optionally* Secrets Manager) because the Athena connector's Lambda function will be deployed within a VPC.

Direct Configuration of Credentials in Connector's connection string:
```
sam deploy --guided --region us-east-2 --template-file athena-clickhouse.yaml --stack-name AthenaClickhouseConnectorStack --capabilities CAPABILITY_NAMED_IAM --parameter-overrides LambdaFunctionName=athenaclickhouseconnectorfunction DefaultConnectionString='clickhouse://jdbc:clickhouse:https://myclickhouseserver.xyzware.io:8443/default?user=foo&password=bar&sslmode=none' DisableSpillEncryption=true SecretNamePrefix=AthenaClickhouse SpillBucket=my-athena-demo SpillPrefix=athena-spill SecurityGroupIds=sg-ab9282d4 SubnetIds=subnet-bc1f0ac6,subnet-db9f40b0 LambdaS3CodeUriBucket=my-athena-demo LambdaS3CodeUriKey=code/athena-clickhouse-2024.19.1.jar
```

Indirect Configuration of Credentials in Connector's connection string using AWS Secrets Manager:
```
sam deploy --guided --region us-east-2 --template-file athena-clickhouse.yaml --stack-name AthenaClickhouseConnectorStack --capabilities CAPABILITY_NAMED_IAM --parameter-overrides LambdaFunctionName=athenaclickhouseconnectorfunction DefaultConnectionString='clickhouse://jdbc:clickhouse:https://myclickhouseserver.xyzware.io:8443/default?${AthenaClickhouse}&sslmode=none' DisableSpillEncryption=true SecretNamePrefix=AthenaClickhouse SpillBucket=my-athena-demo SpillPrefix=athena-spill SecurityGroupIds=sg-ab9282d4 SubnetIds=subnet-bc1f0ac6,subnet-db9f40b0 LambdaS3CodeUriBucket=my-athena-demo LambdaS3CodeUriKey=code/athena-clickhouse-2024.19.1.jar
```
### References

**Parameters** listed below. You **MUST** change the `DefaultConnectionString`, `SpillBucket`, `SpillPrefix`, `SecurityGroupIds`, `SubnetIds`, `LambdaS3CodeUriBucket`, and `LambdaS3CodeUriKey`.

Also, note that there are `DefaultConnectionString` differences depending on whether you directly configure within the URL or indirectly using AWS Secrets Manager.

If you decide to directly configure credentials in the URL, make sure that the URL contains parameters for the `user` and `password`.

If you decide to indirectly configure credentials using AWS Secrets Manager, make sure that the Secret contains parameters for the `username` and `password`. And make sure that you use single quote when referencing it in the DefaultConnectionString on the Terminal CLI due to `$` variable expansion in your Terminal shell.

* LambdaFunctionName=athenaclickhouseconnectorfunction
* DisableSpillEncryption=true [optional]
* SecretNamePrefix=AthenaClickhouse [optional]
* LOG_LEVEL=info [optional]
* **DefaultConnectionString**=clickhouse://jdbc:clickhouse:https://myclickhouseserver.xyzware.io:8443/default?user=foo&password=bar&sslmode=none [direct]
* **DefaultConnectionString**=clickhouse://jdbc:clickhouse:https://myclickhouseserver.xyzware.io:8443/default?${AthenaClickhouse}&sslmode=none [indirect]
* **SpillBucket**=my-athena-demo
* **SpillPrefix**=athena-spill
* **SecurityGroupIds**=sg-ab9282d4
* **SubnetIds**=subnet-bc1f0ac6,subnet-db9f40b0
* **LambdaS3CodeUriBucket**=my-athena-demo
* **LambdaS3CodeUriKey**=code/athena-clickhouse-2024.19.1.jar

**Links**
* https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/what-is-sam.html
* https://docs.aws.amazon.com/athena/latest/ug/connect-to-a-data-source-lambda.html
* https://docs.aws.amazon.com/athena/latest/ug/connectors-mysql.html
* https://github.com/awslabs/aws-athena-query-federation/wiki
* https://github.com/awslabs/aws-athena-query-federation/wiki/Deploy-the-Athena-PostgreSQL-Connector-without-using-SAM
* https://db-engines.com/en/system/ClickHouse


19 changes: 17 additions & 2 deletions athena-clickhouse/athena-clickhouse.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@ Parameters:
Description: "(Optional) An IAM policy ARN to use as the PermissionsBoundary for the created Lambda function's execution role"
Default: ''
Type: String
LambdaS3CodeUriBucket:
Description: This must be set to a S3 mybucket because the JAR is greater than the local SAM 50MB limit and must be referenced based on prior s3 cp deploy step.
Type: String
LambdaS3CodeUriKey:
Description: This must be set to a S3 folder/file (e.g. code/athena-clickhouse-2022.47.1.jar) because the JAR is greater than the local SAM 50MB limit and must be referenced based on prior s3 cp deploy step.
Type: String
Conditions:
HasPermissionsBoundary: !Not [ !Equals [ !Ref PermissionsBoundaryARN, "" ] ]
NotHasLambdaRole: !Equals [!Ref LambdaRoleARN, ""]
Expand All @@ -71,7 +77,9 @@ Resources:
default: !Ref DefaultConnectionString
FunctionName: !Ref LambdaFunctionName
Handler: "com.amazonaws.athena.connectors.clickhouse.ClickHouseMuxCompositeHandler"
CodeUri: "./target/athena-clickhouse-2022.47.1.jar"
CodeUri:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do that as SAR can handle this.

Copy link
Contributor

@chngpe chngpe Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customer can actually invoke sam deploy which will take care of that part

Bucket: !Ref LambdaS3CodeUriBucket
Key: !Ref LambdaS3CodeUriKey
Description: "Enables Amazon Athena to communicate with ClickHouse using JDBC"
Runtime: java11
Timeout: !Ref LambdaTimeout
Expand Down Expand Up @@ -107,9 +115,16 @@ Resources:
Version: 2012-10-17
Statement:
- Action:
- secretsmanager:DescribeSecret
- secretsmanager:GetSecretValue
- secretsmanager:GetResourcePolicy
- secretsmanager:ListSecretVersionIds
Effect: Allow
Resource: !Sub 'arn:${AWS::Partition}:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:${SecretNamePrefix}*'
Resource: !Sub 'arn:${AWS::Partition}:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:*'
- Action:
- secretsmanager:ListSecrets
Effect: Allow
Resource: '*'
- Action:
- logs:CreateLogGroup
Effect: Allow
Expand Down