Skip to content

How_To_Build_A_Connector_or_UDF

Michael Hackett edited this page May 4, 2023 · 7 revisions

How To Build A Connector Or UDF

We've built a detailed tutorial that walks your through how to get, build, and deploy connectors and UDFs from source. The detailed tutorial can be found in the athena-example module of this repository. We recommend you view that README.md for the latest, most detailed documentation. Below is a summary of the steps involved.

*These steps assume you are using an OS capable of running a bash terminal (all commands were tested on Amazon Linux and CentOS).

Step 1: Download The SDK + Connectors

  1. At your terminal run git clone https://github.com/awslabs/aws-athena-query-federation.git to get a copy of the Amazon Athena Query Federation SDK, Connector Suite, and Example Connector.

Step 2: Install Development Tools (Pre-Requisites)

This step is optional if you already have a development environment with Apache Maven, the AWS CLI, the GitHub CLI, and the AWS SAM build tool for Serverless Applications.

If you are using linux and you have docker installed, you can just follow this guide to setup your development environment easily: https://github.com/awslabs/aws-athena-query-federation/wiki/Building-aws-athena-query-federation-using-container_env

Otherwise, if you are on a mac, you can run the ./tools/prepare_dev_env.sh script in the root of the github project you checked out. To ensure your terminal can see the new tools we installed run source ~/.profile or open a fresh terminal. If you skip this step you will get errors later about the aws cli or sam build tool not being able to publish your connector.

Now run mvn clean install from the athena-federation-sdk directory within the github project you checked out earlier.

Step 3: Write The Code

If you are following along with the athena-example you'll need to do the below steps. If you are building one of the ready made connectors like cloudwatch, dynamodb, etc.. you can skip this step and go to Step 4.

  1. (If using Cloud9) Navigate to the aws-athena-query-federation/athena-example folder on the left nav. This is the code you extracted back in Step 2.
  2. Complete the TODOs in ExampleMetadataHandler by uncommenting the provided example code and providing missing code where indicated.
  3. Complete the TODOs in ExampleRecordHandler by uncommenting the provided example code and providing missing code where indicated.
  4. Complete the TODOs in ExampleUserDefinedFuncHandler by uncommenting the provided example code and providing missing code where indicated.
  5. Create an s3 bucket (in the same region you will be deploying the connector), that we can use for spill and to upload some sample data using the following command aws s3 mb s3://BUCKET_NAME but be sure to put your actual bucket name in the command and that you pick something that is unlikely to already exist.
  6. Upload our sample data by running the following command from aws-athena-query-federation/athena-example directory. Be sure to replace BUCKET_NAME with the name of the bucket your created earlier. aws s3 cp ./sample_data.csv s3://BUCKET_NAME/2017/11/1/sample_data.csv

Step 4: Package and Deploy Your New Connector

For fast development, we can bypass the standard Serverless Application Repository setup by directly deploying our CloudFormation stack, which will create all our IAM policies/roles and the Lambda function on our behalf.

cd into the connector module and run sam deploy --template-file <template_file>.yaml -g. You can add the --profile flag if you want to use a specific profile in your ~/.aws/config. Follow the guided prompts, making sure to use a lowercase name for your catalog and lambda function when providing the parameter options. Once the deployment finishes, you should be able to see your stack in CloudFormation and your Lambda function should have been created.

Step 6: Validate our Connector.

One of the most challenging aspects of integrating systems (in this case our connector and Athena) is testing how these two things will work together. Lambda will capture logging from out connector in Cloudwatch Logs but we've also tried to provide some tools to stream line detecting and correcting common semantic and logical issues with your custom connector. By running Athena's connector validation tool you can simulate how Athena will interact with your Lambda function and get access to diagnostic information that would normally only be available within Athena or require you to add extra diagnostics to your connector.

Run ../tools/validate_connector.sh --lambda-func <function_name> --schema schema1 --table table1 --constraints year=2017,month=11,day=1 Be sure to replace lambda_func with the name you gave to your function/catalog when you deployed it via Serverless Application Repository.

If everything worked as expected you should see the script generate useful debugging info and end with:

2019-11-07 20:25:08 <> INFO  ConnectorValidator:==================================================
2019-11-07 20:25:08 <> INFO  ConnectorValidator:Successfully Passed Validation!
2019-11-07 20:25:08 <> INFO  ConnectorValidator:==================================================

Step 7: Run a Query!

Ok, now we are ready to try running some queries using our new connector. Some good examples to try include (be sure to put in your actual database and table names):

USING 
EXTERNAL FUNCTION extract_tx_id(value ROW(id INT, completed boolean)) 
		RETURNS INT LAMBDA '<function_name>',
EXTERNAL FUNCTION decrypt(payload VARCHAR) 
		RETURNS VARCHAR LAMBDA '<function_name'
SELECT year,
         month,
         day,
         account_id,
         decrypt(encrypted_payload) AS decrypted_payload,
         extract_tx_id(transaction) AS tx_id
FROM "lambda:<function_name>".schema1.table1
WHERE year=2017
        AND month=11
        AND day=1;

*note that the <function_name> corresponds to the name of your Lambda function.

Upgrading

If you've made code changes after deploying a connector or UDF, the steps to upgrade are slightly different. You have two options:

The SAR route

  1. Increment the Semantic Version in the connector's YAML file (e.g. here)
  2. Run the publish script as described above.
  3. Navigate to the SAR console (you can click on the link that the script outputs).
  4. Click on Available Applications on the left-hand side.
  5. Click on the Private Applications tab.
  6. Check the box "Show apps that create custom IAM roles or resource policies".
  7. Click on AthenaDynamoDBConnector.
  8. Fill out the Application settings form (use the same SpillBucket and AthenaCatalogName) and click Deploy.
  9. The function will now replace the existing function and updated the stack as needed, but you may have to manually refresh your table list in the Athena UI to reflect changes. Reconnecting the connector is not needed.

The Lambda route

  1. Build the connector (and the SDK first if you haven't): mvn clean install
  2. Update the Lambda jar directly: aws lambda update-function-code --function-name <Lambda name> --zip-file fileb://target/athena-<connector>-1.0.jar (replace <Lambda name> with the name of the running Lambda and <connector> with the specific connector, e.g. fileb://target/athena-dynamodb-1.0.jar)
Clone this wiki locally