Skip to content

yesidevelop/iceberg-athena-aws

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Apache Iceberg on AWS

Table of contents

What's included

The repo is to supplement the youtube video on Iceberg in AWS.

Set up

  1. Run the cloud formation template. This will create;
  • The S3 bucket
  • Glue database
  • Athena work group
  1. Upload the data from the data folder

Main Tutorial

  1. Upload the CSV data from ./data/csv

  2. Create the CSV table using athena

    CREATE EXTERNAL TABLE
      iceberg_tutorial_db.customer_csv
      (
            index_field bigint,
            customer_id string,
            first_name string,
            last_name string,
            company string,
            city string,
            country string,
            phone_1 string,
            phone_2 string,
            email string,
            subscription_date timestamp,
            website string
      )
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      STORED AS TEXTFILE
      LOCATION 's3://customer-data-iceberg/csv/';
    
  3. Create an Apache Iceberg table

    CREATE TABLE
      iceberg_tutorial_db.customer_iceberg 
      (
            index_field bigint,
            customer_id string,
            first_name string,
            last_name string,
            company string,
            city string,
            country string,
            phone_1 string,
            phone_2 string,
            email string,
            subscription_date timestamp,
            website string
      )
      PARTITIONED BY (day(subscription_date))
      LOCATION 's3://customer-data-iceberg/customer_iceberg/'
      TBLPROPERTIES ( 'table_type' ='ICEBERG'  );
    
  4. Run a query to look at the Day partition to see how Iceberg works

    SELECT * FROM customer_iceberg WHERE day(subscription_date) =  24 limit 20;
    
  5. Update some data to see the change take place

    UPDATE customer_iceberg SET company = 'Company2022' WHERE year(subscription_date) =2022;
    
  6. Select the updated data

    SELECT * FROM customer_iceberg WHERE year(subscription_date) =2022 limit 10;
    
  7. Time travel query

    New Query with Athena version 3

    SELECT * FROM customer_iceberg FOR TIMESTAMP AS OF (current_timestamp - interval '1' minute)
    WHERE year(subscription_date) =2022 limit 10;
    
  8. Delete from iceberg table

    DELETE FROM customer_iceberg WHERE year(subscription_date) != 2008; 
    

Creators

Johnny Chivers

Useful Links

Enjoy 🤘

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published