-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc][4.0] Add doc for the coordinated commits writer feature #3261
Open
dhruvarya-db
wants to merge
5
commits into
delta-io:branch-4.0-preview1
Choose a base branch
from
dhruvarya-db:coordinatedcommits-doc-4.0
base: branch-4.0-preview1
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 4 commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
--- | ||
description: Learn about Delta Coordinated Commits. | ||
--- | ||
|
||
# Delta Coordinated Commits | ||
|
||
.. warning:: This feature is available in preview in <Delta> 4.0-preview. Since this feature is still in preview, it can undergo major changes. Furthermore, a table with this preview feature enabled cannot be written to by future Delta releases until the feature is manually removed from the table. | ||
|
||
[Coordinated Commits](https://github.com/delta-io/delta/issues/2598) is a new commit protocol which makes the commit process more flexible and pluggable by delegating control of the commit to an external commit coordinator. Each coordinated commits table has a designated "commit coordinator" and all the commits to the table must go via it. | ||
|
||
|
||
# DynamoDB Commit Coordinator | ||
|
||
<Delta> 4.0-preview also introduces a DynamoDB backed Commit Coordinator implementation. | ||
|
||
## Quickstart Guide | ||
|
||
### 1. Create the DynamoDB table | ||
The DynamoDB Commit Coordinator requires a backend DynamoDB table to coordinate commits. One DynamoDB table can be used to manage multiple Delta tables. You have the choice of creating the DynamoDB table yourself (recommended) or having it created for you automatically. | ||
|
||
- Creating the DynamoDB table yourself | ||
|
||
This DynamoDB table will maintain commit metadata for multiple Delta tables, and it is important that it is configured with the [Read/Write Capacity Mode](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html) (for example, on-demand or provisioned) that is right for your use cases. As such, we strongly recommend that you create your DynamoDB table yourself. The following example uses the AWS CLI. To learn more, see the [create-table](https://docs.aws.amazon.com/cli/latest/reference/dynamodb/create-table.html) command reference. | ||
|
||
```bash | ||
aws dynamodb create-table \ | ||
--region us-west-2 \ | ||
--table-name dynamodb_delta_commit_coordinator \ | ||
--attribute-definitions AttributeName=tableId,AttributeType=S | ||
--key-schema AttributeName=tableId,KeyType=HASH | ||
--billing-mode PAY_PER_REQUEST | ||
``` | ||
|
||
- Automatic DynamoDB table creation | ||
|
||
When you create a DynamoDB Coordinated Table, the commit coordinator will try to create a backing DynamoDB table if it does not exist. This default table supports 5 strongly consistent reads and 5 writes per second. You may change these default values using the table-creation-only configurations keys `spark.databricks.delta.coordinatedCommits.commitCoordinator.dynamodb.writeCapacityUnits` and `spark.databricks.delta.coordinatedCommits.commitCoordinator.dynamodb.readCapacityUnits`. | ||
|
||
### 2. Launch a spark shell with the right packages: | ||
|
||
```bash | ||
bin/spark-shell \ | ||
--packages io.delta:delta-spark_2.13:4.0.0,org.apache.hadoop:hadoop-aws:3.4.0,com.amazonaws:aws-java-sdk-bundle:1.12.262 \ | ||
--repositories https://oss.sonatype.org/content/repositories/iodelta-1149 \ | ||
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \ | ||
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ | ||
--conf spark.databricks.delta.coordinatedCommits.commitCoordinator.ddb.awsCredentialsProviderName=<credentialsProviderName> | ||
``` | ||
|
||
`<credentialsProviderName>` must be the fully qualified class name of the [AWS Credentials Provider](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/AWSCredentialsProvider.html) (e.g. `com.amazonaws.auth.profile.ProfileCredentialsProvider`) which must be used for authenticating with the desired DynamoDB instance. The exact delta-spark package name and repository can vary depending on the release that you are trying to use. | ||
|
||
### 3. Create a table with DynamoDB as the commit coordinator by running the following command: | ||
|
||
```sql | ||
CREATE TABLE <table_name> (id string) USING DELTA | ||
TBLPROPERTIES ('delta.coordinatedCommits.commitCoordinator-preview' = 'dynamodb', 'delta.coordinatedCommits.commitCoordinatorConf-preview' = '{\"dynamoDBTableName\": \"<dynamodb_table_name>\",\"dynamoDBEndpoint\": \"<dynamodb_region_endpoint>\"}'); | ||
``` | ||
|
||
Note that `coordinatedCommits.commitCoordinatorConf-preview` is a serialized JSON with two top-level properties: | ||
1. `dynamoDBTableName`: This is the name of the table which will be used by the commit coordinator client to store information about the table that it is managing. This is the same table that was created in step 1. | ||
2. `dynamoDBEndpoint`: This must specify the fully-qualified url endpoint (e.g. `https://dynamodb.us-west-2.amazonaws.com`) of the DynamoDB instance. The full list of endpoints can be found [here](https://docs.aws.amazon.com/general/latest/gr/ddb.html). | ||
|
||
Any future commit to this table will now by coordinated by the DynamoDB Commit Coordinator. You can also convert any existing non-coordinated commit table to coordinated commits by running: | ||
|
||
```sql | ||
ALTER TABLE <table_name> | ||
SET TBLPROPERTIES ('delta.coordinatedCommits.commitCoordinator-preview' = 'dynamodb', 'delta.coordinatedCommits.commitCoordinatorConf-preview' = '{\"dynamoDBTableName\": \"<dynamodb_table_name>\",\"dynamoDBEndpoint\": \"<dynamodb_region_endpoint>\"}'); | ||
``` | ||
|
||
.. warning:: The commit that converts a table to a coordinated commit table goes through the configured `LogStore` directly. This means the multi-cluster write restrictions imposed by the configured LogStore implementation still apply. To avoid corruption in filesystems where concurrent commits are not safe, no concurrent commits must be performed when the conversion to coordinated commits happens. | ||
|
||
.. note:: Instead of specifying the table properties for each table creation, you can set them as default table properties to be used for every new table via Spark configurations. To do this, you can set the spark properties `spark.databricks.delta.properties.defaults.coordinatedCommits.commitCoordinator-preview` and `spark.databricks.delta.properties.defaults.coordinatedCommits.commitCoordinatorConf-preview`. | ||
|
||
|
||
## Removing the Coordinated Commits Feature | ||
|
||
The feature can be removed from a Delta table by using the `DROP FEATURE` command: | ||
|
||
```sql | ||
ALTER TABLE <table-name> DROP FEATURE 'coordinatedCommits-preview' [TRUNCATE HISTORY] | ||
``` | ||
|
||
.. include:: /shared/replacements.md | ||
|
||
## Compatibility | ||
|
||
Coordinated Commits is a writer table feature, so only clients that recognize the feature can write to these tables. | ||
Older clients which do not understand this table feature can still read a coordinated commits table. However, the read may give stale results depending on table's [commit coordinator backfill policy](https://github.com/delta-io/delta/blob/branch-4.0-preview1/protocol_rfcs/coordinated-commits.md#commit-backfills). Note that the DynamoDB Commit Coordinator tries to backfill all commits immediately. | ||
|
||
|
||
## Dependencies | ||
|
||
The Coordinated Commits feature depends on two other features to function correctly --- [In Commit Timestamps](https://github.com/delta-io/delta/issues/2532) and [Vacuum Protocol Check](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#vacuum-protocol-check). These features will be enabled automatically (if not already enabled) when Coordinated Commits is activated. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file name is "delta-coordinated-commits.md" and not "delta-coordinated-commit.md" ?