Name	Name	Last commit message	Last commit date
Latest commit History 18 Commits
.circleci	.circleci
build	build
project	project
src	src
.gitattributes	.gitattributes
.gitignore	.gitignore
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE.txt	LICENSE.txt
NOTICE.txt	NOTICE.txt
README.md	README.md
build.sbt	build.sbt
version.sbt	version.sbt

TODO

Complete the intro
Add circle ci badge
Update Linking section
Add time travel code snippets
Verify all the links everywhere

Delta Lake Core

Delta Lake Core is .... (copy text from delta docs)

See the Delta Lake Documentation for more details.

Usage Guide

Linking

To use Delta Lake, you will have to create a new project with Maven Delta Lake's Maven coordinates.

groupId: com.databricks.delta
artifactId: delta-core
version: 0.1.0

Reading and Write to Delta Lake tables from using Apache Spark

You can write a to Delta Lake tables using standard Apache Spark DataFrame APIs.

Scala API

import org.apache.spark.sql.SparkSession

val spark: SparkSession = ...  // create SparkSession

// Writing to a Delta Lake table from a batch job
dataframe.write
  .format("delta")
  .mode("overwrite")
  .save("pathToDeltaTable")

// Reading from a Delta Lake table in a batch job 
val dataframe = spark.read
  .format("delta")
  .load("pathToDeltaTable")

// Writing to a Delta lake table from a streaming job
streamingDataFrame.writeStream
  .format("delta")
  .start("pathToDeltaTable")
   
// Writing to a Delta lake table from a streaming job
val streamingDataFrame = spark.readStream
  .format("delta")
  .load("pathToDeltaTable")

Java API

import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;

SparkSession spark = ...   // create SparkSession

// Writing to a Delta Lake table from a batch job
dataframe.write()
  .format("delta")
  .mode("overwrite")
  .save("pathToDeltaTable");

// Reading from a Delta Lake table in a batch job 
 Dataset<Row> dataframe = spark.read()
  .format("delta")
  .load("pathToDeltaTable");

// Writing to a Delta lake table from a streaming job
streamingDataFrame.writeStream()
  .format("delta")
  .start("pathToDeltaTable");
   
// Writing to a Delta lake table from a streaming job
Dataset<Row>  streamingDataFrame = spark.readStream()
  .format("delta")
  .load("pathToDeltaTable");

Python API

from pyspark.sql import SparkSession

spark = ... # create SparkSession

# Writing to a Delta Lake table from a batch job
dataframe.write \
  .format("delta") \
  .mode("overwrite") \
  .save("pathToDeltaTable")

# Reading from a Delta Lake table in a batch job 
dataframe = spark.read \
  .format("delta") \
  .load("pathToDeltaTable") \

# Writing to a Delta lake table from a streaming job
streamingDataFrame.writeStream  \
  .format("delta") \
  .start("pathToDeltaTable")
   
# Writing to a Delta lake table from a streaming job
streamingDataFrame = spark.readStream \
  .format("delta") \
  .load("pathToDeltaTable")

Compatibility

This section states the compatibilty guarantees provided by the current version of Delta Lake.

Compatibility with Spark Versions

Current version of Delta Lake depends on SNAPSHOT build of Apache Spark 2.4 (nightly snapshot after XXXX). This is because Delta Lake requires the changes made by SPARK-27453 for table partitioning to work and these changes are not yet available in an official Apache release. There will be new release of Delta Lake that depends on an official Spark version as soon as there is an Apache Spark release.

Compatibility with storage systems

Delta Lake stores the transaction log of a table in the same storage system as the table. Hence, Delta Lake's ACID guarantees are predicated on the atomicity and durability guarantees of the storage system. Specifically, it requires the storage system to provide the following.

Atomic visibility of files: There must a way for a file to visible in its entirely or not visible at all.
Consistent listing: Once a file has been written in a directory, all future listings for that directory must return that file.

Open source Delta Lake currently supports all these guarantees only on HDFS. It is possible to make it work with other storage systems by plugging in custom implementations of the LogStore API. However, Managed Delta Lake support AWS S3 and Azure Blob Stores.

API Compatibility

Delta Lake guarantees compatibility with the Apache Spark DataFrameReader/Writer APIs, that is, df.read, df.write, df.readStream and df.writeStream. The interfaces of the implementation are currently considered internal and may change across minor versions.

Data Compatibility

Delta Lake guarantees backward compatibility for all Delta Lake tables, that is, newer versions of Delta Lake will always be able to read tables written by older versions of Delta Lake. However, we reserve the right to break future compatibility, that is, older versions of Delta Lake may not be able to read tables written by newer version of Delta Lake. This is because we may add features that are usable only in newer versions. Users attempting to read new version tables from old version of Delta Lake should get a clear error message when forward compatibility is broken.

Building

Delta Lake Core is compiled using SBT.

To compile, run

bin/sbt compile

To generate artifacts, run

bin/sbt package

To execute tests, run

bin/sbt test

Refer to SBT docs for more commands.

Reporting issues

We use Github Issues to track community reported issues. You can also contact the community for getting answers.

Contributing

We happily welcome contributions to Delta Lake. We use Github Pull Requests for accepting changes.

Community

There are two mediums of communication withing the Delta Lake community.

Public Slack Channel
- Register here
- Login here
Public Mailing list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TODO

Delta Lake Core

Usage Guide

Linking

Reading and Write to Delta Lake tables from using Apache Spark

Scala API

Java API

Python API

Compatibility

Compatibility with Spark Versions

Compatibility with storage systems

API Compatibility

Data Compatibility

Building

Reporting issues

Contributing

Community

About

Releases

Packages

Languages

License

VictorZan/delta

Folders and files

Latest commit

History

Repository files navigation

TODO

Delta Lake Core

Usage Guide

Linking

Reading and Write to Delta Lake tables from using Apache Spark

Scala API

Java API

Python API

Compatibility

Compatibility with Spark Versions

Compatibility with storage systems

API Compatibility

Data Compatibility

Building

Reporting issues

Contributing

Community

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages