Overview
Usage
TODO
MVCC in entangleDB
SQL Query Execution in entangleDB
entangleDB Raft Consensus Engine
What I am trying to build
Proposed Architecture
SQL Engine
Raft Engine
Storage Engine
entangleDB Peers
Example SQL Queries that you will be able to execute in entangleDB
Learning Resources I've been using for building the database

Overview

I'm working on creating entangleDB, a project that's all about really getting to know how databases work from the inside out. My aim is to deeply understand everything about databases, from the big picture down to the small details. It's a way for me to build a strong foundation in database.

The name "entangleDB" is special because it's in honor of a friend who loves databases just as much as I do.

The plan is to write the database in Rust. My main goal is to create something that's not only useful for me to learn from but also helpful for others who are interested in diving deep into how databases work. I'm hoping to make it postgresSQL compatible.

Usage

Pre-requisite is to have the Rust compiler; follow this doc to install the Rust compiler

entangledb cluster can be started on localhost ports 3201 to 3205:

(cd husky/cloud && ./build.sh)

Client can be used to connect with the node on localhost port 3205:

cargo run --release --bin entanglesql

Connected to EntangleDB node "5". Enter !help for instructions.
entangledb> SELECT * FROM dishes;
poha
breads
korma

TODO

Make the isolation level configurable; currently, it is set to repeatable read (snapshot).
Implement partitions, both hash and range types.
Utilize generics throughout in Rust, thereby eliminating the need for std::fmt::Display + Send + Sync.
Consider the use of runtime assertions instead of employing Error::Internal ubiquitously.
Revisit the implementation of time-travel queries

MVCC in entangleDB

SQL Query Execution in entangleDB

entangleDB Raft Consensus Engine

What I am trying to build

1. Distributed Consensus Engine

The design for entangleDB centers around a custom-built consensus engine, intended for high availability in distributed settings. This engine will be crucial in maintaining consistent and reliable state management across various nodes.

A key focus will be on linearizable state machine replication, an essential feature for ensuring data consistency across all nodes, especially for applications that require strong consistency.

2. Transaction Engine

The proposed transaction engine for entangleDB is committed to adhering to ACID properties, ensuring reliability and integrity in every transaction.

The plan includes the implementation of Snapshot Isolation and Serializable Isolation, with the aim of optimizing transaction handling for enhanced concurrency and data integrity.

3. Storage Engine

The planned storage engine for entangleDB will explore a variety of storage formats to find and utilize the most efficient methods for data storage and retrieval.

The storage layer is being designed for flexibility, to support a range of backend technologies and meet diverse storage requirements.

4. Query Engine

The development of the query engine will focus on rapid and effective query processing, utilizing advanced optimization algorithms.

A distinctive feature of entangleDB will be its ability to handle time-travel queries, allowing users to access and analyze data from different historical states.

5. SQL Interface and PostgreSQL Compatibility

The SQL interface for entangleDB is intended to support a wide array of SQL functionalities, including complex queries, joins, aggregates, and window functions.

Compatibility with PostgreSQL’s wire protocol is a goal, to facilitate smooth integration with existing PostgreSQL setups and offer a solid alternative for database system upgrades or migrations.

Proposed Architecture

SQL Engine

The SQL Engine is responsible for the intake and processing of SQL queries. It consists of:

SQL Session: The processing pipeline within a session includes:
- Parser: Interprets SQL queries and converts them into a machine-understandable format.
- Planner: Devises an execution plan based on the parsed input.
- Executor: Carries out the plan, accessing and modifying the database.

Adjacent to the session is the:

SQL Storage Raft Backend: This component integrates with the Raft consensus protocol to ensure distributed transactions are consistent and resilient.

Raft Engine

The Raft Engine is crucial for maintaining a consistent state across the distributed system:

Raft Node: This consensus node confirms that all database transactions are in sync across the network.
Raft Log: A record of all transactions agreed upon by the Raft consensus algorithm, which is crucial for data integrity and fault tolerance.

Storage Engine

The Storage Engine is where the actual data is stored and managed:

State Machine Driver: Comprising of:
- State Machine Interface: An intermediary that conveys state changes from the Raft log to the storage layer.
- Key Value Backend: The primary storage layer, consisting of:
  - Bitcask Engine: A simple, fast on-disk storage system for key-value data.
  - MVCC Storage: Handles multiple versions of data for read-write concurrency control.

entangleDB Peers

interaction between multiple database instances or "peers".

Example SQL Queries that you will be able to execute in entangleDB

-- Transaction example with a table creation, data insertion, and selection
BEGIN;

CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR, department VARCHAR);
INSERT INTO employees VALUES (1, 'Alice', 'Engineering'), (2, 'Bob', 'HR');
SELECT * FROM employees;

COMMIT;

-- Aggregation query with JOIN
SELECT department, AVG(salary) FROM employees JOIN salaries ON employees.id = salaries.emp_id GROUP BY department;

-- Time-travel query
SELECT * FROM employees AS OF SYSTEM TIME '-5m';

Learning Resources I've been using for building the database

For a comprehensive list of resources that have been learning what to build in a distributed database, check out the Learning Resources page.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
config		config
husky/cloud		husky/cloud
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
learning_resources.md		learning_resources.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Overview

Usage

TODO

MVCC in entangleDB

SQL Query Execution in entangleDB

entangleDB Raft Consensus Engine

What I am trying to build

1. Distributed Consensus Engine

2. Transaction Engine

3. Storage Engine

4. Query Engine

5. SQL Interface and PostgreSQL Compatibility

Proposed Architecture

SQL Engine

Raft Engine

Storage Engine

entangleDB Peers

Example SQL Queries that you will be able to execute in entangleDB

Learning Resources I've been using for building the database

About

Releases 1

Packages

Contributors 2

Languages

TypicalDefender/entangleDB

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Overview

Usage

TODO

MVCC in entangleDB

SQL Query Execution in entangleDB

entangleDB Raft Consensus Engine

What I am trying to build

1. Distributed Consensus Engine

2. Transaction Engine

3. Storage Engine

4. Query Engine

5. SQL Interface and PostgreSQL Compatibility

Proposed Architecture

SQL Engine

Raft Engine

Storage Engine

entangleDB Peers

Example SQL Queries that you will be able to execute in entangleDB

Learning Resources I've been using for building the database

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages