Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.

Storage engines #1

Open
pkieltyka opened this issue Oct 12, 2016 · 13 comments
Open

Storage engines #1

pkieltyka opened this issue Oct 12, 2016 · 13 comments

Comments

@pkieltyka
Copy link

Awesome project! any thought to having a pluggable storage engine? where default is memory. Having a boltdb option would be very nice.

@tidwall
Copy link
Owner

tidwall commented Oct 12, 2016

It's certainly possible. BoltDB would be a great alternative.

Right now SummitDB uses BuntDB as the database library. It's similar to Bolt but has secondary indexing and spatial indexing built in. Unfortunately Bolt does not have indexing at this time. So there would need to be some extra work around creating indexes using buckets. I'm assuming that Bolt and R-trees are not possible nor on the roadmap. But Summit could reimplement an R-tree structure specifically for Bolt.

Perhaps one option is to use BoltDB for the key space and BuntDB for the indexing.

Bolt and indexes
R-tree library
BuntDB

@pkieltyka
Copy link
Author

@tidwall also, its confusing in the description that its an in-memory NoSQL database, yet the default is to have data persistence? I am happy that it has data persistence, but the description threw me off at first read

@tidwall
Copy link
Owner

tidwall commented Oct 12, 2016

Thanks for the feedback. I see how that might be confusing. Perhaps I should rephrase that to "In-memory database with disk persistence" or something along that line.

@pkieltyka
Copy link
Author

@tidwall what does the in-memory part have to do with it..? that the working set is in memory? the entire db is in memory? or which..?

@tidwall
Copy link
Owner

tidwall commented Oct 12, 2016

The database is entirely in memory, it's the working dataset. Each writable command is appended to a file that is used to rebuild the database if the database needs to be restarted.

This is similar to Redis AOF persistence.

@pkieltyka
Copy link
Author

pkieltyka commented Oct 12, 2016

@tidwall I see, if summitdb requires that the entire data set fit in memory, and the data in memory is the core working set, then I agree it is an in-memory database with disk persistence options. However then, I'd wonder why would someone choose this over Redis? just cuz of raft clustering with strong consistency? I personally feel a gap in database products is something like Redis (compatible) that supports data persistence engines for data sets that can grow to 100GB+, perhaps thats a different product like ledisdb or ssdb. The raft angle is pretty cool of course though.

@tidwall
Copy link
Owner

tidwall commented Oct 12, 2016

Under the hood SummitDB is quite a bit different from Redis. SummitDB is more suited as a NoSQL data store. In a way it's is more like a MongoDB.

I just open sourced the project yesterday, so it's too early to say if anyone will use it over Redis (that's not really my goal). The best I can tell you is why I wrote it and why I'm going to use it:

  • Raft. Strong consistency is something lacking from the Redis Master/Follower model right now. If Redis had this today, I probably would not have created summitdb.
  • Ordered key space. Getting a single key in Redis is super fast, but iteration, paging, and sorting on many keys can be a somewhat more challenging, sometimes requires multiple steps.
  • Secondary indexing. There's ways to kinda do indexing in Redis, but it requires using combinations of sorted sets mixed with other data types.
  • Spatial indexing. A built in R-tree structure can be super versatile and allow for multi-dimensional data like geospatial and statistics. This is not available in Redis.

I've been using Redis for years as a general purpose data store. Sometimes in combination with MongoDB (and recently Tile38 for geospatial). My desire is to merge what I see as common overlaps into a single platform.

I know that being an in-memory will not suit all people and I'm OK with that. I'm hopeful that those who use Redis as a primary data store might find value in a tool like SummitDB.

All that being said, I do like the idea of trying BoltDB as a disk-based storage option in the future.

@railsmechanic
Copy link

I have to agree @pkieltyka. I'm searching for a solution like Redis but with a disk-based storage, which supports data sets larger than the available RAM and which supports clustering without configuring complicated thirdparty proxies etc. So I absolutely like the idea of an additional disk-based storage engine for summitdb. This IMO fills the gap between Redis which is memory only and a full-fledged document database.

@tidwall many thanks for your work

@tidwall
Copy link
Owner

tidwall commented Oct 15, 2016

@railsmechanic Good feedback. While disk-based storage is not what I personally desire for my applications, there is a clear interest in the community that can't be ignored. Perhaps it may be as easy as dropping in BoltDB or perhaps the current BuntDB implementation can be modified to support offloading to disk. I plan on researching this topic further.

@pedigree
Copy link

I could see myself using it in tandem with redis. If it supported replication then I could retire the fork I have of redis, redis-interval-sets, and look at using the spacial index for IP to block mapping. I like it and will playing with it this weekend to do IP-to-subnet and IP-to-ASN mapping with R-trees

@tidwall
Copy link
Owner

tidwall commented Oct 16, 2016

Hi @pedigree. SummitDB currently supports State Machine Replication using the Raft Consensus Algorithm instead of Redis-style replication. I hope summitdb helps with your solution and thanks for your interest in the project.

@pedigree
Copy link

I have several read only geographic replicas configured for API nodes and they run a local copy of the redis database in order provide local access instead of HA. I love the project :)

tidwall pushed a commit that referenced this issue Feb 25, 2017
typo "fync" to "fsync" in readme
@jjzazuet
Copy link

jjzazuet commented Jan 9, 2020

Hi. Just following up on this issue in 2020. Having a disk backed MongoDB alternative would be awesome. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants