-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agate: refactor for memtable impl #37
Conversation
Signed-off-by: Alex Chi <[email protected]>
Signed-off-by: Alex Chi <[email protected]>
Signed-off-by: Alex Chi <[email protected]>
Signed-off-by: Alex Chi <[email protected]>
Signed-off-by: Alex Chi <[email protected]>
Signed-off-by: Alex Chi <[email protected]>
@BusyJay @Connor1996 This PR only contains some prototypes and is relatively easy to review, and yet is critical for implementing new functionalities. I suggest taking a look at this patch first to help me get this project move on. Thanks! Currently, we have many PRs waiting for reviewing. Some of them are small patches and add small features, and some of them are about implementing new structures to make this project move on and complete. I have added priority labels for those PRs that is on critical path and may block me from moving ahead. For other PRs, I think it's okay to review them later. |
Signed-off-by: Alex Chi <[email protected]>
… into refactor-for-memtable
} | ||
|
||
impl MemTable { | ||
pub fn new(skl: Skiplist<Comparator>, wal: Option<Wal>, opt: AgateOptions) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how it is used. But I think skl
and wal
should be created in MemTable
instead of passing from outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WAL name is decided and generated by database instance, so it is passed from outside. And for skiplist, I think we could move it inside memtable.
immutable: VecDeque::with_capacity(max_count - 1), | ||
} | ||
impl MemTables { | ||
pub(crate) fn new(mutable: MemTable, immutable: VecDeque<MemTable>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the previous with_capacity
instead of new with mutable
and immutable
from outside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "open memtable" logic is implemented in db.rs
, the database engine. Therefore, it's necessary to provide an interface to construct Memtables
from a single Memtable
.
Signed-off-by: Alex Chi <[email protected]>
f13a71d
to
b621563
Compare
pub use skiplist::{FixedLengthSuffixComparator, KeyComparator}; | ||
use std::{cmp, ptr}; | ||
|
||
pub static COMPARATOR: FixedLengthSuffixComparator = FixedLengthSuffixComparator::new(8); | ||
pub static COMPARATOR: FixedLengthSuffixComparator = make_comparator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between Comparator::new(8)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No difference, just to make the interface unified. What about reverting this change?
src/util.rs
Outdated
@@ -56,3 +62,15 @@ where | |||
} | |||
i | |||
} | |||
|
|||
/// Fill a file with 0. | |||
pub fn fill_file(file: &mut impl std::io::Write, mut size: u64) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just file.set_len(size)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has been removed for now. We will directly call set_len
on File
, instead of creating this new function to handle that.
src/db.rs
Outdated
|
||
pub struct Core { | ||
wal: Wal, | ||
memtable: MemTable, | ||
mt: RwLock<MemTables>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about move the lock into MemTables
? And Mutex
is probably a better choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This RwLock
guards list of MemTables
(e.g. immutable list). Using RwLock
ensures that we could parallel fetch data from memtables, while blocking reads only when we're flushing mutable table to immutable one.
If we're using Mutex, we may have to hide all memtables behind an Current develop branch already uses Arc
to reduce lock contention when parallel read data. Is it right?Arc<Memtable>
, we do not need to worry about this point.
And for moving that lock into MemTables
, I think that is OK. Meanwhile, I would also like to know, is it our design principle to hide locks in structs, and make all functions callable with immutable reference in structs like MemTable
, Agate
, LevelController
, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're using Mutex, we may have to hide all memtables behind an Arc to reduce lock contention when parallel read data. Is it right?
Why? I think the time it consumes is just creating MemTableView
, which should be very quick. Mutex
usually has a better performance than RwLock
, you can bench and see if it's an improvement.
All functions of Agate
should should take immutable reference and Agate
should be Sync + Send
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks for pointing that out. I'll use Mutex
for MemTables
.
Signed-off-by: Alex Chi <[email protected]>
Signed-off-by: Alex Chi <[email protected]>
… into refactor-for-memtable
@Connor1996 PTAL, thanks! |
pub(crate) version: u64, | ||
} | ||
|
||
pub struct EntryRef<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use &Entry
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entry
uses Bytes
internally. But sometimes we could only get &[u8]
(if not copy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, https://github.com/tikv/agatedb/blob/develop/src/value.rs#L185
The key and value is a Vec<u8>
(or you may think of it as BytesMut
), which will only be valid for one iteration. This way, we let our user to decide whether to copy it into a Bytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
|
||
pub create_if_not_exists: bool, | ||
pub num_memtables: usize, | ||
pub mem_table_size: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using the same name as badger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added all supported options in develop branch. Now they have the same naming convention as badger.
Signed-off-by: Alex Chi <[email protected]>
Signed-off-by: Alex Chi <[email protected]>
Signed-off-by: Alex Chi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pub base_table_size: u64, | ||
pub base_level_size: u64, | ||
pub level_size_multiplier: usize, | ||
pub table_size_multiplier: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is table_size_multiplier
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option is defined at https://github.com/dgraph-io/badger/blob/master/options.go#L58 , which seems to be used when calculating target file size of a LSM level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out I'm referencing an outdated version. 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have forked badger@45bca18 at https://github.com/skyzh/badger . Seems that badger is under constant change, with new things added and some logic changed. We could always use my fork as reference implementation.
src/memtable.rs
Outdated
pub struct MemTable { | ||
pub(crate) skl: Skiplist<Comparator>, | ||
opt: AgateOptions, | ||
core: RwLock<MemTableCore>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use RwLock
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This RwLock
protects Wal
. It seems that when database is running, we only write to Wal
, and we don't need to read from it when reading from memtable. Perhaps Mutex
is a better choice, I'll change that.
src/wal.rs
Outdated
@@ -59,7 +69,7 @@ impl Header { | |||
} | |||
|
|||
/// Decode header from bytes | |||
pub fn decode(&mut self, bytes: &mut Bytes) -> Result<usize> { | |||
pub fn decode(&mut self, bytes: &[u8]) -> Result<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should break implementation of iterator, which depends on decode
mutating Bytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could always convert &Bytes
to &[u8]
without overhead. And when we're decoding header from EntryRef
, we had to use &[u8]
. So I think this change is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not objecting the changes. I'm suggesting the change introduces a bug at
Lines 123 to 139 in afb59d2
let mut entry_data = self.data.slice(start_offset as usize..end_offset as usize); | |
let mut header = Header::default(); | |
header.decode(&mut entry_data); | |
// TODO: merge this truncate with the following key truncate | |
if header.overlap > self.perv_overlap { | |
self.key.truncate(self.perv_overlap as usize); | |
self.key.extend_from_slice( | |
&self.base_key[self.perv_overlap as usize..header.overlap as usize], | |
); | |
} | |
self.perv_overlap = header.overlap; | |
let diff_key = &entry_data[..header.diff as usize]; | |
self.key.truncate(header.overlap as usize); | |
self.key.extend_from_slice(diff_key); | |
self.val = entry_data.slice(header.diff as usize..); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not the same header. I think this naming might be confusing. This is the header for WAL, and the one in iterator is the SST header. I think we could later unify these interfaces :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using TableHeader
for SST, and WalHeader
for write-ahead log / vlog? Meanwhile, we could change all decode
to use &[u8]
as parameter to reduce the need of using slice
on Bytes
(which introduces overhead of increasing the reference counter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but better another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I've reverted this change.
Signed-off-by: Alex Chi <[email protected]>
Signed-off-by: Alex Chi <[email protected]>
This PR adds prototypes of memtable / wal implementation. At the same time,
MemTable
is refactored to use one Wal for each memtable.