Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework on KAFKA-3968: fsync the parent directory of a segment file when the file is created #10680

Merged
merged 1 commit into from
May 14, 2021

Conversation

ccding
Copy link
Contributor

@ccding ccding commented May 12, 2021

(reverted #10405). #10405 has several issues, for example:

  • It fails to create a topic with 9000 partitions.
  • It flushes in several unnecessary places.
  • If multiple segments of the same partition are flushed at roughly the same time, we may end up doing multiple unnecessary flushes: the logic of handling the flush in LogSegments.scala is weird.

Kafka does not call fsync() on the directory when a new log segment is created and flushed to disk.

The problem is that following sequence of calls doesn't guarantee file durability:

fd = open("log", O_RDWR | O_CREATE); // suppose open creates "log"
write(fd);
fsync(fd);

If the system crashes after fsync() but before the parent directory has been flushed to disk, the log file can disappear.

This PR is to flush the directory when flush() is called for the first time.

Did performance test which shows this PR has a minimal performance impact on Kafka clusters.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@ccding ccding marked this pull request as draft May 12, 2021 20:01
@ccding ccding marked this pull request as ready for review May 13, 2021 18:19
@ccding
Copy link
Contributor Author

ccding commented May 13, 2021

I finished performance tests and this PR has minimal impact on performance. It can successfully create a topic with 9000 partitions. @junrao please take a look

cc @ijuma

@ccding
Copy link
Contributor Author

ccding commented May 13, 2021

The two failed tests are for Scala 2.13, but not Scala 2.12.

Both passed on my local run with

./gradlew -PscalaVersion=2.13 core:test --tests kafka.server.RaftClusterTest.testCreateClusterAndCreateAndManyTopics
./gradlew -PscalaVersion=2.13 core:test --tests kafka.server.RaftClusterTest.testCreateClusterAndCreateListDeleteTopic

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ccding : Thanks for the PR. LGTM

@junrao junrao merged commit db3e5e2 into apache:trunk May 14, 2021
ijuma added a commit to ijuma/kafka that referenced this pull request May 16, 2021
…e-allocations-lz4

* apache-github/trunk: (155 commits)
  KAFKA-12728: Upgrade gradle to 7.0.2 and shadow to 7.0.0 (apache#10606)
  KAFKA-12778: Fix QuorumController request timeouts and electLeaders (apache#10688)
  KAFKA-12754: Improve endOffsets for TaskMetadata (apache#10634)
  Rework on KAFKA-3968: fsync the parent directory of a segment file when the file is created (apache#10680)
  MINOR: set replication.factor to 1 to make StreamsBrokerCompatibilityService work with old broker (apache#10673)
  MINOR: prevent cleanup() from being called while Streams is still shutting down (apache#10666)
  KAFKA-8326: Introduce List Serde (apache#6592)
  KAFKA-12697: Add Global Topic and Partition count metrics to the Quorum Controller (apache#10679)
  KAFKA-12648: MINOR - Add TopologyMetadata.Subtopology class for subtopology metadata (apache#10676)
  MINOR: Update jacoco to 0.8.7 for JDK 16 support (apache#10654)
  MINOR: exclude all `src/generated` and `src/generated-test` (apache#10671)
  KAFKA-12772: Move all transaction state transition rules into their states (apache#10667)
  KAFKA-12758 Added `server-common` module to have server side common classes.  (apache#10638)
  MINOR Removed copying storage libraries specifically as they are already copied. (apache#10647)
  KAFKA-5876: KIP-216 Part 4, Apply InvalidStateStorePartitionException for Interactive Queries (apache#10657)
  KAFKA-12747: Fix flakiness in shouldReturnUUIDsWithStringPrefix (apache#10643)
  MINOR: remove unnecessary placeholder from WorkerSourceTask#recordSent (apache#10659)
  MINOR: Remove unused `scalatest` definition from `dependencies.gradle` (apache#10655)
  MINOR: checkstyle version upgrade: 8.20 -> 8.36.2 (apache#10656)
  KAFKA-12464: minor code cleanup and additional logging in constrained sticky assignment (apache#10645)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants