Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Clock interface and LeakyBucket #675

Merged
merged 14 commits into from
Apr 9, 2024
Merged

add Clock interface and LeakyBucket #675

merged 14 commits into from
Apr 9, 2024

Conversation

lherman-cs
Copy link
Contributor

@lherman-cs lherman-cs commented Apr 5, 2024

Modified version of https://github.com/uber-go/ratelimit/tree/main.

Main changes:

  • Allow SetRateLimit on the fly
  • Removed other variants and unstable tests
  • Simplified configuration

Copy link

changeset-bot bot commented Apr 5, 2024

⚠️ No Changeset found

Latest commit: 25ad782

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

💥 An error occurred when fetching the changed packages and changesets in this PR
Some errors occurred when validating the changesets config:
The package or glob expression "github.com/livekit/protocol" specified in the `fixed` option does not match any package in the project. You may have misspelled the package name or provided an invalid glob expression. Note that glob expressions must be defined according to https://www.npmjs.com/package/micromatch.

utils/rate.go Outdated Show resolved Hide resolved
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.

// EDIT: slight modification to allow setting rate limit on the fly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a link to the source where this is taken from?

Is the only change to add SetRateLimit API compared to the original?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did add a test here with a simulated clock, https://github.com/livekit/cloud-protocol/pull/747/files#diff-bf1c81b11717ede4526978ba2d3024bb771a5732c0768b6bafebb86af9f6f516R218-R241

I'll look into adding a unit test here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is only adding SetRateLimit. Per comments in this PR, the change will evolve to modify the internal logic a bit.

utils/rate.go Outdated Show resolved Hide resolved
utils/rate.go Outdated
case timeOfNextPermissionIssue == 0 || (t.maxSlack == 0 && now-timeOfNextPermissionIssue > int64(t.perRequest)):
// if this is our first call or t.maxSlack == 0 we need to shrink issue time to now
newTimeOfNextPermissionIssue = now
case t.maxSlack > 0 && now-timeOfNextPermissionIssue > int64(t.maxSlack)+int64(t.perRequest):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perRequest is not locked and also not atomic. Are there any down sides to this running and rate limit being set on the fly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay. The effect of changing the rate won't apply immediately. I don't see a different behavior that's more appropriate though.

Copy link
Contributor Author

@lherman-cs lherman-cs Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood the question. It was originally meant to be synchronized externally. @paulwe also raised a similar concern, #675 (comment). At the end, I think I should handle the synchronization here after all to avoid unexpected race errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the synchronization handling with mutex instead. I put the details in @paulwe comment thread

utils/rate.go Outdated Show resolved Hide resolved
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
//
// EDIT: slight modification to allow setting rate limit on the fly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add source of this?

also, were you able to augment tests with rate limit change on the fly? I am not reading this as this looks like some existing lib.

Copy link
Contributor Author

@lherman-cs lherman-cs Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boks1971 added source and added a test for setting rate limit

utils/clock.go Outdated Show resolved Hide resolved
utils/rate.go Outdated Show resolved Hide resolved
utils/rate.go Outdated
}

func (lb *LeakyBucket) SetRateLimit(rateLimit int) {
lb.perRequest = time.Second / time.Duration(rateLimit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are SetRateLimit and Take meant to be externally synchronized? the go race detector will raise errors about this. we don't seem to have a test for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there were meant to be externally synchronized. But, rethinking about this, it would make more sense to handle the synchronization here, especially Take is already synchronized. I'll work on this.

Copy link
Contributor Author

@lherman-cs lherman-cs Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spent too much time making this work with atomic. In the end, the code became complicated since state occupied 64 bits. It's possible to cramp the rate limit state to 64 bits, but it has to be u64. Then, handling time without negative numbers became tricky...

In the end, I reverted to the good ol' mutex. For now, we're not going to run into a performance issue since we only use this in QoS from a single goroutine. In the future, we might revisit if there's a need for a large amount of goroutines accessing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulwe added a test

Copy link
Contributor

@boks1971 boks1971 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! just one comment about sleeping with lock held.


// If sleepFor is positive, then we should sleep now.
if lb.sleepFor > 0 {
lb.clock.Sleep(lb.sleepFor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you determine that sleeping with lock held is fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is fine, a comment here explaining why it is okay to sleep with lock held.

From experience, three months down the line if somebody is debugging some hairy issue, this (i. e. sleep with lock held) will pop up as somewhat of a mystery and if this is contributing to some performance issue or some bad behaviour or some bug.

You have mentioned BLOCKING, but an explicit ack that there is a sleep with lock held and it is by design would give more confidence in somebody reading this three months later :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance-wise, I'm not expecting the current implementation to be great, especially with a lot of goroutines contending. In this case, we need to revisit the atomic implementation.

For locking safety, I see the Take method is a variable time.Sleep. It blocks the execution until the scheduler decides to wake them up. The mutex here is intended to be a synchronized queue. When the goroutine is sleeping or waiting for the lock, the goroutine should yield to the scheduler, allowing other goroutines that are not waiting on Take to keep running.

One caveat is that when many goroutines are waiting on the mutex, we give up the entry control to the Go scheduler. Currently, there are 2 scheduling modes: Normal (FIFO) and Starvation (1ms upper bound). Starvation mode is triggered when there's a goroutine waiting for a long time. In the normal case, I think FIFO should be reasonably fair for this problem, every goroutines should wait in-line to access a single resource. https://go-review.googlesource.com/c/go/+/34310/8/src/sync/mutex.go

Perhaps, this can cause a subtle unexpected behavior you're referring to. An alternative, we can get rid of the lock and let the synchronization to be owned by the LeakyBucket owner. The owner can decide how to schedule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced offline. For now, this is good enough for performance and safety. We will revisit this if needed.

@lherman-cs lherman-cs merged commit c69c1b0 into main Apr 9, 2024
3 checks passed
@lherman-cs lherman-cs deleted the lukas/leaky-bucket branch April 9, 2024 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants