-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Clock interface and LeakyBucket #675
Conversation
|
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
// THE SOFTWARE. | ||
|
||
// EDIT: slight modification to allow setting rate limit on the fly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a link to the source where this is taken from?
Is the only change to add SetRateLimit
API compared to the original?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did add a test here with a simulated clock, https://github.com/livekit/cloud-protocol/pull/747/files#diff-bf1c81b11717ede4526978ba2d3024bb771a5732c0768b6bafebb86af9f6f516R218-R241
I'll look into adding a unit test here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Source: https://github.com/uber-go/ratelimit/blob/main/limiter_atomic_int64.go. I'll add this in the comment too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is only adding SetRateLimit. Per comments in this PR, the change will evolve to modify the internal logic a bit.
utils/rate.go
Outdated
case timeOfNextPermissionIssue == 0 || (t.maxSlack == 0 && now-timeOfNextPermissionIssue > int64(t.perRequest)): | ||
// if this is our first call or t.maxSlack == 0 we need to shrink issue time to now | ||
newTimeOfNextPermissionIssue = now | ||
case t.maxSlack > 0 && now-timeOfNextPermissionIssue > int64(t.maxSlack)+int64(t.perRequest): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perRequest
is not locked and also not atomic. Are there any down sides to this running and rate limit being set on the fly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is okay. The effect of changing the rate won't apply immediately. I don't see a different behavior that's more appropriate though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I misunderstood the question. It was originally meant to be synchronized externally. @paulwe also raised a similar concern, #675 (comment). At the end, I think I should handle the synchronization here after all to avoid unexpected race errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced the synchronization handling with mutex instead. I put the details in @paulwe comment thread
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
// THE SOFTWARE. | ||
// | ||
// EDIT: slight modification to allow setting rate limit on the fly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add source of this?
also, were you able to augment tests with rate limit change on the fly? I am not reading this as this looks like some existing lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boks1971 added source and added a test for setting rate limit
utils/rate.go
Outdated
} | ||
|
||
func (lb *LeakyBucket) SetRateLimit(rateLimit int) { | ||
lb.perRequest = time.Second / time.Duration(rateLimit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are SetRateLimit
and Take
meant to be externally synchronized? the go race detector will raise errors about this. we don't seem to have a test for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there were meant to be externally synchronized. But, rethinking about this, it would make more sense to handle the synchronization here, especially Take
is already synchronized. I'll work on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent too much time making this work with atomic. In the end, the code became complicated since state
occupied 64 bits. It's possible to cramp the rate limit state to 64 bits, but it has to be u64. Then, handling time without negative numbers became tricky...
In the end, I reverted to the good ol' mutex. For now, we're not going to run into a performance issue since we only use this in QoS from a single goroutine. In the future, we might revisit if there's a need for a large amount of goroutines accessing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulwe added a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! just one comment about sleeping with lock held.
|
||
// If sleepFor is positive, then we should sleep now. | ||
if lb.sleepFor > 0 { | ||
lb.clock.Sleep(lb.sleepFor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you determine that sleeping with lock held is fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is fine, a comment here explaining why it is okay to sleep with lock held.
From experience, three months down the line if somebody is debugging some hairy issue, this (i. e. sleep with lock held) will pop up as somewhat of a mystery and if this is contributing to some performance issue or some bad behaviour or some bug.
You have mentioned BLOCKING
, but an explicit ack that there is a sleep with lock held and it is by design would give more confidence in somebody reading this three months later :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance-wise, I'm not expecting the current implementation to be great, especially with a lot of goroutines contending. In this case, we need to revisit the atomic implementation.
For locking safety, I see the Take
method is a variable time.Sleep
. It blocks the execution until the scheduler decides to wake them up. The mutex here is intended to be a synchronized queue. When the goroutine is sleeping or waiting for the lock, the goroutine should yield to the scheduler, allowing other goroutines that are not waiting on Take
to keep running.
One caveat is that when many goroutines are waiting on the mutex, we give up the entry control to the Go scheduler. Currently, there are 2 scheduling modes: Normal (FIFO) and Starvation (1ms upper bound). Starvation mode is triggered when there's a goroutine waiting for a long time. In the normal case, I think FIFO should be reasonably fair for this problem, every goroutines should wait in-line to access a single resource. https://go-review.googlesource.com/c/go/+/34310/8/src/sync/mutex.go
Perhaps, this can cause a subtle unexpected behavior you're referring to. An alternative, we can get rid of the lock and let the synchronization to be owned by the LeakyBucket owner. The owner can decide how to schedule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synced offline. For now, this is good enough for performance and safety. We will revisit this if needed.
Modified version of https://github.com/uber-go/ratelimit/tree/main.
Main changes: