Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output/cloudv2: Error handling for flush #3082

Merged
merged 12 commits into from
May 29, 2023
Merged

Conversation

codebien
Copy link
Contributor

It is a small refactor for the error handling on the flush operation.

@codebien codebien self-assigned this May 21, 2023
@github-actions github-actions bot requested review from imiric and mstoykov May 21, 2023 19:26
@codecov-commenter
Copy link

codecov-commenter commented May 21, 2023

Codecov Report

Merging #3082 (c1d61df) into master (df6cbce) will increase coverage by 0.17%.
The diff coverage is 87.34%.

❗ Current head c1d61df differs from pull request most recent head 2520bb2. Consider uploading reports for the commit 2520bb2 to get more accurate results

@@            Coverage Diff             @@
##           master    #3082      +/-   ##
==========================================
+ Coverage   73.51%   73.69%   +0.17%     
==========================================
  Files         238      239       +1     
  Lines       18220    18258      +38     
==========================================
+ Hits        13395    13455      +60     
+ Misses       3954     3934      -20     
+ Partials      871      869       -2     
Flag Coverage Δ
ubuntu 73.69% <87.34%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
js/initcontext.go 88.88% <ø> (+8.88%) ⬆️
js/modules/gomodule.go 77.27% <ø> (ø)
js/modules/require_impl.go 76.47% <76.47%> (ø)
output/cloud/expv2/output.go 81.70% <85.10%> (+36.11%) ⬆️
js/bundle.go 85.09% <88.23%> (-0.15%) ⬇️
js/modules/resolution.go 90.32% <90.32%> (ø)
js/modules/cjsmodule.go 78.26% <100.00%> (ø)
js/modulestest/runtime.go 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

This was referenced May 21, 2023
imiric
imiric previously approved these changes May 24, 2023
Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor suggestions.

output/cloud/expv2/output.go Outdated Show resolved Hide resolved
output/cloud/expv2/output.go Outdated Show resolved Hide resolved
output/cloud/expv2/output.go Outdated Show resolved Hide resolved
output/cloud/expv2/output.go Outdated Show resolved Hide resolved
Base automatically changed from ingestion/binary-proto to master May 26, 2023 11:04
The previous version using the PeriodicFlusher helper was not correct.
On Stop the PeriodicFlusher triggers again the callback so before
really stopping we was calling again the flush.

The new architecture does the same as the PeriodicFlusher but it does
not call again on stop. The flush on correct Stop is handled directly
from the stop method.
@codebien codebien requested a review from imiric May 26, 2023 11:07
@codebien codebien requested a review from mstoykov May 26, 2023 15:24
output/cloud/expv2/output.go Outdated Show resolved Hide resolved
Comment on lines +79 to +80
o.periodicInvoke(o.config.MetricPushInterval.TimeDuration(), o.flushMetrics)
o.periodicInvoke(o.config.AggregationPeriod.TimeDuration(), o.collectSamples)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like that this is a method instead of a function. But looking at it - it won't be particularly good if it's too generic and if it's too specific it will likely never be reused.

In both cases it will likely look worse.

I am leaving this mostly for other people if somebody has some ideas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unique alternative I have in mind is to have it directly as a function in the Start method.

func (o Out) Start() {
periodic := func() {
  ...
}
periodic(d, collect)
periodic(d, flush)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just modify PeriodicFlusher to not call the callback if configured so? We're essentially duplicating what it does to avoid that final call.

Otherwise, if we're fine with the duplication, then this being a method doesn't bother me, since it likely won't be needed anywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may evaluate extending the current periodic helper but we need to be sure this is the final architecture. As commented here #3082 (comment), I would like to stabilize the rest then go back here and make some attempts for refactoring it in a new PR.

Comment on lines 205 to 212
// Do not close multiple times (that would panic) in the case
// we hit this multiple times and/or concurrently
select {
case <-o.stopSamplesCollection:
return
default:
close(o.stopSamplesCollection)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this will work as handleFlushError is only called in flush and that is called only in one placea at a time.

I kind of feel we are doing this in the wrong place - if there was an error that will abort teh flushing of metrics we should stop flushing metrics. Not only collecting them.

Copy link
Contributor Author

@codebien codebien May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to check this all the time when the expectation for this is to get it very few times? Why should we continuously sync over this select statement for a successful test where we will never hit it? We are already doing it for AddMetricSamples so it would require adding another one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we make a network call trhat will fail ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in practice you can just have it in the Stop method I guess - no need to go through all the different parts if we are not going to flush anything.

Copy link
Contributor Author

@codebien codebien May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we make a network call trhat will fail ?

I'm not getting what failing network call

Also in practice you can just have it in the Stop method I guess

The problem with this is when we will move to have multiple-concurrent flushers then we could get this function called multiple times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may try to have this handler as a goroutine and get the error on a channel in the classic go way. In this way, we have a centralized place for doing the close.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do something like this in the v1 maybe more of the structure should be copied from there

Copy link
Contributor Author

@codebien codebien May 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mstoykov I tried but I don't think it is cleaner. How does it achieve the goal?! Invoking the flush operation from the aggregation goroutine that it sounds not optimal because it mixes responsibilities.

I'm not against doing the refactor and I am happy to improve it, but it doesn't sound to me like a priority right now. I have the feeling that we could go back and forth a few times before we find the right fit.
Performances should be fine and the logic has been fixed not panicking.

For now, I included it in the reminders list. I would like to merge the chain before and the retry + concurrent flushers then with a better picture we can do the refactor.

Also, the metadata flush will be added soon, so the risk is to refactor this multiple times in a few days.

It is a more effective naming and it represents better what it does.
It interrupts all operations, not only the samples collection.
@codebien
Copy link
Contributor Author

I renamed stopSamplesCollection to abort because it sounds more generic considering that it stops multiple operations.

mstoykov
mstoykov previously approved these changes May 29, 2023
imiric
imiric previously approved these changes May 29, 2023
Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

output/cloud/expv2/output.go Outdated Show resolved Hide resolved
output/cloud/expv2/output_test.go Show resolved Hide resolved
Comment on lines +79 to +80
o.periodicInvoke(o.config.MetricPushInterval.TimeDuration(), o.flushMetrics)
o.periodicInvoke(o.config.AggregationPeriod.TimeDuration(), o.collectSamples)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just modify PeriodicFlusher to not call the callback if configured so? We're essentially duplicating what it does to avoid that final call.

Otherwise, if we're fine with the duplication, then this being a method doesn't bother me, since it likely won't be needed anywhere else.

@codebien codebien dismissed stale reviews from imiric and mstoykov via 2520bb2 May 29, 2023 09:29
@codebien codebien merged commit ecca789 into master May 29, 2023
@codebien codebien deleted the cloud-v2-handle-flush-err branch May 29, 2023 10:05
@codebien codebien added this to the v0.45.0 milestone May 30, 2023
@codebien codebien mentioned this pull request Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants