output/cloudv2: Error handling for flush #3082

codebien · 2023-05-21T19:25:48Z

It is a small refactor for the error handling on the flush operation.

codecov-commenter · 2023-05-21T19:33:39Z

Codecov Report

Merging #3082 (c1d61df) into master (df6cbce) will increase coverage by 0.17%.
The diff coverage is 87.34%.

❗ Current head c1d61df differs from pull request most recent head 2520bb2. Consider uploading reports for the commit 2520bb2 to get more accurate results

@@            Coverage Diff             @@
##           master    #3082      +/-   ##
==========================================
+ Coverage   73.51%   73.69%   +0.17%     
==========================================
  Files         238      239       +1     
  Lines       18220    18258      +38     
==========================================
+ Hits        13395    13455      +60     
+ Misses       3954     3934      -20     
+ Partials      871      869       -2

Flag	Coverage Δ
ubuntu	`73.69% <87.34%> (+0.17%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
js/initcontext.go	`88.88% <ø> (+8.88%)`	⬆️
js/modules/gomodule.go	`77.27% <ø> (ø)`
js/modules/require_impl.go	`76.47% <76.47%> (ø)`
output/cloud/expv2/output.go	`81.70% <85.10%> (+36.11%)`	⬆️
js/bundle.go	`85.09% <88.23%> (-0.15%)`	⬇️
js/modules/resolution.go	`90.32% <90.32%> (ø)`
js/modules/cjsmodule.go	`78.26% <100.00%> (ø)`
js/modulestest/runtime.go	`100.00% <100.00%> (ø)`

... and 2 files with indirect coverage changes

imiric

LGTM, just minor suggestions.

output/cloud/expv2/output.go

The previous version using the PeriodicFlusher helper was not correct. On Stop the PeriodicFlusher triggers again the callback so before really stopping we was calling again the flush. The new architecture does the same as the PeriodicFlusher but it does not call again on stop. The flush on correct Stop is handled directly from the stop method.

output/cloud/expv2/output.go

mstoykov · 2023-05-26T15:47:52Z

output/cloud/expv2/output.go

+	o.periodicInvoke(o.config.MetricPushInterval.TimeDuration(), o.flushMetrics)
+	o.periodicInvoke(o.config.AggregationPeriod.TimeDuration(), o.collectSamples)


I don't really like that this is a method instead of a function. But looking at it - it won't be particularly good if it's too generic and if it's too specific it will likely never be reused.

In both cases it will likely look worse.

I am leaving this mostly for other people if somebody has some ideas.

The unique alternative I have in mind is to have it directly as a function in the Start method.

func (o Out) Start() { periodic := func() { ... } periodic(d, collect) periodic(d, flush) }

Can we just modify PeriodicFlusher to not call the callback if configured so? We're essentially duplicating what it does to avoid that final call.

Otherwise, if we're fine with the duplication, then this being a method doesn't bother me, since it likely won't be needed anywhere else.

We may evaluate extending the current periodic helper but we need to be sure this is the final architecture. As commented here #3082 (comment), I would like to stabilize the rest then go back here and make some attempts for refactoring it in a new PR.

mstoykov · 2023-05-26T16:10:19Z

output/cloud/expv2/output.go

+	// Do not close multiple times (that would panic) in the case
+	// we hit this multiple times and/or concurrently
+	select {
+	case <-o.stopSamplesCollection:
+		return
+	default:
+		close(o.stopSamplesCollection)
+	}


While this will work as handleFlushError is only called in flush and that is called only in one placea at a time.

I kind of feel we are doing this in the wrong place - if there was an error that will abort teh flushing of metrics we should stop flushing metrics. Not only collecting them.

Do we want to check this all the time when the expectation for this is to get it very few times? Why should we continuously sync over this select statement for a successful test where we will never hit it? We are already doing it for AddMetricSamples so it would require adding another one.

Why should we make a network call trhat will fail ?

Also in practice you can just have it in the Stop method I guess - no need to go through all the different parts if we are not going to flush anything.

Why should we make a network call trhat will fail ?

I'm not getting what failing network call

Also in practice you can just have it in the Stop method I guess

The problem with this is when we will move to have multiple-concurrent flushers then we could get this function called multiple times.

I think we may try to have this handler as a goroutine and get the error on a channel in the classic go way. In this way, we have a centralized place for doing the close.

We do something like this in the v1 maybe more of the structure should be copied from there

@mstoykov I tried but I don't think it is cleaner. How does it achieve the goal?! Invoking the flush operation from the aggregation goroutine that it sounds not optimal because it mixes responsibilities.

I'm not against doing the refactor and I am happy to improve it, but it doesn't sound to me like a priority right now. I have the feeling that we could go back and forth a few times before we find the right fit.
Performances should be fine and the logic has been fixed not panicking.

For now, I included it in the reminders list. I would like to merge the chain before and the retry + concurrent flushers then with a better picture we can do the refactor.

Also, the metadata flush will be added soon, so the risk is to refactor this multiple times in a few days.

Co-authored-by: Mihail Stoykov <[email protected]>

It is a more effective naming and it represents better what it does. It interrupts all operations, not only the samples collection.

codebien · 2023-05-27T10:56:57Z

I renamed stopSamplesCollection to abort because it sounds more generic considering that it stops multiple operations.

imiric

LGTM 👍

output/cloud/expv2/output.go

output/cloud/expv2/output_test.go

imiric · 2023-05-29T08:52:26Z

output/cloud/expv2/output.go

+	o.periodicInvoke(o.config.MetricPushInterval.TimeDuration(), o.flushMetrics)
+	o.periodicInvoke(o.config.AggregationPeriod.TimeDuration(), o.collectSamples)


Can we just modify PeriodicFlusher to not call the callback if configured so? We're essentially duplicating what it does to avoid that final call.

Otherwise, if we're fine with the duplication, then this being a method doesn't bother me, since it likely won't be needed anywhere else.

Co-authored-by: Ivan Mirić <[email protected]>

codebien self-assigned this May 21, 2023

github-actions bot requested review from imiric and mstoykov May 21, 2023 19:26

codebien force-pushed the cloud-v2-handle-flush-err branch from 8f29f8f to 0e50e23 Compare May 21, 2023 20:25

This was referenced May 21, 2023

cloud: New output v2 #3072

Merged

output/cloudv2: Aggregation #3071

Merged

imiric previously approved these changes May 24, 2023

View reviewed changes

output/cloud/expv2/output.go Outdated Show resolved Hide resolved

output/cloud/expv2/output.go Outdated Show resolved Hide resolved

output/cloud/expv2/output.go Outdated Show resolved Hide resolved

output/cloud/expv2/output.go Outdated Show resolved Hide resolved

codebien force-pushed the ingestion/binary-proto branch from 0cddc41 to 8e71152 Compare May 24, 2023 16:09

codebien mentioned this pull request May 25, 2023

cloud: Binary-based ingestion #2954

Closed

codebien force-pushed the ingestion/binary-proto branch from 8e71152 to 3d401fe Compare May 25, 2023 15:44

codebien force-pushed the cloud-v2-handle-flush-err branch from 0e50e23 to 9531f91 Compare May 25, 2023 21:53

codebien force-pushed the ingestion/binary-proto branch from 4e8347e to 76b11c0 Compare May 26, 2023 08:19

Base automatically changed from ingestion/binary-proto to master May 26, 2023 11:04

codebien dismissed imiric’s stale review via b5e3a64 May 26, 2023 11:04

codebien added 4 commits May 26, 2023 13:06

Error handling for flush operation

a7ebb72

Test AddMetricSamples

9bc593f

Rename to stopSamplesCollection

9a9adc1

codebien force-pushed the cloud-v2-handle-flush-err branch from b5e3a64 to 9a9adc1 Compare May 26, 2023 11:07

codebien requested a review from imiric May 26, 2023 11:07

mstoykov reviewed May 26, 2023

View reviewed changes

output/cloud/expv2/output.go Outdated Show resolved Hide resolved

mstoykov reviewed May 26, 2023

View reviewed changes

output/cloud/expv2/output.go Outdated Show resolved Hide resolved

codebien added 3 commits May 26, 2023 16:27

fixup! Correct stop on error signal

ed7987c

Test for multiple calls to handle error

f4fa415

Tparallel fix

540a94b

codebien force-pushed the cloud-v2-handle-flush-err branch from c5d7c92 to 540a94b Compare May 26, 2023 14:44

Less fragile concurrent API

8cf3ecf

codebien requested a review from mstoykov May 26, 2023 15:24

mstoykov reviewed May 26, 2023

View reviewed changes

Update output/cloud/expv2/output.go

b32b8d1

Co-authored-by: Mihail Stoykov <[email protected]>

codebien force-pushed the cloud-v2-handle-flush-err branch from 8fef178 to b32b8d1 Compare May 26, 2023 16:37

Renamed stopSamplesCollection to abort

b35d7ca

It is a more effective naming and it represents better what it does. It interrupts all operations, not only the samples collection.

codebien mentioned this pull request May 27, 2023

output/cloudv2: Optimized metric sinks #3085

Merged

Make sure to not stuck on stop

7072cea

mstoykov previously approved these changes May 29, 2023

View reviewed changes

imiric previously approved these changes May 29, 2023

View reviewed changes

Update output/cloud/expv2/output.go

2520bb2

Co-authored-by: Ivan Mirić <[email protected]>

codebien dismissed stale reviews from imiric and mstoykov via 2520bb2 May 29, 2023 09:29

imiric approved these changes May 29, 2023

View reviewed changes

mstoykov approved these changes May 29, 2023

View reviewed changes

codebien merged commit ecca789 into master May 29, 2023

codebien deleted the cloud-v2-handle-flush-err branch May 29, 2023 10:05

codebien added this to the v0.45.0 milestone May 30, 2023

codebien mentioned this pull request Jun 15, 2023

Cloud output v2 #3117

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

output/cloudv2: Error handling for flush #3082

output/cloudv2: Error handling for flush #3082

codebien commented May 21, 2023

codecov-commenter commented May 21, 2023 •

edited

Loading

imiric left a comment

mstoykov May 26, 2023

codebien May 26, 2023

imiric May 29, 2023

codebien May 29, 2023

mstoykov May 26, 2023

codebien May 26, 2023 •

edited

Loading

mstoykov May 26, 2023

mstoykov May 26, 2023

codebien May 26, 2023 •

edited

Loading

codebien May 26, 2023

mstoykov May 27, 2023

codebien May 27, 2023 •

edited

Loading

codebien commented May 27, 2023

imiric left a comment

imiric May 29, 2023

		o.periodicInvoke(o.config.MetricPushInterval.TimeDuration(), o.flushMetrics)
		o.periodicInvoke(o.config.AggregationPeriod.TimeDuration(), o.collectSamples)

output/cloudv2: Error handling for flush #3082

output/cloudv2: Error handling for flush #3082

Conversation

codebien commented May 21, 2023

codecov-commenter commented May 21, 2023 • edited Loading

Codecov Report

imiric left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codebien May 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codebien May 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codebien May 27, 2023 • edited Loading

Choose a reason for hiding this comment

codebien commented May 27, 2023

imiric left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented May 21, 2023 •

edited

Loading

codebien May 26, 2023 •

edited

Loading

codebien May 26, 2023 •

edited

Loading

codebien May 27, 2023 •

edited

Loading