TRT-1811: fallback queries #1993

neisw · 2024-09-15T20:59:39Z

Queries previous releases without variant, capability or testid filtering
Caches results so navigating down through capabilities doesn't cause the query to run again
Compares stats from previous releases based on testIdentification and selects the one with the highest pass rate
Includes Advanced Option for Fallback Basis default is ignore currently
Fallback test details shows the overriden release status
- Override Release Assessment
- Fallback Release Assessment
- Fallback Release basis with dates
- Override Release basis with dates (new tab)

In testing I'm not seeing many regressions that only show for fallback (which is good).
But testing with:
Sample 4.18 9/21-9/28
Base 4.17 8/2 - 8/30
Fallback Basis: Keep

Will generate a regression for operator conditions etcd that falls back to 4.15 1/29 - 2/28

The overriden basis data is still available via the Override Tests tab

openshift-ci-robot · 2024-09-15T20:59:42Z

@neisw: This pull request references TRT-1811 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Queries previous releases without variant, capability or testid filtering

Caches results so navigating down through capabilities doesn't cause the query to run again

Compares stats from previous releases based on testIdentification and selects the one with the highest pass rate

Currently just a POC

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-09-15T20:59:43Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2024-09-15T20:59:52Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neisw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [neisw]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2024-10-01T19:38:54Z

@neisw: This pull request references TRT-1811 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Queries previous releases without variant, capability or testid filtering

Caches results so navigating down through capabilities doesn't cause the query to run again

Compares stats from previous releases based on testIdentification and selects the one with the highest pass rate

Includes Advanced Option for Fallback Basis default is ignore currently

Fallback test details shows the overriden release status

Override Release Assessment

Fallback Release Assessment

Fallback Release basis with dates

Override Release basis with dates (new tab)

In testing I'm not seeing too many over regressions that only show for fallback (which is good).
But testing with:
Sample 4.18 9/21-9/28
Base 4.17 8/2 - 8/30
Fallback Basis: Keep

Will generate a regression for operator conditions etcd that falls back to 4.15 1/29 - 2/28

The overriden basis data is still available via the Override Tests tab

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2024-10-01T19:46:41Z

@neisw: This pull request references TRT-1811 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Queries previous releases without variant, capability or testid filtering

Caches results so navigating down through capabilities doesn't cause the query to run again

Compares stats from previous releases based on testIdentification and selects the one with the highest pass rate

Includes Advanced Option for Fallback Basis default is ignore currently

Fallback test details shows the overriden release status

Override Release Assessment

Fallback Release Assessment

Fallback Release basis with dates

Override Release basis with dates (new tab)

In testing I'm not seeing many regressions that only show for fallback (which is good).
But testing with:
Sample 4.18 9/21-9/28
Base 4.17 8/2 - 8/30
Fallback Basis: Keep

Will generate a regression for operator conditions etcd that falls back to 4.15 1/29 - 2/28

The overriden basis data is still available via the Override Tests tab

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-10-01T20:18:44Z

@neisw: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/lint	`4fd5899`	link	true	`/test lint`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

sosiouxme

rebase to include #2031
(and preferably the real fix for the 4.18 view)

sosiouxme · 2024-10-02T18:07:28Z

sippy-ng/src/component_readiness/AdvancedOptions.js

+            <p>Fallback Basis: {ignoreFallbackBasis ? 'ignore' : 'keep'}</p>
+            <Switch
+              checked={ignoreFallbackBasis}
+              onChange={handleChangeIgnoreFallbackBasis}
+              name="ignoreFallbackBasis"
+              color="primary"
+            />


could it please have a tooltip to explain what it's for

also i would find this less confusing if the boolean were reversed and so we had "Enable Fallback Basis" defaulting to false (for now)

sosiouxme · 2024-10-02T18:28:47Z

sippy-ng/src/component_readiness/CompReadyTestReport.js

+            <Tab label="Basis Tests" {...tabProps(0)} />
+            <Tab label="Override Tests" {...tabProps(1)} />


These titles are too ambiguous for a newb. It's admittedly difficult to come up with better. Maybe:

Suggested change

<Tab label="Basis Tests" {...tabProps(0)} />

<Tab label="Override Tests" {...tabProps(1)} />

<Tab label="Severest Result" {...tabProps(0)} />

<Tab label="Specified Result" {...tabProps(1)} />

Again some kind of explanation for newbs would be helpful... though I'm not sure tooltips work well on tabs

stbenjam

Can you share a localhost link to a page with a fallback? It'll make it easier if I'm running locally
Are you seeing any errors with redis locally? About 25% of the time things fail to cache - maybe it's my local network, the redis is running elsewhere

INFO[2024-10-03T07:28:16.163-04:00] Fallback (4.15) QueryTestStatus completed in 23.094652875s with 180541 base results from db
WARN[2024-10-03T07:28:16.797-04:00] couldn't persist new item to cache            error="write tcp 192.168.1.157:62924->192.168.1.215:6379: i/o timeout"

Is there any way to wire up ignoreFallback to the cache being unavailable? Running locally if you don't have a cache, it's very very slow

Some UI comments:

I think we should focus on only showing the user one square / one assessment per test. Having 2 muddies the waters: which is the right one? You can put it in a tooltip that it was overridden and what the original result was
Would it be possible to show all the releases in tabs? e.g. the tab titles could be the basis release, and let me see all the ones we have (e.g. all 4). It'd help for archeology, i.e. to see how the test did over time

stbenjam · 2024-10-03T10:33:56Z

pkg/util/utils.go

+func Prior30Days(gaDate *time.Time) time.Time {
+	// see if we can leverage ParseCRReleaseTime or other
+	// existing helper
+	return gaDate.Add(-1 * 30 * 24 * time.Hour)


I just noticed we're inconsistent, sometimes we're showing 30 days before GA, but the buttons in the UI are for 4 weeks (28 days).

Not a new problem but seems like we should pick one or the other.

stbenjam · 2024-10-03T10:47:48Z

pkg/api/componentreadiness/component_report.go

+		// this gets removed when we stop loading fallback data for the current base release
+		// and the section up to gets uncommented
+		fallbackRelease, err = previousRelease(fallbackRelease)
+		if err != nil {
+			log.WithError(err).Errorf("Failure determining fallback release for %s", fallbackRelease)
+			continue
+		}


I think there's a bug here: if previousRelease fails, the loop logs the error and continues -- causing the same release to be processed again in the next iteration, since previousRelease returns it's prior argument on failure.

IMHO it'd be simpler to construct the list of releases you want then range over it. It also avoids the fixed 4 in the for loop, I don't think you can assume there's always that many

e.g.

var selectedReleases []*crtype.Release fallbackRelease := f.BaseRelease // Get base plus up to 3 fallback releases for i := 0; i < 4; i++ { var crRelease *crtype.Release for i := range releases { if release.Release == fallbackRelease { crRelease = &releases[i] break } } if crRelease != nil { selectedReleases = append(selectedReleases, crRelease) } // Attempt to get the previous release var err error fallbackRelease, err = previousRelease(fallbackRelease) if err != nil { log.WithError(err).Errorf("failure determining fallback release for %s", fallbackRelease) break // Stop attempting if previousRelease fails } } for _, crRelease := range selectedReleases { // launch goroutines to fetch the data }

stbenjam · 2024-10-03T10:57:40Z

pkg/api/componentreadiness/component_report.go

+				}
+			} else {
+				// if we miss on a prior release we stop
+				break


This break belongs to if cachedTestStatuses, ok := c.cachedFallbackTestStatuses.Releases[priorRelease]; ok

Do you want to break if previousRelease returns err too?

stbenjam · 2024-10-03T11:08:10Z

pkg/api/componentreadiness/component_report.go

+		}
+
+		wg.Add(1)
+		go func(queryRelease *crtype.Release, queryStart, queryEnd time.Time) {


This isn't the only case where we're not doing it, but we need to start propagating the context from the http Request down into component readiness and using it in all our goroutines.

When a client disconnects, these still continue executing which isn't ideal. At the minimum it makes sippy a target for a ddos attack (clients can quickly reconnect over and over again and spawn off many goroutines that stay running).

i.e.:

func (s *Server) jsonComponentReportFromBigQuery(w http.ResponseWriter, req *http.Request) { ctx := req.Context() // [....] componentreadiness.GetComponentReportFromBigQuery(ctx, ...) // [...] } func GetComponentReportFromBigQuery(ctx context.Context, ...) { // [...] someOtherHelperThatUsesGoRoutines(ctx, ....) // [...] } func someOtherHelperThatUsesGoRoutines(ctx context.Context, ...) { // [...] go func() { select { case <-ctx.Done(): // any clean up required return // terminate when context does default: // actual body of work to do } }() // [...]

I filed a separate card for this: https://issues.redhat.com/browse/TRT-1845

stbenjam · 2024-10-03T12:18:33Z

pkg/api/componentreadiness/component_report.go

+		for _, release := range releases {
+			if release.Release == fallbackRelease {
+				crRelease = &release


Can't take the address of a loop variable like this, instead:

Suggested change

for _, release := range releases {

if release.Release == fallbackRelease {

crRelease = &release

for i, release := range releases {

if release.Release == fallbackRelease {

crRelease = &releases[i]

Or update go.mod to require 1.22

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 15, 2024

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 15, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 15, 2024

neisw force-pushed the fallback-queries branch 3 times, most recently from 4d62d3a to 5e973fc Compare October 1, 2024 17:36

TRT-1811: Support fallback queries

4fd5899

neisw force-pushed the fallback-queries branch from 5e973fc to 4fd5899 Compare October 1, 2024 19:42

neisw changed the title ~~TRT-1811: POC fallback queries~~ TRT-1811: fallback queries Oct 1, 2024

neisw marked this pull request as ready for review October 1, 2024 19:45

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 1, 2024

openshift-ci bot requested review from deads2k and stbenjam October 1, 2024 19:46

sosiouxme reviewed Oct 2, 2024

View reviewed changes

stbenjam reviewed Oct 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRT-1811: fallback queries #1993

TRT-1811: fallback queries #1993

neisw commented Sep 15, 2024 •

edited

Loading

openshift-ci-robot commented Sep 15, 2024 •

edited by openshift-ci bot

Loading

openshift-ci bot commented Sep 15, 2024

openshift-ci bot commented Sep 15, 2024

openshift-ci-robot commented Oct 1, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 1, 2024 •

edited by openshift-ci bot

Loading

openshift-ci bot commented Oct 1, 2024

sosiouxme left a comment •

edited

Loading

sosiouxme Oct 2, 2024

sosiouxme Oct 3, 2024

sosiouxme Oct 2, 2024

stbenjam left a comment

stbenjam Oct 3, 2024

stbenjam Oct 3, 2024

stbenjam Oct 3, 2024 •

edited

Loading

stbenjam Oct 3, 2024

stbenjam Oct 3, 2024 •

edited

Loading

stbenjam Oct 3, 2024

stbenjam Oct 3, 2024 •

edited

Loading

		<Tab label="Basis Tests" {...tabProps(0)} />
		<Tab label="Override Tests" {...tabProps(1)} />

TRT-1811: fallback queries #1993

Are you sure you want to change the base?

TRT-1811: fallback queries #1993

Conversation

neisw commented Sep 15, 2024 • edited Loading

openshift-ci-robot commented Sep 15, 2024 • edited by openshift-ci bot Loading

openshift-ci bot commented Sep 15, 2024

openshift-ci bot commented Sep 15, 2024

openshift-ci-robot commented Oct 1, 2024 • edited by openshift-ci bot Loading

openshift-ci-robot commented Oct 1, 2024 • edited by openshift-ci bot Loading

openshift-ci bot commented Oct 1, 2024

sosiouxme left a comment • edited Loading

Choose a reason for hiding this comment

sosiouxme Oct 2, 2024

Choose a reason for hiding this comment

sosiouxme Oct 3, 2024

Choose a reason for hiding this comment

sosiouxme Oct 2, 2024

Choose a reason for hiding this comment

stbenjam left a comment

Choose a reason for hiding this comment

stbenjam Oct 3, 2024

Choose a reason for hiding this comment

stbenjam Oct 3, 2024

Choose a reason for hiding this comment

stbenjam Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

stbenjam Oct 3, 2024

Choose a reason for hiding this comment

stbenjam Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

stbenjam Oct 3, 2024

Choose a reason for hiding this comment

stbenjam Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

neisw commented Sep 15, 2024 •

edited

Loading

openshift-ci-robot commented Sep 15, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 1, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 1, 2024 •

edited by openshift-ci bot

Loading

sosiouxme left a comment •

edited

Loading

stbenjam Oct 3, 2024 •

edited

Loading

stbenjam Oct 3, 2024 •

edited

Loading

stbenjam Oct 3, 2024 •

edited

Loading