RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/kubernetes/test-infra/issues/19483 below:

release-blocking jobs must run in dedicated cluster: ci-kubernetes-build · Issue #19483 · kubernetes/test-infra · GitHub

What should be cleaned up or changed:

This is part of #18549

To properly monitor the outcome of this, you should be a member of k8s-infra-prow-viewers@kubernetes.io. PR yourself into https://github.com/kubernetes/k8s.io/blob/master/groups/groups.yaml#L603-L628 if you're not a member.

NOTE: I am not tagging this as "help wanted" because it is blocked on kubernetes/k8s.io#846. I would also recommend doing ci-kubernetes-build-fast first. Here is my guess at how we could do this:

create a duplicate job that pushes to the new bucket writable by k8s-infra-prow-build
ensure it's building and pushing appropriately
update a release-blocking job to pull from the new bucket
if no problems, roll out changes progressively
- a few more jobs in release-blocking
- all jobs in release-blocking that use this job's results
- a job that still runs in the "default" cluster
- all jobs that use this job's results
rename jobs / get rid of the job that runs on the "default" cluster
do the same for release-branch variants, can probably do a faster rollout

It will be helpful to note the date/time that PR's merge. This will allow you to compare before/after behavior.

Things to watch for the job

https://prow.k8s.io/?job=ci-kubernetes-build
- does the job start failing more often?
- does the job start going into error state?
https://testgrid.k8s.io/presubmits-kubernetes-blocking#ci-kubernetes-build&graph-metrics=test-duration-minutes
- does the job duration look worse than before? spikier than before?
https://storage.googleapis.com/k8s-gubernator/triage/index.html?pr=1&job=ci-kubernetes-build
- do more failures show up than before?
https://prow.k8s.io/job-history/gs/kubernetes-jenkins/pr-logs/directory/ci-kubernetes-build
- (can be used to answer some of the same questions as above)
metrics explorer: CPU limit utilization for ci-kubernetes-build for 6h
- is the job wildly underutilizing its CPU limit? if so, perhaps tune down (if uncertain, post evidence in this issue and ask)
- (it will probably be helpful to look at different time resolutions like 1h, 6h, 1d, 1w)
metrics explorer: Memory limit utilization for ci-kubernetes-build for 6h
- is the job wildly underutilizing its memory limit? if so, perhaps tune down (if uncertain, post evidence in this issue and ask)
- (it will probably be helpful to look at different time resolutions like 1h, 6h, 1d, 1w)

Things to watch for the build cluster

prow-build dashboard 1w
- is the build cluster scaling as needed? (e.g. maybe it can't scale because we've hit some kind of quota)
- (it will probably be helpful to look at different time resolutions like 1h, 6h, 1d, 1w)
prowjobs-experiment 1w
- (shows resource consumption of all job runs, pretty noisy but putting this here for completeness)
https://monitoring.prow.k8s.io/d/wSrfvNxWz/boskos-resource-usage?orgId=1

Keep this open for at least 24h of weekday PR traffic. If everything continues to look good, then this can be closed.

/wg k8s-infra
/sig testing
/area jobs

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4