Showing content from https://github.com/kubernetes/test-infra/issues/19483 below:
release-blocking jobs must run in dedicated cluster: ci-kubernetes-build · Issue #19483 · kubernetes/test-infra · GitHub
What should be cleaned up or changed:
This is part of #18549
To properly monitor the outcome of this, you should be a member of k8s-infra-prow-viewers@kubernetes.io. PR yourself into https://github.com/kubernetes/k8s.io/blob/master/groups/groups.yaml#L603-L628 if you're not a member.
NOTE: I am not tagging this as "help wanted" because it is blocked on kubernetes/k8s.io#846. I would also recommend doing ci-kubernetes-build-fast first. Here is my guess at how we could do this:
- create a duplicate job that pushes to the new bucket writable by k8s-infra-prow-build
- ensure it's building and pushing appropriately
- update a release-blocking job to pull from the new bucket
- if no problems, roll out changes progressively
- a few more jobs in release-blocking
- all jobs in release-blocking that use this job's results
- a job that still runs in the "default" cluster
- all jobs that use this job's results
- rename jobs / get rid of the job that runs on the "default" cluster
- do the same for release-branch variants, can probably do a faster rollout
It will be helpful to note the date/time that PR's merge. This will allow you to compare before/after behavior.
Things to watch for the job
- https://prow.k8s.io/?job=ci-kubernetes-build
- does the job start failing more often?
- does the job start going into error state?
- https://testgrid.k8s.io/presubmits-kubernetes-blocking#ci-kubernetes-build&graph-metrics=test-duration-minutes
- does the job duration look worse than before? spikier than before?
- https://storage.googleapis.com/k8s-gubernator/triage/index.html?pr=1&job=ci-kubernetes-build
- do more failures show up than before?
- https://prow.k8s.io/job-history/gs/kubernetes-jenkins/pr-logs/directory/ci-kubernetes-build
- (can be used to answer some of the same questions as above)
- metrics explorer: CPU limit utilization for
ci-kubernetes-build
for 6h
- is the job wildly underutilizing its CPU limit? if so, perhaps tune down (if uncertain, post evidence in this issue and ask)
- (it will probably be helpful to look at different time resolutions like 1h, 6h, 1d, 1w)
- metrics explorer: Memory limit utilization for
ci-kubernetes-build
for 6h
- is the job wildly underutilizing its memory limit? if so, perhaps tune down (if uncertain, post evidence in this issue and ask)
- (it will probably be helpful to look at different time resolutions like 1h, 6h, 1d, 1w)
Things to watch for the build cluster
Keep this open for at least 24h of weekday PR traffic. If everything continues to look good, then this can be closed.
/wg k8s-infra
/sig testing
/area jobs
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4