What happened:
I was investigating kubernetes-retired/kubefed#1024 and I stumbled across an issue which I believe might be a bug in Kubernetes.
I have successfully recreated this issue using some test configuration, so you don't need to deploy kubefed to reproduce it.
--- apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: "codeclou/docker-nginx-self-signed-ssl:latest" imagePullPolicy: Always ports: - containerPort: 4443 -- apiVersion: v1 kind: Service metadata: name: nginx spec: selector: app: nginx ports: - port: 443 targetPort: 4443 --- apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: creationTimestamp: null name: testconfigs.example.io spec: group: example.io version: v1 versions: - name: v1 storage: true served: true names: kind: TestConfig plural: testconfigs scope: Namespaced validation: openAPIV3Schema: type: object properties: spec: type: object properties: TestString: description: This is a test string type: string --- apiVersion: admissionregistration.k8s.io/v1beta1 kind: ValidatingWebhookConfiguration metadata: name: test-webhook webhooks: - name: testconfigs.example.io clientConfig: service: namespace: default name: nginx rules: - operations: - CREATE - UPDATE apiGroups: - example.io apiVersions: - v1 resources: - testconfigs failurePolicy: Fail
After applying the YAML, you can see that a pod gets created:
❯ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-5cf877cd99-n4n9l 1/1 Running 0 9m52s 10.156.17.45 gke-simons-cluster-preemptible-899b51b7-m0zk <none> <none>
As well as a service:
❯ kubectl get services -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
nginx ClusterIP 10.156.32.6 <none> 443/TCP 10m app=nginx
The other resources to be deployed are:
kind: TestConfig
andtest-webhook
.The webhook is invoked when a new resource of kind:TestConfig
is created or updated. I would expect it to work as follows: an HTTPS request should be made made to service.name: nginx
which validates the request and the object creation succeeds.
Let's attempt to create a TestConfig object:
--- apiVersion: example.io/v1 kind: TestConfig metadata: name: test-object namespace: default spec: TestString: "This is my test string"
I observe that the validation webhook request times out, causing the resource creation to fail.
❯ kubectl apply -f test-resource.yaml
Error from server (Timeout): error when creating "test-resource.yaml": Timeout: request did not complete within requested timeout 30s
After some investigation, I realised that the admission control webhook attempts to hit the pod's IP address (10.156.17.45), rather than the service's (10.156.32.6), on the port specified by the service's targetPort
(4443). This packet is intercepted by my GCE VPC firewall and gets denied.
{
insertId: "a05tpmfdwoniw"
jsonPayload: {
connection: {
dest_ip: "10.156.17.45"
dest_port: 4443
protocol: 6
src_ip: "10.172.0.3"
src_port: 57446
}
disposition: "DENIED"
...
GKE operates the master plane on a separate VPC network in a separate GCE account. A firewall rule is deployed automatically during cluster creation to allow traffic between the master and the node pools on ports 443 and 10250 only.
As soon as I add port 4443 to this firewall rule and the request to the pod succeeds, I observe that the admission control webhooks fires a correct request to the service name on the correct port and the validation webhook succeeds (in my specific case, I see a certificate mismatch error since no webhook CA was configured and I used a test container).
❯ kubectl apply -f test-resource.yaml
Error from server (InternalError): error when creating "test-resource.yaml": Internal error occurred: failed calling webhook "testconfigs.example.io": Post https://nginx.default.svc:443/?timeout=30s: x509: certificate is valid for local.codeclou.io, not nginx.default.svc
When I remove port 4443 from the above-mentioned GCP firewall rule, the timeout issue doesn't represent itself.
What you expected to happen:
I didn't expect any communication to happen between the admission controller webhook (on the master network) and a specific pod's IP address.
The only communication I expect to happen should be between the webhook and the service.
How to reproduce it (as minimally and precisely as possible):
Please look above to reproduce the issue or attempt to deploy the latest kubefed
on a kubernetes cluster where the traffic between the master and the node networks are restricted with only ports 443 and 10250 allowed.
Anything else we need to know?:
Environment:
kubectl version
):kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-20T04:49:16Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.7-gke.8", Git
cat /etc/os-release
): Container optimised-OSuname -a
): Google managed/sig api-machinery
debu99, andreyvelich, SerhatTeker, MarioUhrik and nhs000
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4