Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes services unable to start due to DNS or Certificate issues #3769

Closed
2 tasks done
neiltwist opened this issue Apr 18, 2019 · 11 comments
Closed
2 tasks done

Kubernetes services unable to start due to DNS or Certificate issues #3769

neiltwist opened this issue Apr 18, 2019 · 11 comments

Comments

@neiltwist
Copy link

  • I have tried with the latest version of my channel (Stable or Edge)
  • I have uploaded Diagnostics
  • Diagnostics ID: D79424CC-C069-4CEC-9E5F-868CAC14C0D0/20190418080540

Expected behavior

  • Select the Checkbox to start Kubernetes
  • Kubernetes starts

Actual behavior

  • Select the checkbox to start Kubernetes
  • Kubernetes is always in the "starting" state
  • kubectl get nodes returns No resources found.

Information

  • Windows Version: 1709
  • Docker for Windows Version: 2.0.3.0 (31778)

Steps to reproduce the behavior

  1. Enable Kubernetes
  2. The Kubernetes containers start up
  3. Look at the logs of the running kubernetes containers and they have the following: 1 log.go:172] http: TLS handshake error from 192.168.65.3:48722: remote error: tls: bad certificate and Get https://vm.docker.internal:6443/api/v1/nodes?limit=500&resourceVersion=0: x509: certificate is valid for docker-for-desktop, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, host.docker.internal, not vm.docker.internal
  4. Look at the Docker for Windows logs and see: vpnkit.exe: ICMP: destination unreachable from 192.168.65.3 and time="2019-04-18T09:43:57+01:00" msg="DNS failure: docker-desktop.\tIN\t AAAA: errno 9002: DnsQuery: DNS server failure." and time="2019-04-18T09:43:57+01:00" msg="0/3 system pods running, found labels but still waiting for labels k8s-app=kube-dns, component=kube-controller-manager, component=kube-apiserver..."
@jpreese
Copy link

jpreese commented Apr 23, 2019

Was this ever resolved for you, @neiltwist ?

@neiltwist
Copy link
Author

Unfortunately not. Eventually I managed to make stable (2.0.0.3) work outside the corporate firewall, but edge (2.0.3.0) comes up with this error regardless.

@jpreese
Copy link

jpreese commented Apr 24, 2019

@neiltwist I tried swapping to the Kubernetes version that you mentioned in this issue and got the same error.

To fix this, I deleted the pki/ folder in C:\ProgramData\DockerDesktop

Restart Docker and this should regenerate client certs for you, which include vm.docker.internal

@neiltwist
Copy link
Author

Thanks @jpreese , that's fixed the certifcate error, but I'm still getting the DNS error in the main log, and in the etcd logs I get the following:

2019-04-24 10:40:24.215447 I | etcdmain: etcd Version: 3.2.24
2019-04-24 10:40:24.215555 I | etcdmain: Git SHA: 420a45226
2019-04-24 10:40:24.215575 I | etcdmain: Go Version: go1.8.7
2019-04-24 10:40:24.215592 I | etcdmain: Go OS/Arch: linux/amd64
2019-04-24 10:40:24.215644 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2019-04-24 10:40:24.215768 I | embed: peerTLS: cert = /run/config/pki/etcd/peer.crt, key = /run/config/pki/etcd/peer.key, ca = , trusted-ca = /run/config/pki/etcd/ca.crt, client-cert-auth = true
2019-04-24 10:40:24.216477 I | embed: listening for peers on https://192.168.65.3:2380
2019-04-24 10:40:24.216709 I | embed: listening for client requests on 127.0.0.1:2379
2019-04-24 10:40:24.216944 I | embed: listening for client requests on 192.168.65.3:2379
2019-04-24 10:40:24.222461 I | etcdserver: name = docker-desktop
2019-04-24 10:40:24.222738 I | etcdserver: data dir = /var/lib/etcd
2019-04-24 10:40:24.222841 I | etcdserver: member dir = /var/lib/etcd/member
2019-04-24 10:40:24.222947 I | etcdserver: heartbeat = 100ms
2019-04-24 10:40:24.223086 I | etcdserver: election = 1000ms
2019-04-24 10:40:24.223227 I | etcdserver: snapshot count = 10000
2019-04-24 10:40:24.223326 I | etcdserver: advertise client URLs = https://192.168.65.3:2379
2019-04-24 10:40:24.223407 I | etcdserver: initial advertise peer URLs = https://192.168.65.3:2380
2019-04-24 10:40:24.226577 I | etcdserver: initial cluster = docker-desktop=https://192.168.65.3:2380
2019-04-24 10:40:24.233261 I | etcdserver: starting member 5a5fcaeaef75abff in cluster 6246e842008cf04d
2019-04-24 10:40:24.233445 I | raft: 5a5fcaeaef75abff became follower at term 0
2019-04-24 10:40:24.234671 I | raft: newRaft 5a5fcaeaef75abff [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2019-04-24 10:40:24.234754 I | raft: 5a5fcaeaef75abff became follower at term 1
2019-04-24 10:40:24.257276 W | auth: simple token is not cryptographically signed
2019-04-24 10:40:24.270433 I | etcdserver: starting server... [version: 3.2.24, cluster version: to_be_decided]
2019-04-24 10:40:24.271775 I | embed: ClientTLS: cert = /run/config/pki/etcd/server.crt, key = /run/config/pki/etcd/server.key, ca = , trusted-ca = /run/config/pki/etcd/ca.crt, client-cert-auth = true
2019-04-24 10:40:24.272998 I | etcdserver: 5a5fcaeaef75abff as single-node; fast-forwarding 9 ticks (election ticks 10)
2019-04-24 10:40:24.273494 I | etcdserver/membership: added member 5a5fcaeaef75abff [https://192.168.65.3:2380] to cluster 6246e842008cf04d
2019-04-24 10:40:24.535414 I | raft: 5a5fcaeaef75abff is starting a new election at term 1
2019-04-24 10:40:24.535479 I | raft: 5a5fcaeaef75abff became candidate at term 2
2019-04-24 10:40:24.535493 I | raft: 5a5fcaeaef75abff received MsgVoteResp from 5a5fcaeaef75abff at term 2
2019-04-24 10:40:24.535504 I | raft: 5a5fcaeaef75abff became leader at term 2
2019-04-24 10:40:24.535512 I | raft: raft.node: 5a5fcaeaef75abff elected leader 5a5fcaeaef75abff at term 2
2019-04-24 10:40:24.535719 I | etcdserver: setting up the initial cluster version to 3.2
2019-04-24 10:40:24.541223 N | etcdserver/membership: set the initial cluster version to 3.2
2019-04-24 10:40:24.541322 I | etcdserver/api: enabled capabilities for version 3.2
2019-04-24 10:40:24.541391 I | etcdserver: published {Name:docker-desktop ClientURLs:[https://192.168.65.3:2379]} to cluster 6246e842008cf04d
2019-04-24 10:40:24.543362 I | embed: ready to serve client requests
2019-04-24 10:40:24.543695 I | embed: serving client requests on 127.0.0.1:2379
2019-04-24 10:40:24.554563 I | embed: ready to serve client requests
2019-04-24 10:40:24.554857 I | embed: serving client requests on 192.168.65.3:2379
WARNING: 2019/04/24 10:40:24 Failed to dial 127.0.0.1:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.
WARNING: 2019/04/24 10:40:24 Failed to dial 192.168.65.3:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.
2019-04-24 10:41:58.692298 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" " with result "range_response_count:1 size:454" took too long (241.2862ms) to execute
2019-04-24 10:41:58.693111 W | etcdserver: read-only range request "key:\"/registry/secrets/kube-system/pod-garbage-collector-token-xg595\" " with result "range_response_count:1 size:2379" took too long (421.3955ms) to execute
2019-04-24 10:41:58.693510 W | etcdserver: read-only range request "key:\"/registry/secrets/kube-system/resourcequota-controller-token-cnxkt\" " with result "range_response_count:1 size:2400" took too long (421.8678ms) to execute

@jpreese
Copy link

jpreese commented Apr 24, 2019

I personally did not have the DNS issue, only the cert one so I can't reproduce that behavior to test.

I'd recommend trying some of the proposed solutions here on GitHub (if not already) that speak to DNS issues, here is one example: #1962

(setting DNS to 8.8.8.8, making sure your hosts file references 127.0.0.1, etc)

@neiltwist
Copy link
Author

I've been through most of those before, did you ever see anything like the below in your certificate issues?

WARNING: 2019/04/24 10:40:24 Failed to dial 127.0.0.1:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

@jpreese
Copy link

jpreese commented Apr 24, 2019

Once Kubernetes went green and was no longer stuck in a Starting state, I did not dive into the logs much as everything seemed to be working.

@neiltwist
Copy link
Author

Ah ok, my kubernetes is still not going green.

@neiltwist
Copy link
Author

And as I say that, it's gone green. I'm not sure it was adhering to the system proxy settings, so I set them manually. And I'm still getting the DNS error (despite having the manual DNS set), but the original certificate error is fixed and it's working fine now.

Thanks for your help!

@sanikolov
Copy link

sanikolov commented May 7, 2019

Tried all the workarounds here and a couple more that were not mentioned (e.g. used a squid proxy, forced DNS resolutions via 8.8.8.8, etc). Rebooted, restarted, reset to factory defaults over and over again. Nothing worked on my windows 10 box. Strangely, while performing similar mindless repetitive steps on my work laptop, I got k8s working by some miracle.

C:\> kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators Jul 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants