Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube DNS Latency #96

Closed
alok87 opened this issue May 25, 2017 · 52 comments
Closed

Kube DNS Latency #96

alok87 opened this issue May 25, 2017 · 52 comments
Assignees

Comments

@alok87
Copy link

alok87 commented May 25, 2017

We have dns pods running in our cluster (cluster details below)
Issue is every 2-3 requests out of 5 is having a latency of 5 seconds because of the dns.

root@my-nginx-858393261-m3bnl:/# time curl http://myservice.central:8080/status
{
  "host": "myservice-3af719a-805113283-x35p1",
  "status": "OK"
}

real	0m5.523s
user	0m0.004s
sys	0m0.000s
root@my-nginx-858393261-m3bnl:/# time curl http://myservice.central:8080/status
{
  "host": "myservice-3af719a-805113283-x35p1",
  "status": "OK"
}

real	0m0.013s
user	0m0.000s
sys	0m0.004s

Cluster details: We are running Kubernetes latest version 1.6.4 installed using kops. Its mutli AZ cluster in aws.

Below are the kube dns details

  • kubedns: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
  • dnsmaq: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1
  • sidecar: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.1

Our Kube dns is running with below requests

cpu	:	200m	
memory	:	70Mi

Please let us know the issue and how to fix this

@thockin
Copy link
Member

thockin commented May 25, 2017 via email

@alok87
Copy link
Author

alok87 commented May 25, 2017

@thockin No restarts in kube dns pods. We are running t2.large 10 nodes and each nodes have a dns running. This a new cluster with very less pressure on it.

@thockin
Copy link
Member

thockin commented May 25, 2017 via email

@bowei bowei self-assigned this May 25, 2017
@alok87
Copy link
Author

alok87 commented May 25, 2017

@bowei We are running the same version of cluster and dns in another environment but it is NON AZ. But no such lags there. Also we tried directly hitting the cluster IP - and it works without lag (in both the clusters) Issue is there with dns resolution in this cluster only. Can u help debug the problem ?

@SleepyBrett
Copy link

Running 1.14.2 here and are seeing wild latency:

screen shot 2017-05-25 at 9 03 50 am

95% in yellow 99% in blue

This is cluster of 23 nodes 16 m4.10xlarge and 6 m4.2xlarge (5 control plane nodes that just have kube components and etcd). There are 9 kube dns pods currently running.

@bowei
Copy link
Member

bowei commented May 25, 2017

@SleepyBrett: are you on AWS as well?
@alok87: non-AZ is non-Amazon, meaning you are seeing this only on AWS?

@SleepyBrett
Copy link

I am on aws as well the graphs above are from a multiaz cluster HOWEVER we are seeing the same thing on a single az cluster

@bowei
Copy link
Member

bowei commented May 25, 2017

AZ = Azure?

@SleepyBrett
Copy link

SleepyBrett commented May 25, 2017

availability zone, I'm on slack if you want to chat.

@SleepyBrett
Copy link

Is this maybe related to #76

@alok87
Copy link
Author

alok87 commented May 27, 2017

@SleepyBrett i meannt to say not highly highly available as that cluster is in singapore with only two zones.(AWS)

@alok87
Copy link
Author

alok87 commented Jun 1, 2017

@bowei @SleepyBrett Observation: This issue is happening in the morning staging cluster gets a lot of load as developers create a lot of services, deployments and pods in k8s cluster. The DNS latency kicks in. But in the night when the resources gets deleted it gets ok. Is the latency because it has to scan a lot of entries in the records in the morning ?

@bowei
Copy link
Member

bowei commented Jun 1, 2017

What is the # of services etc? You can also try setting logging level to --v=0 for kube-dns container in the kube-dns pod as that is impacted by # of services as well.

@alok87
Copy link
Author

alok87 commented Jun 5, 2017

@bowei its happening in our prod cluster also now...
we expose all our product pods using services. these are those services...

@alok87 alok87 closed this as completed Jun 6, 2017
@alok87 alok87 reopened this Jun 6, 2017
@alok87
Copy link
Author

alok87 commented Jun 6, 2017

@bowei

$ export num=0 && while true; do curl -s -w "$num = %{time_namelookup}" "time nslookup service.namespace"; echo ""; num=$((num+1)); done
0 = 0.000000
1 = 0.000000
2 = 0.000000
3 = 0.000000
4 = 0.000000
5 = 0.000000
.
.
.
.
60 = 5.223300

After few hundred requests one request shows huge latency of around 5 seconds.

@alok87
Copy link
Author

alok87 commented Jun 7, 2017

@bowei @thockin Skipping the kuberntes dns service completely resolved our issue. We changed our resolv.conf in our container to have the below content which skips the kube-dns service completely and uses aws managed dns service which resolves our service without any issue.

nameserver 10.0.0.2

instead of

nameserver 100.64.0.10
search namespace-latest.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

So there are two things -

  1. Since for our internal domain resolution we directly want to use the aws managed nameserver 10.0.0.2 we did not wanted kube-dns service to first search by appending all the search path to our requests which we think was the major reason for latency as the no of search path increased there were a lot of un-necessary searches causing the issue(most likely). Can this be done without having to configure resolv.conf just after container boot?
  2. Also we ran the tcpdump on the dns service host 100.64.0.10 and below is the log of tcpdump just before the 5 seconds timeout happened And it is always 5 seconds as that is the default timeout of resolv.conf. The timeout became - whatever value that was there in resolv.conf
    Also i am not 100% sure how it is a timeout as we did get response everytime but with a delay of 5 seconds in those cases.

request

curl http://www.myorg.com/something

tcpdump ouput of kube dns nameserver

16:39:56.294308 IP kube-dns.kube-system.svc.cluster.local.domain > my-nginx-858393261-9hhl3.37108: 45889 1/0/0 CNAME web-private.org.com. (68)
16:39:56.320395 IP my-nginx-858393261-9hhl3.34649 > kube-dns.kube-system.svc.cluster.local.domain: 17803+ A? www.org.com.central.svc.cluster.local. (58)
16:39:56.320436 IP my-nginx-858393261-9hhl3.34649 > kube-dns.kube-system.svc.cluster.local.domain: 26349+ AAAA? www.org.com.central.svc.cluster.local. (58)
16:39:56.320921 IP kube-dns.kube-system.svc.cluster.local.domain > my-nginx-858393261-9hhl3.34649: 26349 NXDomain 0/1/0 (151)
16:39:56.320923 IP kube-dns.kube-system.svc.cluster.local.domain > my-nginx-858393261-9hhl3.34649: 17803 NXDomain 0/1/0 (151)
16:39:56.320996 IP my-nginx-858393261-9hhl3.59543 > kube-dns.kube-system.svc.cluster.local.domain: 36987+ A? www.org.com.svc.cluster.local. (50)
16:39:56.321036 IP my-nginx-858393261-9hhl3.59543 > kube-dns.kube-system.svc.cluster.local.domain: 57616+ AAAA? www.org.com.svc.cluster.local. (50)
16:39:56.321444 IP kube-dns.kube-system.svc.cluster.local.domain > my-nginx-858393261-9hhl3.59543: 57616 NXDomain 0/1/0 (143)
16:39:56.321446 IP kube-dns.kube-system.svc.cluster.local.domain > my-nginx-858393261-9hhl3.59543: 36987 NXDomain 0/1/0 (143)
16:39:56.321495 IP my-nginx-858393261-9hhl3.54097 > kube-dns.kube-system.svc.cluster.local.domain: 5191+ A? www.org.com.cluster.local. (46)
16:39:56.321608 IP my-nginx-858393261-9hhl3.54097 > kube-dns.kube-system.svc.cluster.local.domain: 17282+ AAAA? www.org.com.cluster.local. (46)
16:39:56.323062 IP kube-dns.kube-system.svc.cluster.local.domain > my-nginx-858393261-9hhl3.54097: 17282 NXDomain 0/1/0 (139)

@andrewgdavis
Copy link

andrewgdavis commented Jun 7, 2017

I have been looking into this also.

this is from an alpine:3.6 container (after obtaining curl apk add --update curl)

/ # time curl -6 -w "@curl-format.txt" -o /dev/null -s http://google.com
            time_namelookup:  0.000000               time_connect:  0.000000            time_appconnect:  0.000000           time_pretransfer:  0.000000              time_redirect:  0.000000         time_starttransfer:  0.000000                            ----------                 time_total:  0.000000Command exited with non-zero status 6
real	0m 5.01s
user	0m 0.00s
sys	0m 0.00s
/ # time curl -4 -w "@curl-format.txt" -o /dev/null -s http://google.com
            time_namelookup:  0.011554               time_connect:  0.099152            time_appconnect:  0.000000           time_pretransfer:  0.099260              time_redirect:  0.000000         time_starttransfer:  0.197497                            ----------                 time_total:  0.197553
real	0m 0.20s
user	0m 0.00s
sys	0m 0.00s
/ # time curl -w "@curl-format.txt" -o /dev/null -s http://google.com
            time_namelookup:  5.008615               time_connect:  5.096003            time_appconnect:  0.000000           time_pretransfer:  5.096111              time_redirect:  0.000000         time_starttransfer:  5.197315                            ----------                 time_total:  5.197362
real	0m 5.20s
user	0m 0.00s
sys	0m 0.00s

so the [AAAA?] (ipv6) lookup seems to be causing the timing latency.
I am still looking into the configuration / work around-- without removing kube-dns/dnsmasq

edit: for reference on the @curl-format.txt: https://blog.josephscott.org/2011/10/14/timing-details-with-curl/

@bowei
Copy link
Member

bowei commented Jun 7, 2017

@andrewgdavis : Is it possible to post (in a gist) the output of tcpdump -n -vvv -i any port 53 (e.g. the DNS requests) from the pod?

(For each of the executions of curl)

@andrewgdavis
Copy link

andrewgdavis commented Jun 8, 2017

digging a bit further this behavior does not happen with a busybox container:

kubectl run bb -it --image=busybox --restart=Never sh
/ # time wget google.com
Connecting to google.com (172.217.2.46:80)
Connecting to www.google.com (172.217.6.36:80)
index.html           100% |*****************************************************************************************************************************************************************************************************************************| 11226   0:00:00 ETA
real	0m 0.32s
user	0m 0.00s
sys	0m 0.00s

tried with alpine:3.6 and 3.5 (and added /etc/nsswitch setting of
echo 'hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4' > /etc/nsswitch.conf
but that didn't seem to help...

@bowei
Copy link
Member

bowei commented Jun 8, 2017

Am I reading your test right in that you are making it return with NXDOMAIN if mdns does not respond with name? What happens if you remove mdns4_minimal and mdns4 altogether?

@andrewgdavis
Copy link

by default there isn't anything in /etc/nsswitch.conf in the alpine:{3.6,3.5} containers

... is there some other configuration that you want me to try?

@andrewgdavis
Copy link

narrowing this down a bit more... by modifying the configmap from:

 data:
   upstreamNameservers: |
-    ["10.0.2.3", "8.8.8.8", "8.8.4.4"]
+   ["${corp.internal.nameserver.ip}"]
**** where the 10.0.2.3 nameserver was from a coreos vagrant setup...

to a singular corporate dns nameserver, alpine works fine. still investigating; but it seems like at least a valid work around...

kubectl run alpine36 -it --image=alpine:3.6 --restart=Never sh
/ # time curl -w "@curl-format.txt" -o /dev/null -s http://google.com
time_namelookup:  0.001284               time_connect:  0.087770            time_appconnect:  0.000000           time_pretransfer:  0.087881              time_redirect:  0.000000         time_starttransfer:  0.188864                            ----------                 time_total:  0.188917
real	0m 0.19s
user	0m 0.00s
sys	0m 0.00s

@bowei
Copy link
Member

bowei commented Jun 8, 2017

Does 10.0.2.3 have different entries than the public DNS servers?

All servers in the upstreamNameservers list must return the same responses as dnsmasq will randomly pick one to query, otherwise you will get inconsistent DNS resolution.

From http://www.thekelleys.org.uk/dnsmasq/docs/dnsmasq-man.html

By default, dnsmasq will send queries to any of the upstream servers it knows about and tries to favour servers that are known to be up. 

@alok87
Copy link
Author

alok87 commented Jun 8, 2017

@andrewgdavis which config map are you defining these nameservers? I also want to use a corporate nameserver for a particular domain(say www.example.com) and for rest (say myservice.svc.cluster.local) i want to keep using kube-dns nameserver. How do we do that ?
Currently we have disabled kube dns nameserver by changing resolv.conf

@bowei
Copy link
Member

bowei commented Jun 8, 2017

http://blog.kubernetes.io/2017/04/configuring-private-dns-zones-upstream-nameservers-kubernetes.html

stubDomains: {
   "example.com": "1.2.3.4"
}

@alok87
Copy link
Author

alok87 commented Jun 8, 2017

@bowei The issue of dns lookup came again... even after skipping the kube dns completely by changing the resolv.conf to point to our aws managed dns service(10.0.0.2). But the no has reduced.

@bowei
Copy link
Member

bowei commented Jun 8, 2017

If you are not using kube-dns, then it seems like there is an issue with your 10.0.0.2 DNS service. What DNS service/server are you using?

@alok87
Copy link
Author

alok87 commented Jun 8, 2017

@bowei The nos have reduced significantly, i think because the no of lookups have reduced as there is no search happening for search namespace-latest.svc.cluster.local svc.cluster.local cluster.local So 10.0.0.2 is the default AWS managed dns nameserver.

@bboreham
Copy link
Contributor

bboreham commented Jun 8, 2017

From the tcptrace shown it could also be caused by intermittent packet loss at the network level.

@alok87
Copy link
Author

alok87 commented Jun 8, 2017

@bboreham @bowei I have configured the custom dns forwarding from dns cache for our internal domain. The dnsmasq pod has restarted picking up the new config map. How can I verify if for my domain the requests are getting forwarded directly to our custom dns from dnsmasq

@SleepyBrett
Copy link

My current test harness may be flawed, and I'd love some input on that point but I've run a few tests as follows:

I've set up a special kube-dns pod that isn't in the service, with an extra sidecar container with DNSPerf in it. I've added the following arg to the dnsmasq-nanny --log-queries=extra causing me to get logging from dnsmasq about when queries were made and when replies were sent with sub-second accuracy (and here I have some doubts, am i seeing artifacts of logging hitching)

For completeness here are the args I'm passing to the masq nanny:

- -v=2
- -logtostderr
- -configDir=/etc/k8s/dns/dnsmasq-nanny
- -restartDnsmasq=true
- --
- -k
- --cache-size=1000
- --log-facility=-
- --log-queries=extra
- --server=/cluster.local/127.0.0.1#10053
- --server=/in-addr.arpa/127.0.0.1#10053
- --server=/ip6.arpa/127.0.0.1#10053

Then I was running the following dnsperf command in the pod dnsperf -s 127.0.0.1 -p 53 -d /mnt/hn.short -c 10 -n 10000 -Q 2000

hn.short contains only the following:

google.com A
amazon.com A
jira.nordstrom.net A
git.nordstrom.net A
github.com A
youtube.com A
cnn.com A
nordstrom.com A
confluence.nordstrom.net A
current.platform.prod.aws.cloud.nordstrom.net A
current.platform.r53.nordstrom.net A
imdb.com A
blizzard.com A
nbc.com A
newsblur.com A

And therefore shouldn't ever cause the cache to overflow ( If I understand the functionality of dnsmasq ).

In my thinking this means that, aside TTL expiry, after the first round of queries the only communication should be to and from the dnsmasq cache.

I then wrote a little perl script that takes each query's "start time" and the time of it's first reply and calls that that queries "response time". The script outputs this response time, the time the query was received by dnsmasq and it's corrolation id.

Once that's sent through gnu plot I get plots like this:
screen shot 2017-06-08 at 8 12 47 am

Very localized spikes of latency ( x axis is just seconds of the timestamp of when the query started y axis is milliseconds of response times ).

@andrewgdavis
Copy link

Sorry if there are conflated issues here--- latency can be caused by many different factors. My particular issue seems to point to a problem with alpine containers in conjunction with virtualbox networking when resolving ipv6 traffic:
https://bugs.alpinelinux.org/issues/6221

work arounds:
a) use busybox instead of alpine
b) modify upstream nameservers to be compliant with whatever alpine wants (unfortunately i don't know the explicit details; I just got lucky by modifying the kubedns configmap to be a singular nameserver)

@bowei
Copy link
Member

bowei commented Jun 8, 2017

@alok87 -- you can check the logs for dnsmasq-nanny, it will log when an update has been applied. The config map update is set to around ~30 s. I think by default.

@alok87
Copy link
Author

alok87 commented Jun 9, 2017

@bowei yep its showing server flag for my domain.
does dnsmasq will cache my private domain(example.com.) requests ? coz i still get the latency even after using custom dns

@bowei
Copy link
Member

bowei commented Jun 9, 2017

As alluded to above, if you are still seeing delays even if you are skipping kube-dns, then I suspect an issue with your DNS server itself and I would investigate that leg of the DNS query communications.

@alok87
Copy link
Author

alok87 commented Jun 9, 2017

@bowei Yep the issue is with that DNS but i wanted to cache request in this dnsmasq we already have in place to reduce the no of requests to the aws managed DNS. Does dnsmasq cache custom domains if I use the private DNS setup you mentioned?

@bowei
Copy link
Member

bowei commented Jun 9, 2017

yes it will be cached with the TTL given by the server. Is it ok to close this issue as it is not related to k8s itself?

@alok87 alok87 closed this as completed Jun 9, 2017
@alok87
Copy link
Author

alok87 commented Jun 12, 2017

@bowei We have solved this issue in our production cluster. We did below

  • Completely bypassed kube-dns and used aws managed nameserver(10.0.0.2) in /etc/resolv.conf
nameserver 10.0.0.2
  • Even after switiching to aws managed nameserver our timeouts were happening but the number was reduced. We figured out the aws was rate limiting us as we were doing huge number of lookups for our logging.example.com which was being looked 2-3 times every few seconds. Reducing this un-necessary lookup fixed our issue. But we still face the issue if we use the kube -dns(with private stubdomain for our example.com). There does seem to be some performance bottleneck with kube-dns . Should i reopen this issue?

@jsravn
Copy link

jsravn commented Jun 12, 2017

Might not be the same issue, but we have noticed systemic kube-dns problems since moving to 1.6 from 1.4, also in AWS. Sporadic kube-dns failures. It also happens consistently (but sort of rarely) when kube-dns is deployed. We only use kube-dns for internal resolution, everything else gets directed to our own DNS servers which have never had a problem. When I have some time I'll try to reproduce it.

@alok87
Copy link
Author

alok87 commented Jun 13, 2017

@bowei I ran the perf-test on our prod cluster and queried example.com A Here is the result - https://gist.github.com/alok87/963f59011bbcd9256287c30294a1a8fd Please take a look.

@bowei
Copy link
Member

bowei commented Jun 13, 2017

I suspect you probably exceeded some QPS bounds (the tool reports 40k/s) resulting in dropped packets...

@alok87
Copy link
Author

alok87 commented Jun 13, 2017

@bowei What is the QPS supported by kube-dns at present

@bowei
Copy link
Member

bowei commented Jun 14, 2017

That depends on how many replicas you have. It should be scaled to match your QPS. If you are running dnsperf to max out the system, it will increase # of packets sent unless there is loss...

There is also the issue of running out of conntrack entries for UDP.

@Miyurz
Copy link

Miyurz commented Sep 8, 2017

I suspect this has bitten us too now.

Every morning one of our micro services that renders images from cloudinary times out resolving dns via kube dns. I scaled up the kube-dns pods from 2 to 9 but in vain. This was even though we had dns auto scaler pod from the add on manager.

Error fetching from URL http://res.cloudinary.com/xyz.com//w_136,h_102,c_fill,q_75/c01a0e1c-ed13-4f0c-aeae-19e3f77c146c.webp error { Error: getaddrinfo EAI_AGAIN res.cloudinary.com:80
    at Object.exports._errnoException (util.js:1022:11)
    at errnoException (dns.js:33:15)
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:76:26)
  code: 'EAI_AGAIN',
  errno: 'EAI_AGAIN',

So I tried injecting google's DNS servers in /etc/resolv.conf in my containers but they were over written with cluster DNS, 100.64.0.10. I don't know what overwrites that.

@alok87 How did you bypass kubedns to use your own DNS ?

Not to forget, my cluster is on k8s 1.5.7

@bowei
Copy link
Member

bowei commented Sep 8, 2017

You can set the pod DnsPolicy to Default instead of ClusterFirst to disable use of kube-dns. What QPS for DNS queries are you seeing? What platform are you on?

@alok87
Copy link
Author

alok87 commented Sep 12, 2017

@Miyurz We in our docker-entrypoiny.sh require kube-dns. Hence we are by passing by changing resolv.conf when required. But the better way is what @bowei suggested above if you want to by pass it completely.

set_kube_dns () {
  echo "nameserver $KUBE_DNS_SERVICE_IP" > ~/resolv.conf
  echo "search svc.cluster.local" >> ~/resolv.conf
  cp -f ~/resolv.conf /etc/resolv.conf
  rm -f ~/resolv.conf
}

set_default_dns () {
    echo "nameserver 10.0.0.2" > ~/resolv.conf
    cp -f ~/resolv.conf /etc/resolv.conf
    rm -f ~/resolv.conf
}

@bowei Hope we could fix this problem permanently...

@hanikesn
Copy link

We're having similiar problems in our cluster. Occasionally DNS requests time out for clients. We already scaled up kube-dns deployment to about 8 pods manually, so we have about 20% CPU load per pod. The CPU load is shared between kube-dns and dnsmasq. We're also not hitting nf_conntrack limits and we're using calico in AWS with a single AZ setup.

@jsravn
Copy link

jsravn commented Nov 22, 2017

Similar to @hanikesn I’m also observing sporadic dns failures against kube-dns. Even using dnsmasq to mitigate it, kube-dns requests will inexplicably time out over a period of seconds. External DNS requests to our own dns servers are all fine. I haven’t dug into it yet though. But there seems some sort of systemic issue based on other people’s experiences.

@markine
Copy link

markine commented Nov 22, 2017

Cross posting as this all seems related: kubernetes/kubernetes#45976 (comment)

@bowei
Copy link
Member

bowei commented Nov 27, 2017

What are your environments (platform, kernel, k8s version)? Can you open a new bug as this one is closed.

@krmayankk
Copy link

/area dns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests