What’s the maximum size of a DNS response?

toast0 · on July 27, 2022

One thing to consider if you're packing in A records, but want to fit in 512 bytes is that there may be an intermediary server doing DNS64, turning your A's into AAAA's at a much increased size. It's been a while since I looked at that, but I was encouraged to return no more than 8 A records, so that T-Mobile could return 8 AAAA and not go over the 512-byte length where things tend to get dicey.

ysleepy · on July 29, 2022

Wouldn't the intermediary dns server then not just set the truncated bit and make the client retry using tcp, just as always?

tptacek · on July 27, 2022

A fun bonus detail: the libc used by many Docker containers doesn't implement TCP DNS at all, and just returns the truncated result from the UDP response as if it was the whole answer.

rascul · on July 27, 2022

Is that because they're using Alpine and Musl?

I am reminded of a discussion a few months back.

https://news.ycombinator.com/item?id=30642632

InvaderFizz · on July 28, 2022

Probably. Thankfully RFC9210[0] now makes the musl implementation clearly in violation of spec, so their reasoning for just truncating replies is no longer relevant.

0: https://www.rfc-editor.org/info/rfc9210

vbezhenar · on July 28, 2022

This is not mentioned at [0]. I guess they fixed this particular quirk. Is there any simple method to test it?

0: https://wiki.musl-libc.org/functional-differences-from-glibc...

rascul · on July 28, 2022

I would be interested to know if it's been fixed. I don't have time to dig into it right now, though.

Dwedit · on July 27, 2022

People used to use DNS Tunneling to get free internet in a captive portal, so a lot of data was getting transmitted in DNS packets.

siraben · on July 27, 2022

This still works in a lot of circumstances with iodine[0], including several US domestic flights.

[0] https://github.com/yarrick/iodine

ipdashc · on July 27, 2022

Is DNS tunneling not used anymore? I've been thinking of setting one up recently

hsbauauvhabzb · on July 27, 2022

It’s used by malware, and detectable via high entropy nature of the traffic (most domains don’t have thousands of txt records containing b64). Recently I sent a 1mb file over dns, which terminated abruptly, I suspect dns resolvers blacklisted the domain.

It also multiplies bandwidth by at least 2-3x I think, so I’d call it bad etiquette to use unless you have a very very good reason to.

ipdashc · on July 27, 2022

Gotcha, I see. I just wanted to use it to get around captive portals when necessary, but if DNS blacklisting is an issue, oof...

ev1 · on July 27, 2022

ive seen a few only allow their resolver to be used to resolve specific A records, like just enough to pay/check out.

hsbauauvhabzb · on July 28, 2022

What I mean is I suspect upstream resolvers (google, cloud flare, etc) blocked my domain from resolving any cache misses.

jlokier · on July 28, 2022

I used to use ICMP Echo tunnelling, and had really fast internet compared to the rest of the 50 person office, which had a heavily throttled link except for ICMP.

badrabbit · on July 27, 2022

For requests, I had to investigate a really long domain one time and both glibc and curl had limits less than the maximum of 256 characters unless you recompile stuff. No idea why to this date.

1vuio0pswjnm7 · on July 27, 2022

   Why EDNS0
   Because DNSSEC
   Why DNSSEC
   Because cache poisoning
   Why cache poisioning
   Because remote, shared caches
   Why remote, shared caches

Was the idea for remote, shared caches suggested in RFC 1035.

RFC 1035 suggests that the resolver and the cache are part of the user's operating system.

"User queries will typically be operating system calls, and the resolver and its cache will be part of the host operating system."

The expectation is that the cache is local, not remote.

"Resolvers answer user queries with information they acquire via queries to foreign name servers and the local cache."

Mockpatris' DNS does not direct anyone to make use of remote, shared caches. It is entirely optional.

"Foreign nameservers" refers to authoritative servers, not third party caches.

DNS can work without using third party DNS service. I have been using it this way for 14 years. Remote, shared caches are optional. As is making remote DNS queries contemporaneous with HTTP requests, generally. For myself, the DNS data I need can be collected in bulk and stored.

IMO, a general historical principle that applies to computer software is that "features" can carry risks. In a non-DNS context, Apple's latest "Lockdown Mode" has now provided a popular contemporary illustration. Not every "feature" makes sense for every user in every situation. EDNS0 is an optional feature. Some computer owners may choose not to use it.

There are risks in DNS "extensions".^1 It is for users to decide whether they wish to undertake those risks by using them. Personally, I still adhere to original DNS packet sizes on the networks I control. The now relatively small 512 byte packets are one of the things I like most about DNS. I do not enable EDNS0. I do not send ECS.

NB. I am not against extensions, per se. RFC 1035 contemplated them. I remain keen to to learn what the "Z" bits will be eventually be used for. However using remote caches "safely" and facilitating someone else's DNS-based load balancing are not "features" that interest me.

1. EDNS0/DNSSEC becoming a DoS vector and user privacy leakage via ECS.

Notes

EDNS0 may have spawned further extensions with different impetuses. EDNS Client Subnet (ECS) being an obvious example. Here I refer to what appears to be the initial impetus for EDNS0.

ECS has been the source of problems, e.g, with user privacy.^2 Its impetus was to help CDNs, not users.^3 It may or may not also be used to further the commercially useful data collection (read: advertising-related services) of those who provide third party DNS service.

Remote, shared caches are sometimes referred to as "open resolvers". Problems have been associated with open resolvers.

2. User privacy against the third party and its commercial pertners, not against the operator of the authoritative nameserver.

3. Whether it is "mandatory" (cf. optional) for CDNs is debatable. Cloudflare, a large CDN and third party DNS provider, has said they do not send it. Unless the RFCs have changed, IETF recommends that it be disabled unless it provides a clear benefit to clients.

jiggawatts · on July 27, 2022

These extensions aren’t solely intended for “greedy CDN corporations”.

Anyone who has set up multi-site load balancers knows that telcos aggregating DNS via just a couple of egress IP addresses makes load balancing too coarse. Similarly, session stickyness may not work.

These extensions largely fix this issue.

Having said all this, anycast routing is arguably superior for solving this problem, but nonetheless it’s a real problem that needs some solution.

jsmith45 · on July 27, 2022

RFC 1035 did in one part suggest a model in which some bigger more powerful machines may run a recursive resolver as the local resolver, but other machines would probably use a stub resolver that redirected a local service running a recursive resolver as a service to the network.

This had benefits like the centralized cache of the local networks' recursive resolver being more beneficial.

Implementation wise, most implementations are either stub resolvers, or full blown dns servers capable of serving up arbitrary sets of domains authoritatively. There are fewer implementations that do recursive lookup without a full authoritative server implementation. Obviously including a full capability DNS server in every OS install is absurd, so they come with with the stub resolvers instead.

And originally when the Internet was just big organizations connecting their networks, each network running a recursive resolver for the other machines made sense and worked fine. But along comes home internet, where originally just one machine running windows was getting connected. ISPs could perhaps have required users to run their own recursive resolver, this would be painful, and inefficient. (Keep in mind that a recursive resolver's cache is more efficient the more machines it is providing services to). So the ISPs ended up running recursive resolvers for customers.

But now since the ISPs customers don't all trust each other, concerns like cache poisoning become possible, which were not much of an issue when you ran and trusted your own networks recursive resolver.

ev1 · on July 27, 2022

You forgot "ISPs customers don't trust the ISP" because the ISP is selling their DNS info, what's being resolved, returning false ad-filled spam instead of a nxdomain...

CraigJPerry · on July 27, 2022

This made me wonder about DNS over HTTP but it seems the standard accounts for size and limits both GET / query string and POST methods to 512 bytes.

So tcp supports the longest then?

jagged-chisel · on July 27, 2022

> the actual maximum sizes [and how various factors] influence the practical limit are far from obvious.

nimbius · on July 27, 2022

totally untelated but I wonder if DNS response size is optimized in faang..

toast0 · on July 27, 2022

Mostly faangs use load balancers and don't return more than one A/AAAA record. Sometimes there's some CNAME chaining which isn't ideal, but might be expedient. Otherwise, not too much to optimize.

Otoh, they like their really short TTLs, which results in a lot of queries.

divbzero · on July 27, 2022

I suppose they opt for really short TTLs for faster failover if needed?

toast0 · on July 29, 2022

Yeah, that's the idea, but then there's stuff like this:

   forums.adobe.com.       5       IN      CNAME   forum-redirects.trafficmanager.net.
   forum-redirects.trafficmanager.net.     60      IN      CNAME   encoderfuncus.azurewebsites.net.
   encoderfuncus.azurewebsites.net.        30      IN      CNAME   waws-prod-mwh-007.sip.azurewebsites.windows.net.
   waws-prod-mwh-007.sip.azurewebsites.windows.net.        1800    IN      CNAME   waws-prod-mwh-007.cloudapp.net.
   waws-prod-mwh-007.cloudapp.net. 60      IN      A       52.175.254.10

On a totally cold lookup, that takes at least 13 queries to load, because of the long CNAME chain, and azurewebsites.windows.net is sub-delegated with NS (with 5 minute ttl). That's a lot of back and forth to chase, and then much of it needs to be redone if someone comes back soon.

more_corn · on July 27, 2022

udp or tcp?

mobilio · on July 27, 2022

"Now just about every website on this here internet will tell you that the DNS uses UDP port 53, and that any response must fit into a single 512 byte UDP packet, and of course that answer is right. Except when it isn't. But let's start with that assumption and see how much data we can then fit into a single 512 byte response:"

nineteen999 · on July 28, 2022

Worked for a datacenter once where the Checkpoint Firewall security expert had blocked outbound TCP port 53 from a major customer because "DNS will just use UDP port 53 instead and it's more secure with TCP blocked".

The customer complained that outbound email wasn't being reliably being delivered to domains that had long MX lists - eg. Yahoo, Google etc. Their sendmail queues were long and unwieldy. Didn't take me long to see that the MX list was too big to fit in a 512-byte UDP response packet and was being truncated.

Took a while to convince the stubborn security engineer that this was the problem and get him to unblock TCP Port 53 - after which, mail started flowing smoothly.

bradknowles · on July 28, 2022

I managed to cram 45 IP addresses into a single UDP DNS response for the MXes for AOL.com many years ago.

However, that trick also ended up breaking all internet e-mail around the world -- see https://www.theregister.com/2018/04/16/who_me/

DougN7 · on July 30, 2022

That was a great read. Thank you for sharing :)