Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: why are domain names reversely ordered?
25 points by riobard on Oct 17, 2009 | hide | past | favorite | 45 comments
Why is it www.google.com instead of com.google.www? Tried searching for a good explanation but found nothing helpful. Is there any solid reasons for the arrangement, or is it just a random choice?

[EDIT]: as bajsejohannes points out, the major problem of the current arrangement is that it differs from the order of the path component, as in

    www.google.com/path/to/the/file
it really makes more sense to say

    com.google.www/path/to/the/file



I think it comes down to history. Host names existed before domain names. When domains were bolted on they used the idea of a default domain for each host and that made sense to be on the end.

Consider:

  telnet hosta          # established way
  telnet hosta.abc      # domain bolted on back
  telnet abc.hosta      # domain bolted on front
Since people knew the host names and were used to dealing with them, the suffix was more natural since it kept the domain cruft out at the edge.


This seems pretty much right based on the historical data.

It doesn't seem like ordering of domain name parts was given much thought. RFC 882, which first defined the domain name space, said only this:

By convention, the labels that compose a domain name are read left to right, from the most specific (lowest) to the least specific (highest).

Additionally, it seems that when the first TLD was defined, .arpa, some people were hacking to support it by just concatenating .arpa to the end of all the domain names. Concatenation, of course, is much easier than prepending when you are programming in C. Note the following from RFC 881, which described the transition to using domain names, at a time when the domain name mapping was all stored in HOSTS.TXT files:

So far, no new domains have been introduced. Only a table with all the entries having official names in the ARPA domain has been provided. This should allow programs to be constructed to deal with domain style names in a general way without any special hacks to add or delete the string ".ARPA" to or from host names.


(Just to clarify...) I know what you mean, but I don't think your question makes sense without comparing it to something. That something would naturally be the path.

www.google.com makes as much sense as com.google.www

but

www.google.com/root/sub/subsub makes less sense than com.google.www/root/sub/subsub or subsub/sub/root/www.google.com

Since the latter has consistent increasing or decreasing of specificity, while the former has not.

This incidentally mirrors what I don't like about the date format MM/DD/YY.


That's exactly what is in my mind! The mismatch of domain name order and path order really bothers me for a while and I just couldn't find a sound reason for that.

For the date format, AFAIK, only English speaking countries use MM/DD/YY, possibly due to the same order in plain writing style, e.g. October 17, 2009. This is completely insane for virtually other parts of the world because logically it's either YY/MM/DD (big to small) or DD/MM/YY (small to big), but NEVER MIXED!


actually, only USA does the rediculous MM/DD/YY. the UK is DD/MM/YY


Plus when saying a date it is usual to say the day first in the UK then the month if it's not clear from context. The US date ordering is one of those things (like non-metric units) that make me feel that they're being purposefully obstructive; kinda like Microsoft, always avoiding using other peoples agreed standards.


Actually, com.google.www/root/sub/subsub would make the most sense.

The Commercial domain. The Google host/company. The www service/sub-domain. The path.


I don't think that there's anything inherently more sensible about going from least to most specific instead of the opposite, as long as it's consistent.


The added disadvantage of the current scheme is that it allows all kinds of phishing tricks that would have been a lot harder to get away with if the domain elements would be in the reverse order.

htp://twitter.com.someidiotdomain.info/enteryourpaswordhere

would stand out immediately.

(only one 't' so HN doesn't turn it into a link).


Interview with Tim Berners-Lee where he says he regrets this:

http://www.impactlab.com/2006/03/25/interview-with-tim-berne...

Looking at what he accomplished, I forgive him.


If he had done that it would've been massively confusing, given that domain names would still be the current ordering for everything else (e-mail etc.).


So this is a mistake ...


Why? Independent invention of the domain name system, and the URL-path system, at different times by different people.

URL-inventors did not think it important enough to break the already-established convention for domain names to acheive hierarchical ordering consistency. With hindsight, nearly 20 years later, it might seem that it would have been worth the nonstandard novelty, but I still wouldn't be so sure.

Consistency with telnet and email was very important for early URL comprehension and adoption among technical folk. And, the reversal-of-ordering corresponds with an important threshold in URL-resolution, from one system (network-layer and owned-domains in a collaborative framework) to another (a single hostname's internal organization, usually under a unified authority). That signification can be helpful even if it's hard to explain why. (The same goes for seemingly arbitrary, path-dependent choices in natural language grammars that nonetheless dominate logically-designed synthetic languages.)

It's easy enough for specialized applications to adopt a reversed form; junklight mentions the 'SURT' form used by my project, Heritrix, which reorders a URI internally for certain scoping/sorting/policy-decision purposes as:

  http://(com,example,www,)/path/to/the/resource
(joshu's proposed move of the port/protocol to 'between' the host and path also has a strong puff of logic about it. I think Google, in their BigTable URI-keys, tends to reverse domain-segments and put the protocol/port later.)


It's just a convention, and one that took a while to work itself out. Some older mail servers used to route (and even rewrite -- shudder) domains the other way round. Like little- and big-endian numbers, the merits of either method are less important than having everyone do it the same way.

http://catb.org/jargon/html/B/big-endian.html


I have no idea. UUCP (Usenet) addresses used to be (are?) "backwards" and ! separated. Maybe some historical references related to that would point you to more info.

If I were to guess though, I'd guess email addresses. me@machine being originally valid makes me@tld.machine harder to parse?


They weren't really addresses; it was a series of hosts. It was thus more like a route.


I think everybody that has to write applications that deal with URLs as core identifiers have asked this. It's also hairy 'cause leading part of a URL (protocol and hostname) is case insensitive but the trailing part (path, query string and fragment) isn't. At my last job I created a framework for normalizing and canonicalizing URLs as well as storing them consistently (with the hostname components reversed), it was a big improvement for retrieval and duplication detection accuracy and performance.


For the same reason your address doesn't go 'USA, California, San Jose, 1297 Fray way'.


Bad analogy:

email:

   georgex@somedomain.sometld
www:

   http://somedomain.sometld/~georgex/
real life:

   George Xandros 
   553 Sunset Blvd
   12322 San Jose
   USA
So, the web is mixed up, and the other two are 'backwards'. None of them get it in the most logical way.

Same with dates:

   2002.10.09
vs

   09.10.2002
or

   10.09.2002
It's all about conventions, what we're used to, so to say 'the same reason' is calling upon a reason that you don't state, when if you thought about it long enough would boil down to 'that's how we've always done it'.

But that doesn't really answer the question.

The most 'logical' order is to put the larger units up front, and smaller ones towards the end.

But since we are where we are that isn't going to change 'for the same reason' (too much investment) that England isn't going to switch to driving on the 'right' side of the road.

Once a convention is established even if there is a marginally better one the cost of switching usually outweighs the advantage of the switch, not to mention a whole bunch of messy stuff during the transition.


Is there a clear benefit to driving on the right hand side of the road, over the left?


On the whole there might be, but I'm not sure of that. In the current situation, yes, because there is a cost associated with having to support two kinds of driving styles.

I have both 'LHD' and 'RHD' cars and find that I can switch at will, but most people are uncomfortable in the 'wrong' kind of car because it messes up your overview when overtaking on 'b' roads (two laners).

If everybody would be driving the same kind of car on the same kind of road then that problem would go away.


In China, addresses do work that way. Word order of addresses is a language-specific rule that varies by language.


Same for Russia. Also, we never use MM/DD/YY.


And what reason is that?


The most local comes first because the rest is optional. Both in real life (“give that to Peter”) and to some degree for computer network addresses.

You can email ‘peter’ which would be on same machine, ‘peter@arts’ which would be on the ‘arts’ machine on your LAN, or ‘peter@BigCo.com’ which would be on the WAN.

The http protocol takes existing domain names and tags on a path. The domain name system was already in place, so if anything had to be reversed, it would need to be the path, but that would be rather problematic.


Because the US internet won out over the UK JANET?

http://en.wikipedia.org/wiki/JANET_NRS

Which caused no end of problems in mail relay at the time.


Yes, I think this is the answer. There are two choices, so being human, we chose both. Then eventually chose one of those two. I used to have an email address jmsd@uk.ac.cam.cl i.e. with the domains reversed.

~Matt


If it was ordered com.google.www, you might not even need to use different characters for path separators than domain separators.

com.google.www.path.to.the.file

com/google/www/path/to/the/file


Has anyone written a browser plugin that makes it seem like everything is reversed (or 'correct' if that's your view)?


That would be pretty invasive, because plenty of the places where the urls appear are not so easy to get to.

Statusbar, url bar, hovers over links, view source etc.


com.google:http/80:/path/to/the/file perhaps?


Or even

  com/google:http/80:path/to/the/file
However, it could just as conceivably be

  http/80:com/google:path/to/the/file
because, imho, the protocol and port are not really part of the hierarchical structure (part of the url, but not necessarily part of the uri).


This is widely considered to be a mistake.


This is widely considered to be a mistake.

Citations, please? I've been able to deal with it over the years.


It turns out to be a complete pain for working out scope in Web crawlers because you need to separate the domain from the path part and deal with them separately. If it was in the opposite order it would be much simpler to process.

The Heritrix crawler (primarily worked on by the Internet Archive) introduces a "surt" form which is basically the domain in the same order as the path so that Reading from left to right it goes from least specific to most specific.


Crawling is an inconvenience, sure it is problematic, but phishing is much more problematic. You can train a machine to parse that thing right-to-left no problem, to tell users to start middle-to-left and then middle-to-right is too much of a burden.


Agreed. Not so much "a burden" as something that you will never be able to teach a large number of people.


That you've been 'able to deal with it' is simply because that's what you are used to, but that does not mean that that is better or worse, simply a way of life.

Anyway, anything I search for with 'order' and 'domainname' wants to sell me domains, so no citations, but the basic complaint was that it breaks the sequence of a url, where the 'highest' entity should be on the left, and the smallest entity on the right.

So, iirc, the optimimum would have been something like:

http/com/ibm/www/80/somepath/somefile

That wasn't it exactly, but it gives the general idea.

The DNS was long established by the time URLS rolled around so I doubt anything could have been done about it anyway.

If the phishing troubles resulting from the DNS order would have been foreseen I'm pretty sure that they would have picked the 'other' way.

edit:

found something about all this after some digging:

http://answers.google.com/answers/threadview/id/754114.html

http://bandb.blogspot.com/2009/01/are-domain-name-backwards....

Since the DNS is a hierarchical system, the 'root' of the hierarchy should have been at the beginning, just like in unix you don't start with the name of the file but with the 'root'.

Anyway, the quote I'm looking for is by none other than Tim Berners-Lee, I think it may have been in his book though, not online.


you'd have to have at least a protocol/host/path separator.

http/com/ibm/www/80/somepath/somefile -> is this referring to http://ibm.com/www... or http://www.ibm.com/80...

Instead, I would imagine it should be something like:

com.ibm:http/80:/somepath

You could even do a lookup to com.ibm and ask for http SRV record to find the actual host to connect to.


Someone above posted the interview with Tim Berners-Lee, I take it that's citation enough ?

The relevant quote:

"Looking back on 15 years or so of development of the Web is there anything you would do differently given the chance?

I would have skipped on the double slash - there’s no need for it. Also I would have put the domain name in the reverse order - in order of size so, for example, the BCS address would read: http:/uk.org.bcs/members. The last two terms of this example could both be servers if necessary."


He didn't invent domain names. He invented the web. So, no, it's not citation enough.

Also, given that domain names were already in the current order, if he'd put it in reverse order for the web, it'd be far worse than what we have now, which at least is consistent across different types of services.

EDIT: I guess you could argue it's citation enough if we assume, based on his examples, that the original question refers only to web usage.


I figured Tim Berners-Lee is in an excellent position to criticize not only his own work, but also the more general case of the domain name system.

Obviously if he had done it the other way around in URLs then that would have been a fairly strong point of critique against the DNS, the fact that he would have in retrospect been better of to choose the alternative in spite of creating two different systems makes that critique even stronger.


Not that it matters, but quoting one person about what they would have done doesn't count as "widely considered". I don't believe any large proportion of people consider it at all, let alone in a consistent direction.


As phishing becomes more and more of a problem this is getting wider 'play', people that are security conscious have commented on this for years, and the hierarchical break between domain names and paths always was an eyesore.

I've seen this crop up in many places, I was looking for Tim Berners-Lee statement if I could find it because I figure he's the authority in the field.

You may disagree with that of course.


I figured Tim Berners-Lee is in an excellent position to criticize not only his own work, but also the more general case of the domain name system.

And I agree, being the person who asked the question, and thank you for pointing out what he has said on the issue.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: