Using http:// and .com is madness

ryanwaggoner · on May 17, 2009

Actually, the one main benefit is that they can be parsed as URLs, both visually and by things like email programs. Compare:

"For the latest blog post, just head to elliottkember."

"For the latest blog post, just head to http://elliottkember.com

The first one makes no sense, while the second is obviously talking about a web URL. I'm sure there are other ways to make this work, but this author's post feels like a solution in search of a problem. At the very least, it's hardly "madness".

aneesh · on May 17, 2009

I remember a psychology study that showed people recognize www.yahoo.com as a url more quickly than yahoo.com -- with the latter, you don't see the ".com" until you get to the end of the phrase, and that's when you realize it's a url. I can only imagine that getting rid of the .com would make things even worse.

marcus · on May 17, 2009

That is because of psychological mechanism called Priming, basically an earlier input can affect your perception of a later input.

http://en.wikipedia.org/wiki/Priming_(psychology)

jackchristopher · on May 17, 2009

When I give my email I omit the .com. But I think this works with popular hosts only, like gmail or yahoo. I say "myname at yahoo" and people know to fill the rest.

If I can keep something implicit I will.

zandorg · on May 17, 2009

People who advertise their company (for example, on the side of one of their trucks) sometimes start it with a www. which would fail one of those OS-detected http:// tests which only look for http:// and not www.

skwiddor · on May 17, 2009

My OS isn't very good at reading the side of trucks

randomwalker · on May 17, 2009

This is even more of an issue in conversation. Although the http can be omitted, the .com is often the only clue that the speaker is referring to a web address.

chaosmachine · on May 17, 2009

A good .com domain is already very hard to get. Let's not overvalue them any more than they already are by giving them even more advantages over the other tlds.

gojomo · on May 17, 2009

Someone who highlights words and phrases in four different colors for various kinds of emphasis is lecturing the world about unnecessary signifiers?

silentOpen · on May 17, 2009

He's also looking for an MD5 fixed point. Something is madness but it's not URL syntax...

gojomo · on May 17, 2009

My intuition tells me there's a fair chance such a fixed-point exists, if MD5 behaves as a random oracle.

Specifically, there's a 1-in-2^128 chance that any 128-bit input will give itself as the output. Over 2^128 trials the chance that no input answers itself would be:

  (1-(2^-128))^(2^128)

...which mAlphaMatica helpfully calculates as...

http://www.wolframalpha.com/input/?i=(1-2^-128)^(2^128)

  0.367879441171442321595523770

That is, there's about a 63% chance Kember's quest to find an input whose MD5 output is itself will succeed.

Or, is there something in MD5's construction making this impossible, making the random-oracle model inapplicable?

madcaptenor · on May 18, 2009

The number you computed is nearly 1/e. Basically, (1-1/n)^n = exp(n log (1-1/n)) = exp(n (-1/n + 1/(2n^2) + O(1/n^3) )) = exp(-1 + 1/(2n) + O(1/n^2) ) = 1/e - 1/(2en) + O(1/n^2) and so your answer should agree with 1/e to, say, the first forty digits or so.

(Incidentally, Maple choked when I plugged in (1-2^-128)^(2^128), because it tried to evaluate it as a rational number, the numerator and denominator of which would have well over 2^128 digits.)

silentOpen · on May 18, 2009

Very interesting. I had not worked out or seen the math for this. I don't doubt that such a number could exist. I do question the speed with which one could find the number and the utility once it's found.

madcaptenor · on May 18, 2009

How to find it: do MD5 on some random input. Then do MD5 on that output, and repeat until it converges. The problem is that you could end up in a cycle, instead of at a fixed point.

The speed with which it can be found: I'm going to wave my hands and claim this is in "Analytic Combinatorics" by Flajolet and Sedgewick. Seriously, though, under the assumptions we've been throwing around here this is a "random mapping" and these are reasonably well-studied objects.

silentOpen · on May 18, 2009

It would be interesting to try to map the group properties of MD5-space. Every 128-bit number would either be in a slide to a cycle or in a cycle (a fixed point is a cycle of size 1).

This still doesn't change the huge-normousness of the task, though.

gojomo · on May 18, 2009

No need to worry about cycles from using one output as the next input; just iterate over all possible 128-bit inputs, in any order. Unless you're trying to leverage some known deviation of MD5 from being a true random oracle, each input is just as likely as any other to be an identity input.

helium · on May 17, 2009

This is kind of funny coming from a global perspective. I'm a South African who has worked in Europe. In the Netherlands ALL the websites have a .nl domain. In Switzerland it's all .chf. In South Africa it's all .co.za. If someone referred to just justsomewebsite/somepage, I would be pretty confused as to which domain it belongs to, especially as I have Google Chrome installed which defaults all my pages to have a .nl domain if I don't enter it.

swombat · on May 17, 2009

Non-issue, pointless to discuss.

janitha · on May 17, 2009

It's not pointless to discuss. These are lessons to be learned when new systems are to be designed. These are things that were not meant for mainstream usage when they were design. They just caught on. Now it's too adopted to do anything about it.

Note: Personally URLS should be backwards to make sense.. com.google.mail/sub/folder (go in order of largest authority down to the smallest subfolder).

At the moments it's

http://tiny.small.big.bigger/big/small/tiny

zacharydanger · on May 17, 2009

Your example of tiny.small.big.bigger is the wrong way to look at it. Instead, look at it as http://<unique host identifier>/path/to/resource and it makes more sense. tiny.small.big.bigger isn't even correct since you're trying to map "size" to subdomain level. As if a subunit with 5 parents can't be larger than a subunit with just 2 parents.

And com.google.mail would mean specifying the least meaningful information first. Would you rather be introduced to someone as "Widgets Incorporated, Director of Widgets, John Doe" or "John Doe, Director of Widgets at Widgets Incorporated".

dangrover · on May 17, 2009

I agree. If people knew how to read URLs, then phishing scams would no longer exist.

"Okay, the domain in the URL for this link is X, but I know my bank's domain is Y, therefore it's a scam."

Part of it is the URL syntax being unintuitive, part of it is simple lack of proper education/training. But it seems like in the time you tell people "DON'T CLICK LINKS IN EMAIL OMG", you could just teach them how to read a URL.

Maybe if URLs had spaces and looked a little more like a postal address than a confusing jumble, people might actually read them. As it stands now, though, most people just type everything into a search page -- even if they're actually typing in a URL as a search term. Kind of like that story about the number one search term on MSN being Google.

madcaptenor · on May 18, 2009

I actually search for Google every so often, because I get confused about whether I'm on my home computer (Chrome) or office computer (Firefox).

anamax · on May 18, 2009

The Brits proposed biggest.big.small.tiny for host names but got shot down.

gojomo · on May 17, 2009

Because of this inherent hierarchical inconsistency in standard URI formats, my projects sometimes internally use a reordered variant of the URI we call 'SURT form'. For example:

http://(com,google,mail,)/sub/folder

The parentheses and commas make it absolutely clear that you're dealing with a reordered URI-authority-component.

Even the trailing comma and off-paren are significant in our most common use-case: determining whether a URI falls within a certain hierarchical collection-scope by making a simple prefix-comparison. Compare the prefixes:

http://(com,google

http://(com,google,

http://(com,google,)

The first would also accept < http://(com,googlepages, > URIs, while the second would accept all subdomains of < com,google, >, and the third accepts only URIs strictly on < com,google, > but not its subdomains.

cool-RR · on May 17, 2009

swombat · on May 17, 2009

Because no one really cares, it really doesn't matter, and it won't get changed anyway (too many things depend on it). There are so many problems in the world that can be addressed and are worth addressing... this is not one of them.

byrneseyeview · on May 17, 2009

The only browser I use that requires http:// and .com is w3m.

cool-RR · on May 17, 2009

Just tried it on Firefox and Chrome, they both send me to Google. (Firefox with "I'm Feeling Lucky".)

andymoe · on May 17, 2009

This is a solved problem... We tried this with AOL keywords, and registrars hijacking DNS for search and both of those turned out to be a bad idea. Browsers these days will bring you to the .com domain by default anyway when you have a search provider selected. PS. Anyone else notice the Snake game on the site?

Sephr · on May 17, 2009

Sure, because ftp://somewhere.com and http://somewhere.com are always the same website.

potatolicious · on May 17, 2009

I'm surprised this argument came from someone who touts himself as a skilled web developer... The use of http:// is pretty obvious to anyone with even a rudimentary amount of networking knowledge.

noamsml · on May 17, 2009

But then how would we know what's a URL and what's not? And how will I distinguish outside sites from local machines and stuff I set in my hosts file?

rbanffy · on May 17, 2009

Can we bury this?

mcantelon · on May 17, 2009

Explicit is better than implicit. The semantic web will require, more than ever, that we specify exactly what our data represents, rather than leaving the interpretation to humans.

amalcon · on May 17, 2009

How would your browser distinguish between, say, www.net and www.net.com? How about a hypothetical www.net.com.com?

stevejalim · on May 17, 2009

Break the syntactic rules of a URL by having, essentially, an optional TLD and it'll just make things harder for people who are online, but don't quite get it (eg, my dad, etc)

rbanffy · on May 17, 2009

It's like the "www." discussion. URLs exist for a reason someone vocal may or may not understand. He only uses http URLs and .com domains, but I regularly use git://, svn:// and even faked a couple URL styles for my own use that I felt I could easily parse.

anigbrowl · on May 17, 2009

Although the intent was comic, the best verbalization I've ever heard (and one I now use frequently and effectiveively) is 'wuh wuh wuh something dot com'. Everyone gets it, and for people who hate writing down URLs (because it's techie and unnatural to them) it gives them a cheap laugh which puts them in a good mood.

Now that I have made you aware of this, you will be unable to get it out of your head.

joechung · on May 17, 2009

"dub dub dub something dot com" is something I've heard a lot to verbalize an URL.

pam_gamble · on May 17, 2009

I've used, and heard many others use, dub dub dub to verbalize the www at the beginning of most URLs.

swombat · on May 17, 2009

Sometimes, I try "triple-u" for comic effect.

Yeah, my jokes suck.

ssharp · on May 18, 2009

Two things bothered me about this page:

#1 - The highlighting is too much and a bit ridiculous - to the point of being counterproductive.

#2 - The little page peel thing is pretty neat but the HTML behind it isn't the source of the page. Geeky complaint, I know.

stcredzero · on May 17, 2009

http:// and .com are madness along the lines of _The Inmates are Running the Asylum_. If you step back for a moment, isn't it strange that abbreviations of protocols and double-slash separators are presented to the naive end-user?

Maybe AOL keywords were before their time, but I don't understand why something like a Book Title can't be used for the 1st part of the URL.

silentOpen · on May 17, 2009

Isn't it strange that telephone numbers are segmented arbitrarily? Weird that you know what a "country code", "area code", and "exchange" are?

It's not madness. It's topological addressing.

stcredzero · on May 18, 2009

Telephone numbers are just meant to be somewhat organized line noise. Humans can just treat it as a big arbitrary blob of data they have to type in. But domain names are meant to be "human readable" but just look at the mess the URL makes of them?

Phone numbers were that way because telephone companies had to organize a hierarchical set of wires that could make physical connections between any two endpoints with limited relay-driven processing. Packet switched networks are supposed to abstract away topological addressing for the higher layers. Why make users type it in? Why, with all of the computing power we have available, do we make users put in the protocol? Protocol is certainly not relevant to most users. I've even known CIOs to mess that one up. Why put it in front of them at all for the default situation?

pkulak · on May 17, 2009

People are still putting www in front of their URLs. Let's get rid of that first.

skwiddor · on May 17, 2009

file://

rtsp://

mailto://

irc:// (if you have chatzilla or similar installed)

joezydeco · on May 17, 2009

what about gopher:// !!!!

skwiddor · on May 18, 2009

blimey, I forgot about that, my friend __20h__ will be annoyed considering the channel I hang out in irc is called #gopher and we promote gopher uri's in the title!!