That's like a whole level extra of sad because it's not that someone was oblivious to list context in perl, but that they knew about it but didn't quite get it right.
Or, it was purposeful, and he liked how it left out the attributions to single line quotes or made a weird half-aside out of multi-line ones, and it's just another case of someone not realizing put some random content up under your own name can lead to problems. Also sad, but less unique.
Microsoft also provides a URL lengthener via "Advanced Outlook.com security for Office 365 subscribers". Thanks to them, any URL we send or receive by email at work turns into an unrecognizable monster. At least the links include comforting words like "safelinks" and "protection" so that everyone can feel safe and protected.
This is so frustrating, I just want the url, but NO, you can an escaped version of the URL embedded inside another escaped blob instead. Ditto with Facebook.
I'm sure this is something they have considered as their own analytics program uses onclick to track outbound traffic - I assume it's a mix of speed, reliability and the ability to track without javascript.
Plus, if it was done on OnClick there would definitely be an add-on to disable the tracking on all browsers.
You put a `ping` attribute on a link, and when the user follows the link, the browser sends a request to the ping URL notifying it of the action.
Firefox disables it by default, which doesn’t seem like much of a privacy win considering JavaScript is enabled by default and can polyfill this functionality.
I've actually tried switching to Bing, just because Bing's search results include the actual URL of the page when you right-click (instead of some Google-riffic monstrosity...)
Copying canonical PDF link from Google result on Android device is extremely frustrating because Chrome Android doesn't support to open PDF so it opens reader app. Anyone have solution?
Oh my goodness, yes, safelinks: no doubt they provide some security benefits but from a UX perspective I detest them. The redirect to the real URL often takes far too long and sometimes Outlook, being the apparently single-threaded "let's make all network requests on the UI thread" pile of stinking fecal matter that it is, appears not to respond to your click, so you click again, and then maybe again, and then maybe yet again... and then all of a sudden you've got four browser tabs opening up to load the same URL and taking several seconds to redirect.
The Office 365 mail crap is the worst. We had to use it in college and it would always click password reset links, expiring them before you could even read your mail.
It used to be longer, but name.com that I use nowadays does not support any longer ones. :( I think there were a few a's more, up to the technical limit (which I forget now).
I clicked on your link, half expecting to get rick-rolled, and was pleasantly surprised to find myself back on this same page.... but also kinda disappointed. :)
These days it's all about personalised, targeted rolling. You have to find out what song your friend hates the most (say, "Africa" by Toto) and try to trick them into listening to ever-more-horrible cover versions of it.
At least, that seems to be the thing in my friend group.
I don't know if that really works though. It's a bit like gift giving: the point is arguably more to show off the effort you personally put in just to prank your friend.
Thought I'd comment to note that (as of 2021-05-14T22:07:00Z) this just does the alert() POC and isn't nefarious, if anyone is deeply curious enough to click but cautious enough to avoid a deaddove.jpg situation
Well, I use Replit as an ide and just hitting the run button meant my fix was immediately deployed. didn't have to push to git or wait ssh into a machine to pull from master and restart
We took "Show HN" out of the title because it implies that the project is the submitter's own personal work. Please don't do that if it isn't. The rules are here: https://news.ycombinator.com/showhn.html.
Back in the early days of web hosting I recall that a customer had a domain name that was literally as long as ICANN would allow at the time. It was very nearly a full sentence. I don't recall the limit but this domain
is 57 characters (not including .com) and I think that sounds familiar. One could have a lot of fun with that if they wanted to (and such as was done in this case).
> Back in the early days of web hosting I recall that a customer had a domain name that was literally as long as ICANN would allow at the time. It was very nearly a full sentence. I don't recall the limit
RFC 1034 specifies:
"Each node has a label, which is zero to 63 octets in length."
And:
"To simplify implementations, the total number of octets that represent domain name (i.e., the sum of all label octets and label lengths) is limited to 255."
It's been years since I've seen or heard an ad for dontbeaweekendparent.com, but I still remember it, even though there's zero chance I'll ever need their services. Sometimes a long domain name can be useful.
Welsh place names can be brilliantly literal, Aberystwyth for example is "mouth of the river Ystwyth". Llanfairpwllgwyngyll... means "St Mary's church of the pool of the white hazels over against the pool of St Tysilio Gogo" but the name was completely contrived in the Victorian era to promote tourism to Anglesey.
I was aware of that place name, and clicked the link hoping to find an audio file or link to youtube video that would give an example of how to pronounce it.
This reminds me, I did a little investigation into what the actually URL length limits are per browser. Here is the blog post in case you are interested:
> Disclaimer: Since there is a maximum letter count to a URL, there is a slight chance your resulting URL will be too long to use. No worries, try running it through a site like bit.ly and then paste that one over here. Every resulting URL ends up having a minimum number of characters, so your URL will still be plenty long!
This seems like something you can trivially solve yourself. Is there any good reason why you push this issue on the user?
I think it is because it is not implemented the way (I imagine) URL shorteners to be implemented with a database back-end to map the short tokens to the actual sites. Instead it just performs a reversible transform on the URL. This way the site is really just a static website (and I imagine a lot cheaper for the owner who can't be interested in laying out a lot of money for this).
But this is just a lot of speculation on my part which means I'm probably wrong about at least one aspect.
There isn't really "a" maximum count. There's a variety of enforced limits. If the standard(s) even set an official one, it's not something a lot of things pay attention to. You also can encounter problems with things like command lines enforcing their own limits, making a URL that your browser may or may not be happy with not be something you can directly curl or wget.
Since there really isn't any set maximum, I didn't have anything to base it off of. I might actually remove that section since I've seen 24k-long URLs work perfectly fine. Also I would have to do some reworking to have that happen since the url lengthening is all browser-side right now.
For many years (but no longer) Vince Cate's Offshore Information Services had a DNS A record for the ccTLD "ai", so that http://ai/ was a valid URL that worked in browsers. (A few people also had e-mail addresses @ai.)
Today, there are no longer any TLDs of any kind with their own A records.
The DNS record is no longer in the root zone itself (I think it used to be?) and some recursive resolvers seem to filter out the bare-TLD query or response. But the A record still exists inside the ai zone... so I'm not sure who is right, so to speak!
The A record was never in the root zone. The root zone has an NS record for ai TLD which delegates authority to ai nameservers.
You can try `dig @a.root-servers.net ai NS` to get the nameserver `a.lactld.org.` which is authoritative for ai. You can now try `dig @a.lactld.org ai A` to get it (that's how recursive resolves generally work)
Yeah, so I think I'm seeing something weird where some software isn't allowing this particular recursive query? I'll dig into it more.
By the way:
> The A record was never in the root zone.
Are you sure of that? I don't have any proof, but my impression was that the DNS registry for ai (which was also just Vince Cate...) was able to ask for it because at one time the root zone was less restrictive in what RRtypes could be placed there for TLDs. But that might just be repeating someone else's mistaken impression.
and this version of the root zone from 2009 does indeed not have an A record for ai, just ordinary NS delegations. So it seems like your explanation is confirmed, and there must be a change in some recursive resolver behavior. (I tried from four different Linux systems on different networks and all refused to resolve it, so there really must be something that's changed more widely, not just my home router.)
I checked an archive of root zone data from June 1999 to May 2021[0], and there don't seem to be A records for any TLD. Not sure why you're having this issue, but I'm curious to know which Linux distro/software doesn't resolve ai.
It looks like it's systemd-resolve. I just checked on several machines and, when using nslookup to query the recursive nameserver that systemd-resolve is forwarding to, the ai A record does resolve, while when using systemd-resolve itself, or the local instance on 127.0.0.53 configured via /etc/resolv.conf, it returns a failure every time.
I'll have to look into this some more.
Edit: I think I found it! From systemd-resolved.service(8):
· Single-label names are routed to all local interfaces capable of IP
multicasting, using the LLMNR protocol. Lookups for IPv4 addresses
are only sent via LLMNR on IPv4, and lookups for IPv6 addresses are
only sent via LLMNR on IPv6. Lookups for the locally configured
host name and the "_gateway" host name are never routed to LLMNR.
· Multi-label names are routed to all local interfaces that have a
DNS server configured, plus the globally configured DNS server if
there is one. Address lookups from the link-local address range are
never routed to DNS.
I can't edit this anymore, but further down in this thread it turned out that it's just systemd that refuses to attempt a global DNS A lookup for a single-label name, so this record DOES still exist, just like it ever did, but all the machines that I've tried on for the past few years as well as yesterday were Linux systems using a local systemd stub resolver that enforced this rule. :-(
I guess I should get some servers running other operating systems, or something.
It might be difficult to convince the systemd developers that this special case should be removed just because of this one anomalous DNS name...
Mind sharing some background? I'm curious about how you snagged the domain, how you've managed to monetize it, and what the competition is like in this space.
You've shortened 6,463,545 links so far. Assuming you're using a charset of [A-Za-z0-9], that gives you 14,776,336 total possible 4-character URLs. Almost half way there!
What's your plan once you reach the 4-character limit? Roll over to 5, or something fancier?
Yes, you are correct. I don't have anything fancy planned. Users with accounts are able to customize the ending so this will increase the availability. Currently, Bitly is up to 7 characters long when you create a random short URL. T.LY is already 2 characters shorter and there are still plenty of short random URLs 62^5 = 916,132,832
years ago, we asked bit.ly if they could vendor our link shortening and tracking. Turned out we were already a couple multiples larger than them in terms of link processing. our use case is a bit different though. Instead of one link given to many, we tend to be one link given to one recipient. Optimizing this kind of problem is interesting and fun.
I actually like using it. There's something comfortable about using an operating system that's old enough to have kids. Another benefit is that screens are high enough resolution nowadays that it doesn't look like a child's toy.
TAC* is the best compression format available for the web today! By using revolutionary scientific methods, research teams at RSG and the Beige Programming ensemble were able to a compose a complex software tool that expels many of the myths that surround modern file compression techniques. The secret of TAC compression is not that it makes files smaller, but that it makes files bigger, much bigger.* This provides the end user with a compression tool to meet almost any need in today's bandwidth and gig overloaded computing world.
A nice feature is that when you design a URL lengthener, you can make the service stateless. The backend doesn't need to store every URL that it was ever given. This contrasts with URL shortener services, which must store state in order to generate very short URLs. (In theory, shorteners can use stateless compression like zlib, but it wouldn't save enough space to be worth it.)
Dumb question: the lengthener prepends the URL with "\x20\x0b" strings which are then not removed on the other end. So the link inserted in the <meta> redirect ends up looking like
so what you're seeing is the creation of a zero width space. When you combine an \x20 and an \x0b, it makes a zero width space, which your browser ends up ignoring. Only reason these are in there are to ensure the URLs are at a minimum length. Who wants a mere 20 character URL when you can have a minimum of 250 characters.
Reminds me of an API I worked on. We had a max length for some token. A client had an incrementing sequence number, so they decided to pad it out to the max length. It looked something like this:
It basically just base64 encode the url to generate the link and decode the url arg to do the redirect.
I also own this email address that I made just for fun:
JesuisLeProprietaireDeCeNomDeDomaineTresLongBonjourCaVaBienAller@icirickynotarodemontrealetceciestunnomdedomainebeaucouptroplong.club
The multiple efforts thing is a lesson I should learn but refuse to do.
It just feels so spammy. I don't want to touch that world intentionally but it works.
It gets worse
I had two similar services, one that used a page's meta information to create URL stubs so that the link would have the semantic meaning in it instead of say "id=27158278". It'd also (this is about 10 years ago) fill in opengraph holes if found and present the social media card generators with more complete information.
It also had a JavaScript plugin that would fill in title tags to your anchor links so that you could hover over a link and get the destination page's title.
I thought it was really useful but I literally got nothing but either silence or criticism. It was really demotivating. I just abandoned it.
It sucks to create something that you believe in, that you like, that you find value in and get nothing but hot bottles of shit from everyone else. Nothing constructive, just demoralizing abuse. I've tried so hard to never be that person. The receiving end of that is awful. It's never ok. (I should make a list of "never events" in open source/programming (https://en.m.wikipedia.org/wiki/Never_event) not in engineering negligence but human empathy negligence.)
Anyways, then I had another project (also about 10 years ago) where I registered a bunch of news sounding sites, like say, themercurystar.com and as a url "shortener" it created absurd sensationalist news headlines as the "shortened" URL from a grammar generator. So for instance, you may get a url like themercurystar.com/arts/current/henry-kissinger-sings-dances-on-broadway-in-bye-bye-birdie or /taylor-swift-proves-hodge-conjecture-pnas etc.
It was complete with the opengraph of a stock image and a repurposing of the generative title with more filler to make it like look real, redirecting the crawlers to a fake shell page to satisfy the meta content and redirecting the humans to the actual link to be shortened.
That one was probably too good. I found it hilarious but apparently I was the only one in on the joke.
So failure again. Ah well.
They're still great and I'd gladly launch them again.
It's more like you work on something that flops and you see similar things get traction.
There's bookshelves full of analysis on this problem. I've got a few of those bookshelves in my library, a major preoccupation of mine for maybe 15 years.
But one of the legitimate reasons the big books don't touch upon is the agitation and hustle game. Probably because those authors just do it without thinking about it.
Geoffrey Moore, Steve Blank, Clayton Christensen, there's a certain aggrandizing precocity they all have that they seem to look past. It's somewhere on the road to carnival barking and clickbaiting.
The line on that road that I refuse to cross is likely way too conservative.
In fact I've had things that became popular by other random people playing that game who I've never met, just for the social cache or whatever endorphins that thing does for those people.
That's the core strategy of virality and trying to hook influencers.
It's a trend I've been noticing within the past 6 months or so. When something catches I'll do some research and find an almost irritating number of failed nearly identical attempts.
The "one note band approach" looks like it's a decent strategy, I just have to get over how objectionable it feels to me.
Being a bit more shameless doesn't necessarily always appear to be a bad move
Anyway, cool project. It's okay to build something for a laugh every now then. I know these days sometimes I forget I used to do this for fun. Have a good weekend all.
I've never thought about a practical URL that couldn't be accessed by specific browsers based on length as something someone would explicitly want versus something to avoid until this post.
On iOS, the location bar animates scrolling through the aaaaaas when you navigate to the site. Someone had to design that animation. Thank you, anonymous designer.
I guess my tinfoil hat is too tight. While its cool and funny I inherently don't trust things like this.
Edit: the concern is about data collection and profiling over time, it could essentially be an advertising model, you get an idea of things a particular cookie/IP/fingerprint does. depending one what is in your original link, all kinds of ids and data can be sitting in the url alone. Does a link to my social media account potentially expose my PII?
The problem with this is that when I hold and press "a" in the address bar on mac at least alternate characters menu pops up instead of the key being repeated. I'd suggest this new domain:
EDIT: Looks like the site is actively being updated, there are new option checkboxes, the self-reference link only works with "Use a path instead of an URL" checked
EDIT: I repeatedly lengthened maybe a dozen times, that seems to get it stuck in a loop:
I think you were accessing the site while I was working on some changes haha! I think whats happening in the one while goes in an endless loop, your URL legitimately got too long for browsers to comprehend so some data got cut off
Having written two apps that do this, I can confirm that it is an amazing technique, and criminally underutilized! My projects store and render entire pages from the URL and encrypt URLs (respectively):
Hadn't considered it originally because I was concerned about allowable characters in the URL, and jumped to base64 instead of leaning more heavily on the actual set of acceptable characters to make it readable.
In hindsight, this is a great thing to add to my next project that (inevitably) uses this technique, thanks!
Not tracking actual users, but more general metrics. Genuinely wanted to see num of visitors but since the site is static this was probably the most privacy-oriented ones I could think of. Trust me, i'm not sure what data i'd be able to sell from this legitimate joke of a site lol
fun fact. in the browser wars of 1993(?) i looked at the specs from netscape (mozilla dady for the young folks) and microsoft (what w3c? ha!) and netscape release a browser spec that said "X must support up to Y", as in "url must be up to 1024 chars", "cookies must be up to 1mb", etc...
then microsoft release IE4 (or 6?) web spec. It was literally a copy of netscape's but with "up to" replaced with "at least".
and from this day on, nobody knows about limits on the standard and everything was up in the air, just so sites could work on IE4 and be broken on netscape. Thanks microsoft!
I did some experiments to test the actual URL limit of IE. at the time it was around 4MB, but IE would still go over if you got creative with hostnames levels and odd schemas.
--
quick edit:
keep in mind, in 1993, the money from giving out free browsers where on the servers: netscape server vs microsoft IIS (just like today giving free browsers the money is on makig it easier to access YOUR content --e.g. default search, etc).
Making your browser crash the competitor server mean that server was seen as lower quality. (Same thing with google deliberately crashing performance of firefox on their services today[0])
The point of microsoft making this change was to force netscape to update their server as they increase the URL limit arbitrarily to all IE users.
I was on a 12-person failed project, the kind of which you owe millions to the govt. We had a problem with the search, we couldn’t get performance.
I told my boss: “See, they wrote ‘The old search responded in 2 seconds. The new search must take at least the same time.’ We could almost add a sleep(2000) before starting the search.”
He went with it. They dealt to drop the requirement on the performance of the search on a “mutual agreement.”
Ah yes. Checkbox Driven Development. AKA Monkey Paw Development, where you give exactly what was asked for; it remains surprisingly popular in the government and enterprise spaces.
I've worked in such places. The reason it is that way is because you will receive a broken description/specification/story of what you are supposed to implement. You have a choice to make when that happens, you either implement it as specified or you reject it because it is broken. The problem is that if you do reject it then it will take about 6 months to get back a specification that is broken in another way and then you have to make the same choice...
So after a few iterations you just say "fuck it" and implement it as specified and then hope that you get a chance to fix it before shipping it (or that it doesn't become your headache later on...).
I've been there too, and I know. I'm not speaking to the choices devs make (rock and hard place, as you say), but the choices the org makes. For government work is driven by Congress' procurement process, but for enterprise is entirely on upper leadership's perceived need to avoid risk. Which is ironically hilarious, since such approaches guarantee higher risk, in that they pretty much universally lead to late delivery of broken features.
Enterprise developer here. Exactly this. If you reject the spec, you won't get another one before the deadline that was committed before you got the spec you want to reject.
The intention usually isn't obvious. The stakeholders have spent so much time documenting, in depth, the solution that they want, while spending no real time documenting or communicating the specifics intricacies of the -problem-.
That's the issue in a nutshell; Checkbox Driven Development implies "if we just define the solution well enough upfront, we'll get what we need!" instead of "if we define the problem well enough, and let dev pitch us solutions, and iterate as we go, we'll get what we need". Which implies that the devs are not to be trusted to come up with a solution themselves.
To deviate from expectations and be congratulated, you have to, A. Be certain you're doing the right thing, and B. Have an audience that can recognize you did the right thing. Both of those require a level of trust that is just missing in this sort of org.
Exactly, and sometimes to fix the issues with the spec you'll have to go far enough off script that it gets very obvious to people you dont want attention from.
Yeah, I just coined it while making the post. :P Less a "also (currently) known as" and more of a "also (should be) known as". Certainly how I'll be referring to it in cynical moments from here on out.
Requirements are hard upfront, period, to the point I'd say that any organization trying to set them upfront is dysfunctional, tautologically. Making all the decisions when you have the least amount of information is a Bad Idea.
There are requirements that will affect architecture that, if they're guessed at and turn out to be wrong, will lead to massive refactoring and/or large amounts of effort being thrown out. 100%.
Where I disagree from most businesses is in the implicit belief they have that seems to be "better for devs to be idle than for devs to work on code that will be thrown away". I'd rather take a guess and start work; best case we're right and are ahead; worst case we're wrong and have learned some useful lessons.
Which you'll note is the same dilemma as every other decision related to the project, with the only difference being the scope.
Do you happen to have the exact wording? As far as I can tell these mean the same thing.
1. "You must support URL length up to 100 characters" -> your browser must support URLs that are 100 characters or less (and may or may not support longer ones)
2. "Your supported URL length must be at least 100 character" -> You must support URLs that are 100 characters or less (and may or may not support longer ones)
So if you were a programmer on a project and you were given a spec that says "up to 100", you would just make it unbounded, and for all intents and purposes completely ignore the spec?
"Must" and "Must Not" are keywords in formal spec. If it says "Must support up to 100" and doesn't say "Must Not support over 100" then I would consider the upper limit to be whatever limit is sane for the data type.
So you would pick an arbitrary upper limit based on your own notion of what is sane. Picking such a limit, you would still need to write the same error handling code for limits, but it would happen at maybe 200. And the next programmer who inherits your code looks at the spec and your code and has to guess "why 200"? And it becomes lore. Which is specifically worse than writing to the spec.
I can see where you're coming from, it does read like "MUST support up to 100 characters (and MAY support more of you choose).
But honestly I think it's a bad practice to build the "may" part, because it's not explicit. The person who wrote the spec just as easily could have intended it to be "MUST support up to 100 (and may not go over 100)". So by not setting a bound you're gambling with your implementation being rejected, but setting a bound at 100 satisfies both possible "implied clauses" of the requirement and should not be rejected.
I spent some time looking at similar specs for more recent browsers, but wasn't able to find anything useful. This was for a proof-of-concept I made that stores entire web pages in URLs (creatively named "URL Pages") by base64-encoding them and putting them in the URL fragment (the part after the "#").
The URLs this thing generates get pretty damn big sometimes, since I never got around to implementing compression. I can confirm that massive URLs from pages with inline images do work, but probably take some not-so-optimized code paths because they make my computer's fans spin up. Click at your own peril:
I made a service to store arbitrary files as URLs that is similar. The hard part is files that are too large, I can handle files up to 5mb if you click on them all via local storage. Compression helps a lot as making them base64 increases the size quite a bit.
Yes, I did this for self-contained reports in maybe 2014. All images referenced (containing diagrams) were embedded as data URIs. Restrictions are AFAIK more picky now, though so YMMV in 2021.
I needed to send data over GET in 2012/2013 and built my own tiny LZW-alike compression to squeeze as much as possible into the 100kb which seemed to be the safe limit for non-ie browsers at the time
That's really interesting, I'd wondered if that was feasible! A few years ago I needed to send myself notes and URLs from a work computer to look at later, so I put it into the browser as https://< my website >.com/index.html?saveforlater=note%20to%20myself
When I got home I'd search the server logs for "saveforlater" and retrieve my note. Though it might have been faster to just write it on a slip of paper.
This is true, but linking to data URIs no longer works. Many browsers block them for "security reasons." In Firefox, a link to that page is not clickable for me:
From a convenience standpoint, it's also far less likely that a URL with an http: scheme will be blocked by a random web application than one with a data: scheme. For example it makes sharing on social media sites and chat applications more feasible.
I don't know. There's a lot of problems but to me "at least" sounds like a more helpful phrasing. Browsers run in such heterogeneous compute environments (even back then) that "up to" basically cripples you to the lowest common denominator of all platforms you target. "At least" makes it mostly the HW vendors problem. Sure, MS was encountering this problem more because Windows ran on such a large range of HW but think about what the world would look like today if you had browser vendors putting caps for desktop browsers based on what mobile could support.
EDIT: For some limits. For other limits "up to" wording may be more appropriate & is still in use (e.g. storage).
"At least" seems like a very good way of introducing a DoS vector.
I think that 1024 was probably too short as a limit, but I think that it does make sense to impose an arbitrary upper bound to reject malformed requests early.
I don't see what you mean by "the HW vendor's problem", I can assure you that any browser in existence is going to have an issue if you send a 1TB URL, while the NIC will have no issue transmitting it.
And here's the answer to the sibling asking why it's a problem, since they mean exactly the same on practice :)
What it literally means and what people understand when reading it aren't the same thing. On this case, for people creating sites, "up to" leads immediately into the real meaning of the phrase, while "at least" strongly implies the opposite. But for people creating browsers, the implication is inverted.
The URL can be up to 1024 characters. The browser must support at least 1024 character URLs.
They're 2 sides of the same coin, but MS didn't actually rephrase the sentence properly. Their version would have every URL have at least 1024 characters in it. Any less than that, and the browser should reject the URL as invalid.
> Any less than that, and the browser should reject the URL as invalid.
lol. that would have been awesome. domain squatters would be running for the 1000 character names while crying about all the money they paid for three letters one :)
I remember there was a site / tools that fit the whole web page content within its URL. And it was precisely limited by this "standard" where every browser behaves differently.
This is the kind of project we need more of. No politics, no perverse economic incentives. Just one human helping other humans utterly obliterate their address bar.
Unfortunately due to the way your service performs redirects(?), link previews give away the game. if I post a shadyurl.com link I get a link preview of the target website, but link previews for a(x56) show a preview for your website. Here's an example from a Messenger conversation where I used both services:
This is going in the toolbox along with http://shadyurl.com