Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For my personal website, I have gone back and forth on using "cool URIs" without the ".html" extension. Initially when I began building my website in the early 2000s, I configured my web server to handle requests to /blog/{slug} by serving the corresponding {slug}.html file stored on the disk. However, over time, I opted for simplicity and got rid of such server configurations. I now simply expose /blog/{slug}.html in the URLs.

The popular "Cool URIs don't change" article at <https://www.w3.org/Provider/Style/URI> says:

> What to leave out

> ...

> File name extension. This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid.

But I have been running my website for over 20 years now and I do think I'll stick with ".html" for the foreseeable future. This combined with the fact that I strictly use relative links for cross-linking between pages, for loading CSS, images, favicons, etc. means that I can browse my website offline (directly from my local disk) too just by opening the local index.html file on my web browser.



I recently thought through this problem and came up with the concept of building of a list of "candidates" for a given URL. Then the caller loops through and returns the first candidate that actually exists. It's a nice boundary between functions. I wrote up my solution in literate markdown (and javascript) here [0].

(Apart from supporting optional extensions, this code also supports throwing an error if someone prepends dots into the url - which, for me, indicates someone probing the server for weaknesses and is not a legit request.)

The funny thing is that I still often use file extensions since IntelliJ can only let me easily navigate/check existence if I use the extension.

Eventually I'll support slugs in the filename by just ignoring everything after the first dash.

0 - https://simpatico.io/reflector#urltofilename


How I wish they were right about .html.,, I wish we had something else by now.

Personally I'm a fan of including a post ID in the URL, e.g. /category/123/post-name. Because if you want/need to change the URL later, you can simply parse the URL to get the ID back to create redirects. A lot of sites of all scales don't implement redirects which makes me sad.

I think there was a news site acquired by Bloomberg, I forgot the name. When you visited an article in the old domain, it redirected to a landing page on Bloomberg saying it was part of Bloomberg now instead of redirecting to its new URL.


> How I wish they were right about .html.,, I wish we had something else by now.

You can thank the browser complexity moat for that. If browsers were simpler to implement someone would have started experimenting with this (markdown at least) years ago and other browsers would have picked it up.


I am now leaning towards the same approach. In 20 years, you could always serve html files and serve a new format alongside (for example, markdown).


How many hypertext formats apart from HTML are supported without plugins on major browsers?

Asking genuinely, I don't know, but it's an important fact to take into account if you're planning ahead.


SVG? Maybe XML/XSLT? We have also PDFs (yes it is not text). Otherwise, none in my knowledge.

Using plugins, you could think about Markdown, wiki markup, ...


PDF is done via an internal plugin. Standards compliant web browser doesn't have to do anything with PDF. Major browsers have internal type handler for PDF.

Similar type handler is engaged with XML. Unless you can utilize W3C standards to implement a custom markup language using XML/XSLT and have it work across browsers without plugins.

SVG is vector graphics.

For another full markup to be even considered there would have to be one that's widely adopted and realized through plugins. Nobody is making interventions in standards to open up venues for easy implementation of custom markups when those markups are used by 0.001% of publishers.


Markdown, wiki markup, etc. have been around for a long time and there has never been any talk of supporting them natively in the browser.

I don't see why that would change.


A great example of browser complexity moats holding back potential useful innovation.

If browsers were easier to make, someone could experiment with content negotiating for markdown and rendering it client side.


Yeah, sending a .md for client-side rendering would allow the client to reformat it more easily based on user preferences. Then again, Safari/Firefox reader mode already do an ok job with HTML for this.


But we could go so much further than reader mode. Users should have way more control over how content is rendered. But I'm something of an extremist. I don't really consider CSS/JS part of the web.


I don't really agree about CSS/JS, but either way, I've been in plenty of situations operating informational sites that just want to serve mixed text/image without worrying too much about how it's formatted. Unfortunately there isn't such an option. Regular HTML tags are supposed to do this, but most browsers won't format those in a modern-looking way. It'd save a lot of collective time if they could.


When those "informational" sites were normal 15 years ago, browser like Opera had user-CSS that you could just override, and had a number of presets. You could format the site to look like C64 BASIC.

The stuff you're talking about isn't about browsers its about the websites.

If you had a website that uses javascript to parse MD or any other markup, spit it out as trivial HTML with light DOM, client-side formatting can do everything you want.

The problem is that modern websites use patterns that workaround users' capability to customize the presentation of the website. They do not want you to look at their site the way you want.


Browsers can reformat clean HTML easily in theory, but I mean the defaults aren't nice, and most users aren't changing them. You have to use CSS to make a site look good by default.

I guess the best solution to that isn't browser-side .md rendering, though.


How is this any different than rendering PDF in browser? PDF is not a Web standard. Browsers choose to ship internal plugin to handle PDF.


This is somewhat stupid from my angle (the W3C recommendation).

I don't expect that url.html is a static html file. I expect it to be server-side generated in 2024. For me site.com/page and site.com/page.html are the same. I do not expect different behavior from my web client side. So I may switch backend engine every year, and I'll just route the request sfrom page.html and that's it.

What's way worse than this is using non-HTML extensions for emitting html. I go to pichost.com/image.jpg and I get a webpage served. This is a bad pattern and it needs to go away. I'm not even going into responding differently depending on user-agent or referrer, if you have combination of these you get JPG returned, if you don't you get a webpage returned.


> What's way worse than this is using non-HTML extensions for emitting html. I go to pichost.com/image.jpg and I get a webpage served. This is a bad pattern and it needs to go away. I'm not even going into responding differently depending on user-agent or referrer, if you have combination of these you get JPG returned, if you don't you get a webpage returned.

It's mostly based on the Accept header these days (browsers don't tend to include HTML there in image contexts) and the Referer should have been removed decades ago. This means browsers (the ones with a large market share at least) are 100% complicit in enabling this behavior.


The HTTP standard specifies this behaviour.

HTTP has no concept of a file extension.


> I don't expect that url.html is a static html file. I expect it to be server-side generated in 2024.

Needless complexity if all you need is best served by a static html file.


Agreed... but not what I was talking about. HTTP has no files or extensions, it's just URL that someone named dot something. Since it doesn't have to be that file type behind, I don't expect it to.


The internal framework we have at my company directly ties the extension of the endpoint to an expected mimetype return from the controller. So endpoint.html / endpoint.xml / endpoint.json / endpoint.csv you always know what you are getting. Only the implemented extensions work, defined per controller, no magic here.

There is an escape mechanism for making endpoints without an extension but we rarely use it.

It’s a weird design I probably wouldn’t make these days, but for debugging at a glance it’s honestly pretty nice to look at the stream of requests and just know the type of each.


That's an interesting choice. I like that from an ease of use perspective, but I don't love it from the perspective of knowing what you're actually accessing, ie, if it's a .JSON URL I'm expecting to be served a static JSON file rather than a script that's serving me JSON dynamically. I kind of feel the same way about certain uses of HTTP status codes, like, if I get a 404 I would expect it to be because the page wasn't found, not because a POST parameter was wrong. The worst offenders don't serve an error message with the status code, but I'm getting off track here.


That's clearly incorrect semantics, and should be 400 Bad Request. Unfortunately the semantics of HTTP status codes are unenforceable with some obvious exceptions.

There's no excuse for not implementing them properly, however. I'm less of a fan of the existence of verbs, which I consider to be a part of the URI which isn't in the URI itself. Things would be better if one URI was one endpoint, rather than potentially as many endpoints as there are verbs.


Most people have a /blog/{slug} directory with an index.html inside it. This is also a nice place to put images and other files you only include in a single page.


> Most people have a /blog/{slug} directory with an index.html inside it.

That's /blog/slug/ which should return the default file for that directory or generate an index (ahem) of what's in that directory.

./slug.html <-> /blog/slug

./slug/index.html <-> /blog/slug/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: