I'm going to ask a stupid question at my own peril.. please go easy on me
Does serving a 404 page still allow the response of 404 as the response code? or is it technically a 200 since it is serving the custom 404 page successfully?
If you think about it, any response you get is a "200" by that definition since the server successfully gave you...something.
The browser usually has no special handling for most response codes, so serving a 404 page with a 404 status code is fine/expected and lets things (browser, scraper, etc) respond appropriately. I don't think the browser treats it specially but if you were scraping, you'd obviously want to ignore that result.
It is frustrating to work with APIs that return something like
200
{
meta: {
status: 404
message: "field <x> not found"
}
}
If you request a page like www.ycombinator.com/monkeybusiness
There is no page called MonkeyBusiness so the webserver will throw a 404 error and either display a default page generated by the web server software or optionally a custom page.
In reality the 404 page will have a different url that you never see.
If you happened to know that the page was www.ycombinator.com/error404.html
Then you could load that page directly and it would return 200 OK
In very simplified terms, "404" is just a number that's included in the "invisible" HTTP header of the web page you're visiting. Whether this number is "200" (success) or "404" (not found), it doesn't affect how your browser renders the web page.
Technically you could have a website where you serve real web pages full of content using the (wrong) 404 code, or serve web pages that tell the user "not found" using the (wrong) 200 code. It would massively mess up bots, search engines, browser extensions, and any other software that needs to know whether a page actually exists - but it would be fully browsable like normal by a human with a web browser, since humans don't see or read HTTP headers.
Brave offers to check the Wayback Machine for a cached version of the page.
Basically: 404 tells crawlers that the URL is invalid.
The HTTP server also has to return something,. It could simply return 0-length content and allow the browser to show its error page, but that wouldn't be "on brand."
I had a similar debate with coworkers about returning 404 when a DB webhoook query found no rows. It added extra complexity to client code trying to figure out if it was a bad URL, bad query, or just no rows.
404 means the server cannot find the requested resource. In the case of a database, the "resource" is the database endpoint.
So, 404 would be used in case the database endpoint does not exist at the URL you tried to access it at. A query returning zero rows would be a "success" in HTTP terms.
Yep, for that 204 (OK, but no results) exists. But if you serve a browser a page with 204 in the header, it refuses to render anything since 204 mandates an empty body, so the browser doesn't even look at it.
It is a good question! The 404 status code is useful context. The browser (or user agent, crawler, your code!) can act however it likes in response to the 404. A browser will render the page still, thus all the funky 404 pages
since sites can apply their branding to the error. If this were not the case the browser might show a generic message to the user (people would get used to this).
An example of where the browser might ignore the body is
in a 301 redirect.
Does serving a 404 page still allow the response of 404 as the response code? or is it technically a 200 since it is serving the custom 404 page successfully?