> You may have a lot of things you do not want the world to know about. (Eg user uploaded media). Use robots.txt to hide them from search engines.
That's not what a robots.txt file is for. It doesn't hide files, if anything it makes files that you don't want the world to know about easier to find.
For something like an image hosting or pastebin site, isn't that possible a privacy concern?
I'm curious, what is the standard practice for the storage of static media - where are the static files (text/images) stored on the server, and how are the appropriate config files that concern that media set up?
If the images are relevant then you're tossing earned SE traffic and extra value in the other ranking signals Google uses. If you have 1000 posts on gardening and an image for each with gardening in the filename, you're going to have an easier time ranking for gardening.
I feel like this is a checklist that makes you feel like you're accomplishing something (checking off items on the list) while never actually launching your website.
In a way, it gives you a bunch of excuses to put off your public launch - "I just need to fix some CSS bugs in Firefox 2" or "I just need to test the backup restore process one more time to make sure it works". Most of us aren't building bank software here, and if you're building anything consumer based (think Facebook), the advantage of having a site up and getting feedback on it over having all these things checked off is astronomical.
I agree with you on this but for a beginner I think it's a great list to start with. I think a lot of people miss a lot of things on this website and don't have a good launch because of it.
Hmm, 13 drawn out items and none of them include a SQLi XSS or CSRF audit? Odds are there are plenty. Once your db and CEO's sexting logs end up on pastebin you'll probably be thinking that the robots.txt was pretty minor.
Regarding 2: If you haven't done proper index/join stress tests with hundreds of thousands to millions of rows of data pre-launch, then I really don't know what else to say besides you're doing it wrong. Finding out two days after launch that your schema is complete garbage or improperly indexed is an amateur move. Placing javascript (that's not a shiv or script loader) in your <head> to try and disguise this isn't a good start.
Given that the reply seemingly has nothing to do with your post, I think this is a case of someone replying to a comment that is higher up on the page rather than starting their own thread.
I know the text says "... or equivalent software," but is there any great advantage between using Google Analytics or a hosted component on your own server?
I feel it's impolite to subject your users to be tracked by Google and other huge companies just because they visited your website.
Is there at least any tracking service that respects Mozilla's DNT[1]?
Piwik [1] is a great alternative to Google Analytics. It is free, open source software, and it is typically self-hosted.
While I do not think it supports Do-Not-Track at this time, Piwik does have configuration options to respect emerging laws on data protection and user privacy.
> I feel it's impolite to subject your users to be tracked by Google and other huge companies just because they visited your website.
It's also probably illegal in Europe now, though at least in the UK the authorities have declared a one year moratorium on taking action over it because the new privacy rules on cookies etc. are practically unworkable even if well-intentioned.
Analytics software on your server has the advantage of tracking images, css, js, and other non-html-page files, and still works when your users use addons like Noscript, Adblock, or Ghostery.
That's ugly. But who cares? A sophisticated user (read, anyone using Pingdom) won't be using the same password for their Pingdom account and their email. Though the consequences if they did would be ugly.
Same here, although I don't use a password manager, I generate them based on a fixed seed and the website's domain, using a simple algorithm. One less backup to depend on ;)
Sending an email on an error might be a bad idea. We had this set up on a server that powers an API (along with our main site). During a network problem in the data center where the server could not get to the mysql, we had about 10,000 requests on the sever = 10,000 emails = the outlook guys having to reboot the exchange server because of some odd interaction with their spam filtering appliance.
I have a log that will mail me first time an error occurs. If it happens again, it will still be logged, but I don't get an email. Once I mark the error as resolved - if it happens again, I get an email.
We solve this by having the e-mails come out in hourly batches for errors, with a separate, single, immediate e-mail for "the site is down" type problems.
I was a bit surprised to see no meniton of security. Any website checklist that includes backups because "your website data is too precious" should have a security assessment on that checklist.
OWASP is an awesome project. If you really want to be secure use that as a guideline! Thanks for reminding me of that. I couldn't figure the name. All I could think of was WebScarab...
> You may have a lot of things you do not want the world to know about. (Eg user uploaded media). Use robots.txt to hide them from search engines.
That's not what a robots.txt file is for. It doesn't hide files, if anything it makes files that you don't want the world to know about easier to find.