Hacker News new | past | comments | ask | show | jobs | submit login
HTML5 Security Cheatsheet: What your browser does when you look away... (html5sec.org)
74 points by tshtf on Nov 16, 2012 | hide | past | favorite | 14 comments



I don't get it. This page basically shows all available event handlers and other attributes for HTML elements and says "you can put JavaScript here". Well, thanks.

Letting your users write HTML/CSS (or not escaping input) is a bad idea to begin with.


Sometimes you have to allow some user input, like <b>, <font>, or <i> tags in those WYSIWYG editors. This is a reference for what you should remove from your whitelist or add to your blacklist.


The right way to handle this problem is to scrape the content out of the incoming HTML, do a best-effort pass at remembering the formatting rules expressed in it, entity-encode everything, and regenerate the HTML markup from scratch.

It is never a good idea to accept HTML from a client, attempt to clean it up, and then pass it directly into the DOM of a server-generated page.


I can't believe anyone has attributes like "onfocus" on their whitelist.

Although I respect the site's aim for completeness, the whole site could be shortened to one example of each issue. One example of "on..." attributes, one example of "javascript:" URLs, and so on. I don't see the value of the second, third, yet-another "on..." example. This is just hiding the deeper issues in a mess of seemingly clever examples.

Regarding the blacklist proposal, I really hope that nobody is seriously using those for HTML! One typo, one forgotten entry, or one new browser feature, and the blacklist's security drops to zero.


Still, OP's right. If you need to let them use plain HTML (it's better to use markdown or something similar) just parse it and remove any attributes and tags not on your whitelist.


Or make them use Markdown, org-mode syntax, BBCode or s-expressions ({b text in bold {i and italics}}). User formatting should never go directly to the browser, without being reinterpreted by the website.


Yeah I thought this, however there are some useful gems in there, the Javascript embedded in SVG images being one example.

It's just a pity the good content is so watered down with dozens of obvious "you should sanitised user input" examples and variations of the same attacks.


I have the suspicion this might not be a guide for people who want to prevent attacks, but a guide for people who want to attack poorly secured sites.


Some of these are ingenius. Placing an input box far down a page, setting it to autofocus, and then reacting to body onscroll? I'd never have thought of that. Glad these references exist!

I have exploited the embedding of Flash to do XSS before. It's funny that while the site I was on heavily filtered any JS input in its rich text editor, you could easily upload a Flash file (served from the SAME domain!), and XSS it.


> Some of these are ingenius. Placing an input box far down a page, setting it to autofocus, and then reacting to body onscroll?

Although this sounds quite clever, I don't see the value in this. If your HTML whitelist contains "on..." attributes such as "onscroll", you have deeper issues than this clever trick. Almost certainly you are vulnerable to stuff like "onclick", too.

Except if you use blacklists where you added "onclick" and forgot to add "onscroll". But in that case, if you are using blacklists instead of whitelists, you are almost certainly doomed anyway. (see https://news.ycombinator.com/item?id=4794745)


I'm not sure what this is saying. If it's talking about things the browser can be made to do my response is "so? that's intentional".

If it's about things that can happen if you embed user (read: attacker) provided content in your page then you have already lost. There's never a time that you can safely embed content from an untrusted source in your page - no blacklist or whitelist based approach to content is going to be safe. The correct approach to user provided content is to parse the content, drop anything you don't understand or recognize exactly. Then escape all of the left over content, and reconstruct at the end.

You could use markdown to do this for you, or you could do it manually if you want you own rules (and/or <>-like syntax).

Filtering content is just not sound and every time I see something that seems to imply that it is, it makes me cry.

(A example of this taken to its extreme is WebGL shader parsing. A correct + "safe" implementation of WebGL must at the very least: 1. parse the shader itself, dropping all comments, etc 2. perform strict semantic analysis on the result of <1> (especially as many GL drivers don't) 3. take the result of <2> and turn that back into text 4. throw the result of <3> at the gl engine

This is necessary to ensure that not only is the shader correct (in the terms of webgl), but also to ensure that no parsing oddities can get through (e.g. something seen as a comment terminator in the driver but not the validator - bugs like this have happened with multiple validators in multiple contexts over the last few decades)


This page could be much better with just a little bit more prose.


Quite a list. Kinda makes me feel bad for all the people writing HTML sanitizers :(


It's really not that difficult if you use a parser + whitelist. You don't have to care about this sort of thing if you limit people to using certain tags/attributes in WYSIWYG editors and other inputs.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: