Arg, scooped! I was working on this exact same thing! :D
Since you've beat me to it, let me offer up a couple additional tricks you might want to use. If you want to make this completely independent of browser API's, you can eliminate the dependence on window.location (or atob/btoa as the sla.ckers.org poster did).
Trick #1 is to get the letter "S".
You can extract this from the source code of the String constructor, but you want to be careful to make this as portable as possible. The ES spec doesn't mandate much about the results of Function.prototype.toString, although it "suggests" that it should be in the form of a FunctionDeclaration. In practice you can count on it starting with [whitespace] "function" [whitespace] [function name]. So how to eliminate the whitespace?
For this, we can make use of JS's broken isNaN global function, which coerces its argument to a number before doing its test. It just so happens that whitespace coerces to NaN, whereas alphabetical characters coerce to 0. So isNaN is just the predicate we need to strip out the whitespace characters. So we can reliably get the string "S" from:
[].slice.call(String+"").filter(isNaN)[8]
Of course, to get isNaN you need the Function("return isNaN")() trick, and you know how the rest of the encoding works.
Trick #2 then lets you get any lowercase letter, in particular "p".
For this, we can make use of the fact that toString on a number allows you to pick a radix other than 2, 8, 10, or 16. Again, the ES spec doesn't mandate this, but in practice it's widely implemented, and the spec does say that if you implement it its behavior needs to be the proper generalization of the other radices. So we can get things like:
(25).toString(26) // "p"
(17).toString(18) // "h"
(22).toString(23) // "m"
and other hard-to-achieve letters.
But once you've got "p", you're home free with escape and unescape, as you said in your post.
PS I flipped the logic in my explanation; whitespace coerces to 0 and letters coerce to NaN. Which is why filter removes the whitespace and not the letters.
If you like reducing programs to basic expressions you should read into SKI combinator calculus and the X combinator.
Here is a paper that describes the construction of an efficient X combinator[1].
Reading the paper gave me insight in how simple yet powerful combinatory logic is.
I evalled all pieces of Javascript of <30 characters in Rhino, takes 1 minute on my laptop. 4219 possible values, after stripping out some really uninteresting stuff. Doesn't seem to contain anything interesting, unfortunately.
I am not sure about those results. I entered (+[][{}]+{})[+[]] into Chrome console and got N (from NaN[Object object]) while your code lists it as u. If you replace the first +[] with a 0 you get an u (from undefined...). Interesting.
Did you try to do this locally? The article explains that the "p" is picked up from window.location, assuming it's http or https. If you're using "file://...", that third character index is 'e' instead.
Performance might not be too bad actually. My understanding is that he's building up a string with the code you run normally and then evaling it, so the performance might not be bad, aside from the start-up cost. Bandwidth... I don't want to speculate on that one :)
Actually, I did a small test and found that after gzip, the file size only expands by about 10x. Running both input and output through bz2, the obfuscated file only comes out 3x larger. If you were very protective of your code, and you had enough of it to justify loading up a bz2 decoder on the client side, you could actually make that economical bandwidth-wise.
That said, this was a very small test; the original file was a random snippet of JS code less than 500 bytes, and that itself took a considerable amount for hieroglyphy to chew on, so I can't really do a proper test of a larger input file.
I think if you wanted to make this really robust, you could use more techniques to beef it up.
One technique would be to store verbose or commonly-used string constants in accessible locations like Array.prototype.f. Then you could access, say, the string "prototype" by simply writing
[][(![]+[])[+[]]]
Once you build up a little scratch storage of the most common or hard-to-encode strings, everything starts getting orders of magnitude smaller.
(Technically, this means that you're polluting the space shared with the program being encoded, so for everything to work the program can't make use of it. But that's a pretty simple invariant to ask of the input program: "don't get or set the 'f' property of arrays.")
Another technique would be to break up large statements into smaller substatements, to avoid fixed limits of JS engines on statement size. You can always avoid semicolons, since ASI is guaranteed to work if you start your statement with a !.
But think of the reduced complexity (in terms of characters) this language is! This makes me think of the small amount of primitives needed to make a LISP machine. Or Brainfuck. Or Unlambda.
I mentioned in another reply that it actually does gzip fairly well (and bzips even better). But you still end up with an order of magnitude expansion with gzip (as compared to gzipping the original source), and around 3x expansion with bz2 (again compared to bz2 on the original source).
That's from a <0.5KB test input, so the expansion might be mitigated a little more for larger files. I was going to test on a 3KB microlibrary, but gave up after about 10 minutes of waiting for the conversion to finish.
... you'd need some preprocessing first. The ASCII characters `0' and `1' aren't easy to use to write a program, though you could do it with nasm and some '-' and '+'s I suppose.
If someone can prove me wrong, I'd be very happy though. Writing a program using just '0' and '1' (the ASCII characters) would be awesome. (in an established programming language, and no homomorphisms. :) )
Since you've beat me to it, let me offer up a couple additional tricks you might want to use. If you want to make this completely independent of browser API's, you can eliminate the dependence on window.location (or atob/btoa as the sla.ckers.org poster did).
Trick #1 is to get the letter "S".
You can extract this from the source code of the String constructor, but you want to be careful to make this as portable as possible. The ES spec doesn't mandate much about the results of Function.prototype.toString, although it "suggests" that it should be in the form of a FunctionDeclaration. In practice you can count on it starting with [whitespace] "function" [whitespace] [function name]. So how to eliminate the whitespace?
For this, we can make use of JS's broken isNaN global function, which coerces its argument to a number before doing its test. It just so happens that whitespace coerces to NaN, whereas alphabetical characters coerce to 0. So isNaN is just the predicate we need to strip out the whitespace characters. So we can reliably get the string "S" from:
[].slice.call(String+"").filter(isNaN)[8]
Of course, to get isNaN you need the Function("return isNaN")() trick, and you know how the rest of the encoding works.
Trick #2 then lets you get any lowercase letter, in particular "p".
For this, we can make use of the fact that toString on a number allows you to pick a radix other than 2, 8, 10, or 16. Again, the ES spec doesn't mandate this, but in practice it's widely implemented, and the spec does say that if you implement it its behavior needs to be the proper generalization of the other radices. So we can get things like:
(25).toString(26) // "p"
(17).toString(18) // "h"
(22).toString(23) // "m"
and other hard-to-achieve letters.
But once you've got "p", you're home free with escape and unescape, as you said in your post.
Dave