If they started with Python (or anything else) they would have ended up in the same place. They would end up writing their own compiler and VM because they want and they can. It's actually amazing that they didn't extend the language itself yet.
(I work on HHVM.) We've dipped our toes into the waters of changing the language a bit. Setting the Eval.EnableHipHopSyntax and Eval.EnableHipHopExperimentalSyntax flags in HipHop enables a few differences, the most notable of which is a Python-inspired 'yield' syntax for generators.
... or they would start their own project similar to Unladen Swallow.
I think picking simple language like PHP makes it much easier to create your own fully custom stack once you need one.
Ofcourse Facebook problems will never be your problems so every one should pick what is right for him at any given moment and stop dreming about (im)possible future.
Actually I think it would be different case - they would probably not need to reimplement language in their own way - because speed benefits are already there (when it comes to pypy). Instead they would patch other libs to work with pypy on a need basis. Which seems way less work than start from scratch and implement whole lang - and community would benefit a lot from this too. But since it will not ever happen it's just speculation at this point.
It's impossible to know what you'll need in the future. No one should be wasting too much time trying to figure it out.
How could Zuckerberg -- or anyone, for that matter -- have predicted the kind of rampant success Facebook has had? Who thought in 2005 that Facebook would become the largest photo sharing site on the web? Or that they would need to handle almost 140 million unique visitors per month?
If the largest problem of your business is your application stack, you are in a very good place. Most businesses have much larger problems like, say, cashflow.
The state of the art in JITs is LuaJIT2, despite almost no fanfare (and lots of fanfare by other JIT projects). I haven't had a look at hhvm yet, but if anyone wants a cool project, make hiphop emit Lua instead of C++, and run it under LuaJIT2.
LuaJIT2 really is magic, and Mike Pall is a humble but talented magician.
LLVM isn't the worlds greatest target for a dynamic language, as Rubinius and Unladen Swallow found out. It does a lot of low level optimization, but a basic prerequisite to that is being able to do some sort of optimistic type-based optimizations, and LLVM has no infrastructure for that.
It looks like HipHopVM uses tracing which is probably a better way to go.
The hhvm is some nice piece of software, but it also takes many resources. This will always remind me to be very careful when making a design decision, like what language to use for project.
The justification for why it matters seems a bit "off" to me:
For perspective on why this matters, consider that many Facebook engineers
spend their days developing PHP code in an endless edit-reload-debug cycle.
The difference between 8-second and 5-second reloads due to switching from
hphpi to the hhvm interpreter makes a big difference to productivity,
and this improvement will be even more dramatic once we enable the translator.
Big leap of intuition follows, bear with me:
Clearly there are some very talented engineers working at Facebook, as evident by this project. On the other hand, apparently a large number of Facebook engineers are spending all their time in a run-debug cycle, trying to "make this darn thing work," and the engineers with talent are being used to incrementally improve the mediocre coders' lackluster productivity.
Guys, if three seconds in compile overhead makes such a difference in your productivity, maybe you should think for a few seconds about code correctness before you hit the compile button.
All engineers, those who work on HHVM like me included, spend their lives in a run-debug cycle trying to "make this darned thing work," whether "this darned thing" is a language runtime or the new photo uploader. The tighter you can make that loop, the more productive the engineer is, because she has less time to keep all the items she's mentally juggling pinned in volatile short-term memory.
Sure, we all go through life via a form of trial and error, learning from our mistakes.
Personally, I think the fewer times you have to go around that wheel, the better.
I don't want to diminish the technical excellence of the achievements touted, and as a coder of primarily compiled languages, I would welcome any such improvements in the compilers I use.
However, my larger point stands-- which is that if a few seconds shaved on the run-debug loop is really a big deal for your total productivity, it means you're looping too much.
Trial and error is a fine way to learn a language, or to debug truly mysterious issues, such as those that exist outside of the abstraction layer you're working in. But in my opinion it's a poor way to work. It means that you don't understand the code you're writing.
One of the other annoyances we had (I think its addressed in the note?) is that until now the interpreter (on our dev machines) and the compiler (hphpc in production) occasionally differ in small, occasionally painful ways. Unifying our development and production environments will eliminate another potential source of bugs.
Also: Tight loops don't imply a lack of understanding. Often times when you're trying to get the CSS on a page just right (across all browsers) it requires quick iteration, even if you do understand CSS well.
I disagree. My anecdotal evidence as a person who prefers to meditate on code is that many times the person who does quick iterations even beats people that understand the language. A deep understanding of the language, in fact, can sometimes be a hinderance to getting things done. I realize thats anecdotal, but if you have an organization (say 1000 people or whatever facebook has) and they are doing these iterations all day and you can shave 5 minutes more out of each persons day, that 5000 mins per day :)
As much as I enjoy developing in dynamic, interpreted languages with a repl, I think this is a perfectly valid perspective. If a few seconds savings per compile is really multiplying out to a significant chunk of time, maybe it wouldn't hurt to take a more meditative approach to the code.
Note that in many dynamic languages (and I think that includes PHP), syntax checking is done right before you try to run it, and type checking is performed at runtime. When you're programming in a Java IDE, these kinds of errors (and related ones, like typos in variable names, etc.) all get resolved while you're typing. In PHP or JavaScript you only find out once you run the program, so saving 3 seconds can be a huge win.
Of course, you can do static analysis to some degree which would cut down that time even further, but you may also get false alerts or miss some issues.
"The x64 machine code that the translator generates consumes approximately ten times as much memory as the corresponding HHBC. CPU instruction cache misses are a limiting factor for the large PHP applications that Facebook runs, so a hybrid between interpretation and translation may outperform pure translation."
This tells me that the engineers on the hhvm project are at least smarter than the engineers on the Java language.
>The first 90% of the hhvm project is done; now we're on to the second 90% as we make it really shine.