The HipHop Virtual Machine

mrich · on Dec 9, 2011

Sounds like they are some experienced software engineers:

>The first 90% of the hhvm project is done; now we're on to the second 90% as we make it really shine.

k_bx · on Dec 9, 2011

Really experienced mention the third 90% http://morepypy.blogspot.com/2011/11/gothenburg-sprint-repor... (search 90%)

josefresco · on Dec 10, 2011

Going to use that line next time a client asks me "how close we are to launch".

enduser · on Dec 9, 2011

This is a good reminder not to build anything other than a prototype in a language that you're going to regret later.

scotty79 · on Dec 9, 2011

If they started with Python (or anything else) they would have ended up in the same place. They would end up writing their own compiler and VM because they want and they can. It's actually amazing that they didn't extend the language itself yet.

kmavm · on Dec 9, 2011

(I work on HHVM.) We've dipped our toes into the waters of changing the language a bit. Setting the Eval.EnableHipHopSyntax and Eval.EnableHipHopExperimentalSyntax flags in HipHop enables a few differences, the most notable of which is a Python-inspired 'yield' syntax for generators.

ergo14 · on Dec 10, 2011

They would be sitting on JIT-ed pypy 1.7 at this point, which can yield over 10 times speedup in some cases- I've tested some of my own code.

So having superior language with better out of the box speed in default implementation - I'd hardly call same place.

scotty79 · on Dec 10, 2011

... or they would start their own project similar to Unladen Swallow.

I think picking simple language like PHP makes it much easier to create your own fully custom stack once you need one.

Ofcourse Facebook problems will never be your problems so every one should pick what is right for him at any given moment and stop dreming about (im)possible future.

ergo14 · on Dec 10, 2011

Actually I think it would be different case - they would probably not need to reimplement language in their own way - because speed benefits are already there (when it comes to pypy). Instead they would patch other libs to work with pypy on a need basis. Which seems way less work than start from scratch and implement whole lang - and community would benefit a lot from this too. But since it will not ever happen it's just speculation at this point.

wmf · on Dec 10, 2011

If they started with Java they would have found that HotSpot isn't fast enough and written their own?

scotty79 · on Dec 10, 2011

It depends on how unsuitable hotspot vm would be for them. Google wrote their own vm because hotspot vm was highly unsuitable for android.

I'd say that if Facebook was java based they'd have to stick with hotspot vm. Whether it's a good thing is a completely different story.

leftnode · on Dec 9, 2011

Facebook also has some unique traffic problems that almost no other startup is going to experience.

tyree732 · on Dec 9, 2011

I can't imagine that Facebook regretted PHP until it was too late for them to do anything about it.

cbo · on Dec 10, 2011

It's impossible to know what you'll need in the future. No one should be wasting too much time trying to figure it out.

How could Zuckerberg -- or anyone, for that matter -- have predicted the kind of rampant success Facebook has had? Who thought in 2005 that Facebook would become the largest photo sharing site on the web? Or that they would need to handle almost 140 million unique visitors per month?

If the largest problem of your business is your application stack, you are in a very good place. Most businesses have much larger problems like, say, cashflow.

ryanwaggoner · on Dec 10, 2011

almost 140 million unique visitors per month

That seems low. According to http://www.facebook.com/press/info.php?statistics, it's 800m active users, more than 50% of which log in on any given day.

keeran · on Dec 9, 2011

Couldn't help but chuckle at the code example

  return $hit

draegtun · on Dec 10, 2011

Yes its amazing how easy it sneaks in! Was looking at a gist of mine recently (https://gist.github.com/1222875) and it suddenly jumped out at me!

  my $hits = ...

  for my $hit (@$hits) {
      my $user = $hit->{_source};
      ...
  }

:)

beagle3 · on Dec 10, 2011

The state of the art in JITs is LuaJIT2, despite almost no fanfare (and lots of fanfare by other JIT projects). I haven't had a look at hhvm yet, but if anyone wants a cool project, make hiphop emit Lua instead of C++, and run it under LuaJIT2.

LuaJIT2 really is magic, and Mike Pall is a humble but talented magician.

ericmoritz · on Dec 9, 2011

Excuse my ignorance of LLVM but couldn't someone build a PHP LLVM JIT compiler and stand of the shoulders of that project?

I am sure I am trivializing the effort that it takes to make JIT for PHP.

rayiner · on Dec 9, 2011

LLVM isn't the worlds greatest target for a dynamic language, as Rubinius and Unladen Swallow found out. It does a lot of low level optimization, but a basic prerequisite to that is being able to do some sort of optimistic type-based optimizations, and LLVM has no infrastructure for that.

It looks like HipHopVM uses tracing which is probably a better way to go.

hendler · on Dec 9, 2011

Are they at any risk for VM patent issues?

aurelianito · on Dec 9, 2011

This is the kind of things that make me want to work in a company like Facebook.

cpeterso · on Dec 13, 2011

This is the kind of things that make me not want to work in a company like Facebook.

js4all · on Dec 10, 2011

The hhvm is some nice piece of software, but it also takes many resources. This will always remind me to be very careful when making a design decision, like what language to use for project.

nodesocket · on Dec 10, 2011

We are looking into running our API on HipHop, but not until the load justifies the effort to migrate. Currently running on lighttpd with PHP-FPM.

lansing · on Dec 9, 2011

Looks like an impressive, cool project.

The justification for why it matters seems a bit "off" to me:

  For perspective on why this matters, consider that many Facebook engineers 
  spend their days developing PHP code in an endless edit-reload-debug cycle. 
  The difference between 8-second and 5-second reloads due to switching from 
  hphpi to the hhvm interpreter makes a big difference to productivity, 
  and this improvement will be even more dramatic once we enable the translator.

Big leap of intuition follows, bear with me:

Clearly there are some very talented engineers working at Facebook, as evident by this project. On the other hand, apparently a large number of Facebook engineers are spending all their time in a run-debug cycle, trying to "make this darn thing work," and the engineers with talent are being used to incrementally improve the mediocre coders' lackluster productivity.

Guys, if three seconds in compile overhead makes such a difference in your productivity, maybe you should think for a few seconds about code correctness before you hit the compile button.

kmavm · on Dec 9, 2011

All engineers, those who work on HHVM like me included, spend their lives in a run-debug cycle trying to "make this darned thing work," whether "this darned thing" is a language runtime or the new photo uploader. The tighter you can make that loop, the more productive the engineer is, because she has less time to keep all the items she's mentally juggling pinned in volatile short-term memory.

lansing · on Dec 10, 2011

Sure, we all go through life via a form of trial and error, learning from our mistakes.

Personally, I think the fewer times you have to go around that wheel, the better.

I don't want to diminish the technical excellence of the achievements touted, and as a coder of primarily compiled languages, I would welcome any such improvements in the compilers I use.

However, my larger point stands-- which is that if a few seconds shaved on the run-debug loop is really a big deal for your total productivity, it means you're looping too much.

Trial and error is a fine way to learn a language, or to debug truly mysterious issues, such as those that exist outside of the abstraction layer you're working in. But in my opinion it's a poor way to work. It means that you don't understand the code you're writing.

jhferris3 · on Dec 10, 2011

Other fb employee here.

One of the other annoyances we had (I think its addressed in the note?) is that until now the interpreter (on our dev machines) and the compiler (hphpc in production) occasionally differ in small, occasionally painful ways. Unifying our development and production environments will eliminate another potential source of bugs.

Also: Tight loops don't imply a lack of understanding. Often times when you're trying to get the CSS on a page just right (across all browsers) it requires quick iteration, even if you do understand CSS well.

campnic · on Dec 10, 2011

I disagree. My anecdotal evidence as a person who prefers to meditate on code is that many times the person who does quick iterations even beats people that understand the language. A deep understanding of the language, in fact, can sometimes be a hinderance to getting things done. I realize thats anecdotal, but if you have an organization (say 1000 people or whatever facebook has) and they are doing these iterations all day and you can shave 5 minutes more out of each persons day, that 5000 mins per day :)

sokoloff · on Dec 10, 2011

At many companies, those 5000 minutes "saved" will be later squandered on Facebook. :)

z0r · on Dec 10, 2011

As much as I enjoy developing in dynamic, interpreted languages with a repl, I think this is a perfectly valid perspective. If a few seconds savings per compile is really multiplying out to a significant chunk of time, maybe it wouldn't hurt to take a more meditative approach to the code.

monopede · on Dec 10, 2011

Note that in many dynamic languages (and I think that includes PHP), syntax checking is done right before you try to run it, and type checking is performed at runtime. When you're programming in a Java IDE, these kinds of errors (and related ones, like typos in variable names, etc.) all get resolved while you're typing. In PHP or JavaScript you only find out once you run the program, so saving 3 seconds can be a huge win.

Of course, you can do static analysis to some degree which would cut down that time even further, but you may also get false alerts or miss some issues.

spitfire · on Dec 10, 2011

"The x64 machine code that the translator generates consumes approximately ten times as much memory as the corresponding HHBC. CPU instruction cache misses are a limiting factor for the large PHP applications that Facebook runs, so a hybrid between interpretation and translation may outperform pure translation."

This tells me that the engineers on the hhvm project are at least smarter than the engineers on the Java language.

rat87 · on Dec 10, 2011

what did the java engineers get wrong? Hotspot performs fast interpretations then compiles after multiple run of a piece of code.