Facebook developers announced and released HipHop, a compiler that takes PHP code and outputs C++ code to later be compiled by g++. I'll disclaim that I haven't seen the source code (still not on github as I'm writing this) and am basing my conclusions only on their blog post, but I think there won't be a "tremendous impact" from this project. It's basic cost/benefit analysis.
The benefits are nearly great enough. Only 50% CPU reduction? Let's walk through the math of what that might mean for Facebook in terms of cost savings. Their post says they do 400B PHP impressions per month, which works out to about 150k/sec. That's a phenomenal number and a challenge for any site to perform at. Facebook describes how PHP is used almost exclusively for the front end, and that the back end services are powered by Erlang, Java, Python, etc. So if we assume they've got the front and back fairly well separated and can scale the front end based entirely on CPU load, we can make up some numbers to guess their total cost.
In my experience, even a poorly implemented PHP site can run around 20 requests per second on a single core. Assuming Facebook uses beefy 8 core machines, that would be 160/sec per box, and 1,000 machines to serve their 150k/sec load. Given that their load probably peaks with US traffic, and that they probably need double capacity to handle a co-location failing, I could see that going up to 4,000 machines.
Facebook has over 30,000 machines, most of which I'm sure are used to handle photos data and their tremendous memcached layer, so 4,000 web serving machines sounds about right. At Facebook's scale, they can probably get a beefy 8 core machine relatively cheaply, but let's use $5k as a number, that means that saving 2,000 machines saves Facebook $10MM. That's pretty impressive, but 2,000 machines compared against their total machine load of 30k isn't that impressive. It's 7%.
Further, if we expect this to catch on and have a "tremendous impact" on the world at large, it would have to be a big benefit for all the small PHP sites out there. I'd wager that almost all small sites are not bottlenecked on PHP CPU performance, but database performance instead. Even if it was PHP performance, for most small sites the savings would likely be going from 30 machines to 15, which is almost unnoticeable. For this system to be really effective, I'd want to see a speedup of 10-100x, not merely 2x.
So I've laid out that the savings aren't that great. But they aren't nothing, a speedup of 2x which saves Facebook $10MM is pretty impressive. but what does it cost. I suspect the costs are pretty high and measured almost exclusively in terms of developer productivity. Let's set aside the man year of development their lead developer spent developing this technology. Facebook has to integrate this system into the development process, and there are two obvious ways.
1. Developers add a "compile" step to their development cycle and test using HipHop on their development boxes. This, to me, would represent a worst case outcome (and from their blog post is almost certainly not what they do). One of the many development speed benefits of scripted languages like PHP and Ruby comes from the fact that developers don't have a compile cycle. While that compile cycle seems trivial (how hard is it to interrupt your code/test cycle with a 2 minute compile?) in my experience it's a large amount of the benefit. Sure, dynamic typing is another major benefit, but I believe that if every Facebook developer now has to add a compile cycle, their overall productivity will plummet.
2. Alternatively (and more likely), they keep developers with an interpreted language on their dev box, and they move the compile step into the release process. They indicate this in their blog post by talking about developers using HPHPi, their experimental interpreter. The problem here is that you've now doubled your testing. A developer writes his code and tests it locally using HPHPi, or even the standard PHP install. After that, they have to compile it with HipHop and test it again to make sure the compiler didn't break anything. If the compiler were a mature technology, you could probably skip this step, but with an experimental compiler you simply can't trust that it will work on your code. Further, when you find an issue in production that you didn't anticipate, you're now not sure if it's the compiler or the code that is broken. Given the difficulty of debugging live site bugs that often come about due to circumstances that are hard to duplicate in the test environment, this seems like a disaster.
So we're left with a system that doesn't provide enough benefit for the costs that will be associated with it. A neat toy, but if I were Facebook, I would value the developer productivity way over the $10MM I might be able to save in server costs. What do you think?