Loved by millions, Facebook has risen from a small-time university social networking service to become the biggest phenomenon on the Internet. But in Facebook's case popularity doesn’t come easily. With some 400 million unique home pages, Facebook is pushing the boundaries of traditional Web application scalability -- and the company is not shy about admitting that much of this success has been achieved by leveraging open source software.
For more on Facebook's use of open source technology be sure to check out the related slideshow Open source at Facebook
Late last year Facebook announced it had surpassed the 300 million user mark. A significant (and growing) number in its own right, but what makes Facebook different is that users do not simply access a page for a search query, but instead actively upload content and interact with other subscribers. If Web analytics company Alexa is accurate, Facebook users spend more than 30 minutes per day using the service -- about three times more than Google. Therefore, the data processing requirements of Facebook make its scalability challenges even more daunting.
Facebook’s core service is built on top of the venerable LAMP stack. Linux, Apache, MySQL and PHP is used by millions of Web sites across the Internet to serve dynamically generated data. Facebook’s rapid rise in popularity in recent years has seen it grow to the point where it now operates the largest single-domain LAMP stack in the world.
Traditional methods won’t cut it for social networks
A popular way to scale a Web application is to continuously add Web and database servers to a cluster in order to distribute the transaction processing demands. Facebook needed to rethink this approach.
David Recordon, senior open programs manager at Facebook, says when scaling a traditional Web site you are able to “break up” the information and can share databases among your users.
“As you have more users you have more databases, and you scale from that perspective, Recordon says. “On Facebook everyone is connecting to other people all over the world. You can’t scale based on where your users are. The challenge is: how do we serve all this information quickly with only having a small number of data centres?”
On average, a Facebook user has about 150 different friends and more than 70 per cent of users now come from outside the US. Thanks to its custom translation system, for instance, Facebook was translated into French in under 24 hours.
“When we render a page on Facebook we are pulling data from many different places. To render a page like that we are talking to many different pieces of our infrastructure without being able to separate them apart by user.”
Recordon, who leads open source and open standards initiatives at Facebook, spoke recently at FOSDEM, the Free and Open Source Software Developers European Meeting, in Belgium to discuss how the social networking giant uses open source software to meet it's enormous user demands.
“And every page is different, not just per person, but at what time they saw it,” he says. “We think of relationships as a graph. You have people which are nodes, and edges to represent the relationships between them. But we’ve also seen Facebook grow to be more than just relationships between people.”
Recordon says the fact that Facebook users can become a fan of a company or person presents “a very different scaling challenge”, one that involves moving from connecting one person to a few hundred other people versus, say, serving Michael Jackson’s fan page which has more than 10 million people connected to it.
Recordon’s co-presenter and Facebook, open source developer advocate Scott MacVicar, also detailed some of the numerous technologies the portal uses to achieve the necessary scale.
"The scaling challenges come from what people do on Facebook," MacVicar says. Some 8 billion minutes are spent by people on the site every day and 3.5 billion pieces of content are shared every week, be text or multimedia elements like photos and videos. With more than 2.5 billion photos alone added every month, Facebook may also be the biggest photo sharing site on Web as well.
“There’s more than just the Web site, there is the API and the platform people use to build applications with Facebook. There are a million users of that,” MacVicar says, adding there are now around 400 million unique home pages.
“If we take a standard page on Facebook -- say a news feed -- to construct that we need to take data from 150 friends and that’s split across multiple servers and it has to be done in milliseconds. It’s not just your direct friends. If other friends have commented on your photos then that has to be pulled in as well. So we’re pulling from potentially thousands of different sources all just to render that one single page.”
A familiar technology architecture
At a high level, the Facebook's architecture consists of a load balancer on top with requests spread amongst a pool of Web servers. These Web servers then use different services to fetch data.
“It will fall into memcache, which is our in-memory fast database access [tool],” MacVicar says.
Recordon says Facebook's architecture looks like any Web server built today and claims that, for the most part, the site isn’t architected any differently than any other site. Where things start to get different is with Facebook’s use of PHP, which MacVicar says is popular among the site's developers because it is simple to learn and uses a syntax similar to C, making it is easy for new developers to learn.
“It’s an interpreted language, so if you want to make changes you can see them live. And because PHP is a templating language it's good for Facebook because we move fast and can’t spend time waiting for other languages.”
However, MacVicar also highlighted how PHP is problematic for Facebook because of its high CPU usage. “Because we do a lot of data assembly and build the page in the app server we use a lot of CPU for that, as well as memory. We’d also like to reuse more PHP logic.”