Computerworld

Open source helps Facebook achieve massive app scalability

People all over the world spend a total of eight billion minutes a day on Facebook. Some 400 billion Web pages are viewed every month, 3.5 billion pieces of content are shared every week and the site logs a staggering 25TB of data every day. David Recordon, senior open programs manager at Facebook, talks about how the social networking giant uses open source tools to achieve its massive app scalablilty.

Loved by millions, Facebook has risen from a small-time university social networking service to become the biggest phenomenon on the Internet. But in Facebook's case popularity doesn’t come easily. With some 400 million unique home pages, Facebook is pushing the boundaries of traditional Web application scalability -- and the company is not shy about admitting that much of this success has been achieved by leveraging open source software.

For more on Facebook's use of open source technology be sure to check out the related slideshow Open source at Facebook

Late last year Facebook announced it had surpassed the 300 million user mark. A significant (and growing) number in its own right, but what makes Facebook different is that users do not simply access a page for a search query, but instead actively upload content and interact with other subscribers. If Web analytics company Alexa is accurate, Facebook users spend more than 30 minutes per day using the service -- about three times more than Google. Therefore, the data processing requirements of Facebook make its scalability challenges even more daunting.

Facebook’s core service is built on top of the venerable LAMP stack. Linux, Apache, MySQL and PHP is used by millions of Web sites across the Internet to serve dynamically generated data. Facebook’s rapid rise in popularity in recent years has seen it grow to the point where it now operates the largest single-domain LAMP stack in the world.

Traditional methods won’t cut it for social networks

A popular way to scale a Web application is to continuously add Web and database servers to a cluster in order to distribute the transaction processing demands. Facebook needed to rethink this approach.

David Recordon, senior open programs manager at Facebook, says when scaling a traditional Web site you are able to “break up” the information and can share databases among your users.

“As you have more users you have more databases, and you scale from that perspective, Recordon says. “On Facebook everyone is connecting to other people all over the world. You can’t scale based on where your users are. The challenge is: how do we serve all this information quickly with only having a small number of data centres?”

On average, a Facebook user has about 150 different friends and more than 70 per cent of users now come from outside the US. Thanks to its custom translation system, for instance, Facebook was translated into French in under 24 hours.

“When we render a page on Facebook we are pulling data from many different places. To render a page like that we are talking to many different pieces of our infrastructure without being able to separate them apart by user.”

Recordon, who leads open source and open standards initiatives at Facebook, spoke recently at FOSDEM, the Free and Open Source Software Developers European Meeting, in Belgium to discuss how the social networking giant uses open source software to meet it's enormous user demands.

“And every page is different, not just per person, but at what time they saw it,” he says. “We think of relationships as a graph. You have people which are nodes, and edges to represent the relationships between them. But we’ve also seen Facebook grow to be more than just relationships between people.”

Recordon says the fact that Facebook users can become a fan of a company or person presents “a very different scaling challenge”, one that involves moving from connecting one person to a few hundred other people versus, say, serving Michael Jackson’s fan page which has more than 10 million people connected to it.

Recordon’s co-presenter and Facebook, open source developer advocate Scott MacVicar, also detailed some of the numerous technologies the portal uses to achieve the necessary scale.

"The scaling challenges come from what people do on Facebook," MacVicar says. Some 8 billion minutes are spent by people on the site every day and 3.5 billion pieces of content are shared every week, be text or multimedia elements like photos and videos. With more than 2.5 billion photos alone added every month, Facebook may also be the biggest photo sharing site on Web as well.

“There’s more than just the Web site, there is the API and the platform people use to build applications with Facebook. There are a million users of that,” MacVicar says, adding there are now around 400 million unique home pages.

“If we take a standard page on Facebook -- say a news feed -- to construct that we need to take data from 150 friends and that’s split across multiple servers and it has to be done in milliseconds. It’s not just your direct friends. If other friends have commented on your photos then that has to be pulled in as well. So we’re pulling from potentially thousands of different sources all just to render that one single page.”

A familiar technology architecture

At a high level, the Facebook's architecture consists of a load balancer on top with requests spread amongst a pool of Web servers. These Web servers then use different services to fetch data.

“It will fall into memcache, which is our in-memory fast database access [tool],” MacVicar says.

Recordon says Facebook's architecture looks like any Web server built today and claims that, for the most part, the site isn’t architected any differently than any other site. Where things start to get different is with Facebook’s use of PHP, which MacVicar says is popular among the site's developers because it is simple to learn and uses a syntax similar to C, making it is easy for new developers to learn.

“It’s an interpreted language, so if you want to make changes you can see them live. And because PHP is a templating language it's good for Facebook because we move fast and can’t spend time waiting for other languages.”

However, MacVicar also highlighted how PHP is problematic for Facebook because of its high CPU usage. “Because we do a lot of data assembly and build the page in the app server we use a lot of CPU for that, as well as memory. We’d also like to reuse more PHP logic.”

Page Break

To solve these performance issues, Facebook developed a tool called HipHop that transforms PHP into optimised C++ code. HipHop is designed to give the best of both worlds -- the speed of an interpreted language and the performance of a compiled language like C++.

“What makes HipHop work really well is we can take the majority of our code base and greatly speed it up,” Recordon says.

Facebook has developed software in C++, as well as Python and Erlang, and found writing extensions for PHP can be difficult because the macros are not well documented.

When asked why the company did not develop applications in C++, MacVicar says the reason is due to the fact that Facebook’s code base is about 4 million lines of PHP, so "the first problem would be how to translate all of that to C++ without holding up development on the site".

With HipHop, Facebook's PHP code goes through a 7-step transformation process involving code optimisation and then compiling.

“We want to minimise the differences between PHP and HipHop so you can use either [and] support for Apache is on the roadmap,” MacVicar says.

Interestingly, Facebook’s adoption of HipHop, with its own embedded Web server, is now pushing Apache out of the stack.

“We’ve generally been using Apache with PHP, but HipHop has its own embedded Web server, which is a really simple Web server built on top of libevent. So now we have been moving to using that,” Recordon says.

HipHop only compiles on Linux, but MacVicar says someone from Microsoft has contacted Facebook and “hopefully they will contribute the Windows port”.

“Hopefully someone in the community will contribute the Mac port, or I will do it myself,” he says.

While Facebook attempts to make the most use of the components it has developed in a variety of languages -- C++, Erlang (used for the chat service), Java and Python – the company's philosophy is not to choose a single language when building infrastructure.

“The entire Web server infrastructure is PHP, but we use many different languages, from a backend infrastructure perspective, depending on the service,” Recordon says.

“Philosophically, we think about ‘when do we need a service’. Is it something that needs to be really quick? Is it something that currently has a big overhead in terms of our application layer deployment and maintenance? Do we want another failure point in our network? We need to balance those.”

Standard tools for common tasks

Facebook's extensive use of open source software has also fostered a culture of giving back to the community. HipHop is open source, as are many of the system administration and data integration tools that Facebook uses.

“We use Thrift to communicate between all [the] services we have, something we open sourced,” MacVicar says. “It is essentially an RPC server and can generate code for you in a number of languages.”

According to Recordon, Facebook looks at a variety of open source and commercial tools to manage its information, not just the free stuff.

With about 400 billion page views a month, Facebook logs a staggering 25TB of data every day. “We used Syslog for logging, and the logging server sort of exploded because we were logging so much information from all these page views,” Recordon says. “So we went and created Scribe and we are able to break up this funnel from a logging perspective. The logging information from servers will get routed into Scribe servers."

Once routed to the Scribe servers, Facebook log data is then condensed and stored in the Hadoop and Hive cluster to be used for future data analysis.

“Logging is a common problem people have so we also open sourced Scribe and it’s now being used by Twitter and they are contributing back to the project,” Recordon says.

Facebook’s other scaling challenge revolves around photo sharing which, like almost all aspects of the site, has ballooned to a massive scale. There are an estimated 40 billion photos hosted by Facebook, each stored in four resolutions for a total of 160 billion photos. In all, about 1.2 million photos are served every second.

“The first thing we did was NFS using a commercial solution, because you have to choose the battles you are going to fight," MacVicar says. “But unfortunately it just didn’t scale. It wasn’t that the commercial solution was bad, but the I/O was so high it simply didn’t work.”

To overcome this challenge, Facebook first optimised its file serving capability and then took a deeper look at how files are kept on a file system.

“We developed a system called Haystack, which allows us to serve photos with one physical read on the disk,” Recordon says. “So it doesn’t matter from a random data access perspective, it’s always one physical read to serve a file. It takes about 300MB of RAM to run this for every terabyte of photos we have. We went from 10 I/O operations per photo to one.”

Haystack is not an open source project yet, but Facebook is working on making it open source because “we really think it’s useful for a lot of sites both large and small,” Recordon says.

Data storage and analysis goes big time

Facebook’s infrastructure relies on memcache for faster database access and this acts as a “middle tier” between the Web servers and the databases.

Memcache was originally developed for the LiveJournal blogging service to improve the performance of its database-driven Web sites and is now used by many of the largest sites on the Internet, from Facebook to Wikipedia.

“It’s great, but our engineers need to use it in a smart manner,” Recordon says. “We currently serve about 120 million memcached requests per second, so we’re incredibly reliant on memcache to make Facebook work.”

The system was stress-tested when Facebook launch of its new personalised username capability in June 2009. Making good on its promise to give its millions of users an easier way to share their profiles, Facebook let people chose an alias for their online profiles on a first-come, first-serve basis. Members responded in droves, registering new user names at a rate of more than 550 a second. Within the first seven minutes, 345,000 people had claimed user names; within 15 minutes, 500,000 users had grabbed a name.

“We had about 200 million users at the time, so we asked 200 million users to access Facebook at the same time,” Recordon says. “This is like a denial of service attack that you bring upon yourself, but for us it was really a product launch. And a million usernames were assigned in the first hour.”

From a technical perspective, MacVicar describes memcache as “really robust” with some nice features, but says "we wanted to make it better”. As a result, Facebook engineers ported memcache to 64-bit because most of its memcached machines have 16GB of memory. This enhancement has since made it into the main memcache code base.

“We added multithreading so we could utilise all the cores on the processors, and another of the more interesting things we did was adding UDP support,” MacVicar says. “Sometime this year we will do another release of the memcache server and hopefully that will get merged into the upstream version.”

Recordon says memcache is a prime example of an open source technology that existed before Facebook burst onto the scene but that the company has since used extensively and also added to.

Page Break

Another critical part of Facebook’s infrastructure is Hive, a specialised execution front end for Hadoop, the open source software for clustering and cloud computing first developed at Yahoo.

Facebook first became interested in Hadoop as a means of processing their Web site's traffic data that was generating terabytes per day and overtaxing their Oracle database. Though they were happy with Hadoop, they wanted to simplify its use so that engineers could express frequently used analysis operations in SQL. The resulting Hadoop-based data warehouse application became Hive, and it now helps to process more than 10TB of Facebook data daily. Now Hive is available as an open source subproject of Apache Hadoop.

“Many companies are building on each other’s open source technologies and that is what we’re big on. We want to open source the things we are finding useful in Facebook,” Recordon says.

Facebook currently runs a large Hadoop-Hive cluster for data processing and analytics. “We have a lot of information coming into this cluster and we couldn’t have done it without Hadoop coming before us,” Recordon says. “An important point, from an open source perspective, is being able to build on top of what others have released, make it solve your use cases better and then release it again for others to build on [and] innovate together.”

The initial idea behind Hive was to simplify Hadoop so non-engineers could work with it. Now about 250 people per month from different parts of Facebook are running jobs on the Hive-Hadoop infrastructure.

Databases

With so much data manipulation going on at Facebook the question persists: where is all that data stored? The open source relational database management system MySQL has played a key role at Facebook from the site's inception, and the company currently runs thousands of MySQL servers in multiple data centres.

“It’s a 'share-nothing' architecture,” MacVicar says. “In an army, if one soldier loses his head then nothing happens. At Facebook it doesn’t really matter if a single server disappears -- we have other servers that will back it up.”

MacVicar says Facebook uses MySQL’s InnoDB engine because it’s the only one with full transaction support. “We use databases as a way to keep persistent data. Memcache is a distributed index and database is persistent storage,” MacVicar says. “The Web server will look at the distributed index to see if the data is there and if that fails it will go to persistent storage. We also have other search services that will look for data.”

In keeping with the rest of its distributed architecture, Facebook has independent database clusters rather than one really large cluster. According to its own internal records, Facebook has experienced about 8000 server-years of runtime without any data loss or corruption -- at least due to the software.

So how does a large site like Facebook prevent data loss? Recordon says standard approaches to running database clusters “applies to us too”.

“Replication is working [and] the lessons for running databases are similar for all types of sites. Those best practices apply to a much larger scale.”

An open innovation culture

From the outside, Facebook may look like a single, unified "grand design" in social networking, but many of the technologies it now relies on were born out of “hack projects”, Recordon says.

“We’re not necessarily going out to solve a specific problem,” he says. “Rather, a few people will just sit down and work on something. When we want to solve a problem we often try two or three ways of doing things. We prototype them then throw two away and move forward with the third one.”

“We have an engineering culture where small groups of people can have a large impact,” Recordon says.

Part of Facebook's engineering culture is to “move fast”, and open source technologies help it to achieve that speed. Recordon cites the photo serving tool Haystack, which was built by only three employees, as a good example of the kind of innovations produced by Facebook's "engineering culture".

When asked how Facebook scales internally with its developers and how it ensures all the developers are on a consistent frame of reference when developing new features, Recordon says scaling is "more than technology, it’s about scaling an organisation and scaling an engineering team".

“We’ve developed a variety of internal tools to help us do that,” he says, adding that Facebook has layered apps on top of version control repositories to help combat collaboration issues.

Facebook won’t reveal the exact number of servers it has, only that it’s “tens of thousands” overall. With data centres on east and west coast of the US, Facebook also uses content distribution networks (Akamai) for images, css and javascript around the world.

Operating system capacity planning and trending the tools used by Facebook include Ganglia and Cactii, as well as some internally-developed software. The main source code repository is subversion, but about half of the developers are now using Git, which Facebook has developed apps for which handle code reviewing and documentation.

Another open source project that Facebook uses is Cassandra, a distributed, fault-tolerant database. Cassandra is now hosted by the Apache Software Foundation and since going open source in 2008 has attracted some large sites, including Digg and Twitter. At Facebook, Cassandra is used for inbox search, but Recordon admits the company hasn’t done much active development work on it during the past year.

“At this point I think Digg, Twitter and Rackspace are the largest contributors to Cassandra inside of the Apache project,” Recordon says. “From our perspective it’s great to see a community develop it and push it forward.”

Facebook has mastered open source high-performance computing, but the site also uses the commercial Oracle RAC for a some of its data management.

For corporate IT managers and CIOs the Facebook story is a lesson in innovation and pragmatism. If a start-up can achieve the vast scale of data management that Facebook has using a mixture of public software and in-house code, then perhaps it will prompt others to re-examine established methods that involve commercial off-the shelf software and look farther afield for scalable technology solutions.