FRAMINGHAM (04/24/2000) - The world's largest online bookseller once protected its precious database servers from a leaky Seattle roof with a piece of blue tarpaulin that a staffer rushed out to buy at Home Depot.
Amazon.com Inc. has come a long way since attaching that tarp to the fourth-floor ceiling of its downtown Seattle nerve center two years ago. Today, the retailer is building a second data center in Virginia that will have four times the capacity of its current one, Kim Rachmeler, director of enterprise management, told a Retail Systems 2000 audience here last week.
Amazon is typically tight-lipped about its information technology. So Rachmeler's speech provided a rare glimpse into the behind-the-scenes technical challenges, and sometimes dramatic solutions, that staffers devised on frenetic Internet time to cope with traffic levels that grew 30% each month during the company's first 18 months in business.
"Survival is absolutely the same thing as scaling for us," Rachmeler said.
"Scaling is the most important thing we do. It is our No. 1 strategic initiative."
And it has been driving Amazon's Web architecture decisions. As traffic soared the first few years, Amazon opted for bigger and bigger boxes because it had the money but not the time or the staff to optimize systems, Rachmeler said.
When Amazon opened shop in July 1995, IT staffers set up a bell to ring every time a book order reached Ernie, the sole Sun Microsystems Inc. SPARCstation V box that served Amazon's Web site. But they might have gone deaf if they had left that system in place. The next year, they relieved Ernie in favor of a Digital Equipment Corp. Alpha 2000 and later added another one because it was "the biggest box out there," Rachmeler said.
"By changing vendors, we were going to give ourselves more room to expand in the long run," she said.
By the spring of 1997, Amazon eased the strain by substituting two DEC 8400s as it launched the second version of its Web site. More significant architectural changes would come later as the company made plans to link up with major portals and to add features such as recommendations and one-click shopping.
"We were scared," Rachmeler said. "We were about to drink from the fire hose, and we had no idea the kinds of traffic that we were going to get from those situations."
The solution: It removed one of the DEC 8400s in favor of redundant DEC 4100s serving Web pages at the front line. "What it allowed us to do is expand the capacity of the Web site only by buying new machines. Instead of spending human power to get more capacity, we could simply use our credit cards and increase the front line," Rachmeler said, noting that Amazon had fewer than 30 IT staffers at the time.
"It also meant that any one of these online machines could be taken off-line for maintenance or if it had hardware problems, and the store would stay open," Rachmeler said.
But Christmas '97 was coming, "and that shouldn't have been a surprise, but it was," she said. The taxed database server was already running on the biggest machine available, so Amazon couldn't put in another box. Instead, a SWAT team launched Project Database Headroom to squeeze out 30% more performance.
"We would go to executives of the company and ask them not to run their reports during the day. We would tell the financial teams not to execute billing programs during peak periods of time. We sent out messages to the entire Amazon staff that if they had programs accessing the database, they needed to talk to this SWAT team," Rachmeler recalled.
Big Iron for Christmas
When the Christmas rush was over, Amazon brought back its idle DEC 8400 as a hot standby, and then over the next two years began cracking off pieces of the main database to run on separate machines. In time for last Christmas, staffers brought in big iron - a Hewlett-Packard Co. V-class machine - to anchor the system. They also split up the Web database from one to four machines and increased the number of online servers.
But in the future, Amazon knows that hardware won't be able to solve every problem. Rachmeler noted that the online retailer's focus will shift to modular software systems, which will help ease development and maintainability.
Giga Information Group analyst Mike Gilpin said Amazon took the right approach to expansion, given its circumstances. Now the company will experience the "classic set of growth pains" encountered by early adopters as they grow to be large companies, he said.
Amazon will need to take a "more controlled approach to software architecture" and do "more separation of function between different layers of the architecture. There never is an easy time to make those changes," he said.
One challenge, for instance, will be solving a "contentious" middleware issue, since the company now uses software from several vendors, according to Rachmeler.
"Moore's Law is not going to save us anymore," she said. "We're going to have to get smarter about the way that we use our systems, not just increasing [capacity and availability] by steroids."