Computerworld

DBAs reveal how they rev their database without spending a fortune

Lots for admins to choose from on the market, but throwing money's not the answer
  • Eric Lai (Computerworld)
  • 20 August, 2007 08:16

With a flock of new database management packages suddenly hitting the market, you'll forgive your database administrator (DBA) if they suddenly seem to be in a spending mood.

New or upcoming products include Oracle's just-released 11g, which, as usual, promises more features than a DBA can wrap his head around; Microsoft's SQL Server 2008, the follow-on to the successful SQL Server 2005, which is due out next February; the new "Cheetah" release of Informix, with which IBM hopes to chase down lost market share; and Version 9 of FileMaker's namesake software, which the vendor says is Web 2.0-ready but remains as easy to use as ever.

Tempting as these new products can be, experienced DBAs know that new versions usually pack in more features, meaning that any performance boost is more likely to come from the expensive hardware upgrade accompanying a database upgrade, not the database upgrade itself.

And those gains can be limited, if underlying design flaws or operational problems remain.

As Amy Stuemky, a DBA at travel tour operator Globus, puts it, "Throwing hardware at a problem is always the wrong solution."

So how do you solve problems like processes that tend to run for days, or database utilization that mysteriously spikes up to 100%, without spending an arm and a leg on consultants or new hardware? Some DBAs say all it takes is a little resourcefulness and sweat, augmented by a well-chosen third-party tool.

Fixing bad SQL code

For DeWayne Treadway, a database and systems administrator at Dallas-based Lone Star Steel Co., the problem was an "overall general slowness" with the steel maker's pair of Oracle databases that Treadway couldn't pinpoint.

A steel manufacturing plant in the northeastern Texas town of Lone Star, was running Oracle 8i on IBM AIX servers, one of them a 16-way System p 570, the other an older RS/6000 M80, when Treadway arrived two years ago.

Treadway had a hunch the problem had something to do with the SQL code. It had been written by contract developers five years earlier, when Lone Star made a big move off mainframe computers to Unix servers. Lone Star's situation is hardly unique. "Database applications are immortal," wrote Josh Berkus, one of the developers behind the open-source PostGreSQL database, in a recent blog post. "The average life span of a 'temporary, one-off' application is four years, and there is code from the 1960s that is still running today."

So figuring out where the code was not easy for Treadway. The Statspack application that came with the Oracle Database didn't store historical performance data, making it "no good" for the task, he said.

And as Treadway put it, "I'm a DBA, not a developer."

Page Break

After looking at a number of database performance-tuning software packages, including Oracle Enterprise Manager and Quest Software's Spotlight on Oracle, Treadway chose Confio Software's Ignite for Oracle.

Using the tool and its "wait-time analysis" features, Treadway was able to quickly pinpoint three pieces of SQL code that were causing half of the database's delays. For example, Treadway was able to get one piece of code fingered by Ignite rewritten so that a process that formerly took four minutes would run in just 10 seconds.

"If it runs 15 times an hour all day long, you can see how that can accumulate," he said.

Since starting to use Ignite a year ago, Treadway has been able to improve database performance by 30%. He says he could have wrung a 50% boost out of the system with further tweaks, but he didn't bother because the company is about to move from 8i -- an eight-year-old product for which support has ended -- to Oracle 10g R2.

But even so, Treadway figures that by using Ignite he'll be able to better test the new Oracle database's performance as he rolls it out, rather than "just make sure it works, which is what I would have done without Ignite."

Building a database-backed Web site the right way

Backing a dynamic content or e-commerce Web site has become one of the most popular applications for databases in recent years. Yet, the vast majority, at least according to Neil Day, former CIO at Walmart.com, aren't built in an efficient way.

According to Day, in the typical three-tiered Web application architecture, you "beat the crap out of database." To prevent a bottleneck at the database, DBAs need to "insulate the database" from the application's requests by using a combination of caching and very intelligent balancing of traffic.

To do that, Walmart.com created nodes that were each made up of four lightweight, commodity PC servers.

"There wasn't a piece of Cisco, Sun or EMC gear -- it was all cheap x86 boxes and other disposable hardware," he said.

With the entire application stack including the massive Walmart.com database, each node is able to cache and handle all of the traffic requests a particular user might make. That minimizes bottlenecks and the resulting delays, and it ensures that a user's requests always stays local.

According to Day, Google's strategy of replicating its search index database over tens of thousands of PC servers for speed and redundancy, called sharding is a similar, less complicated version of his strategy.

But Day says that sharding doesn't work as well for media content sites or an e-commerce site like Walmart.com, which has a huge inventory of items as well as millions of graphical elements and images, and as a result has a much larger database than Google has.

"The interesting part is how to make sure that most of the data any particular user will need is always near those four servers," he said.

Day is loath to reveal many more details. But using this strategy, Walmart.com was able to become one of the largest e-commerce sites in the world while "running on a couple of million dollars worth of hardware," he said.

Page Break

"At the end of the day, if you're going to be cheap, you're going to have to be clever," he said. And "in a high-transaction-volume world, you may spend a hideous amount of money and still have a poor-performing Web site."

Day has brought a technique similar to the one he used at Walmart.com over to his Web music start-up, MediaMaster.

For DBAs who like what they're reading but don't want to rebuild their infrastructure from scratch, Day says that open-source Java clustering software from Terracotta, created by the former chief architect for Walmart.com, Ari Zilka, effectively uses many of the same tricks.

"Terracota provides a really sophisticated and intelligent cache for data that is used across the application. Just as importantly, it figures out how to get that caching to work seamlessly in code below your application layer, so it happens transparently," he said.

When upgrades go bad

For Amy Stuemky, longtime DBA at Globus, the problem was in the travel company's mission-critical Web site.

"The site would go down and nobody knew why," Stuemky said. "We weren't sure whether it was the database or the Web servers or applications. It was driving us crazy."

Globus has 50 Oracle databases ranging from Versions 8 to 10g, and 30 Microsoft SQL Server 2000 databases. Stuemky is the sole DBA.

That made it harder for Stuemky to devote a lot of time to figuring out what was wrong with Web site, which travel agents and an increasing number of consumers use to book travel tours.

Stuemky even did what she knew wouldn't work: add two more CPUs to the SQL Server databases.

"We did it because our database licensing allowed it, but the databases just took up all of it [utilization] as well," she said.

After looking at several database tools, including Quest's Spotlight (which Stuemky pronounced "excellent but not within our budget"), Globus also turned to Confio's Ignite.

It bettered SQL Server's built-in performance monitor tool, which let Stuemky check the CPU availability but not drill down to see what the users and queries were actually doing.

After installing Ignite on her desktop PC, Stuemky let it run. Within a day, it came back with a diagnosis: a "huge locking problem" in SQL Server. Armed with the accompanying data, Stuemky was able to obtain a hot fix from previously unhelpful Microsoft support staffers.

She was also able to diagnose the main issue: session state tables created by SQL Server to keep track of user visits that needed to be rebuilt from scratch because of an upgrade in Globus' .Net drivers. Because rebuilding the tables was never mentioned in Microsoft's documentation, Stuemky said it may have taken many more months or even years to figure out that problem.

In the past two and a half months, the site has been "much more stable" and up 24/7, Stuemky said.

Page Break

Fixing the business, not the database

Simple adjustments to a company's business process can often pay more dividends than a technical fix. Indeed, fixing a problem -- tuning SQL code, for instance -- can often simply shift a bottleneck from one place to another, according to Craig Shallahamer, an ex-Oracle-turned-independent database consultant with his own firm, Orapub.

By contrast, reducing a database's workload or rebalancing it (moving to a different machine or running a noncritical batch job to a different day or time) can accomplish much more.

So why is this approach so rarely taken? Shallahamer says the problem is the typical personality type of a DBA: introverted and oriented toward problem-solving.

Because of that, says Shallahamer, "they tend to look for any technical problem to solve so when they go home they can feel good, even if that isn't the best solution."

Shallahamer recalls working with a DBA whose Oracle systems were "getting hammered" during the peak month's-end closing period. Just in passing, the DBA mentioned that there were some SQL statements that were part of a message-passing application that were being run every second.

"I asked him if there was any way he could slow that [SQL statement] down to run just once a minute. He said that he would have to check with the users first," he said. "They said, 'Oh yeah, that's not a problem.' So it's really all in how you present it."

DBAs also "often don't feel powerful enough to push for process re-engineering," he said. As the prior example showed, said Shallahamer, that's also a misconception. DBAs can get all of the information they need, such as profiling a workload, to "make a really amazing case."