Web data: Cached or current?

For all of its virtual connotations, the Internet depends entirely upon its physical infrastructure to move information around. And the physical distance from server to end user leaves plenty of time for information, in the form of packets, to get lost, resulting in e-mails that never arrive, Web pages that load incompletely and streaming audio or video that pops, flickers or just dies. So getting files closer to end users can improve performance.

One way to do that is by caching files near the edge of the network, closer to users. Barry Weber, vice president of technical infrastructure at BarnesandNoble.com, says the company's BN.com site saw a 50 percent improvement in performance from the end users' perspective after it started using caching in February last year.

Within the past few years, more companies have embraced caching as a way to push static content out to users, frequently outsourcing the content to external content delivery networks (CDN). CDNs are groups of Web servers and caching servers, which are simpler and less expensive than Web servers but also aren't able to generate dynamic content.

Companies are increasingly turning to CDNs because they can deliver static content more reliably than the prevailing model of a few clusters of Web servers serving every request. BN.com outsources delivery of its static content to Akamai Technologies. After BN.com uploads new content to one of Akamai's servers, it takes two to three hours for it to become available across Akamai's CDN. The CDN intercepts all IP requests for BN.com's static content HTML, images, streaming audio or video and serves it to users from the available cache that's physically closest to the user.

Meanwhile, requests for dynamic content, such as book inventory levels and targeted banner advertisements, go to BN.com's servers as usual. Both find their way back to the end user, who sees only the finished Web page. Though CDNs are unnecessary on a small scale, the CDN helps keep the site running quickly when, say, a new Stephen King novel comes out and thousands of users are viewing the book's Web page on BN.com every hour.

Now, for the first time, caching is enabling companies to do things that were previously impossible or very unreliable on the Internet, such as streaming catalogs of media files. But caching still leaves something to be desired for retail companies, such as Barnesandnoble.com, that dynamically generate their Web pages with content specifically targeted at individuals.

Some companies have a financial imperative to make their video files reliably available on the Internet. And reliability has been elusive, especially as the number of simultaneous streams has increased.

"If you're throwing these giant streaming files around your worldwide network, capacity becomes an issue very quickly," says Greg Howard, an analyst at HTRC Group.

But caching, he says, "can dramatically reduce costs for streaming, mainly in the areas of maintaining wide-area network capacity." Just as CDNs can put static files closer to end users, so, too, can they keep copies of streaming media files, serving multiple users from multiple locations rather than from just the few centralized streaming servers many companies use.

Take, for example, Coastal Training Technologies, which sells safety and training videos on topics ranging from blood-borne pathogens to oxyfuel welding.

Before customers buy, they want to preview the videos, which can cost up to US$800 each. In the past, Coastal would mail out bunches of preview tapes. But it could take weeks for customers to review them, which made it difficult to close sales with follow-up calls.

Coastal wanted to make decent previews available online but didn't want to have to run Web servers to house the thousands of necessary preview files. After attending the Streaming Media East conference in New York last summer, the company decided to outsource the delivery of its previews to a CDN.

Coastal chose Digital Island Inc. in San Francisco after also evaluating service from Activate, Akamai, Burst.com, Globix and iBeam Broadcasting. Choosing Digital Island over Akamai was practically "a flip of the coin," says Mark Stelbauer, Coastal's director of e-business.

Coastal uses 500K Advanced Streaming Format files. The company uploads 50 or 100 files at a time via file transfer protocol to a Digital Island server, and within a few hours, the files are propagated across the CDN. Unlike many other CDNs, which cache content based solely on popularity, Digital Island also maintains many copies of Coastal videos on several different servers.

"Since we're not targeting the consumer, the files are not going to be requested every 15 seconds. For us, it's maybe every 15 or 20 minutes," says Stelbauer. Thus, a popularity-based model wouldn't work there.

Coastal wouldn't specify how many users previewed videos exclusively online but did say that once the figure reaches 20 percent to 30 percent of overall users, it will make an impact in the bottom line. Already, however, salespeople are able to call just hours after previews are viewed online, which has helped sales.

Though current hardware and software makes it possible for companies to build their own CDNs, HTRC's Howard cautions against it. "People who are building their own CDNs are finding it too difficult or not cost-effective when you include the cost of labor. It just makes sense to go to the service providers in this market," he says. A company such as CDN outsourcer Akamai has 9,700 servers configured in 650 networks across 56 countries, a scale that few do-it-yourselfers would be able to match.

Pricing for outsourced CDNs wasn't available for this article; CDN vendors wouldn't release the information, and the customers interviewed were contractually prohibited from discussing it.

Users say that, in general, vendors divulge little information, making it difficult to compare them when shopping for a CDN. But there are other ways to evaluate CDNs, namely by their performance. That's what BN.com did in February last year, when it pitted its top three CDN choices (which it declined to name) against one another, watching as each hosted the static content on the BN.com site simultaneously. "It's pretty fascinating, because we really had the statistics," says Weber. The company chose Akamai.

Beyond the Static

Caching can speed the delivery of content, but to date, it has only been good for static Web content, not dynamic information such as pricing. Weber says that's the way it has to be for now, given current cache limitations. "I'd like to go beyond caching static content, as soon as possible," he says.

What holds him back, he says, are "distributed databases and distributed applications," which produce the dynamic information on a Web page that's tailored to individual users or which changes quickly. Caches can't handle that content well.

Caching dynamic content is "problematic from a database standpoint, because you need one version of the truth," says Peter Firstbrook, an analyst at Meta Group. Companies need to be able to refresh the information across the CDN whenever a little change occurs, so there's just one version of it. So "you have to be able to delete pages from the cache when a certain event occurs, not just at a certain time," Firstbrook says.

At Outpost.com, the site of Cyberian Outpost, for instance, the dynamic information on any given Web page can include product information, real-time stock inventory, product categories and order-tracking information.

Even prices change moment by moment. "The average price can change six or 10 times per day on [a] product," says Raymond Karrenbauer, chief technology officer at Outpost.com. Every time new inventory lands in a warehouse, the e-commerce application adjusts pricing based on current inventory supply and customer demand levels.

Some industry initiatives are afoot to let companies push dynamic assembly onto the CDN to increase content delivery speed. One is the Edge Side Includes (ESI) open-standards specification, co-authored by Akamai, Art Technology Grou, BEA Systems, Circadence, Digital Island, IBM, Interwoven, Oracle and Vignette. The core of ESI is a series of XML tags that specify how and when information and pages should be assembled within the content management system, application server and CDN. To date, Oracle's 9i application server and Akamai's EdgeSuite infrastructure service support ESI.

Two newer companies are also forging into dynamic delivery territory. Software from SpiderCache in Vancouver, British Columbia, and Chutney Technologies in Atlanta can accelerate dynamic content delivery by using things such as event- or time-based expiration of caches, predictive modeling and real-time cache consistency checks.

But these are baby steps. "The Holy Grail is to move all this stuff out to the edges," says Firstbrook. "But the reality is, I don't think you'll be able to do that anytime soon."

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about Akamai TechnologiesAkamai Technologiesbarnesandnoble.comBEABEA SystemsCircadenceCoastal Training TechnologiesCyberian OutpostDigital IslandGlobixHTRC GroupiBeam BroadcastingIBM AustraliaInterwovenIslandMeta GroupOracleOutpost.comSpiderCacheVignette

Show Comments
[]