Computerworld

Steam processing at breaking speeds

At investment management firm Bridgewater Associates, access to real-time data is measured in market ticks. Data feeds containing quote and trade activity are expected to stream in at 124,000 messages per second this year, so even subsecond delays in the arrival of data can affect trading decisions and put the U.S.-based organization at a disadvantage.

Monitoring high volumes of data that have very low latency requirements is beyond the capabilities of transactional databases, which must write each transaction to disk, so financial services firms traditionally build their own custom applications to keep up.

"There is a lot of effort required to build a framework that could perform and deal with lots of data concurrently," says Ed Thieberger, head of training technology at Bridgewater.

Recently, however, Bridgewater and other financial services firms have found an alternative in stream processing tools. Stream processing software goes by a variety of names, including streaming databases and event stream processing. The technology includes an engine that monitors data as it flows into and out of databases and other applications and can easily tap into external data feeds or internal message queues. All the data the engine gathers is held in memory to speed processing.

With data volumes increasing, organizations are running out of options for real-time processing. Financial services firms have little choice but to pursue stream processing because data quantities are starting to outstrip the capabilities of even custom-developed tools.

"At these volumes, traditional techniques won't scale," says Mike Stonebreaker, co-founder and chief technology officer at Massachusetts-based StreamBase Systems. Bridgewater's custom C++ program could handle 18,000 messages per second -- more than the 900 a relational database could support, but far short of the data volumes it faces this year. In contrast, the StreamBase engine handles 140,000 messages per second, Stonebreaker says.

Having gained a following in financial services, the emerging technology is beginning to spread to other industries that need to monitor operational data and interpret and respond to events in real time. Businesses are using it in areas as diverse as compliance management, network monitoring and real-time fraud detection in telecommunications, retail and e-commerce.

Stream processing software is also ideally suited to leverage message-based data flows within a service-oriented architecture (SOA). "If your organization already has MQ or other message-oriented middleware, then this is relatively straightforward," says Charles Nichols, CEO of SeeWhy Software. Users set up rules- or time-based queries that tell the stream processing engine what to look for. The software then monitors one or more data streams and triggers the appropriate response when one of those conditions is detected.

To keep latency low, stream processing systems place data that must be retained in memory and discard everything else. Nothing is stored on disk.

"Streaming databases say, 'Let's not try to store everything. Let's just watch everything as it flies by and keep running totals,'" such as the total number of transactions per second, says Eric Rogge, an analyst at Ventana Research.

At Bridgewater, Thieberger uses StreamBase's streaming technology to watch for delays in data feeds coming in from providers of market data. If one feed falls behind, StreamBase immediately issues an alert and splices in the missing data from another source. "The tool is very well suited to represent all of the rules we want to implement that lead to decisions about how we are trading," Thieberger says.

He measures the success of stream processing both in reduced development costs and faster time to market. "We haven't had to build a framework that does what StreamBase does," he says. In addition, once StreamBase is pointed at the data streams to be measured, business analysts can construct queries using a drag-and-drop user interface rather than rely on programmers, Thieberger says.

Stream processing also matches up well with another emerging technology: radio frequency identification. "Streaming is the only technology that can handle large volumes of RFID data that need to be analyzed on the fly," says Diaz Nesamoney, founder and CEO of Celequest, a business intelligence tool vendor.

The challenge with RFID tags is that they broadcast the same data continuously, says Jan Vink, IT director at Boekhandels Group Nederland, a Netherlands-based chain of 42 bookstores. When a pilot bookstore recently began checking in more than 1,200 books per day using RFID tags and a tag reader "tunnel," Vink used Progress Software's Apama tool to filter out the repetitive messages and ensure that each book was received in the system just once. The 45 to 50 boxes a day the store receives now take a total of 125 seconds for incoming processing rather than the 125 minutes required before, says Vink.

Page Break

The toughest part of the project wasn't the technology, however. "It was new for us to work with event streaming," he says. "We were used to batch processing."

Stream processing tools first arrived on the scene several years ago, having grown out of academic research at several universities, including professor emeritus David Luckham's seminal work on complex event processing (CEP) at Stanford University. Some vendors were spawned from those academic exercises. For example, Stonebreaker is both an adjunct professor of computer science at MIT and CTO at StreamBase. And Celequest worked with researchers at Stanford in developing its product.

Most players in the market are small, and the technology is still maturing, says Philip Howard, an analyst at Bloor Researchin Towcester, England. "None of the big boys have entered this space yet," he says, but that's changing.

Many of the start-ups have only a few customers today, says Gavin Little-Gill, an analyst at TowerGroup. He expects many of the vendors to disappear over the next two years as the market consolidates, new vendors jump in and stream processing capabilities are integrated into databases and other tools. "A bunch of these guys are going to get gobbled up by the Oracles and Microsofts of the world," Little-Gill says.

Already, Tibco Software Inc.has launched its own product, Business Events, and Progress Software acquired startup Apama last year. IBM is working on two CEP projects: one called Active Middleware Technology and another, from its Tivoli Software group, called Active Correlation Technology.

In addition to using stream processing to detect and react to events, some tools also feed real-time updates to a dashboard. "If there's anything that's driving this, it's dashboards," says Rogge.

At first blush, the dashboards presented by stream processing tools sound a lot like business activity monitoring (BAM) tools, but the latter lack the ability to analyze data and perform event processing, says Jeff Wooton, vice president of product strategy at Aleri Labs, a Chicago-based vendor of stream processing technology. Nonetheless, he sees the two areas converging. "Either BAM products will make use of event-processing products, or they will start competing with them," he predicts.

Maja Tibbling, lead enterprise architect at Con-way, says that unlike BAM tools, stream processing can measure what is not happening, which she says is just as important as knowing what is happening. The transportation company uses Tibco's BusinessEvents to track and plan pickups and deliveries and the activities of inbound and outbound trucks to ensure that transportation planners are working with the most up-to-date information.

"In our rules engine, we need to know who cares about [an event] and how long they are allowed not to get [the information]. You can always figure out if something has happened. That absence of events is what's difficult to capture," Tibbling says.

Con-way recently moved toward an SOA as part of an effort to better integrate its systems, and BusinessEvents plays a key role in monitoring some 8 million events a day on the company's enterprise service bus. Events are published on the ESB, where processes that need to know about specific events can subscribe to them in parallel. "All of them, not just one stream, need to complete within a two-minute service-level agreement. If one doesn't complete, we need to know which one," Tibbling says. BusinessEvents triggers that.

Cendant is deploying Celequest's Analytics Server to keep up with hotel reservation requests coming in from channel partners such as Orbitz. "We wanted to know exactly what was happening with the business in real time," says Nick Forte, director of application architecture. With more than 6,500 hotels to book, Cendant's systems handle up to 500 transactions per second during peak times through various channels. Forte uses dashboards to monitor activity. Having an ESB architecture facilitated integrating the tool with data streams. "An ESB makes it much easier to pluck that information off and look at it," says Forte.

Page Break

With the system set up on one channel, Cendant has already seen a benefit. During initial testing, the tool revealed that a few-hundred-thousand rate plans had no inventory allocated to them. As a result, requests against those plans received an error code. "We kept telling them the product is not available," says Forte -- a costly error. The dashboard picked up on the problem, allowing staffers to quickly remedy it and limit a potentially large loss of revenue.

The next step will be to expand the system to all of Cendant's channels, Forte says. He also hopes to use the system to automate yield management. "If we see occupancy rates going up on a property, we might want to trigger an event to send rates higher by some percentage," he says.

Forte describes stream processing and Cendant's move to an SOA as the first steps toward a more proactive approach to operations. "The wave of the future is predictive modeling," he says. But that's in the future. Right now, Forte says, "we're trying to get all of the plumbing into place."

The biggest challenge to stream processing may not be the technology but the change in mind-set that's required to effectively use the tools. "The barrier is changing the way you think about the problem," says Tibbling. "In this case, it's how you think about business problems in multiple dimensions. How do you externalize what your brain does automatically? To put that in software is a difficult matter."

Stream processing tools play by the rules

The secret to using stream processing tools effectively lies in knowing what you want to monitor for and creating the proper rules. The tools for creating queries fall into two competing camps: rules engines built on if/then statement logic and variants of SQL, which have been extended to support time-based queries. Mike Stonebreaker, chief technology officer at StreamBase Systems, prefers the latter. "SQL is great paradigm. People understand it and what you do is adapt SQL to real-time streaming data," he says. Other vendors that use SQL include Aleri Labs and Celequest Corp. "The problems with rule engines is there's no standard; there's no rule equivalent of SQL," he says. But while SQL is a standard, a variant of SQL with extensions to support stream-based queries doesn't exist. Stonebreaker also criticizes rules engine syntax, which he says is overly complex. "Rule languages have problems of inscrutable notation, and it's completely unordered," he says. If anyone was going to support SQL, one would think it would be Progress Software, but the database vendor favors the rules engine that came with its acquisition of Apama. "The important part here is to have a robust enough language to express temporal logic. Time as the first order issue as opposed to filtering," says Mark Palmer, vice president of event stream processing at Progress' real time division.

At a glance: stream processing

What it does: Stream processing software monitors and analyzes operational data flows in real time to detect predetermined conditions or events.

Pros: Can handle very large volumes of streaming data with very low latency requirements; faster than custom-built applications; integrates well with service-oriented architectures.

Cons: Most applications are still in financial services; products are relatively new and still maturing; analysts expect a vendor shakeup in the next two years; capabilities may eventually be embedded within existing tools such as databases and business activity monitoring.

Best Use: Applications that require fast detection of and reaction to business events embedded in large volumes of data, such as fraud detection, network monitoring or transaction monitoring.

Page Break

Streaming keeps the brew flowing

Diageo in London has been using stream processing software and a dashboard provided by SeeWhy Software for the past two years to track the status of shipments of Guinness beer to the U.S. Previously, shipments arrived from the factory to the port of departure in England with a nearly 100 percent on-time rating, but only 50 percent of shipments that left that port arrived at the U.S. warehouse on schedule. Andy Cullen, head of supply chain planning and exports, brought in SeeWhy, which pulls shipping data from the company's SAP system, gets "in flight" updates from shippers' Web sites and streams in historical information on the shipper, seasonal factors, routes and other criteria to update the status and predict whether each shipment will be on time. It also continually updates the current overall on-time rating and predicts what the month-end number will be.

Cullen can drill down by carrier or destination to see why shipments are running behind schedule. When the lowest-cost shipper is the slowest, analysts must decide whether it's more cost-effective to use a faster shipper or expand inventory in the U.S. warehouse as a supply buffer. But some decisions are easier to make. For example, Cullen discovered that some higher-cost shippers were also slower than less-expensive competitors. "We've found ways of driving [the percentage of on-time shipments] from 50 percent to 80 percent this month, and we're consistently above 70 percent," he says.

Cullen says organizations that want to try stream processing technology should have a clearly defined problem in mind and start small. Although Diageo now wants to expand the use of the technology to its other business units, Cullen says solving a problem at the departmental level first and working with a start-up like SeeWhy allowed his organization to be more nimble. "If we'd both been huge companies, it would have been very difficult to get this project going," he says.

Where real-time dashboards fit

The benefit of stream processing is the ability to react in real time. So why do some vendors include a dashboard? "The minute you put a human in the loop, you're not talking millisecond [response]. Dashboards are fundamentally long-latency kinds of things," says StreamBase Systems CTO Mike Stonebreaker. The value of stream processing lies in the ability to provide a closed-loop system that analyzes the event and reacts in real time, he argues.

That's the ideal, says Tim Bass, principal global architect at Tibco Software. But the reality is that in many cases, human intervention may be required. No one is going to turn down a transaction if the system is only 70 percent sure that the buyer is committing fraud, he says. The trick is to correlate those events with other historical data streams to boost the confidence level to 99 percent. "Then you have capability to take automated action," Bass says.

Dashboards definitely have their place, says Celequest CEO Diaz Nesamoney. "The dashboard is there more for status check," he says. "In many applications, it's not as though something is going wrong all the time. You also want to know when things are running fine, what are my call volumes right now. You can see that visually."