Computerworld

Column: Data Replication, the 'Net: What to know

  • Shaku Atre (Computerworld)
  • 19 November, 1998 12:01

The Internet has emerged as a low-cost, do-it-yourself data-replication strategy. The timing, however, is ironic because that emergence is occurring at the same time organizations are rolling out database-aided data replication.

It isn't surprising that an increasing number of organizations are experimenting with replication for the first time. Microsoft Corp. is touting the new, flexible replication in SQL Server 7.0. IBM Corp. provides similar features with its various tools for DB2-only and heterogeneous environments. Sybase Inc., too, offers bidirectional replication between mobile Adaptive Server Anywhere clients and the Sybase Adaptive Server. Making data replication even easier, most vendors have wizards that help automate the process.

The primary driver behind any data replication strategy is usually the same: multiple people in separate locations needing access to the same data.

Sounds simple, right? Well, it isn't. The fundamental problem is this: As soon as you have the same data in more than one place, the problem arises of how you know it is in fact, the same. If data changes in one place, it needs to change in all places to stay consistent.

That problem is tough enough in a simple replication scenario, where you have a single, central copy that needs to be replicated to multiple clients. But usually it turns out that users also need to make changes to the data. Short of a complex, database-specific means of ensuring that changes made at those sites actually originate at the central site and then flow out from there, managing that two-way replication is enormously more complex than the already difficult, one-way scenario.

The chief drawbacks to a centralized data approach have traditionally been data access, retrieval speeds and the investment in the network infrastructure that supports such data access.

The infrastructure issue is what's boosting the Internet as a network surrogate for data access. The virtually maintenance-free and cost-free aspect of Internet computing is making a return to centralized data possible. But, in fact, that "free" infrastructure doesn't come without a cost.

The Internet communications model doesn't support the conversation-like mode of communications common in traditional networking, where there's acknowledgment when data is received. Instead, in Internet computing a message is sent to a destination and is assumed to have arrived safely at some (usually unknown) time. There's no notion of sending, waiting, receiving and acknowledging during the communication.

Not knowing whether data has been received -- nor how much time elapsed between its being sent and its being received -- makes for a dicey data-replication infrastructure.

You have two choices to solve that problem: Bring the data to the users (standard data replication) or bring the users to the data (access to centralized databases). Which should you choose? Consider the following questions:

-- What is the nature of the system or application? Applications that require very fast response time or online data integrity may not yet be candidates for Internet-based access. Data-query-oriented applications and those that support only low-to-moderate levels of transactions are better suited for such an approach.

-- What kind of communications infrastructure do you have? If you have invested in a wide-area network, why rock the boat? Stick with what you have. If you don't have that infrastructure in place or want to reduce your maintenance and support costs, consider using the Internet instead of your own private network.

(Atre is president of Atre Associates Inc., a consulting firm in New York that specializes in data warehouse and database technologies. Her e-mail address is shaku@atre.com, and her DataWareMart methodology can be found at www.atre.com.)