Computerworld

Getting your data warehouse in shape

No matter how well you tune your data warehouse, reliability of data is tantamount to any successful project. And even simple discrepancies can cause serious repercussions when attempting to make well-founded business decisions based on the warehouse data.

Unreliable data has resulted in such blunders as listing a doctor's patient as male and pregnant, industry observers say.

Operational data warehouse applications provide decision-makers with information that helps them monitor and control the company.

And then there's data marts.

A data mart is a collection of databases and tools focused on a specific business problem. Size alone doesn't define data marts, though they tend to be smaller than data warehouses, which are enterprisewide collections of data that span numerous business areas and topics.

For example, a utility company could use accounting information in a data mart to plot spending over the course of the year to find opportunities for cost reductions. It also could study the annual or quarterly workload and reschedule staff to avoid overtime. Such applications use tools such as agent technology query and reporting tools and online analytical processing and multidimensional databases.

Agent tools, analytical or data mining applications, meanwhile, use sophisticated software to help staff members come up with insights about a company's customers, processes and markets.

For instance, a company might study its customer base and identify new niche audiences for its products. That could enable the company to tailor marketing more closely to customer needs. Tools used here include statistical analysis tools, discovery-based data mining tools and visualisation tools.

Defining your data warehouse goals

Companies which build a poorly defined data mart or data warehouse that aren't tied to solving a particular business problem, will result in a poor system at either analysis or operational monitoring.

For example, a multidimensional database (MDD) lets you study the interdependence of many factors (dimensions) that impact how the business is running (by store, by salesperson, by product line, etc). That can help you monitor the business. You first define these dimensions and then summarise the data to fit them.

What if you want insights into what customers buy? That will be difficult to do with an MDD because the data is already summarised in specific ways that define the problem. You could redefine the dimensions.

But summarised data will never really support wide-ranging analysis. You need to analyse the underlying detailed data. A data mining tool lets you study the detailed data from many angles.

Dividing applications into these two categories and seeing which type appeals to senior executives might also help you determine which is more likely to win approval. Some executives will be more excited by an application that promises direct improvements to operations through strengthened control in spending, personnel deployment, inventory control or manufacturing process efficiency. Others will prefer the "deep stuff" that analyses the data and looks for breakthrough understanding and predictions about customer behaviour.

Both categories have the potential for huge returns.

From build to buy

As data warehousing goes mainstream, the role of packaged applications and turnkey offerings will become more pervasive. That has to happen, just as it happened -- and still happens -- in the world of online transaction processing (OLTP).

Given the choice, most companies would rather buy than build. In the early days of transactional processing, custom application development was the only option.

You had to build an application because you couldn't buy one. Common wisdom held that each company had needs so unique that no software package could meet them. The same belief surrounds packaged data warehousing applications.

There is some truth to that notion. Each company's warehouse will have a different mix of data sources from which to pull together and transform data. Query needs also vary and change depending on company needs. But custom development and maintenance are just too expensive for many businesses.

Some companies buy data marts that they hope will provide turnkey solutions. They include the "get started quick" product bundles from vendors such as NCR, Sybase, SAS Institute, along with data marts based on products from Informatica, Sagent, and IBM's Visual Warehouse.

Those data marts give you a technical capability, but usually they require extensive customisation to do something really useful. How much that costs depends on how far you want to go to reflect your business in the data mart; consultants cost a couple of thousand of dollars per day. So users often skimp on customisation and wind up with a data mart with a fairly generic query capability that limits its return on investment (ROI).

What's needed and what vendors are starting to supply is a depth of business content that lets you exploit the data warehousing opportunities for your industry (vertical applications) or business functions (horizontal applications). This content needs to be built in.

For example: various AS/400 independent software vendors have created a wide range of template-based industry-specific data mart systems. Many can draw data directly from OLTP application packages. Along with the templates, vendors also may bundle consulting services, offering experts who know both the vendor's product and the industry.

Other vendors go further and offer packages that are full applications for data warehousing and data mining. IBM offers the Fraud and Abuse Management System, a data mining application for a data mart that helps insurance companies save millions by identifying suspect patterns.

Platinum is notable for its RiskAdvisor, an insurance application that is essentially a robust enterprise data warehouse in a box.

Expect to see smart data warehousing vendors dev-elop and market packaged products that do even more to help a company achieve ROI by improving business performance.

Such packages have to do two things: take the technical burden from the user without limiting scalability or flexi-bility, and supply robust business content. Not everyone needs or wants a packaged data warehouse.

But many small and midsize businesses do, as do departments within larger companies that don't have the resources to build and manage a data warehouse or data mart. They shouldn't have to. A well-crafted package or application template builds in this expertise.