Teradata Scales a Data Mountain

SAN MATEO (02/14/2000) - Windows NT shops that have longed for a highly scalable and reliable data warehouse are finally seeing their wishes fulfilled.

NCR Corp. has ported its Teradata database system to Windows NT, and the result -- a feature-rich product capable of running and managing the very largest data warehouses -- is a smashing success. Teradata for NT (TNT) earned a score of Excellent in our evaluation, thanks in part to its flexible, scalable architecture.

Teradata has provided unprecedented performance on Unix systems for more than a decade. Major organizations all over the world rely on its scalability and performance to run the most complex and aggressive data warehousing solutions ever deployed. Two years ago, NCR began the process of porting Teradata from Unix to Windows NT as part of its "Open Systems" strategy. Consequently, NCR now provides an excellent solution for businesses that want to use Windows NT with true scalability and performance for their data warehousing solutions without the high cost of a Unix-based Teradata environment.

Hello BYNET

Teradata is a relational database system that will provide decision-support capabilities for organizations that need to store and analyze gigabytes and even terabytes of data. The heart of TNT is how it brings MPP (massively parallel processing) and SMP (symmetric multiprocessing) technologies together.

TNT uses the same BYNET architecture that NCR created for the Unix version, porting BYNET from its native MP-RAS (NCR's Unix distribution) environment. The BYNET architecture loosely couples up to four Windows NT SMP systems, or nodes, into a single, logical database system.

Each of the NT nodes runs two types of virtual processors: PEs (parsing engines) and AMPs (access module processors). PEs process the SQL statements from clients and manage the sessions with the AMPs. AMPs manage the database access and provide the database parallelism.

For example, a client application executes a standard ANSI SQL SELECT statement used to retrieve a results set from the Teradata database. The request is received and processed by the PEs. They parse the SQL statements, optimize the query plans, and send the requests to the AMPs through BYNET. Each AMP processes its portion of the request and sends the results back to the PEs. The PEs combine the results and return the answer set to the client. The result to the business is incredibly fast information retrieval, even with the most complex queries.

I performed my evaluation using an NCR WorldMark 4800 system configured with two NT nodes. The physical implementation for the whole system included eight Pentium III Xeon 500-MHz CPUs, 2GB RAM, and 80 9GB disk drives. Teradata currently supports up to four CPUs in an NT node. This is fewer than the eight supported by the OEM version of NT but still provides the performance of the equivalent on a Unix-configured system.

I began the evaluation by testing the Database Window on the administrative workstation. I was able to find the 10 AMPs and two PEs configured for each node, but Database Window's displays were character-based and more consistent with results I would expect from a Unix system. Fortunately, NCR configures all customer systems before shipping them and also performs all upgrades required by its customers. Therefore, I felt that getting to know this application in detail would not necessarily be a top priority for systems administrators.

Instead, I found Teradata Manager, which I ran on the desktop, to be a much better administration tool than the Database Window. It provided a friendly Windows GUI, with the capability of managing performance monitoring, alert management, graphical analysis, reporting, and system configuration.

NCR has packed a lot of features into Teradata, so new users should not expect to master the product immediately. I recommend new customers take advantage of the education credits provided by NCR as part of the license cost.

Documentation is nicely presented on a single CD. It was extremely useful and designed specifically for use with Adobe Acrobat 4.0, which is also provided.

Getting to data

I spent a lot of my time using two of Teradata's SQL tools: the WinDDI (Windows Data Dictionary Interface), a GUI for performing SQL DDL (Data Definition Language) commands to manage database objects and users; and Queryman, which I used to enter my SQL query commands. I really began to understand the features and benefits of Teradata while using these utilities.

I created a new table using the CREATE TABLE command in WinDDI. I paid special attention to how I entered the table into the system; using other RDBMS systems, performance can be greatly enhanced through careful planning of the physical placement of tables in the database. However, with Teradata, the only syntax that identified where the table would be stored was PRIMARY INDEX (fieldname); no segmentation or paging extensions were required.

Teradata uses the fieldname to determine the placement of the table information across the AMPs using a hashing algorithm. As the administrator, table placement was taken care of for me, making this task incredibly simple.

Organizations that adopt TNT will find such simplified administration fuels faster physical implementations of the data warehouse.

Adding users to Teradata was initially somewhat confusing. I had expected Win-DDI to display administrative data by database, in much the same way you would expect things to appear with Microsoft SQL Server. Instead, it displays users.

Users are created using the CREATE USER statement. To access the user's tables, you reference the user's name as you would a database followed by the table name. With tables belonging to users, Teradata manages the table contents across the AMPs without being restricted by a database structure. The user's definition is used to identify table ownership and the maximum space allotted to the application that accesses them. Administrators will appreciate Teradata's simplicity here once it becomes familiar.

I used Queryman to enter SQL query commands, and one window proved to be extremely valuable. It was a list of all queries previously run with full statistics available at the click of the mouse. This was very useful during my evaluation because I frequently selected queries from this view for reuse.

Another impressive feature was the capability of running concurrent queries in the same window. I used this several times to compare results between a single query request and multiple query requests. As an administrator, I find it extremely helpful to run my queries in parallel; this allows me to maximize the resources of the database and get my work done more quickly.

I had no problems entering any of the standard ANSI SQL commands. However, there is no stored procedure language; NCR says it expects to include one in the next release. Stored procedures are typically created to get the database server to perform much of the processing that otherwise occurs in the client application. Teradata doesn't require this capability because the PEs and AMPs are responsible for the query processing. The only thing the client is required for is to issue the command. Overall, the lack of a stored procedure language does not strike me as a significant oversight. Teradata does come with a macro language that resembles a procedure language, but it is limited in functionality.

Using derived tables in complex queries was very useful. This was accomplished by entering a SELECT statement in the FROM clause instead of having to use temporary tables. This should help developers reduce the time they spend building queries. Teradata also includes complex RANK and QUANTILE functions that greatly enhance developers' abilities to write queries that perform complex calculations.

Finally, I generated some reports using Cognos Impromptu and Seagate Info. Both products accessed Teradata immediately using ODBC.

NCR is working closely with Cognos and several other companies to help them build functions into their products that take advantage of Teradata's high performance. The goal is to enhance query performance by using Teradata to process the results instead of the workstation. I ran some reports with Cognos using functions that were optimized and they returned in minutes. Reports without the optimized functions required a coffee and donut run. I should add that the functions being tested were very complex queries.

If you're going to deploy Teradata, you'll want to identify the products you intend to use with it, because NCR is working with a long list of third-party vendors to help them take advantage of Teradata's power. If you don't use one of the optimized solutions, you won't get the benefits of this collaboration.

I didn't have a chance to execute client applications, but Teradata supports CLI (call level interface), API, and ODBC interfaces. The company also provides the CLI interface to mainframe platforms, enabling mainframe applications to directly update the Teradata database.

If you're still reluctant to consider Windows NT for your enterprise applications, Teradata for NT may be the product that makes you reconsider your position. Teradata's features and scalability finally make NT a suitable operating system for world-class enterprise data warehousing solutions. The rich tools and utilities provide everything you need to integrate and manage a data warehouse within a business's existing environment.

I give NCR Teradata NT an excellent overall score with only the slightest hesitation on its lack of a stored procedure language. As Windows supports more processors, I hope NCR will continue to enhance the CPU support on the SMP nodes, further improving scalability. Stored procedures and implementation on Windows 2000 are planned in the next release in June 2000. This product will only get better.

Allan Holbrook is a senior systems engineer for Holbrook Consulting Inc. He can be reached at allan@servillian.com.

MPP VS. SMP: Two architectures

To fully understand how NCR implements Teradata NT, one should understand the differences between MPP (massively parallel processing) and SMP (symmetric multiprocessing) technologies.

SMP shares disk storage with all of the CPUs in the system. This is referred to as tightly coupled architecture. Windows NT Enterprise systems implement SMP technology and support up to four CPUs in a single SMP node for Teradata NT.

SMP systems are very scalable up to a certain number of processors. Once that threshold is reached, the overhead to manage them becomes greater than the benefits of adding another CPU.

The number of processors used depends on the speed of the processors. For example, when SMP technology used Intel 486 CPUs, the maximum number of processors was eight; now with the Pentium III Xeon 500-MHz processor, the maximum is about 32.

MPP architecture configures two or more SMP systems. The SMP systems are loosely coupled in that they have their own separate disk storage.

MPP systems are unlimited in their scalability. As SMP nodes are added, the overhead remains the same. Scalability is linear. In fact, some Teradata customers have MPP systems comprising more than 150 CPUs.

NCR Teradata 3.0.1 NT combines MPP and SMP to take advantage of both architectures. Windows NT SMP nodes running Teradata have been certified for up to four CPUs, and four SMP nodes can be connected using MPP.

This results in the scalability of NT to 16 processors at this time. The next release of Teradata NT, scheduled for June, will support up to 16 SMP nodes for scalability to 64 CPUs.

THE BOTTOM LINE: EXCELLENT

Teradata 3.0.1 for Windows NT

Business Case: Teradata for Windows NT is a relational database system that provides decision-support capabilities along with high-end performance and scalability.

Technology Case: Having ported this product from the Unix MP-RAS environment, NCR has provided a level of parallelism that no other RDBMS system can claim.

Furthermore, Teradata's database architecture simplifies administration.

Pros:

+ Excellent system scalability

+ Simplified database administration and development+ Helpful client utilitiesCons:

- Lack of a stored procedure language for the RDBMS database- Vendor assistance required for system setup- Some character-based utilitiesCost: $119,000 for software and services; $225,000 including an NCR WorldMark 4400Platform(s): Windows NT 4.0 Server or Enterprise Edition with Service Pack 5NCR Corp., Dayton, Ohio; (800) 225-5627 www.ncr.com.

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about Adobe SystemsAMPCognosDDiIntelLogicalMicrosoftNCR AustraliaTeradata AustraliaTNT Australia

Show Comments
[]