Relational Databases

Comments

Everyone knows what a simple database is: Telephone directories, mail-order catalogs and dictionaries are all databases of sorts. Databases can be structured or organized in several different ways: as flat files, as hierarchical or networked structures or as related tables. Of these types, relational databases manage most organizations' data.

A database can be described as a table of data consisting of columns and rows, similar to a spreadsheet. Each row contains a single record; each column contains all the instances for each row of one particular piece of data. For example, a typical telephone directory consists of columns holding telephone numbers, subscriber names and subscriber addresses. Each row consists of a number, name and address. This simple form is called a flat file because of its two-dimensional nature and the fact that all data is stored in a single file.

Ideally, every database has at least one column with a unique identifier, or key. Consider the phone book: There may be many entries for J. Smith, but none of the phone numbers will be duplicated. The phone number serves as the key.

In reality, things aren't quite so simple. Two or more people sharing a phone might want a listing for each name; this causes the phone number to appear in two or more places, resulting in multiple rows with keys that aren't unique.

Data Causes Problems

In the simplest databases, every record is complete in a single row, meaning the phone company would need a separate column for every piece of account information. That would require a separate column for a secondary subscriber, another for a tertiary subscriber and so on for as many additional subscribers as might be needed.

This means every record in the database would have to have all the extra columns, even if most would never be used. It also means the database would have to be redesigned and rebuilt every time you rolled out a new service. Add touch-tone service, and you've got to rebuild to add a new column. Add caller ID, call waiting and so on, and you'd have to rebuild again and again.

In the 1960s, only the biggest companies could afford computers to manage their data. Moreover, databases built on static data models and procedural programming languages such as Cobol could be costly to maintain and weren't always reliable. Procedural languages specify sequences of events that the computer should step through to accomplish tasks. Programming such sequences was difficult, especially if you needed to change the structure of the database or cook up a new kind of report.

Powerful Relations

E.F. Codd, a researcher at IBM's San Jose research laboratory, essentially created and defined the relational database concept in his seminal work "A Relational Model of Data for Large Shared Data Banks" (Communications of the ACM, June 1970).

Codd created a model that allowed designers to break down their databases into separate but related tables that made sense for performance while retaining the outward appearance of the original database to end users. Codd has since been considered the father of the relational database.

It works in the following manner: The phone company could build its primary table around a primary key such as telephone number and store it with other basic customer information. The company can define a separate table with columns for that primary key and for value-added services like caller ID and call waiting. It could define yet another table to track billing, with the primary-key phone numbers and billing data in each row.

End users can easily extract the information they want in the form they want it, though it's stored in separate tables. So a phone company customer service representative can retrieve a subscriber's billing information as well as the status of special services or the latest payment received, all on the same screen.

Codd originally specified 12 rules for relational databases, most of which address aspects of data integrity, updates and access, though the first two are quite accessible even to a nontechnical audience.

Rule 1, the information rule, mandates that all information in a relational database be represented as a set of values stored in tables; Rule 2, the guaranteed access rule, mandates that every piece of data in a relational database be accessible by table name, primary key and column name. In other words, all data is stored in tables, and if you know the table, the primary key and the column name of the piece of data you want, you should always be able to retrieve it.

One of the implications of Codd's work is that relational databases use declarative rather than procedural programming languages. Declarative languages such as SQL permit users to, in effect, tell the computer, "I want the following bits of data from all records that fit this set of criteria." The computer then figures out the steps it needs to go through to pull that data from the database.

The vast majority of heavy-duty databases rely on relational database management system software available from vendors such as Oracle Corp., Sybase Inc., IBM and Microsoft Corp.

While most vendors' SQL flavors are interoperable only to a point, SQL makes possible the database-backed Web sites that we increasingly rely on. Easy-to-program interfaces between Web sites and relational databases make it possible for end users to add new records, update existing records and produce reports for diverse services such as conducting online trading and accessing online library catalogs.

A relational database uses multiple tables related to one another by a designated key (here, the PhoneNumber field).