|
A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data.
DefinitionThere appear to be two definitions of a surrogate in the literature. We shall call these surrogate (1) and surrogate (2):
We shall adopt the surrogate (1) definition throughout this article largely because it is more data model rather than storage model oriented. See Date (1998). An important distinction exists between a surrogate and a primary key, depending on whether the database is a current database or a temporal database. A current database stores only currently valid data, therefore there is a one-to-one correspondence between a surrogate in the modelled world and the primary key of some object in the database; in this case the surrogate may be used as a primary key, resulting in the term surrogate key. However, in a temporal database there is a many-to-one relationship between primary keys and the surrogate. Since there may be several objects in the database corresponding to a single surrogate, we cannot use the surrogate as a primary key; another attribute is required, in addition to the surrogate, to uniquely identify each object. Although Hall et alia (1976) say nothing about this, other authors have argued that a surrogate should have the following constraints:
Surrogates in practiceIn a current database, the surrogate key can be the primary key, generated by the database management system and not derived from any application data in the database. The only significance of the surrogate key is to act as the primary key. It is also possible that the surrogate key exists in addition to the database-generated uuid, e.g. a HR number for each employee besides the UUID of each employee. A surrogate key is frequently a sequential number (e.g. a Sybase or SQL Server "identity column", a PostgreSQL In a temporal database, it is necessary to distinguish between the surrogate key and the primary key. Typically, every row would have both a primary key and a surrogate key. The primary key identifies the unique row in the database, the surrogate key identifies the unique entity in the modelled world; these two keys are not the same. For example, table Staff may contain two rows for "John Smith", one row when he was employed between 1990 and 1999, another row when he was employed between 2001 and 2006. The surrogate key is identical (non-unique) in both rows however the primary key will be unique. Some database designers use surrogate keys religiously regardless of the suitability of other candidate keys, while others will use a key already present in the data, if there is one. A surrogate may also be called a
Some of these terms describe the way of generating new surrogate values rather than the nature of the surrogate concept. Here are some possible candidates for generating surrogates:
Advantages of Surrogate KeysImmutabilitySurrogate keys do not change while the row exists. This has two advantages:
Flexibility for changing requirementsBecause of changing requirements, the attributes that uniquely identify an entity might change. In that case, the attribute(s) initially chosen as the natural key will no longer be a suitable natural key.
In these cases, generally a new attribute must be added to the natural key (e.g. an attribute "original_company"). With a surrogate key, only the table that defines the surrogate key must be changed. With natural keys, all tables (and possibly other, related software) that use the natural key will have to change. More generally, in some problem domains it is simply not clear what might be a suitable natural key. Surrogate keys avoid problems from choosing a natural key that later turns out to be incorrect. PerformanceOften surrogate keys are composed of a compact data type, such as a four-byte integer. This allows the database to query faster than it could multiple columns.
CompatibilitySeveral database application development systems, drivers, and object-relational mapping systems, such as Ruby on Rails or Hibernate (Java), depend on the use of integer or GUID surrogate keys in order to support database-system-agnostic operations and object-to-row mapping. Disadvantages of Surrogate KeysDisassociationBecause the surrogate key is completely unrelated to the data of the row to which it is attached, the key is disassociated from that row. Disassociated keys are unnatural to the application's world, resulting in an additional level of indirection from which to audit. Query OptimizationRelational databases assume a unique index is applied to a table's primary key. The unique index serves two purposes: 1) to enforce entity integrity—primary key data must be unique across rows—and 2) to quickly search for rows queried. Since surrogate keys replace a table's identifying attributes—the natural key—and since the identifying attributes are likely to be those queried, then the query optimizer is forced to perform a full table scan when fulfilling likely queries. The remedy to the full table scan is to apply a (non-unique) index on each of the identifying attributes. However, these additional indexes will take up disk space, slow down inserts, and slow down deletes. NormalizationThe presence of a surrogate key can result in the database administrator forgetting to establish, or accidentally removing, a secondary unique index on the natural key of the table. Without a unique index on the natural key, duplicate rows are likely to appear and are difficult to identify. Business Process ModelingBecause surrogate keys are unnatural, flaws can appear when modeling the business requirements. Business requirements, relying on the natural key, then need to be translated to the surrogate key. Inadvertent DisclosureProprietary information may be leaked if sequential key generators are used. By subtracting a previously generated sequential key from a recently generated sequential key, one could learn the number of rows inserted during that time period. This could expose, for example, the number of transactions or new accounts per period. The solution to the inadvertent disclosure problem is to generate a random primary key. However, a randomly generated primary key must be queried before assigned to prevent duplication and cause an insert rejection. Inadvertent AssumptionsSequentially generated surrogate keys create the illusion that events with a higher primary key value occurred after events with a lower primary key value. This illusion would appear when an event is missed during the normal data entry process and is, instead, inserted after subsequent events were previously inserted. The solution to the inadvertent assumption problem is to generate a random primary key. However, a randomly generated primary key must be queried before assigned to prevent duplication and cause an insert rejection. See alsoReferences
This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL. |
|||||||||||||||||||||||||||||||||||||||
This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License.
Mercedes Car
This site monitored by SitePinger.net