Jump to bottom

Null (SQL)

From Wikipedia, the free encyclopedia

Jump to: navigation, search
The Greek lowercase omega (ω) character is used to represent Null in database theory.

Null is a special marker used in Structured Query Language (SQL) to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL Null serves to fulfill the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent Null in database theory. NULL is also an SQL reserved keyword used to identify the Null special marker.

Null has been the focus of controversy and a source of debate because of its associated Three-Valued Logic (3VL), special requirements for its use in SQL joins, and the special handling required by aggregate functions and SQL grouping operators. Although special functions and predicates are provided to properly handle Nulls, opponents feel that resolving these issues introduces unnecessary complexity and inconsistency into the relational model of databases.

History

Null was introduced by E. F. Codd as a method of representing missing data in the relational model. Codd later reinforced his requirement that all RDBMS' support Null to indicate missing data in a two-part series published in ComputerWorld magazine.[1][2] Codd also introduced a ternary (three-valued) logic, consisting of the truth values True, False, and Unknown, which is closely tied to the concept of Null. The Unknown truth value is generated whenever Null is compared with any data value, or with another Null.

Codd indicated in his 1990 book The Relational Model for Database Management, Version 2 that the single Null mandated by the SQL standard was inadequate, and should be replaced by two separate Null-type markers to indicate the reason why data is missing. These two Null-type markers are commonly referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively.[3] Codd's recommendation would have required SQL's logic system be expanded to accommodate a four-valued logic system. Because of this additional complexity, the idea of multiple Null-type values has not gained widespread acceptance.

Three-valued logic (3VL)

Since Null is not a member of any data domain, it is not considered a "value", but rather a marker (or placeholder) indicating the absence of value. Because of this, comparisons with Null can never result in either True or False, but always in a third logical result, Unknown.[4] The logical result of the expression below, which compares the value 10 to Null, is Unknown:

10 = NULL       -- Results in Unknown

However, certain operations on Null can return values if the value of Null is not relevant to the outcome of the operation. Consider the following example in which the OR statement is evaluated in short-circuited form:

TRUE OR NULL   -- Results in True

In this case, the fact that the value on the right of OR is unknowable is irrelevant, because the outcome of the OR operation would be True regardless of the value on the right.

SQL implements three logical results, so SQL implementations must provide for a specialized three-valued logic (3VL). The rules governing SQL three-valued logic are shown in the tables below (p and q represent logical states)"[5]

p AND q p
True False Unknown
q True True False Unknown
False False False False
Unknown Unknown False Unknown
p OR q p
True False Unknown
q True True True True
False True False Unknown
Unknown True Unknown Unknown
p NOT p
True False
False True
Unknown Unknown
p = q p
True False Unknown
q True True False Unknown
False False True Unknown
Unknown Unknown Unknown Unknown

Basic SQL comparison operators always return Unknown when comparing anything with Null, so the SQL standard provides for two special Null-specific comparison predicates. The IS NULL and IS NOT NULL predicates test whether data is, or is not, Null.[6]

Data typing

Null is untyped in SQL, meaning that it is not designated as an integer, character, or any other specific data type.[4] Because of this, it is sometimes mandatory (or desirable) to explicitly convert Nulls to a specific data type. For example, if overloaded functions are supported by the RDBMS, SQL might not be able to automatically resolve to the correct function without knowing the data types of all parameters, including those for which Null is passed.

Data Manipulation Language

SQL three-valued logic is encountered in Data Manipulation Language (DML) in comparison predicates of DML statements and queries. The WHERE clause causes the DML statement to act on only those rows for which the predicate evaluates to True. Rows for which the predicate evaluates to either False or Unknown are not acted on by INSERT, UPDATE, or DELETE DML statements, and are discarded by SELECT queries. Interpreting Unknown and False as the same logical result is a common error encountered while dealing with Nulls.[5] The following simple example demonstrates this fallacy:

SELECT *
FROM t
WHERE i = NULL;

The example query above logically always returns zero rows because the comparison of the i column with Null always returns Unknown, even for those rows where i is Null. The Unknown result causes the SELECT statement to summarily discard each and every row. (However, in practice, some SQL tools will retrieve rows using a comparison with Null.)

CASE expressions

SQL CASE expressions operate under the same rules as the DML WHERE clause rules for Null. Because it can be evaluated as a series of equality comparison conditions, a simple CASE expression cannot check for the existence of Null directly. A check for Null in a simple CASE expression always results in Unknown, as in the following:

SELECT CASE i WHEN NULL THEN 'Is Null'  -- This will never be returned
              WHEN    0 THEN 'Is Zero'  -- This will be returned when i = 0
              WHEN    1 THEN 'Is One'   -- This will be returned when i = 1
              END
FROM t;

Because the expression i = NULL evaluates to Unknown no matter what value column i contains (even if it contains Null), the string 'Is Null' will never be returned.

A searched CASE expression also returns the first value for which the result of the comparison predicate evaluates to True, including comparisons using the IS NULL and IS NOT NULL comparison predicates. The following example shows how to use a searched CASE expression to properly check for Null:

SELECT CASE WHEN i IS NULL THEN 'Null Result'  -- This will be returned when i is NULL
            WHEN     i = 0 THEN 'Zero'         -- This will be returned when i = 0
            WHEN     i = 1 THEN 'One'          -- This will be returned when i = 1
            END
FROM t;

In the searched CASE expression, the string 'Null Result' is returned for all rows in which i is Null.

Check constraints

The primary place in which SQL three-valued logic intersects with SQL Data Definition Language (DDL) is in the form of check constraints. A check constraint placed on a column operates under a slightly different set of rules than those for the DML WHERE clause. While a DML WHERE clause must evaluate to True for a row, a check constraint must not evaluate to False. This means that a check constraint will succeed if the result of the check is either True or Unknown. The following example table with a check constraint will prohibit any integer values from being inserted into column i, but will allow Null to be inserted since the result of the check will always evaluate to Unknown for Nulls.[7]

CREATE TABLE t (
     i INTEGER,
     CONSTRAINT ck_i CHECK ( i < 0 AND i = 0 AND i > 0 ) );

In order to constrain a column to reject Nulls, the NOT NULL constraint can be applied, as shown in the example below. The NOT NULL constraint is semantically equivalent to a check constraint with an IS NOT NULL predicate.

CREATE TABLE t ( i INTEGER NOT NULL );

Procedural extensions

SQL/PSM (SQL Persistent Stored Modules) defines procedural extensions for SQL, such as the IF statement. However, the major SQL vendors have historically included their own proprietary procedural extensions. Procedural extensions for looping and comparisons operate under Null comparison rules similar to those for DML statements and queries. The following code fragment, in ISO SQL standard format, demonstrates the use of Null 3VL in an IF statement.

IF i = NULL THEN
      SELECT 'Result is True'
ELSEIF NOT(i = NULL) THEN
      SELECT 'Result is False'
ELSE
      SELECT 'Result is Unknown';

The IF statement performs actions only for those comparisons that evaluate to True. For statements that evaluate to False or Unknown, the IF statement passes control to the ELSEIF clause, and finally to the ELSE clause. The result of the code above will always be the message 'Result is Unknown' since the comparisons with Null always evaluate to Unknown.