I encounter too many databases in which numeric columns are defined as REAL or FLOAT. These data types are approximate, meaning they can only represent values approximately. They are meant to represent very large or very small numbers, and they should be used in databases where a very wide range of values should be expressed, such as scientific systems. In most cases, when we need to represent numbers, we actually need exact numbers, such as 15.873 or 98.2. Such numbers can represent transaction amounts, product sizes or room temperatures.
Using REAL or FLOAT when actually an exact number is required can lead to problems. The application that expects an exact value might behave unexpectedly when given an approximate value. Recently, I was involved in a data migration project, in which I was requested to transfer financial data from the current system to a new system. All the monetary columns in the current database were defined as REAL. Before I continue with the story, let me give you some background about these data types.
When it comes to numeric data types, SQL Server offers two categories of data types: exact and approximate. The exact data types include the integer family (TINYINT, SMALLINT, INT and BIGINT), the DECIMAL and NUMERIC types, and the MONEY and SMALLMONEY types. The approximate data types include the FLOAT and REAL data types.
Let’s put some order in this mess. The integer family data types are quite obvious. They differ in the storage size and the range of integer values. Nothing much to say about that.
The DECIMAL and NUMERIC types are equivalent. These are synonyms, so let’s talk about DECIMAL from now on. The DECIMAL type includes two parameters: precision (p) and scale (s). The precision determines the maximum total number of decimal digits that can be stored, both to the left and to the right of the decimal point. The scale determines the maximum number of digits, out of the total number of digits, which can be stored to the right of the decimal point. Therefore, 0 <= s <= p. The important thing to note about the DECIMAL data type is that it is exact. It has a fixed number of digits to the left and to the right of the decimal point, and you can represent any exact decimal value, which falls within the range of this type (-10^38 + 1 to 10^38 – 1).
The MONEY and SMALLMONEY are meant to represent monetary or currency values, but the implementation is actually very similar to DECIMAL and NUMERIC, so it’s not really interesting to talk about it either.
What I would really like to talk about is the FLOAT and REAL data types. These are approximate data types, and they are implemented based on the IEEE 754 Standard for Floating-Point Arithmetic. Floating point is a mechanism for representing very large or very small numbers. In this mechanism, numbers are represented approximately to a fixed number of significant digits and scaled using an exponent. The internal representation of a FLOAT or REAL number consists of 3 parts: The sign bit (S), the exponent (E) and the fraction (F). The IEEE 754 standard defines several types of floating point representations, which differ in the size of the exponent and the fraction parts. In SQL Server, only two types are supported: single precision and double precision. The following table presents the sizes of these two types:
Type | Sign | Exponent | Fraction |
Single Precision | 1 | 8 | 23 |
Double Precision | 1 | 11 | 52 |
FLOAT allows us to represent both types by supplying a parameter that indicates the number of bits that are used to store the fraction (plus one additional hidden bit). But SQL Server actually treats any value between 1 and 24 as 24, which determines a single precision number, and similarly any value between 25 and 53 is treated as 53, which determines a double precision number. So it doesn’t really matter if you specify FLOAT(25) or FLOAT(37). In both cases you will end up with a double precision number that uses 52 bits for the fraction and 8 bytes overall. The REAL data type is actually a synonym to FLOAT(24), and as you should already know by now, it represents a single precision number, in which there are 23 bits for the fraction and 4 bytes overall. So just like we focused on DECIMAL, and ignored NUMERIC, let’s focus on REAL from now on, and ignore FLOAT.
Is it getting too complicated? Let’s look at an example:
DECLARE
@Real AS REAL = 1.234567;
SELECT
BinaryRepresentation = CAST (@Real AS BINARY(4));
GO
The binary representation of the REAL number 1.234567 is 0x3F9E064B. Isn’t it great? Let’s analyze this value. The first bit is the sign bit (S), and it determines whether the value is positive (0) or negative (1). The next 8 bits represent the exponent (E), and the rest 23 bits represent the fraction (F). Without getting into all the bitwise stuff, here is how we can calculate S, E and F:
DECLARE
@Binary AS BINARY(4) = 0x3F9E064B ,
@S AS INT ,
@E AS INT ,
@F AS INT;
SET @S = CAST (@Binary AS INT) & 0x80000000;
SET @E = (CAST (@Binary AS INT) & 0x7F800000) / POWER (2,23);
SET @F = (CAST (@Binary AS INT) & 0x007FFFFF) * 2;
SELECT
S = @S ,
E = @E ,
F = @F;
GO
In our case: S = 0, E = 127 and F = 3935382. Now, according to IEEE 754, the formula to calculate the value is: (-1)^S x 2^(E-127) x 1.F. Here it is in T-SQL:
DECLARE
@S AS INT = 0 ,
@E AS INT = 127 ,
@F AS INT = 3935382;
SELECT
RealValue = CAST
(
CAST (POWER (-1.0 , @S) AS REAL) *
CAST (POWER (2.0 , @E-127) AS REAL) *
CAST ((@F | 0x01000000) AS REAL) / POWER (2 , 24)
AS REAL
);
GO
Yes, we made it! The value is 1.234567. So far everything is OK. But remember that REAL is an approximate data type? What does this “approximate” mean? Well, here is an example:
DECLARE
@Real1 AS REAL = 1.23456789 ,
@Real2 AS REAL = 1.2345678901;
SELECT
Real1 = @Real1 ,
Real2 = @Real2;
SELECT
BinaryReal1 = CAST (@Real1 AS BINARY(4)) ,
BinaryReal2 = CAST (@Real2 AS BINARY(4));
GO
Both numbers are displayed as 1.234568, and the binary representation of both of them is 0x3F9E0652. These are two different numbers represented with the same approximate value. You can’t represent the exact original numbers. For that, you’ll have to use DECIMAL.
Now, take a look at this one:
DECLARE
@Real1 AS REAL = 1.2345678 ,
@Real2 AS REAL = 1.23456789;
SELECT
Real1 = @Real1 ,
Real2 = @Real2;
SELECT
BinaryReal1 = CAST (@Real1 AS BINARY(4)) ,
BinaryReal2 = CAST (@Real2 AS BINARY(4));
GO
We only changed the original numbers a little bit. Now both numbers are still displayed as 1.234568, but the binary representations are different. The first is 0x3F9E0651, and the second is 0x3F9E0652. This is problematic, because when you look at the numbers (1.234568), they are identical, but when you compare them, they are actually not, since the comparison is made based on the binary representation:
DECLARE
@Real1 AS REAL = 1.2345678 ,
@Real2 AS REAL = 1.23456789;
SELECT
Real1 = @Real1 ,
Real2 = @Real2;
SELECT
BinaryReal1 = CAST (@Real1 AS BINARY(4)) ,
BinaryReal2 = CAST (@Real2 AS BINARY(4));
IF
@Real1 = @Real2
BEGIN
PRINT N'Numbers are identical';
END
ELSE
BEGIN
PRINT N'Numbers are not the same';
END;
GO
Now let’s go back to my story. I was requested to transfer data from the current database, which consisted of REAL columns, to a new database. The main table was “Billing.Transactions”, and it had two columns: “OriginalAmount” and “BillingAmount”. Both of these columns were defined as REAL. One of the migration processes required extracting only the transactions in which the original amount is the same as the billing amount. I used the following simple query:
SELECT
...
FROM
Billing.Transactions
WHERE
OriginalAmount = BillingAmount;
GO
This query returned about 12,000 rows. The customer told me that it doesn’t make sense. He expected this query to return about 19,000 rows. It took me a while to find the problem, but you should be able to guess it by now. There were about 7,000 rows in which the original amount and the billing amount looked the same, but had different binary representations.
How does it happen? Where do these differences come from? Well, these numbers have probably gone through all kinds of calculations, both in the application and in the database, before achieving their final values. For example, such calculations can be aggregations or rounding operations. Since these are approximate data types, the results of the calculations may vary.
The customer couldn’t care less about the binary representation. He expected to treat those 7,000 transactions just the same as the other 12,000. After verifying with the customer that he is only interested in 2 digits to the right of the decimal point, I rewrote the query like this:
SELECT
...
FROM
Billing.Transactions
WHERE
CAST (ROUND (OriginalAmount , 2) AS DECIMAL(19,2))
=
CAST (ROUND (BillingAmount , 2) AS DECIMAL(19,2));
GO
This query returned the expected 19,000 rows. You can argue that this query is inefficient, because the optimizer can’t use any index. It’s true. This is the price you have to pay when you define columns as REAL instead of DECIMAL to begin with.
The much more serious problem in this case is the lost 7,000 rows. If the customer hadn’t noticed the missing 7,000 rows, and we continued with the original query, it would have caused serious bugs later in the new system. And it would have probably taken much more time and effort to track the source of the problem.
So here is the bottom line: Never use REAL or FLOAT, unless you really need them. In 99% of the cases, what you really need is DECIMAL, because you actually need to represent exact values.
Comments