Suppose you’re writing a query to find all
the invoices that were written on January 6, 2003. You know from the control
totals that 122 invoices were written that day. But when you run this query:
SELECT * FROM Invoices
WHERE InvoiceDate = '2003-01-06'
the result set is empty. What’s going on?
How dates and times are stored
in SQL Server
Before you can effectively query date/time
(or temporal) data, you have to know something about how date/time
values are stored. SQL Server supports two date/time data types: datetime and
smalldatetime. The difference between the two is the amount of storage used.
Datetime uses 8 bytes of storage, while smalldatetime uses only 4 bytes. For
this reason, datetime can represent date/time values within a wider range and
with more precision than smalldatetime. These differences are summarized in
the table below.
Type
|
Minimum
|
Maximum
|
Precision
|
datetime
|
Jan 1, 1753
midnight
|
Dec 31, 9999
23:59:59.997
(0.003 seconds until midnight)
|
To the nearest
3.33 milliseconds
|
smalldatetime
|
Jan 1, 1900
midnight
|
Jun 6, 2079
23:59
(1 minute until midnight)
|
To the nearest
minute
|
Both datetime and smalldatetime represent the
date and time as a value that’s equal to the number of days in relationship
to a base date. In SQL Server, that base date is midnight on January 1, 1900.
As you can see in the table, the smalldatetime type can only represent dates
from this base date on. In contrast, the datetime type can also represent
dates that are before January 1, 1900. To do that, it stores those values as
negative numbers.
To visualize how date/time values are stored,
you can think of them as consisting of two parts. The integer portion
represents the number of whole days since January 1, 1900. The fractional
portion represents the fraction of a day that’s passed since midnight. For
example, the date/time value representing noon on January 4, 1900 is stored as
3.5. In this case, 3 represents three full days since the base date and 0.5
represents one half of a day between midnight and noon. To see this, submit
the following query:
SELECT CAST(CAST('1900-01-04 12:00' AS datetime) AS float)
Note: The CAST function explicitly
changes the data type of a value as specified. So in this statement, the inner
CAST changes the string literal '1900-01-04 12:00' to a value of data type datetime.
Then, the outer CAST changes that datetime value to a value of data type float.
The final result is a floating-point representation of the datetime value that
represents noon on January 4, 1900.
So far, so good. But the problems that crop
up in querying date/time data are caused by confusion over two fundamental
facts that aren’t so obvious. First, date/time data types are approximate
numerics, not exact numerics. Second, date/time data types can’t store a
date without a time or a time without a date.
Date/Time
values are approximate numerics
Datetime and smalldatetime are like the
floating-point data types, float and real, in that they’re approximate
numerics. That means the value retrieved from SQL Server may be different from
the value that was originally stored. For example, if you store the expression
10/3.0 in a column of data type float, you’ll retrieve a value
3.3333330000000001. Although this is a reasonable representation of ten
thirds, it’s not exact since it’s rounded past the 6th digit.
In fact, if you add three such values together, you get 9.9999990000000007,
not 10. Of course, most programmers understand this as a rounding error. And
it’s a persistent problem for all digital computers, not just those running
SQL Server. Still, you need to be aware of it as you code search conditions.
In contrast, when working with exact numeric
data, the value retrieved from SQL Server is exactly the value that was
originally stored. For example, if you store 10/3.0 in a column of data type
int, it’s stored as 3 and retrieved as 3. In this case, SQL Server
implicitly casts the result of the expression as a real value, 3.333333. Then,
SQL Server implicitly casts 3.333333 as an integer because it’s being stored
in a column of type int. Although this is still a rounding error, it occurs
before the value is stored, not as a result of the physical limitations of
computer storage. In other words, the error was introduced by using the wrong
data type, not by the inherent limitation of the data type itself. Since the
system always returns the same value as was stored, the data type is exact.
Now, to see how this affects date/time
values, consider the date and time value for 8:00AM on January 4, 1900. As you
saw above, noon on this day is stored as 3.5, or halfway through the fourth
day. In contrast, 8:00AM is one third of the way through the day, so its
representation will be approximate. To see this for yourself, submit the
following query:
SELECT CAST(CAST('1900-01-04 08:00' AS
datetime) AS float)
You’ll get the following result:
3.3333333333333335
But if you submit this query:
SELECT CAST(3.3333333 AS datetime),
CAST(3.3333334 AS datetime)
you’ll get the following results:
1900-01-04 07:59:59.997
1900-01-04 08:00:00.003
As you can see, these three values are all
quite close. In fact, they’re close enough to be considered 8:00AM for most
applications. However, in a search condition based on a single value, such as:
WHERE (DTValue = '1900-01-04 08:00')
you’d only match those rows where the
stored value exactly matches 3.3333333333333335. You’ll see how to get
around this later in this article.
Dates
without times and times without dates
SQL Server doesn’t provide data types for
storing just the date or just the time. So if you store a date/time value
without an explicit time, the fractional portion of the value is set to zero.
This represents midnight as 00:00:00. Similarly, if you store a date/time
value without an explicit date, the integer portion of the value is set to
zero. This represents the date January 1, 1900. To see this, submit the
following query:
SELECT CAST('1900-01-04' AS datetime),
CAST('10:00' AS datetime)
which returns the following result:
1900-01-04 00:00:00.000
1900-01-01 10:00:00.000
Whether you can ignore the date or the time
component when you query a date/time column depends on how the column has been
designed and used.
The
effect of database design on querying
Database designers don’t always use
date/time columns appropriately. At the time the database is designed, each
date/time column should be identified as to whether it will store both dates
and times, dates only, or times only. The designer, by using defaults,
constraints, and triggers, can enforce these rules to prevent the accidental
storage of data that are either unnecessary or not applicable.
For example, a column in an accounts payable
system for the date an invoice is received is unlikely to need the time. In
that case, the designer should plan to use the column solely for dates and
never store the time component. A trigger could be assigned to prevent the
non-integer portion of the date value from being stored when updating or
inserting.
Generally, however, the programmer is forced
to work with an existing database. In this case, you should examine the way in
which the date/time values are being used before you assume the designer did
his or her job correctly.
The simplest way to do that is to submit a
query using a search condition similar to the following, where DT is the
date/time column in question:
WHERE CAST(FLOOR(CAST(DT AS float))AS datetime) = 0 OR
DT - CAST(FLOOR(CAST(DT AS float))AS datetime)
= 0
Note: The FLOOR function returns
the largest integer that is less than or equal to the specified value. In this
expression, FLOOR is applied to the floating-point representation of the DT column.
This simply strips off the fractional portion of the number.
The first expression returns the date
(integer) portion of the value, while the second returns the time portion. If
this query returns no rows, it’s likely that the column has been used
consistently to store both dates and times, since the date is never 0 and the
time is never 0.
Keep in mind, of course, that if the above
query returns rows, it doesn’t necessarily imply that the column has been
used inconsistently. If the time happens to be exactly midnight or the date
happens to be January 1, 1900, then it’ll show up in the result set. In that
case, you can test for columns with time-only or date-only data by using
either of these two queries:
WHERE TOnly <> Tonly - (CAST(FLOOR(CAST(TOnly
AS float))AS datetime))
WHERE DOnly <> CAST(FLOOR(CAST(DOnly AS
float))AS datetime)
Here, TOnly and DOnly are date/time columns
that you expect contain only times or dates, respectively. If the query
returns rows, then those rows don’t contain the type of data you expected.
Determining what kind of data are stored in
the date/time columns of each table is important for intelligent querying. If
the columns are used consistently, then your job is easier. However, even if
the columns are used inconsistently, you’ll at least know which query
pitfalls to watch out for as you code your queries.
Performance
considerations in querying
A search based on an indexed column completes
faster than a search based on a non-indexed column. So date/time columns that
are searched frequently should be indexed. Be aware, though, that if you then
use a function in the search condition, the index can’t be used in the same
way, which slows performance. For searches that are executed thousands of
times a day on a production database, this can cause significant performance
problems. For this reason, you should avoid using functions in such search
conditions whenever possible. As you’ll see in the examples that follow,
this sometimes results in solutions that are less flexible than those that use
functions.
In addition, keep in mind that some
applications require that you search for portions of a date/time column. The
portion could be date only, time only, or even a smaller portion, such as a
year or hour. In that case, it may improve performance to split a single
date/time column into two or more separate columns, and then index those that
are searched most often.
How
to search by date
Frequently. you’ll need to search a
date/time column for a specific date, regardless of time. If the data in the
column have been used consistently with the time component set to zero,
that’s no problem. You just search for the date you’re looking for.
But consider the following table, called
DateSample:
ID DateVal
-- -----------------------
1 2001-02-28 10:00:00.000
2 2002-02-28 13:58:32.823
3 2002-02-29 00:00:00.000
4 2002-02-28 00:00:00.000
As you can see, the DateVal column is used
inconsistently. The third and fourth values indicate that the column might
have been intended to store dates only, but the first two values indicate that
this wasn’t enforced.
As a result, if you use the following query
to retrieve rows with the date February 28, 2002:
SELECT * FROM DateSample
WHERE DateVal = '2002-02-28'
the result set includes only row 4 instead of
both rows 2 and 4. That’s because the date literal is implicitly cast as a
datetime value which, in this case, has a zero time component. Since this
doesn’t exactly match the value in row 2, that row isn’t returned.
How can you get around the time component? If
the query is run often, you should base the search on a range of values, as
in:
SELECT * FROM DateSample
WHERE DateVal BETWEEN '2002-02-28' AND '2002-02-28 23:59:59.997'
Remember that the BETWEEN clause retrieves
values that are equal to the upper and lower limits, so you can’t code the
upper limit as just '2002-02-29'. If you do, then you’ll incorrectly
retrieve row 3. Another way to get the same result is to use comparison
operators:
SELECT * FROM DateSample
WHERE DateVal >= '2002-02-28' AND DateVal < '2002-02-29'
If the query is run infrequently (to produce
a report only once a month, for instance), you can code an expression in the
WHERE clause that strips the date/time value of its fractional component. For
example, this query:
SELECT * FROM DateSample
WHERE CAST(FLOOR(CAST(DateVal AS float)) AS datetime) = '2002-02-28'
returns both rows 2 and 4. In addition, there
are many other expressions that you can use to accomplish this same result (my
SQL book, Murach’s SQL for SQL Server, covers a couple of others).
By the way, if you wished to retrieve rows
with the day February 28, regardless of year, you could code the following
query:
SELECT * FROM DateSample
WHERE MONTH(DateVal) = 2 AND DAY(DateVal) = 28
which retrieves rows 1, 2, and 4. Since there
isn’t a way to accomplish this without using one or more functions, however,
this query shouldn’t be run frequently against a production database. If you
need to perform this kind of search on a query that runs often, you should
change the design of the database, if possible. Then, you can create a
separate, indexed column to store the portion of the date/time value that you
need to search.
How
to search by time
Searching a column for a specific time,
regardless of date, is similar to searching for date-only values. If the
column consistently stores just the time portion, then searching the data is
simplified. However, unlike date values, the time value is represented by an
approximate numeric. So even when the date portion can be ignored, you must
still consider rounding errors.
To illustrate time-only searches, consider
following table, called TimeSample:
ID TimeVal
-- -----------------------
1 2002-02-28 10:00:00.000
2 1900-01-01 13:58:32.823
3 1900-01-01 09:59:59.997
4 1900-01-01 10:00:00.000
Here, the TimeVal column is used
inconsistently, sometimes storing time only and sometimes storing both date
and time. So if you use the following query to retrieve rows with the time
10:00AM:
SELECT * FROM TimeSample
WHERE TimeVal = '10:00:00'
you only get row 4. Row 1 isn’t retrieved
because the date literal is implicitly cast as a datetime value with a zero
date component, which doesn’t match the date component of row 1. In
addition, row 3 isn’t retrieved because this value is close to, but not
equal to, 10:00AM.
To ignore the date component of a column, you
can code an expression that strips the date/time value of its integer
component, such as:
SELECT * FROM TimeSample
WHERE TimeVal - CAST(FLOOR(CAST(TimeVal AS float)) AS datetime) = '10:00'
which returns rows 1 and 4. Unfortunately,
there’s no way to accomplish this without using one or more functions. For
this reason, it’s critical to store time-only data correctly in the first
place. If you need to do this kind of search often, you should, if possible,
change the database design.
To search for time values that are
approximately equal is simply a matter of coding a range of values. If the
time-only data are stored consistently without the date component, you can use
a query like this:
SELECT * FROM TimeSample
WHERE TimeVal BETWEEN '09:59' AND '10:01'
or
SELECT * FROM TimeSample
WHERE TimeVal > '09:59' AND TimeVal < '10:01'
Both of these queries return rows 3 and 4. Of
course, you have to decide what literal values to use for the range of
approximation that you prefer.
If the time-only data are stored
inconsistently, then you need to accommodate both non-zero date components and
a range of time values. For instance, you can use a query like this:
SELECT * FROM TimeSample
WHERE TimeVal - CAST(FLOOR(CAST(TimeVal AS float)) AS datetime) > '09:59'
AND TimeVal - CAST(FLOOR(CAST(TimeVal AS float)) AS datetime) <
'10:01'
which returns rows 1, 3, and 4. Again,
though, there’s no way to accomplish this without the use of a function. So
you may need to change the database design, if you can.
One other way to approximate time values is
to use the smalldatetime data type rather than the datetime data type in the
original table. Since smalldatetime always rounds the time portion to the
nearest minute, times in the range from 09:59:29.999 to 10:00:29.998 are
stored as 10:00. If approximation to the nearest whole minute is sufficient
for your application, then using smalldatetime will prevent the need to search
for a range of values.
Bryan Syverson, author of Murach’s
SQL for SQL Server, has worked with SQL as an application programmer,
a SQL programmer, and a database developer in a variety of environments, including
the health care industry, management consultancies, and not-for-profit organizations.
In the process, he’s realized that most programmers could do their jobs more
effectively if they knew more about SQL. So his goal in Murach’s SQL is
to provide that knowledge in an easy-to-access style for both beginning and experienced
SQL users.
|