About myself
My technical
skills
My favorites
My picture album
Shortcut keys
My code library
VB resources
SQLServer
resources
SQLServer books
Replication FAQ
Scripting
resources
ASP resources
Search my site
Sign my
guestbook
Contact
information
SQL Server Articles New
Featured Book:
|
|
NEW!!! Subscribe to my
newsletter:
Want to keep in touch with the
latest in SQL Server world? Email vyaskn@hotmail.com with
'subscribe' in the subject line |
How to perform case sensitive searches in SQL Server?
Note: Information & code samples from this article are tested on SQL Server 2005 RTM (Yukon) and found to be working. Will update the article in case of any compatibility issues. |
This is another one of those questions, that gets asked at least once a day in the Microsoft SQL Server programming newsgroups. Over the years, different programmers have come up with different solutions, and I'll be presenting some of those in this article.
A default SQL Server installation is case insensitive, which means that SQL Server will not differentiate between upper and lower case characters/letters. That is, "Oracle" is the same as "ORACLE" or "oracle". Let's see what controls this case sensitiveness of SQL Server. In SQL Server 6.5 and 7.0, you can select a 'sort order', while installing SQL Server. Sort order defines how the characters supported by the code page are compared and evaluated. The default sort order is 'case insensitive'. If you want SQL Server to perform case sensitive searches, you need to select binary sort order while installing SQL Server (6.5 or 7.0). In these versions (SQL Server 6.5 and 7.0), sort order is defined at the server level, and cannot be manipulated at the database level. If you want to change the sort order of an existing SQL Server 6.5 or 7.0 instance, you must rebuild the master database using rebuildm utility. Not a pleasant thing to do.
Terminology has changed, and now a sort order is called 'collation' in SQL Server 2000. Also ANSI SQL:99 uses the word 'collation'. By default SQL Server 2000 gets installed with case insensitive collation. You must change the collation of the server while installing, if you want case sensitiveness. But the good news is, SQL Server 2000 lets you specify collation at the database as well as column level. That means, you can have a default SQL Server installation with case insensitive collation, but a specific database or a set of columns can have case sensitive collation.
Lets see why one would want to perform case sensitive searches. Classic example is, password comparisons. People use a combination of upper and lower case (mixed case) characters in their passwords, just to make the passwords difficult to guess. But there is no point in doing that, if the database disregards the case. To keep the users happy, and their passwords secure, programmers prefer case sensitive comparisons in this case.
Here is another example: Some people tend to get the MOST out of their database. Yes, from a performance point of view, a binary sort order or case sensitive collation will perform faster, but only at the expense of the usability. Think about a user searching your table for the phrase "fish and chips", and not being able to find a match, just because, that phrase is actually stored in the database as "Fish and Chips". If this happens to be a database serving an online catalog, you will lose some sales as well. One way of avoiding this problem would be, to store everything in lower case, and covert user input to lower case before comparing with the table data. Too much hassle, isn't it? I wouldn't recommend this approach. Not worth the performance you gain.
Here are some methods you can employ for case sensitive searching in SQL Server:
All the examples in this article, are based on the following table and data. Requirement is to implement case sensitive search for CustID and CustPassword columns:
CREATE TABLE dbo.Customers
(
CustID char(8) NOT NULL PRIMARY KEY CLUSTERED,
CustPassword varchar(15) NOT NULL,
CustName varchar(25) NOT NULL,
CustEmail varchar(60) NOT NULL,
CustAddress varchar(100) NOT NULL
)
GO
INSERT INTO dbo.Customers VALUES('USA00001', 'TheUnbreakable', 'John', 'John.Smith@SomeDom.com', '10932,Bigge Rd., Menlo Park, CA')
INSERT INTO dbo.Customers VALUES('USA00002', 'ABCxyz012789', 'Heather', 'HGraham@SomeDom.com', '18 Broadway Av. San Francisco, CA')
INSERT INTO dbo.Customers VALUES('EUR00001', 'MagicNSparkle', 'Vyas', 'VyasKN@SomeDom.com', '18 Grove Mead, Herts, AL09 7JK, UK')
INSERT INTO dbo.Customers VALUES('ASA00001', 'aAbBcCXyZ', 'Franklin', 'FMayer@SomeDom.com', '22 Main Street, Singapore')
INSERT INTO dbo.Customers VALUES('OZ000001', 'DOWNundah', 'Steve', 'SWaugh@SomeDom.com', '321, Cricket Way, Melbourne, Australia')
--NOTE: I omitted the VALUES list in the above INSERT statements, for simplicity. --This is not a best practice, and you should always include a VALUES list in your INSERT statements.
Consider the following example, that searches the Customers table for a given Customer ID and password. On a server/database with case insensitive collation, this query will disregard case, and will find a row, as long as the values match, no matter what case is used in the input. Since the @CustID 'usa00001' matches a CustID of 'USA00001' and @CustPassword 'theunbreakable' matches the CustPassword of 'TheUnbreakable', the following query prints 'Customer Found!':
DECLARE @CustID char(8), @CustPassword varchar(15)
SET @CustID = 'usa00001'
SET @CustPassword = 'theunbreakable'
IF EXISTS
(
SELECT 1
FROM dbo.Customers
WHERE CustID = @CustID
AND CustPassword = @CustPassword
)
BEGIN
PRINT 'Customer Found!'
END
ELSE
BEGIN
PRINT 'Invalid Customer ID or Password!'
END
GO
Now our goal is to write queries that perform case sensitive comparisons. I will introduce you to some popular techniques, and show you how to make those methods efficient. It is important to note that, if and when you migrate the following queries onto a case sensitive database, it is better to get rid of the additional bits from the WHERE clause, and compare normally.
Method 1: Converting data to binary type before comparison (Works in both SQL Server 7.0 and 2000)
|
When you convert a character to binary or varbinary datatype, that character's ASCII value gets represented in binary. Since, 'A' and 'a' have different ASCII values, when you convert them to binary, the binary representations of these values don't match, and hence the case sensitive behavior.
DECLARE @CustID char(8), @CustPassword varchar(15)
SET @CustID = 'usa00001'
SET @CustPassword = 'theunbreakable'
IF EXISTS
(
SELECT 1
FROM dbo.Customers
WHERE CAST(CustID AS varbinary(8)) = CAST(@CustID AS varbinary(8))
AND CAST(CustPassword AS varbinary(15)) = CAST(@CustPassword AS varbinary(15))
)
BEGIN
PRINT 'Customer Found!'
END
ELSE
BEGIN
PRINT 'Invalid Customer ID or Password!'
END
GO
The above example will print 'Invalid Customer ID or Password!', because the input provided is in all lower case, while the data in the table is stored in mixed case. Everything looks good. But if you observe the execution plan for this query (By pressing Ctrl + K in Query Analyzer, or by running SET SHOWPLAN_TEXT ON command), you will see an 'index scan'. An index scan is no good from performance point of view, as it means, scanning the whole index -- which is as bad as a table scan.
If you modify the above query as shown below, you will see an 'index seek', which is faster. You will realize this performance difference easily, on a huge table.
DECLARE @CustID char(8), @CustPassword varchar(15)
SET @CustID = 'usa00001'
SET @CustPassword = 'theunbreakable'
IF EXISTS
(
SELECT 1
FROM dbo.Customers
WHERE CAST(CustID AS varbinary(8)) = CAST(@CustID AS varbinary(8))
AND CAST(CustPassword AS varbinary(15)) = CAST(@CustPassword AS varbinary(15))
AND CustID = @CustID
AND CustPassword = @CustPassword
)
BEGIN
PRINT 'Customer Found!'
END
ELSE
BEGIN
PRINT 'Invalid Customer ID or Password!'
END
GO
Curious, why the first query scanned the index? Here's why: When a column used in the WHERE clause is enclosed within a function (in this case CAST), the Query Optimizer (QP) cannot predict the outcome of the function in advance, and hence it has to scan the whole index and see if there's a match. So, to avoid the problem we added "AND CustID = @CustID AND CustPassword = @CustPassword" to the WHERE clause, and Optimiser made use of the clustered index.
|
Method 2: Using the COLLATE clause to dictate the case sensitiveness of the query (Works only in SQL Server 2000)
|
The COLLATE clause allows us to specify a particular collation for an expression. In the following examples, we will use COLLATE to make our search case sensitive. We need to specify a case sensitive collation along with COLLATE. The following example uses the collation SQL_Latin1_General_CP1_CS_AS. If you are working with non-English data, choose an appropriate collation.
DECLARE @CustID char(8), @CustPassword varchar(15)
SET @CustID = 'usa00001'
SET @CustPassword = 'theunbreakable'
IF EXISTS
(
SELECT 1
FROM dbo.Customers
WHERE CustID = @CustID COLLATE SQL_Latin1_General_CP1_CS_AS
AND CustPassword = @CustPassword COLLATE SQL_Latin1_General_CP1_CS_AS
)
BEGIN
PRINT 'Customer Found!'
END
ELSE
BEGIN
PRINT 'Invalid Customer ID or Password!'
END
GO
The above example will print 'Invalid Customer ID or Password!', because the input provided is in all lower case, while the data in the table is stored in mixed case. But again, the execution plan shows an index scan. We can turn this into an index seek, by following the same trick, as in Method 1: So, let's rewrite the query:
DECLARE @CustID char(8), @CustPassword varchar(15)
SET @CustID = 'usa00001'
SET @CustPassword = 'theunbreakable'
IF EXISTS
(
SELECT 1
FROM dbo.Customers
WHERE CustID = @CustID COLLATE SQL_Latin1_General_CP1_CS_AS
AND CustPassword = @CustPassword COLLATE SQL_Latin1_General_CP1_CS_AS
AND CustID = @CustID
AND CustPassword = @CustPassword
)
BEGIN
PRINT 'Customer Found!'
END
ELSE
BEGIN
PRINT 'Invalid Customer ID or Password!'
END
GO
|
Method 3: Using BINARY_CHECKSUM function (Works only in SQL Server 2000)
|
I've seen many people using BINARY_CHECKSUM function to perform case sensitive searches. BINARY_CHECKSUM() function accepts input, and returns a checksum value for that input. Though this is a useful function for tracking changes and verifying integrity of data, I don't think it is the appropriate for making case sensitive searches. It works for simple comparisons though. Here's an example:
DECLARE @CustID char(8), @CustPassword varchar(15)
SET @CustID = 'usa00001'
SET @CustPassword = 'theunbreakable'
IF EXISTS
(
SELECT 1
FROM dbo.Customers
WHERE BINARY_CHECKSUM(CustID) = BINARY_CHECKSUM(@CustID)
AND BINARY_CHECKSUM(CustPassword) = BINARY_CHECKSUM(@CustPassword)
)
BEGIN
PRINT 'Customer Found!'
END
ELSE
BEGIN
PRINT 'Invalid Customer ID or Password!'
END
GO
The above example will print 'Invalid Customer ID or Password!' and you will see an index scan in the execution plan. The following example will convert that into an index seek.
DECLARE @CustID char(8), @CustPassword varchar(15)
SET @CustID = 'usa00001'
SET @CustPassword = 'theunbreakable'
IF EXISTS
(
SELECT 1
FROM dbo.Customers
WHERE BINARY_CHECKSUM(CustID) = BINARY_CHECKSUM(@CustID)
AND BINARY_CHECKSUM(CustPassword) = BINARY_CHECKSUM(@CustPassword)
AND CustID = @CustID
AND CustPassword = @CustPassword
)
BEGIN
PRINT 'Customer Found!'
END
ELSE
BEGIN
PRINT 'Invalid Customer ID or Password!'
END
GO
As you can see, this technique works for comparing smaller strings. But the following script will prove that BINARY_CHECKSUM can return the same checksum value for different input values. This is very bad, especially when it comes to validating user names and passwords. Someone can bypass the authentication/authorization by specifying a password, which is not correct, but produces the same checksum value as the correct password. Try this script and find out why I don't recommend this approach:
SET NOCOUNT ON
DECLARE @i varchar(500)
CREATE TABLE #t (CharValue varchar(500), BinaryChecksum int)
SET @i = 'A'
WHILE @i <> REPLICATE('A', 500)
BEGIN
INSERT #t SELECT @i, BINARY_CHECKSUM(@i)
SET @i = @i + 'A'
END
SELECT CharValue, COUNT(*) AS 'Times Repeated' FROM #t GROUP BY CharValue
SELECT BinaryChecksum, COUNT(*) AS 'Times Repeated' FROM #t GROUP BY BinaryChecksum
SELECT BINARY_CHECKSUM('A') AS [Checksum value for 'A'],
BINARY_CHECKSUM('AAAAAAAAAAAAAAAAA') AS [Checksum value for 'AAAAAAAAAAAAAAAAA']
DROP TABLE #t
|
Method 4: Changing the collation of the column permanently, so that all comparisons are case sensitive by default (Works only in SQL Server 2000)
|
SQL Server 2000 lets you specify collation at the column level also. So, you could make your CustID and CustPassword columns case sensitive by default. This saves a lot of effort, as you don't have to employ any special techniques in your queries, to get case sensitive behavior. Here's an example:
/* Since we have a primary key constraint defined on the CustID column, we cannot alter it directly.
First we need to drop the constraint. So we need the constraint name. The following procedure will
give you the primary key constraint name */
EXEC sp_help 'dbo.Customers'
GO
/* Substitute your primary key constraint name into the following ALTER TABLE command, to drop the constraint */
ALTER TABLE dbo.Customers DROP CONSTRAINT PK__Customers__71F07EBE
GO
/* Change the collation of the CustID column */
ALTER TABLE dbo.Customers ALTER COLUMN CustID char(8) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL
GO
/* Add the primary key constraint back in */
ALTER TABLE dbo.Customers ADD CONSTRAINT PK_Customers PRIMARY KEY(CustID)
GO
/* Change the collation of the CustPassword column as well */
ALTER TABLE dbo.Customers ALTER COLUMN CustPassword varchar(15) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL
GO
/* Now try this query and observe the case sensitive behavior */
DECLARE @CustID char(8), @CustPassword varchar(15)
SET @CustID = 'usa00001'
SET @CustPassword = 'theunbreakable'
IF EXISTS
(
SELECT 1
FROM dbo.Customers
WHERE CustID = @CustID
AND CustPassword = @CustPassword
)
BEGIN
PRINT 'Customer Found!'
END
ELSE
BEGIN
PRINT 'Invalid Customer ID or Password!'
END
GO
|
Method 5: Using computed columns (Works in both SQL Server 7.0 and 2000)
|
You could add a computed column to your table, that exhibits case sensitivity. A computed column is a virtual column that derives its value from the existing columns. SQL Server 2000 allows you to create an index on a computed column to make searches on these columns faster. We will exploit that feature in this example:
/* Adding two computed columns -- one for CustID and the other for CustPassword */
ALTER TABLE dbo.Customers ADD CustID_CS AS (CAST(CustID AS varbinary(8))),
CustPassword_CS AS (CustPassword COLLATE SQL_Latin1_General_CP1_CS_AS)
GO
/* Create a nonclustered index on the computed column CustID_CS */
CREATE NONCLUSTERED INDEX NC_NI_Customers_CustID ON dbo.Customers(CustID_CS)
GO
/* The following example will make use of the index on the computed column to perform case sensitive searches efficiently */
DECLARE @CustID char(8), @CustPassword varchar(15)
SET @CustID = 'usa00001'
SET @CustPassword = 'theunbreakable'
IF EXISTS
(
SELECT 1
FROM dbo.Customers
WHERE CustID_CS = CAST(@CustID AS varbinary(8))
AND CustPassword_CS = @CustPassword
)
BEGIN
PRINT 'Customer Found!'
END
ELSE
BEGIN
PRINT 'Invalid Customer ID or Password!'
END
GO
|
Method 6: Make use of client side languages like VB or VBScript for case sensitive comparisons (Works in both SQL Server 7.0 and 2000)
|
Instead of trying perform case sensitive comparisons on the backend, try taking advantage of your client side applications. This may not be appropriate for all situations, but I'm mentioning it here for the sake of completeness. The following code can be used in a Visual Basic application, to perform case sensitive searches:
Dim cn As New ADODB.Connection, rs As New ADODB.Recordset
Dim CustID As String, CustPassword As String
CustID = "usa00001"
CustPassword = "theunbreakable"
cn.Open "Server=(local);Integrated Security=SSPI;Database=tempdb;Provider=SQLOLEDB"
Set rs = cn.Execute("SELECT CustID, CustPassword FROM dbo.Customers WHERE CustId = '" + CustID + "' AND CustPassword = '" + CustPassword + "'")
If CustID = rs("CustID") And CustPassword = rs("CustPassword") Then
MsgBox "Customer Found!"
Else
MsgBox "Invalid Customer ID or Password!"
End If
Set rs = Nothing
Set cn = Nothing
|
Hope you find the above techniques useful. Here is some additional information that might help you deal with collations:
The following command shows you your SQL Server's default collation:
SELECT SERVERPROPERTY('Collation') AS 'Server Level Collation'
To see your default database collation:
SELECT DATABASEPROPERTYEX('Pubs', 'Collation') AS 'Database Level Collation'
To see column level collations of Customers table:
EXEC sp_help 'dbo.Customers'
To see server level collation settings in SQL Server 2000 as well as the previous versions:
EXEC sp_helpsort
To a listing of all available collations in SQL Server 2000:
SELECT * FROM ::fn_helpcollations()
For further information about specific collations:
SELECT COLLATIONPROPERTY('German_PhoneBook_CI_AS', 'CodePage')
SELECT COLLATIONPROPERTY('French_CI_AS', 'LCID')
SELECT COLLATIONPROPERTY('Latin1_General_CI_AS', 'ComparisonStyle')
|