Tuesday 13 March 2012

NULL values - NOT IN, or not?

I've recently come across a piece of behaviour which surprised me around the IN operator, or more specifically the NOT IN variant of it.

So, the usual behaviour of this is to give us everything which is in one list but not another:
/* Example Script 1 - "NULL values - NOT IN, or not?"
Dave Green, http://d-a-green.blogspot.com */
/* Create main table */
CREATE TABLE #mya (aa INT)
/* Insert sample values */
INSERT #mya (aa) VALUES (1),(2),(3),(4),(5)
/* Create comparison table */
CREATE TABLE #myb (aa INT)
INSERT #myb(aa) VALUES (6)
/* Run the select, 5 rows returned; no rows excluded */
SELECT * FROM #mya WHERE aa NOT IN(SELECT aa FROM #myb)
/* Clearup */
DROP TABLE #mya
DROP TABLE #myb
This returns the five results:
aa
-----------
1
2
3
4
5
(5 row(s) affected)

 So far, so good.

So, what if the table #myb (the comparison table) contains no values?

Well, that works well too:

/* Example Script 2 - "NULL values - NOT IN, or not?"
Dave Green, http://d-a-green.blogspot.com */
/* Create main table */
CREATE TABLE #mya (aa INT)
/* Insert sample values */
INSERT
#mya (aa) VALUES (1),(2),(3),(4),(5)
/* Create comparison table */
CREATE TABLE #myb (aa INT)
/* Run the select, 5 rows returned; no rows excluded */
SELECT * FROM #mya WHERE aa NOT IN(SELECT aa FROM #myb)
/* Clearup */
DROP TABLE #mya
DROP TABLE #myb
Which gives us :
aa
-----------
1
2
3
4
5
(5 row(s) affected)
This is all well and good. But what if the table DOES contain a row, but the value in the field in that row is null?

/* Example Script 3 - "NULL values - NOT IN, or not?"
Dave Green, http://d-a-green.blogspot.com */
/* Create main table */
CREATE TABLE #mya (aa INT)
/* Insert sample values */
INSERT #mya (aa) VALUES (1),(2),(3),(4),(5)
/* Create comparison table */
CREATE TABLE #myb (aa INT)
INSERT #myb(aa) VALUES (NULL)
/* Run the select */
SELECT * FROM #mya WHERE aa NOT IN(SELECT aa FROM #myb)
/* Clearup */
DROP TABLE #mya
DROP TABLE #myb
Which gives us:

aa
-----------
(0 row(s) affected)
 
This is unexpected. It's actually occurring because the comparison between the null in the record, and the values in the table results in UNKNOWN, which isn't TRUE or FALSE.

As BOL puts it
Any null values returned by subquery or expression that are compared to test_expression using IN or NOT IN return UNKNOWN. Using null values in together with IN or NOT IN can produce unexpected results.
This seems to me to be an understatement!

Now, this behaviour isn't so bad if the only record in the set is null - that seems unlikely. But what if you have only one bad record in a series of otherwise good ones?
/* Example Script 4 - "NULL values - NOT IN, or not?"
Dave Green, http://d-a-green.blogspot.com */
/* Create main table */
CREATE TABLE #mya (aa INT)
/* Insert sample values */
INSERT #mya (aa) VALUES (1),(2),(3),(4),(5)
/* Create comparison table */
CREATE TABLE #myb (aa INT)
/* Insert 95 integer records */
DECLARE @i INT
SET @i=6
WHILE @i < 100
BEGIN
INSERT
#myb(aa) VALUES (@i)
SET @i +=1
END
/* Insert one null record */
INSERT #myb(aa) VALUES (NULL)
/* Run the select, 5 rows returned; no rows excluded */
SELECT * FROM #mya WHERE aa NOT IN(SELECT aa FROM #myb)
/* Clearup */
DROP TABLE #mya
DROP TABLE #myb
Again, we get no results:

aa
-----------
(0 row(s) affected)

You can try simply commenting in or out the line which inserts the null value into #myb, and prove the difference.

Clearly, this means that if you're using NOT IN, and there is any possibility that the values you are comparing with could be NULL, then you should consider this. Alternatively, set the columns you are referencing as NOT NULL to avoid this.

What happens if we put a null value in the main table (#mya in the above examples)? The null record is never returned, whether there is a null in the comparison (#myb) table or not. And it's worth noting that this last point is true for IN and NOT IN.

So, what happens if we're not using a table but rather a set of values?

/* Example Script 5 - "NULL values - NOT IN, or not?"
Dave Green, http://d-a-green.blogspot.com */
/* Create main table */
CREATE TABLE #mya (aa INT)
/* Insert sample values */
INSERT #mya (aa) VALUES (1),(2),(3),(4),(5),(null)
/* Get results */
SELECT * FROM #mya WHERE aa NOT IN (1,2,null)
SELECT * FROM #mya WHERE aa IN (1,2,null)
/* Clearup */
DROP TABLE #mya
We get :

aa
-----------
(0 row(s) affected)
 
aa
-----------
1
2
(2 row(s) affected)

So this isn't quite the same behaviour we saw with tables; here, the not in null means that everything evaluates as false (as before), but the in is explicitly looking for true matches - so with the exception of the comparison between null and null, we get the expected 2 records.

It's worth noting that the behaviour experienced can change when the ANSI_NULLS setting is changed, but that this is mandated ON in a future SQL server release (per the note in BOL)

In summary, use caution when looking at IN with NULLable fields or values.

Further reading : http://stackoverflow.com/questions/129077/sql-not-in-constraint-and-null-values