In SQL, if you need to compare two columns that are both nullable, your compare statements can quickly get out of hand (and a lot less readable along the way). While fixing a bug in my code, I found out about a way to keep the comparison short and readable using the EXCEPT (or INTERSECT) operator.

This week I had a slap-on-the-forehead day, where I discovered a mistake in my SQL code, and it was not the first time I made that same mistake… (Donkeys, rocks, anyone?) But I learned something new while fixing it.

The bug: an application I developed, that had an MS SQL Server database back-end, was not always updating values changed by the user correctly. The database update was performed by a stored procedure that executed a MERGE INTO statement, which I use quite often as I have found it a fairly pretty way of doing something like an UPSERT which does not exist in MSSQL. The statement was something like this:

MERGE INTO MyTable Tgt
USING (SELECT @Key as Key, @ValueA as ValueA, @ValueB as ValueB, @ValueC as ValueC) Src
ON (Tgt.Key = Src.Key)
WHEN MATCHED AND (
    Tgt.ValueA <> Src.ValueA OR
    Tgt.ValueB <> Src.ValueB OR
    Tgt.ValueC <> Src.ValueC
  )
  THEN UPDATE SET Tgt.A = Src.A, Tgt.ValueB = Src.ValueB, Tgt.ValueC = Src.ValueC
WHEN NOT MATCHED BY TARGET
  THEN INSERT (KeyA, ValueA, ValueB, ValueC)
  VALUES (KeyA, ValueA, ValueB, ValueC)

Pretty standard, right? Now if I tell you that ValueA, ValueB and ValueC are nullable columns, you probably already spotted my mistake. The statement has no trouble changing a column from one value to another, but is failing to update a column from null to a value or vice versa.

Clarification, in short: if either operand of the equasion ValueA<>ValueA is null, the result is not true or false but unknown. This causes the whole and clause to turn unknown and because unknown is not true, the update set statement is not executed. Read up on three-valued logic here, if you like.

To fix that, I’d have to alter the matched clause, like this:

WHEN MATCHED AND (
  (Tgt.ValueA <> Src.ValueA) OR (Tgt.ValueA IS NULL AND Src.ValueA IS NOT NULL) OR (Tgt.ValueA IS NOT NULL AND Src.ValueA IS NULL) OR
  (Tgt.ValueB <> Src.ValueB) OR (Tgt.ValueB IS NULL AND Src.ValueB IS NOT NULL) OR (Tgt.ValueB IS NOT NULL AND Src.ValueB IS NULL) OR
  (Tgt.ValueC <> Src.ValueC) OR (Tgt.ValueC IS NULL AND Src.ValueC IS NOT NULL) OR (Tgt.ValueC IS NOT NULL AND Src.ValueC IS NULL)
)

Wow. I only have three columns and already this statement is getting out of hand. Some much for fairly pretty.

There must be a shorter way to do this. There are quite a few discussions on the matter on stackoverflow. Some refer to the use of IsNull(Tgt.ValueA, xxx) with a very unlikely value for xxx, but while the real-world chances are slim, I consider it a bit ugly. Especially in my situation, where the use of null had a really specific intention for the columns involved. But one of the other suggestions pointed to a somewhat shorter way to say the same:

ISNULL(NULLIF(Tgt.ValueA, Src.ValueA), NULLIF(Src.ValueA, Tgt.ValueA)) IS NOT NULL

While that’s a nice trick and nicely explained by the author, I felt the intention of the statement was not really clear anymore. In the same thread, the approved answer mentioned the use of the EXCEPT or INTERSECT operator. However, from this answer, it was not immediately clear to me how to use that in a merge condition. I found a better example in another thread. As it turns out, you just have to write this:

EXISTS (SELECT Tgt.Value EXCEPT SELECT Src.Value) 

Yields exactly the same result. So I could shorten my statement to

WHEN MATCHED AND (
  EXISTS (SELECT Tgt.ValueA EXCEPT SELECT Src.ValueA)
  EXISTS (SELECT Tgt.ValueB EXCEPT SELECT Src.ValueB)
  EXISTS (SELECT Tgt.ValueC EXCEPT SELECT Src.ValueC)

or, even  shorter:

WHEN MATCHED AND (
  EXISTS (SELECT Tgt.ValueA, Tgt.ValueB, Tgt.Value C EXCEPT SELECT Src.ValueA, Src.ValueB, Src.ValueC)

Now we are getting somewhere! It is correct (which is the most important bit of course) and I feel the intention of the statement, which is to return true if the columns in Tgt and Src differ, is still clear. And if you have a composite index on the columns, it is even quite efficient.

Why does that work? It is because the EXCEPT (and it’s nephew INTERSECT) operator is somewhat of an oddball where comparisons to null are concerned. For these operators are exceptional (pun intended) in the fact that, unlike other operators, they behave like they consider a null “equal” to another null. This is a defendable design choice, because as you may recall they are used to get the difference c.q. intersection of two rowsets. For this (and only this) a row A having a null value in column X, needs to be considered the same row as a row B having a null value in column X (all other columns being equal). The operator is never claiming that null equals null, only that rows holding a null in the same spot are, for the intention of the operator.

If you like to get into the technical details, you should really read this blog post.

Shiftkey Avatar

Published by

Plaats een reactie