HiveBrain v1.2.0
Get Started
← Back to all entries
debugsqlMinor

Compare each column in a row and return an error if any of them is different

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
erroreachcolumnreturnanydifferentandcomparethemrow

Problem

I'm attempting to find any identifiers with any values that are different in the same row.

+------------+--------+--------+--------+--------+
| Identifier | Value1 | Value2 | Value3 | Value4 |
| f001       | a1     | a2     | a2     | a2     |
| f002       | a4     | a4     | a4     | a5     |
| f003       | a2     | a2     | a2     |        |
| f004       | a1     | a1     | a1     | a1     |
| f002       | a9     |        |        |        |
+------------+--------+--------+--------+--------+


For example,

The first identifier should return "MisMatch",

The second identifier should return "MisMatch",

The third identifier should return "NoIssue",

The fourth identifier should return "Mismatch",

The fifth identifier should return "NoIssue",

Any help would be great, I'm stuck at this point.

Some lines have hundreds of columns, while others only have one. I'm hoping to be able to locate any line that contains any mismatch.

I'm using SQL Server 2008 R2.

Solution

I see this question is tagged both Excel and SQL Server.

In Excel you could use the approach from here

The formula in cell F2 above is

=IF(SUM(IF(FREQUENCY(IF(LEN(B2:E2)>0,MATCH(B2:E2,B2:E2,0),""), IF(LEN(B2:E2)>0,MATCH(B2:E2,B2:E2,0),""))>0,1))>1,"MisMatch","Match")


Pay attention to the note in the linked article


The formulas in this example must be entered as array formulas. Select
each cell that contains a formula, press F2, and then press
CTRL+SHIFT+ENTER.

Curly braces will then appear around the formula in the formula bar

Another (SQL Server) way (credit to Geoff Patterson for improvements) would be

SELECT T.Identifier,
       CASE
         WHEN min_val <> max_val
           THEN 'MisMatch'
         ELSE 'NoIssue'
       END
FROM   @T T
       CROSS APPLY (SELECT MIN(Val),
                      MAX(Val)
                    FROM   (VALUES (Value1),
                                   (Value2),
                                   (Value3),
                                   (Value4)) V(Val)) V(min_val, max_val)


It should work well for the "hundreds of columns" mentioned.

Though it introduces some extra complexity into the plan compared to just a simple compute scalar.

Code Snippets

=IF(SUM(IF(FREQUENCY(IF(LEN(B2:E2)>0,MATCH(B2:E2,B2:E2,0),""), IF(LEN(B2:E2)>0,MATCH(B2:E2,B2:E2,0),""))>0,1))>1,"MisMatch","Match")
SELECT T.Identifier,
       CASE
         WHEN min_val <> max_val
           THEN 'MisMatch'
         ELSE 'NoIssue'
       END
FROM   @T T
       CROSS APPLY (SELECT MIN(Val),
                      MAX(Val)
                    FROM   (VALUES (Value1),
                                   (Value2),
                                   (Value3),
                                   (Value4)) V(Val)) V(min_val, max_val)

Context

StackExchange Database Administrators Q#115806, answer score: 7

Revisions (0)

No revisions yet.