HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Excluding particular words from a string on a WHERE clause

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
wordsexcludingwhereclauseparticularfromstring

Problem

I have a table that stores invoices and I'm trying to get stats out of it, unfortunately the table is poorly build and some crucial information is all mixed up on a nvarchar field, information such as whether the invoice has been cancelled or if part of the charge is exempt resides on this field in a very daliesque string that gets parsed by the frontend. 3453.234;exempt;Invoice Total...

So I want to create a query that would exclude a few words. My problem is how to accomplish a query that can exclude records depending of a list of keywords (cancelled, exempt), so if any of this words is in the field the amount would not be taken in account.

Solution

Its not too difficult to build a query that does it. First create a query that joins on a table using LIKE with wildcards. Then exclude everything from this query.

To see what I mean, see this simple example: http://sqlfiddle.com/#!6/619fb/2

Or alternately I have reproduced an example here:

--Create tables for comparing data
CREATE TABLE Invoice
(
  InvoiceID INT NOT NULL IDENTITY,
  InvoiceData varchar(200) NOT NULL,
);

CREATE TABLE BadWords
(
  BadWordID INT NOT NULL IDENTITY,
  BadWord varchar(10) NOT NULL
);

--Insert data
INSERT INTO Invoice (InvoiceData)
VALUES ('This is some invoice data'), ('it is about'), ('something interesting that'),
('you should look at'), ('because its got invoice information');

INSERT INTO BadWords (BadWord)
VALUES ('this'),('invoice');

--Test query:
SELECT *
FROM Invoice
WHERE InvoiceID NOT IN (
  SELECT InvoiceID
  FROM Invoice i
  INNER JOIN BadWords b ON i.InvoiceData LIKE '%' + b.BadWord + '%'
)


Performance may be an issue with this if you have hundreds of thousands of rows. Without more information though it would be difficult to give you a high performing query (for example using a NOT CONTAINS free text query may give better performance).

Code Snippets

--Create tables for comparing data
CREATE TABLE Invoice
(
  InvoiceID INT NOT NULL IDENTITY,
  InvoiceData varchar(200) NOT NULL,
);

CREATE TABLE BadWords
(
  BadWordID INT NOT NULL IDENTITY,
  BadWord varchar(10) NOT NULL
);

--Insert data
INSERT INTO Invoice (InvoiceData)
VALUES ('This is some invoice data'), ('it is about'), ('something interesting that'),
('you should look at'), ('because its got invoice information');

INSERT INTO BadWords (BadWord)
VALUES ('this'),('invoice');

--Test query:
SELECT *
FROM Invoice
WHERE InvoiceID NOT IN (
  SELECT InvoiceID
  FROM Invoice i
  INNER JOIN BadWords b ON i.InvoiceData LIKE '%' + b.BadWord + '%'
)

Context

StackExchange Database Administrators Q#95926, answer score: 7

Revisions (0)

No revisions yet.