HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Validating an HTML input string

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
inputstringhtmlvalidating

Problem

A partial solution is to get all the Notes (VARCHAR) where there is an imbalance in `. But how can I write a query for that?

For example:

Input: Hello World
Output: Valid HTML

Input:
Output: Invalid HTML

Input:
Output: Valid HTML (Marking it Valid is OK for my use case)
But hey isn't that a valid HTML? It could be


Are there any better ways to check for "invalid" HTML where "invalid" is defined as HTML content with
` at the end.

Solution

If all you want to do is confirm that the input string consists of a sequence of ` elements:

  • Split the string on the



  • The first element must be blank



  • Subsequent elements must contain exactly one > character



The input string is valid if every element passes the tests above.

Example

String splitter

I am using the SQL#.String_Split4k streaming table-valued function from the free edition of the SQLCLR SQLsharp library by Solomon Rutzky.

If the input string might be longer than 4000 characters, the (less efficient) SQL#.String_Split function would be used instead (it handles strings up to 2GB).

Any efficient string splitter (T-SQL or otherwise) that can return empty elements correctly with a sequence number will work.

The example below shows the sort of output SQL#.String_Split4k returns.

The first parameter is the string to split. The second parameter is the delimiter. The third parameter (SplitOption) is set to 1, which means Keep Empty Elements.

SELECT
    SS.SplitNum,
    SS.SplitVal
FROM SQL#.String_Split4k(N'', N'<', 1) AS SS;


Output:

╔══════════╦══════════╗
║ SplitNum ║ SplitVal ║
╠══════════╬══════════╣
║ 1 ║ ║
║ 2 ║ a> ║
║ 3 ║ b> ║
║ 4 ║ c> ║
╚══════════╩══════════╝

Table and test data

CREATE TABLE #T
(
    row_id integer IDENTITY PRIMARY KEY,
    string varchar(50) NOT NULL
);

INSERT #T 
    (string)
VALUES 
    ('Hello World'),
    (''),
    ('');


Solution

SELECT
    T.row_id,
    string = (SELECT TOP (1) T2.string FROM #T AS T2 WHERE T2.row_id = T.row_id),
    is_valid =
        MIN
        (
            CASE
                -- First element must be blank
                WHEN SS.SplitNum = 1 AND SS.SplitVal = N'' THEN 1
                WHEN SS.SplitNum = 1 THEN 0
                -- Other elements must contain exactly one >
                WHEN SS.SplitVal NOT LIKE N'%>%' THEN 0
                WHEN SS.SplitVal LIKE N'%>%>%' THEN 0
                -- Otherwise valid
                ELSE 1
            END
        )
FROM #T AS T
CROSS APPLY SQL#.String_Split4k(T.string, '<', 1) AS SS
GROUP BY T.row_id
ORDER BY T.row_id;


Output

╔════════╦════════════════════════════╦══════════╗
║ row_id ║ string ║ is_valid ║
╠════════╬════════════════════════════╬══════════╣
║ 1 ║ Hello World ║ 1 ║
║ 2 ║ ║ 0 ║
║ 3 ║ ║ 1 ║
╚════════╩════════════════════════════╩══════════╝

Code Snippets

SELECT
    SS.SplitNum,
    SS.SplitVal
FROM SQL#.String_Split4k(N'<a><b><c>', N'<', 1) AS SS;
CREATE TABLE #T
(
    row_id integer IDENTITY PRIMARY KEY,
    string varchar(50) NOT NULL
);

INSERT #T 
    (string)
VALUES 
    ('<html>Hello World</html>'),
    ('<html><Hello World</html>'),
    ('<html><Hello World></html>');
SELECT
    T.row_id,
    string = (SELECT TOP (1) T2.string FROM #T AS T2 WHERE T2.row_id = T.row_id),
    is_valid =
        MIN
        (
            CASE
                -- First element must be blank
                WHEN SS.SplitNum = 1 AND SS.SplitVal = N'' THEN 1
                WHEN SS.SplitNum = 1 THEN 0
                -- Other elements must contain exactly one >
                WHEN SS.SplitVal NOT LIKE N'%>%' THEN 0
                WHEN SS.SplitVal LIKE N'%>%>%' THEN 0
                -- Otherwise valid
                ELSE 1
            END
        )
FROM #T AS T
CROSS APPLY SQL#.String_Split4k(T.string, '<', 1) AS SS
GROUP BY T.row_id
ORDER BY T.row_id;

Context

StackExchange Database Administrators Q#158924, answer score: 8

Revisions (0)

No revisions yet.