HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Scrubbing Names via SQL Query/Batch

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
scrubbingsqlquerybatchnamesvia

Problem

We have sensitive information (names of people) and we're looking to put into place a script to "scrub" of the names. Has anybody come up with a good algorithm to do this, something other than just making all of the last names "TEST" or something?

Thanks.

Solution

I had to do this for NPI (non-public information) reasons at a previous employer. Ended up scrambling fields in a script, since a first name / last name pair isn't NPI unless it's tied to something else.

Here's a rough version of the script. Season to taste. Also, this assumes you have first and last names in separate fields.

CREATE TABLE #RandomData
(
    DataID int not null,
    ID int not null,
    FirstName varchar(50),
    LastName varchar(50)
)

INSERT INTO #RandomData (ID, DataID)
SELECT ROW_NUMBER() OVER (ORDER BY NEWID()), DataId FROM DataTable

UPDATE r SET r.FirstName = x.FirstName
FROM #RandomData r
INNER JOIN (
    SELECT ROW_NUMBER() OVER (ORDER BY NEWID()) AS ID, FirstName FROM DataTable
) x
    ON x.ID = r.ID

UPDATE r SET r.LastName = x.LastName
FROM #RandomData r
INNER JOIN (
    SELECT ROW_NUMBER() OVER (ORDER BY NEWID()) AS ID, LastName FROM DataTable
) x
    ON x.ID = r.ID

CREATE CLUSTERED INDEX PK_RandomData ON #RandomData
(
    DataID asc
)

ALTER TABLE DataTable DISABLE TRIGGER ALL

UPDATE dt 
    SET dt.FirstName = r.FirstName, 
    dt.LastName = r.LastName
FROM DataTable dt
INNER JOIN #RandomData r
    ON dt.DataId = r.DataId

ALTER TABLE DataTable ENABLE TRIGGER ALL

DROP TABLE #RandomData

Code Snippets

CREATE TABLE #RandomData
(
    DataID int not null,
    ID int not null,
    FirstName varchar(50),
    LastName varchar(50)
)

INSERT INTO #RandomData (ID, DataID)
SELECT ROW_NUMBER() OVER (ORDER BY NEWID()), DataId FROM DataTable

UPDATE r SET r.FirstName = x.FirstName
FROM #RandomData r
INNER JOIN (
    SELECT ROW_NUMBER() OVER (ORDER BY NEWID()) AS ID, FirstName FROM DataTable
) x
    ON x.ID = r.ID

UPDATE r SET r.LastName = x.LastName
FROM #RandomData r
INNER JOIN (
    SELECT ROW_NUMBER() OVER (ORDER BY NEWID()) AS ID, LastName FROM DataTable
) x
    ON x.ID = r.ID

CREATE CLUSTERED INDEX PK_RandomData ON #RandomData
(
    DataID asc
)

ALTER TABLE DataTable DISABLE TRIGGER ALL

UPDATE dt 
    SET dt.FirstName = r.FirstName, 
    dt.LastName = r.LastName
FROM DataTable dt
INNER JOIN #RandomData r
    ON dt.DataId = r.DataId

ALTER TABLE DataTable ENABLE TRIGGER ALL

DROP TABLE #RandomData

Context

StackExchange Database Administrators Q#11719, answer score: 5

Revisions (0)

No revisions yet.