patternMinor
Use utl_match.jaro_winkler_similarity to select distinct rows using Oracle
Viewed 0 times
distinctjaro_winkler_similarityrowsoracleutl_matchusingselectuse
Problem
I have a table with log messages that many times contain repeated information with slightly different values.
For example:
Unable to make thread 19043 a realtime process
Unable to make thread 20763 a realtime process
Unable to make thread 22179 a realtime process
FEED_XYZ Secondary Instrument Lines not configured
FEED_ZZZ Secondary Instrument Lines not configured
(...)
I'd like to have just one line from each group above:
Unable to make thread 19043 a realtime process
FEED_XYZ Secondary Instrument Lines not configured
Is there a way to use a select query to retrieve distinct rows with
Something like:
I know that I can make a pipelined function or some other PL/SQL routine, but I just would like to double-check if there's some easier way of doing this.
For example:
Unable to make thread 19043 a realtime process
Unable to make thread 20763 a realtime process
Unable to make thread 22179 a realtime process
FEED_XYZ Secondary Instrument Lines not configured
FEED_ZZZ Secondary Instrument Lines not configured
(...)
I'd like to have just one line from each group above:
Unable to make thread 19043 a realtime process
FEED_XYZ Secondary Instrument Lines not configured
Is there a way to use a select query to retrieve distinct rows with
utl_match.jaro_winkler_similarity?Something like:
select log_message
from logs_table
where "utl_match.jaro_winkler_similarity between rows < 80"I know that I can make a pipelined function or some other PL/SQL routine, but I just would like to double-check if there's some easier way of doing this.
Solution
I'm not overly familiar with the
Result (screenshot from test DB at apex.oracle.com):
If you just need any value from each group (as opposed to the "first" value in each sorted group), you could use
However, unless there is a specific reason why you need to use
Or these (e.g. if there is a need for more fine-grained filtering):
UTL_MATCH.JARO_WINKLER_SIMILARITY function, but using it in a UNION query, in conjunction with the ROW_NUMBER analytic function, will give you the result you desire:WITH
temp AS
(
SELECT log_message, ROW_NUMBER() OVER (ORDER BY log_message) rn
FROM logs_table
WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'Unable to make thread 19043 a realtime process') > 80
UNION
SELECT log_message, ROW_NUMBER() OVER (ORDER BY log_message) rn
FROM logs_table
WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'FEED_XYZ Secondary Instrument Lines not configured') > 80
)
SELECT log_message
FROM temp
WHERE rn = 1
;Result (screenshot from test DB at apex.oracle.com):
If you just need any value from each group (as opposed to the "first" value in each sorted group), you could use
ROWNUM criteria instead of the ROW_NUMBER functions, which would return a similar result:WITH
temp AS
(
SELECT log_message
FROM logs_table
WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'Unable to make thread 19043 a realtime process') > 80
AND ROWNUM = 1
UNION
SELECT log_message
FROM logs_table
WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'FEED_XYZ Secondary Instrument Lines not configured') > 80
AND ROWNUM = 1
)
SELECT log_message
FROM temp
;However, unless there is a specific reason why you need to use
UTL_MATCH.JARO_WINKLER_SIMILARITY, there's a chance you may be falling victim to the XY Problem. You could achieve identical results by replacing the two individual WHERE-clause lines (in either query above) with these:WHERE log_message LIKE 'Unable to make thread %'
WHERE log_message LIKE 'FEED\_%' ESCAPE '\'Or these (e.g. if there is a need for more fine-grained filtering):
WHERE REGEXP_LIKE(log_message, 'Unable\sto\smake\sthread\s\d+\sa\srealtime\sprocess')
WHERE REGEXP_LIKE(log_message, 'FEED_[A-Z]{3}\sSecondary\sInstrument\sLines\snot\sconfigured')Code Snippets
WITH
temp AS
(
SELECT log_message, ROW_NUMBER() OVER (ORDER BY log_message) rn
FROM logs_table
WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'Unable to make thread 19043 a realtime process') > 80
UNION
SELECT log_message, ROW_NUMBER() OVER (ORDER BY log_message) rn
FROM logs_table
WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'FEED_XYZ Secondary Instrument Lines not configured') > 80
)
SELECT log_message
FROM temp
WHERE rn = 1
;WITH
temp AS
(
SELECT log_message
FROM logs_table
WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'Unable to make thread 19043 a realtime process') > 80
AND ROWNUM = 1
UNION
SELECT log_message
FROM logs_table
WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'FEED_XYZ Secondary Instrument Lines not configured') > 80
AND ROWNUM = 1
)
SELECT log_message
FROM temp
;WHERE log_message LIKE 'Unable to make thread %'
WHERE log_message LIKE 'FEED\_%' ESCAPE '\'WHERE REGEXP_LIKE(log_message, 'Unable\sto\smake\sthread\s\d+\sa\srealtime\sprocess')
WHERE REGEXP_LIKE(log_message, 'FEED_[A-Z]{3}\sSecondary\sInstrument\sLines\snot\sconfigured')Context
StackExchange Database Administrators Q#158190, answer score: 2
Revisions (0)
No revisions yet.