HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Fastest way to get the length of the intersection of two text arrays in PostgreSQL

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlthearrayslengthtextwaytwogetfastestintersection

Problem

This is what I am currently using:

CREATE FUNCTION array_intersect(a1 anyarray, a2 anyarray)
RETURNS anyarray AS
$ 
  SELECT ARRAY(
    SELECT unnest($1)
    INTERSECT SELECT unnest($2)
    ORDER BY 1
  );
$ LANGUAGE sql
IMMUTABLE STRICT;

--get the length:
select array_length ( array_intersect(array[...], array[...]), 1);


Is there a faster way?

Solution

-
You say you're looking for "Jaccard Similarity between two arrays"

You may consider MadLib which provides this as dist_jaccard()


Jaccard distance between two varchar vectors treated as sets.

-
You may want to try it without STRICT. That may be slowing it down.

-
You may want to use INTERSECT ALL Especially if you know you don't have dupes in $1.

-
As Craig Ringer said in the comments you may want out intarray and consider using &.

To give a more accurate answer here,

  • What are the datatypes?



  • How many elements are we talking about on each array?



  • How much exclusion does the INTERSECT provide?

Context

StackExchange Database Administrators Q#58329, answer score: 2

Revisions (0)

No revisions yet.