HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Compute intersections of all combinations of vectors in a list of vectors in R

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
vectorsallcombinationscomputeintersectionslist

Problem

My goal is to compute the intersections of several vectors (sets of identifiers, gene-names to be specific). I start with a list of vectors and run the function below, which loops through 1:n where n is the number of sets and then uses combn to generate all combinations of my sets taken m at a time.

I paste together a name and then reduce by intersection over my sets yielding a named list of character vectors holding the elements in common between each combination of sets.

My question is, of course, is there a better way to accomplish this?

## Compute the intersection of all combinations of the
## elements in the list of vectors l. Might be useful
## for generating Venn/Euler diagrams.
## There might be better ways to do this!
overlap  c(1,2,3,3)
      # Reduce(intersect, list(c(1,2,3,3)), init=l[[indices[1]]]) => c(1,2,3)
      results[[name]] <- Reduce(intersect, l[indices], init=l[[indices[1]]])
    }
  }
  results
}

overlap( list(foo=c('a','b','c','d','e','e'),
              bar=c('a','c','e','f','g'),
              bat=c('a','b','c','d','g')))

Solution

Your approach seems reasonable, but there are some simplifications you can make.

First, your construction of name is needlessly complex. This works just as well:

name <- paste(names(l)[indices], collapse="_")


Second, you can call unique on each element of l at the outset which eliminates the need to specify an init value to Reduce (and thus reducing all the calculations by one call to intersect). It also shortens the arguments to intersect since duplicates have already been eliminated.

l <- lapply(l, unique)


These two give a function

overlap <- function(l) {
  results <- list()
  # Remove duplicates within each entry of l
  l <- lapply(l, unique)

  # combinations of m elements of list l
  for (m in seq(along=l)) {

    # generate and iterate through combinations of length m
    for (indices in combn(seq(length(l)), m, simplify=FALSE)) {

      # make name by concatenating the names of the elements
      # of l that we're intersecting
      name <- paste(names(l)[indices], collapse="_")

      results[[name]] <- Reduce(intersect, l[indices])
    }
  }
  results
}


Further elimination of duplicate work would involve recognizing that higher order interactions, as you are determining them now, are repeating the intersections of the lower orders (that is foo_bar_bat first intersects foo and bar and then intersects that with bat, but the intersection of foo and bar was already determined). And "first" order interactions are just the arguments passed through unique (as they were simplified in the previous iteration).

overlap <- function(l) {
  results <- lapply(l, unique)

  # combinations of m elements of list l
  for (m in seq(along=l)[-1]) {

    # generate and iterate through combinations of length m
    for (indices in combn(seq(length(l)), m, simplify=FALSE)) {

      # make name by concatenating the names of the elements
      # of l that we're intersecting
      name_1 <- paste(names(l)[indices[-m]], collapse="_")
      name_2 <- names(l)[indices[m]]
      name <- paste(name_1, name_2, sep="_")

      results[[name]] <- intersect(results[[name_1]], results[[name_2]])

    }
  }
  results
}


If you really want to eliminate more duplicate calculations, you can assign names(l) outside both loops and length(l) outside the outer loop.

Code Snippets

name <- paste(names(l)[indices], collapse="_")
l <- lapply(l, unique)
overlap <- function(l) {
  results <- list()
  # Remove duplicates within each entry of l
  l <- lapply(l, unique)

  # combinations of m elements of list l
  for (m in seq(along=l)) {

    # generate and iterate through combinations of length m
    for (indices in combn(seq(length(l)), m, simplify=FALSE)) {

      # make name by concatenating the names of the elements
      # of l that we're intersecting
      name <- paste(names(l)[indices], collapse="_")

      results[[name]] <- Reduce(intersect, l[indices])
    }
  }
  results
}
overlap <- function(l) {
  results <- lapply(l, unique)

  # combinations of m elements of list l
  for (m in seq(along=l)[-1]) {

    # generate and iterate through combinations of length m
    for (indices in combn(seq(length(l)), m, simplify=FALSE)) {

      # make name by concatenating the names of the elements
      # of l that we're intersecting
      name_1 <- paste(names(l)[indices[-m]], collapse="_")
      name_2 <- names(l)[indices[m]]
      name <- paste(name_1, name_2, sep="_")

      results[[name]] <- intersect(results[[name_1]], results[[name_2]])

    }
  }
  results
}

Context

StackExchange Code Review Q#17905, answer score: 4

Revisions (0)

No revisions yet.