patternpythonMinor
Compute intersections of all combinations of vectors in a list of vectors in R
Viewed 0 times
vectorsallcombinationscomputeintersectionslist
Problem
My goal is to compute the intersections of several vectors (sets of identifiers, gene-names to be specific). I start with a list of vectors and run the function below, which loops through 1:n where n is the number of sets and then uses combn to generate all combinations of my sets taken m at a time.
I paste together a name and then reduce by intersection over my sets yielding a named list of character vectors holding the elements in common between each combination of sets.
My question is, of course, is there a better way to accomplish this?
I paste together a name and then reduce by intersection over my sets yielding a named list of character vectors holding the elements in common between each combination of sets.
My question is, of course, is there a better way to accomplish this?
## Compute the intersection of all combinations of the
## elements in the list of vectors l. Might be useful
## for generating Venn/Euler diagrams.
## There might be better ways to do this!
overlap c(1,2,3,3)
# Reduce(intersect, list(c(1,2,3,3)), init=l[[indices[1]]]) => c(1,2,3)
results[[name]] <- Reduce(intersect, l[indices], init=l[[indices[1]]])
}
}
results
}
overlap( list(foo=c('a','b','c','d','e','e'),
bar=c('a','c','e','f','g'),
bat=c('a','b','c','d','g')))Solution
Your approach seems reasonable, but there are some simplifications you can make.
First, your construction of
Second, you can call
These two give a function
Further elimination of duplicate work would involve recognizing that higher order interactions, as you are determining them now, are repeating the intersections of the lower orders (that is
If you really want to eliminate more duplicate calculations, you can assign
First, your construction of
name is needlessly complex. This works just as well:name <- paste(names(l)[indices], collapse="_")Second, you can call
unique on each element of l at the outset which eliminates the need to specify an init value to Reduce (and thus reducing all the calculations by one call to intersect). It also shortens the arguments to intersect since duplicates have already been eliminated.l <- lapply(l, unique)These two give a function
overlap <- function(l) {
results <- list()
# Remove duplicates within each entry of l
l <- lapply(l, unique)
# combinations of m elements of list l
for (m in seq(along=l)) {
# generate and iterate through combinations of length m
for (indices in combn(seq(length(l)), m, simplify=FALSE)) {
# make name by concatenating the names of the elements
# of l that we're intersecting
name <- paste(names(l)[indices], collapse="_")
results[[name]] <- Reduce(intersect, l[indices])
}
}
results
}Further elimination of duplicate work would involve recognizing that higher order interactions, as you are determining them now, are repeating the intersections of the lower orders (that is
foo_bar_bat first intersects foo and bar and then intersects that with bat, but the intersection of foo and bar was already determined). And "first" order interactions are just the arguments passed through unique (as they were simplified in the previous iteration).overlap <- function(l) {
results <- lapply(l, unique)
# combinations of m elements of list l
for (m in seq(along=l)[-1]) {
# generate and iterate through combinations of length m
for (indices in combn(seq(length(l)), m, simplify=FALSE)) {
# make name by concatenating the names of the elements
# of l that we're intersecting
name_1 <- paste(names(l)[indices[-m]], collapse="_")
name_2 <- names(l)[indices[m]]
name <- paste(name_1, name_2, sep="_")
results[[name]] <- intersect(results[[name_1]], results[[name_2]])
}
}
results
}If you really want to eliminate more duplicate calculations, you can assign
names(l) outside both loops and length(l) outside the outer loop.Code Snippets
name <- paste(names(l)[indices], collapse="_")l <- lapply(l, unique)overlap <- function(l) {
results <- list()
# Remove duplicates within each entry of l
l <- lapply(l, unique)
# combinations of m elements of list l
for (m in seq(along=l)) {
# generate and iterate through combinations of length m
for (indices in combn(seq(length(l)), m, simplify=FALSE)) {
# make name by concatenating the names of the elements
# of l that we're intersecting
name <- paste(names(l)[indices], collapse="_")
results[[name]] <- Reduce(intersect, l[indices])
}
}
results
}overlap <- function(l) {
results <- lapply(l, unique)
# combinations of m elements of list l
for (m in seq(along=l)[-1]) {
# generate and iterate through combinations of length m
for (indices in combn(seq(length(l)), m, simplify=FALSE)) {
# make name by concatenating the names of the elements
# of l that we're intersecting
name_1 <- paste(names(l)[indices[-m]], collapse="_")
name_2 <- names(l)[indices[m]]
name <- paste(name_1, name_2, sep="_")
results[[name]] <- intersect(results[[name_1]], results[[name_2]])
}
}
results
}Context
StackExchange Code Review Q#17905, answer score: 4
Revisions (0)
No revisions yet.