HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Matplotlib-venn and keeping lists of the entries

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
keepingthematplotlibvennlistsandentries

Problem

Having come upon the wonderful little module of matplotlib-venn I've used it for a bit, I'm wondering if there's a nicer way of doing things than what I have done so far. I know that you can use the following lines for a very simple Venn diagram:

union = set1.union(set2).union(set3)
indicators = ['%d%d%d' % (a in set1, a in set2, a in set3) for a in union]
subsets = Counter(indicators)


... but also want to have lists of entries in the various combinations of the three sets.

```
import numpy as np
from matplotlib_venn import venn3, venn3_circles
from matplotlib import pyplot as plt
import pandas as pd

# Read data
data = pd.read_excel(input_file, sheetname=sheet)

# Create three sets of the lists to be compared
set_1 = set(data[compare[0]].dropna())
set_2 = set(data[compare[1]].dropna())
set_3 = set(data[compare[2]].dropna())

# Create a third set with all elements of the two lists
union = set_1.union(set_2).union(set_3)

# Gather names of all elements and list them in groups
lists = [[], [], [], [], [], [], []]
for gene in union:
if (gene in set_1) and (gene not in set_2) and (gene not in set_3):
lists[0].append(gene)
elif (gene in set_1) and (gene in set_2) and (gene not in set_3):
lists[1].append(gene)
elif (gene in set_1) and (gene not in set_2) and (gene in set_3):
lists[2].append(gene)
elif (gene in set_1) and (gene in set_2) and (gene in set_3):
lists[3].append(gene)
elif (gene not in set_1) and (gene in set_2) and (gene not in set_3):
lists[4].append(gene)
elif (gene not in set_1) and (gene in set_2) and (gene in set_3):
lists[5].append(gene)
elif (gene not in set_1) and (gene not in set_2) and (gene in set_3):
lists[6].append(gene)

# Write gene lists to file
ew = pd.ExcelWriter('../Gene lists/Venn lists/' + compare[0] + ' & '
+ compare[1] + ' & ' + compare[2] + ' gene lists.xlsx')

pd.DataFrame(lists[0], columns=[compare[0]]) \
.to_excel(ew

Solution

Perhaps something like the following?

values_to_sets = {a : (a in set1, a in set2, a in set3) for a in union}
sets_to_values = {}
for a, s in values_to_sets.items():
    if s not in sets_to_values:
        sets_to_values[s] = []
    sets_to_values[s].append(a)
print(sets_to_values)


This first identifies each item with a tuple indicating which sets that item belongs to. Then you flip the dictionary mapping, where each tuple maps to a list of items belonging to the combination of sets indicated in the tuple.

You could even expand this to an arbitrary number of sets:

sets = [set1, set2, set3, set4]
values_to_sets = {a : (a in s for s in sets) for a in union}

Code Snippets

values_to_sets = {a : (a in set1, a in set2, a in set3) for a in union}
sets_to_values = {}
for a, s in values_to_sets.items():
    if s not in sets_to_values:
        sets_to_values[s] = []
    sets_to_values[s].append(a)
print(sets_to_values)
sets = [set1, set2, set3, set4]
values_to_sets = {a : (a in s for s in sets) for a in union}

Context

StackExchange Code Review Q#64635, answer score: 4

Revisions (0)

No revisions yet.