snippetpythonMinor
Python command-line program to convert genomic data file
Viewed 0 times
filelineconvertprogrampythongenomiccommanddata
Problem
Background:
I have written this code to transforme a .csv file exported from a software called Geneious containing SNPs and concatenate them into a DNA sequence.
So basically take fields from .csv file to make strings.
The code itself is just a bunch of functions that perform small tasks, some functions call others and in the end the result is printed to a file. I used argparse because this is going to be a command line tool, and is useful to have obligatory arguments and default values for the others.
I am inexperienced in coding and have noone to review my code. I feel that needing to call each argument for each function is really awkward.
My questions:
Is this the best structure? Is creating a "chain" of functions like this the Best practice?
Code
```
import argparse
import collections
import csv
def cleaning(file_as_list, snp, names):
"""From input file get the SNPS."""
with open(file_as_list, 'r') as input_file:
reader = csv.reader(input_file)
file = list(reader)
have_SNP = [x for x in file if x[snp] == '1']
for i in range(len(have_SNP)):
mult_names = have_SNP[i][names].replace(':', ',').replace(', ', ',')
sep_names = mult_names.split(',')
only_names = [x for x in sep_names if ' ' not in x]
have_SNP[i][names] = only_names
return have_SNP
def reference_dic(file_as_list, snp, names, col_ref, pos):
"""Creates the dict with all positions and reference nucleotides."""
have_SNP = cleaning(file_as_list, snp, names)
ref_dic = {}
for i in have_SNP:
ref_dic[int(i[pos].replace(',', ''))] = i[col_ref]
return ref_dic
def pos_list(file_as_list, snp, names, col_ref, pos):
"""Creates a list with all the ehxisting positions in reference."""
ref_dic = reference_dic(file_as_list, snp, names, col_ref, pos)
list_pos = []
for key in ref_dic:
list_pos.append(key)
sorted_pos_lis = sorted(list_pos)
return sorted_pos_lis
def genomes_list(file_as_list,
I have written this code to transforme a .csv file exported from a software called Geneious containing SNPs and concatenate them into a DNA sequence.
So basically take fields from .csv file to make strings.
The code itself is just a bunch of functions that perform small tasks, some functions call others and in the end the result is printed to a file. I used argparse because this is going to be a command line tool, and is useful to have obligatory arguments and default values for the others.
I am inexperienced in coding and have noone to review my code. I feel that needing to call each argument for each function is really awkward.
My questions:
Is this the best structure? Is creating a "chain" of functions like this the Best practice?
Code
```
import argparse
import collections
import csv
def cleaning(file_as_list, snp, names):
"""From input file get the SNPS."""
with open(file_as_list, 'r') as input_file:
reader = csv.reader(input_file)
file = list(reader)
have_SNP = [x for x in file if x[snp] == '1']
for i in range(len(have_SNP)):
mult_names = have_SNP[i][names].replace(':', ',').replace(', ', ',')
sep_names = mult_names.split(',')
only_names = [x for x in sep_names if ' ' not in x]
have_SNP[i][names] = only_names
return have_SNP
def reference_dic(file_as_list, snp, names, col_ref, pos):
"""Creates the dict with all positions and reference nucleotides."""
have_SNP = cleaning(file_as_list, snp, names)
ref_dic = {}
for i in have_SNP:
ref_dic[int(i[pos].replace(',', ''))] = i[col_ref]
return ref_dic
def pos_list(file_as_list, snp, names, col_ref, pos):
"""Creates a list with all the ehxisting positions in reference."""
ref_dic = reference_dic(file_as_list, snp, names, col_ref, pos)
list_pos = []
for key in ref_dic:
list_pos.append(key)
sorted_pos_lis = sorted(list_pos)
return sorted_pos_lis
def genomes_list(file_as_list,
Solution
This sounds like a good use case to use a class where the arguments you currently pass through the chain of functions would be class attributes, for instance, if we would have a single
Then, your functions will become instance methods where you'll access the instance attributes via
Think of a class as a way to group related things logically, providing a shared access to the common variables and methods.
There are also some other things to improve:
-
you can make use of
FYI, since this is a controversial topic in general:
Converter class, we may have it initialized this way:class Converter:
def __init__(self, filename, snp, names, col_ref, pos, col_genome):
self.filename = filename
self.snp = snp
self.names = names
self.col_ref = col_ref
self.pos = pos
self.col_genome = col_genomeThen, your functions will become instance methods where you'll access the instance attributes via
self. instead of taking an argument.Think of a class as a way to group related things logically, providing a shared access to the common variables and methods.
There are also some other things to improve:
- instead of converting your arguments to
int, you can define them withtype=int
- you can make use of dictionary and list comprehensions in multiple places
-
you can make use of
str.join() - for example, when defining reference_snps_list:reference_snps_list = "".join(str(ref_dic[i]) for i in pos_no_dup)- you can use the special
argparse.FileTypefor the input file argument
FYI, since this is a controversial topic in general:
- Start Writing More Classes
- Stop Writing Classes
Code Snippets
class Converter:
def __init__(self, filename, snp, names, col_ref, pos, col_genome):
self.filename = filename
self.snp = snp
self.names = names
self.col_ref = col_ref
self.pos = pos
self.col_genome = col_genomereference_snps_list = "".join(str(ref_dic[i]) for i in pos_no_dup)Context
StackExchange Code Review Q#159911, answer score: 2
Revisions (0)
No revisions yet.