HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

Python command-line program to convert genomic data file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filelineconvertprogrampythongenomiccommanddata

Problem

Background:

I have written this code to transforme a .csv file exported from a software called Geneious containing SNPs and concatenate them into a DNA sequence.

So basically take fields from .csv file to make strings.

The code itself is just a bunch of functions that perform small tasks, some functions call others and in the end the result is printed to a file. I used argparse because this is going to be a command line tool, and is useful to have obligatory arguments and default values for the others.

I am inexperienced in coding and have noone to review my code. I feel that needing to call each argument for each function is really awkward.
My questions:

Is this the best structure? Is creating a "chain" of functions like this the Best practice?
Code

```
import argparse
import collections
import csv

def cleaning(file_as_list, snp, names):
"""From input file get the SNPS."""
with open(file_as_list, 'r') as input_file:
reader = csv.reader(input_file)
file = list(reader)
have_SNP = [x for x in file if x[snp] == '1']
for i in range(len(have_SNP)):
mult_names = have_SNP[i][names].replace(':', ',').replace(', ', ',')
sep_names = mult_names.split(',')
only_names = [x for x in sep_names if ' ' not in x]
have_SNP[i][names] = only_names
return have_SNP

def reference_dic(file_as_list, snp, names, col_ref, pos):
"""Creates the dict with all positions and reference nucleotides."""
have_SNP = cleaning(file_as_list, snp, names)
ref_dic = {}
for i in have_SNP:
ref_dic[int(i[pos].replace(',', ''))] = i[col_ref]
return ref_dic

def pos_list(file_as_list, snp, names, col_ref, pos):
"""Creates a list with all the ehxisting positions in reference."""
ref_dic = reference_dic(file_as_list, snp, names, col_ref, pos)
list_pos = []
for key in ref_dic:
list_pos.append(key)
sorted_pos_lis = sorted(list_pos)
return sorted_pos_lis

def genomes_list(file_as_list,

Solution

This sounds like a good use case to use a class where the arguments you currently pass through the chain of functions would be class attributes, for instance, if we would have a single Converter class, we may have it initialized this way:

class Converter:
    def __init__(self, filename, snp, names, col_ref, pos, col_genome):
        self.filename = filename
        self.snp = snp
        self.names = names
        self.col_ref = col_ref
        self.pos = pos
        self.col_genome = col_genome


Then, your functions will become instance methods where you'll access the instance attributes via self. instead of taking an argument.

Think of a class as a way to group related things logically, providing a shared access to the common variables and methods.

There are also some other things to improve:

  • instead of converting your arguments to int, you can define them with type=int



  • you can make use of dictionary and list comprehensions in multiple places



-
you can make use of str.join() - for example, when defining reference_snps_list:

reference_snps_list = "".join(str(ref_dic[i]) for i in pos_no_dup)


  • you can use the special argparse.FileType for the input file argument



FYI, since this is a controversial topic in general:

  • Start Writing More Classes



  • Stop Writing Classes

Code Snippets

class Converter:
    def __init__(self, filename, snp, names, col_ref, pos, col_genome):
        self.filename = filename
        self.snp = snp
        self.names = names
        self.col_ref = col_ref
        self.pos = pos
        self.col_genome = col_genome
reference_snps_list = "".join(str(ref_dic[i]) for i in pos_no_dup)

Context

StackExchange Code Review Q#159911, answer score: 2

Revisions (0)

No revisions yet.