patternpythonMinor
Loading a Protein Data Bank file into a numpy matrix
Viewed 0 times
matrixfilenumpyintobankproteinloadingdata
Problem
Here is my code:
Is there a more efficient way to do this ? I mean, a way where I don't have to write twice the same lines? I think the code should look more like this :
def read_Coordinates_Atoms2(fileName, only_CA = True):
'''
in : PDB file
out : matrix with coordinates of atoms
'''
with open(fileName, 'r') as infile:
for line in infile :
if only_CA == True :
if line.startswith('ATOM') and line[13:15] == 'CA':
try: # matrix fill-up
CoordAtoms = np.vstack([CoordAtoms, [float(line[30:38]), float(line[38:46]), float(line[46:54])]]) # np.append
except NameError: # matrix declaration
CoordAtoms = np.array([[line[30:38],line[38:46], line[46:54]]], float)
else :
if line.startswith('ATOM'):
try: # matrix fill-up
CoordAtoms = np.vstack([CoordAtoms, [float(line[30:38]), float(line[38:46]), float(line[46:54])]]) # np.append
except NameError: # matrix declaration
CoordAtoms = np.array([[line[30:38],line[38:46], line[46:54]]], float)
return CoordAtomsIs there a more efficient way to do this ? I mean, a way where I don't have to write twice the same lines? I think the code should look more like this :
def foo(file, condition2 = True):
if condition1 and condition2 :
# do lots of instructions
elif condition1 :
# do the same lots of instructions (but different output)Solution
Seeing that both your blocks are identical, you can be able to merge them using boolean logic.
First thing is that, in each case, you perform
Second, either you have
This lets you rewrite your
You can also extract out the line parsing at it is somehow repeated:
But you can also simplify the whole thing by converting your data to float before the
First thing is that, in each case, you perform
line.startswith('ATOM') so put that first.Second, either you have
only_CA being True and you need 'CA' at line[13:15] too, or you have only_CA being False. In other words, you keep the line if either only_CA is False or 'CA' is at line[13:15].This lets you rewrite your
for loop as:for line in infile:
if line.startswith('ATOM') and (not only_CA or line[13:15] == 'CA'):
try: # matrix fill-up
CoordAtoms = np.vstack([CoordAtoms, [float(line[30:38]), float(line[38:46]), float(line[46:54])]]) # np.append
except NameError: # matrix declaration
CoordAtoms = np.array([[line[30:38],line[38:46], line[46:54]]], float)You can also extract out the line parsing at it is somehow repeated:
for line in infile:
if line.startswith('ATOM') and (not only_CA or line[13:15] == 'CA'):
data = [line[30:38], line[38:46], line[46:54]]
try: # matrix fill-up
CoordAtoms = np.vstack([CoordAtoms, [float(x) for x in data]]) # np.append
except NameError: # matrix declaration
CoordAtoms = np.array([data], float)But you can also simplify the whole thing by converting your data to float before the
try and feeding np.array data of the correct type:for line in infile:
if line.startswith('ATOM') and (not only_CA or line[13:15] == 'CA'):
data = [float(line[begin:end]) for begin, end in ((30, 38), (38, 46), (46, 54))]
try: # matrix fill-up
CoordAtoms = np.vstack([CoordAtoms, [data]]) # np.append
except NameError: # matrix declaration
CoordAtoms = np.array([data])Code Snippets
for line in infile:
if line.startswith('ATOM') and (not only_CA or line[13:15] == 'CA'):
try: # matrix fill-up
CoordAtoms = np.vstack([CoordAtoms, [float(line[30:38]), float(line[38:46]), float(line[46:54])]]) # np.append
except NameError: # matrix declaration
CoordAtoms = np.array([[line[30:38],line[38:46], line[46:54]]], float)for line in infile:
if line.startswith('ATOM') and (not only_CA or line[13:15] == 'CA'):
data = [line[30:38], line[38:46], line[46:54]]
try: # matrix fill-up
CoordAtoms = np.vstack([CoordAtoms, [float(x) for x in data]]) # np.append
except NameError: # matrix declaration
CoordAtoms = np.array([data], float)for line in infile:
if line.startswith('ATOM') and (not only_CA or line[13:15] == 'CA'):
data = [float(line[begin:end]) for begin, end in ((30, 38), (38, 46), (46, 54))]
try: # matrix fill-up
CoordAtoms = np.vstack([CoordAtoms, [data]]) # np.append
except NameError: # matrix declaration
CoordAtoms = np.array([data])Context
StackExchange Code Review Q#144429, answer score: 4
Revisions (0)
No revisions yet.