patternpythonMinor
Reading from a .txt file to a pandas dataframe
Viewed 0 times
readingfilepandasdataframefromtxt
Problem
Having a text file
I'm reading the text file to store it in a dataframe by doing:
and I end up with a dataframe of the data. I have to read more bigger files that follow this format. Is there a faster way to redo this to improve runtime?
'./inputs/dist.txt' as:1 1 2.92
1 2 70.75
1 3 60.90
2 1 71.34
2 2 5.23
2 3 38.56
3 1 61.24
3 2 38.68
3 3 4.49I'm reading the text file to store it in a dataframe by doing:
from pandas import DataFrame
import pandas as pd
import os
def get_file_name( path):
return os.path.basename(path).split(".")[0].strip().lower()
name = get_file_name('./inputs/dist.txt')
with open('./inputs/dist.txt') as f:
df = DataFrame(0.0, index=[1,2,3], columns=[1,2,3])
for line in f:
data = line.strip().split()
row,column,value = [int(i) if i.isdigit() else float(i) for i in data]
df.set_value(row,column,value)
m[name] = dfand I end up with a dataframe of the data. I have to read more bigger files that follow this format. Is there a faster way to redo this to improve runtime?
Solution
When opening very large files, first concern would be memory availability on your system to avoid swap on slower devices (i.e. disk).
Pandas is shipped with built-in reader methods. For example the
In the specific case:
will create a
You can by the way force the
Pandas is shipped with built-in reader methods. For example the
pandas.read_table method seems to be a good way to read (also in chunks) a tabular data file.In the specific case:
import pandas
df = pandas.read_table('./input/dists.txt', delim_whitespace=True, names=('A', 'B', 'C'))will create a
DataFrame objects with column named A made of data of type int64, B of int64 and C of float64.You can by the way force the
dtype giving the related dtype argument to read_table. For example forcing the second column to be float64.import numpy as np
import pandas
df = pandas.read_table('./input/dists.txt', delim_whitespace=True, names=('A', 'B', 'C'),
dtype={'A': np.int64, 'B': np.float64, 'C': np.float64})Code Snippets
import pandas
df = pandas.read_table('./input/dists.txt', delim_whitespace=True, names=('A', 'B', 'C'))import numpy as np
import pandas
df = pandas.read_table('./input/dists.txt', delim_whitespace=True, names=('A', 'B', 'C'),
dtype={'A': np.int64, 'B': np.float64, 'C': np.float64})Context
StackExchange Code Review Q#152194, answer score: 7
Revisions (0)
No revisions yet.