patternpythonMinor

Read daily files and concatenate them

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

readconcatenatefilesandthemdaily

Problem

Edit - here is my modified code: http://jsfiddle.net/#&togetherjs=GzytydCsRh

Can someone take a look and give me some feedback? It seems a bit long still but that is the first time I used functions.

I am reading a bunch of CSV files and using glob to concatenate them all together into separate dataframes. I eventually join them together and basically create a single large file which I use to connect to a dashboard. I am not too familiar with Python but I used Pandas and sklearn often.

As you can see, I am basically just reading the last 60 (or more) days worth of data (last 60 files) and creating a dataframe for each. This works, but I am wondering if there is a more Pythonic/better way? I watched a video on pydata (about not being restricted by PEP 8 and making sure your code is Pythonic) which was interesting.

(FYI - the reason I need to read 60 days worth of files is because customers can fill out a survey from a call which happened a long time ago. The customer fills out a survey today about a call that happened in July. I need to know about that call (how long it lasted, what the topic was, etc).

```
import pandas as pd
import numpy as np
from pandas import *
import datetime as dt
import os
from glob import glob
os.chdir(r'C:\\Users\Documents\FTP\\')
loc = r'C:\\Users\Documents\\'
rosterloc = r'\\mand\\'
splitsname = r'Splits.csv'
fcrname = r'global_disp_'
npsname = r'survey_'
ahtname = r'callbycall_'
rostername = 'Daily_Roster.csv'
vasname = r'vas_report_'
ext ='.csv'
startdate = dt.date.today() - Timedelta('60 day')
enddate = dt.date.today()
daterange = Timestamp(enddate) - Timestamp(startdate)
daterange = (daterange / np.timedelta64(1, 'D')).astype(int)

data = []
frames = []
calls = []
bracket = []
try:
for date_range in (Timestamp(startdate) + dt.timedelta(n) for n in range(daterange)):
aht = pd.read_csv(ahtname+date_range.strftime('%Y_%m_%d')+ext)
calls.append(aht)
except IOError:
print('File does not exist:', ahtname+da

Solution

Use a class, or at least some functions, to make your code more readable and understandable

Very first reaction looking at your code is ....blech. I don't want to read that giant blob.

-
Why not make a class to bundle a bunch of functions together, such as a function readAndConcatAHT? Actually, many of these for loops are doing the exact same thing for slightly differently named files. Why not do something like a function that takes in a filename and then runs a for loop like so:

def readAndConcatFile(filename, daterange):
    try:
        for date_range in (Timestamp(startdate) + dt.timedelta(n) for n in range(daterange)):
            fcr = pd.read_csv(filename+date_range.strftime('%m_%d_%Y')+ext, parse_dates = ['call_time'])
            data.append(fcr)
    except IOError:
        print('File does not exist:', fcrname+date_range.strftime('%m_%d_%Y')+ext)

This would really clear out your code even if you elected not to write any other functions or a call. I think it's also fairer to your reader and yourself to respect DRY and not make readers check for themselves when you are doing something absolutely repetitive with slightly different function names.

I'll put an extra point to say I think a class would be nice because in your init or some processing function you could string together a bunch of calls to readAndConcatFile to standardize your read/write process for these CSV files. This will, again, make your code more extensible and more readable.

Avoid redundant import statements and stick with standards

Almost everyone uses import pandas as pd. I wouldn't recomend doing it any other way, and it's never a good idea to do a whole scale import *

Don't import glob unless you are actually using it. Where do you use glob after importing it?

Use special features only when you need them

-
Do you actually need raw strings? I don't see you using your strings in any way that would seem to require them.

-
Similarly, why use os.chdir when it could be smarter to specify filenames as absolute file names? Here you're again using an option you don't really need and that could have future unintended side effects.

Use more defined constants

-
It's not a good idea to hard code Timedelta(60 day) like so. You should separately specify DAY_RANGE = 60 as a constant and then use that wherever you'd use 60. That way you can easily change the day range. Alternately, you could make the day range an input parameter to your script so that non-programmer users can also call this script for their desired look-back period.

-
Similarly, you can save your desired date formats as strings to be treated as constants at the top of your file:
format1 = "'%m_%d_%Y'" and format2 = "'%Y_%m_%d'" Again this makes it easier to see what's going on and also makes it easier to make changes in the future. You can change just one string at the top of your file to change all related formatting, rather than having to change each string. This won't make any given line of code shorter, but it will make it better.

More sophisticated error handling

Error handling is not something I do enough of myself, but I wonder if you can do better here. I'm going to assume that errors in ahtname are related, for example, to errors in fcrname. If that's the case, once you establish that a date range is missing for one kind of file, why not delete that daterange for all further queries in future loops? You could do so easily by simply deleting that member of daterange that causes the IOError. Then you wouldn't get repetitive error messages that are really all telling you the same thing.

What I liked

It's good practice to use generator expressions where you can, so I liked seeing code like for date_range in (Timestamp(startdate) + dt.timedelta(n) for n in range(daterange)):.

Code Snippets

def readAndConcatFile(filename, daterange):
    try:
        for date_range in (Timestamp(startdate) + dt.timedelta(n) for n in range(daterange)):
            fcr = pd.read_csv(filename+date_range.strftime('%m_%d_%Y')+ext, parse_dates = ['call_time'])
            data.append(fcr)
    except IOError:
        print('File does not exist:', fcrname+date_range.strftime('%m_%d_%Y')+ext)

Context

StackExchange Code Review Q#104050, answer score: 5

Revisions (0)

No revisions yet.