HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Merging two files into one .CSV

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
mergingintocsvonetwofiles

Problem

I'm relatively new to Python scripting. I made a script that takes two raw data files and merges them into one CSV file, but it takes a long time to complete. Are there any logic problems in this code?

input_fileVms = open( 'vms.csv', 'rb')

if site == 'eg':
    output_merge = open('mergedFileEG.csv', 'wb')
elif site== 'fm':
    output_merge = open('mergedFileFM.csv', 'wb')

dataVms = csv.reader(input_fileVms)
writerMerge = csv.writer(output_merge,quoting=csv.QUOTE_ALL)

for lineVM in dataVms:
    input_fileUsers = open( 'users.csv', 'rb')
    dataUsers = csv.reader(input_fileUsers)
    new_line = strip(lineVM)
    OSFile = open('OS.csv', 'rb')
    OS = csv.reader(OSFile)
    i = 0
    for user in dataUsers:
        if i = 4:

                idLine = str(user[1])

                if idLine in new_line:

                    for OS in OS:
                        system = OS[0]
                        if system in new_line:

                            if len(user) < 5:
                                writerMerge.writerow((str(new_line[2] + ',' + user[2] + "," + "Missing"+ ',' +OS[1])).split(','))
                            else:
                                writerMerge.writerow((str(new_line[2] + ',' + user[2] + "," +user[4]+ ',' +OS[1])).split(','))

            input_fileUsers.close()

Solution

You should never do this:

for OS in OS:


Your loop variable "OS" is the same name as the iterator you're looping over. That is very likely to cause you problems. In some simple cases it will work, but makes for very unreadable code.

Secondly, for every line in the first file you are reading the ENTIRE second file! So you read the second file as many times as there are lines in the first. You need to separate the loops, read the first file (I like using input_fileVms.readlines() to put everything in a list of strings) and then read the second file. Once you have the contents of the two files in data structures, then you can merge them. Unless your files are in the 1GB range, this is what I would recommend. If you are just concatenating the files, you don't need csv.reader() because there is no reason to split out the strings and get the individual fields.

Code Snippets

for OS in OS:

Context

StackExchange Code Review Q#110587, answer score: 6

Revisions (0)

No revisions yet.