HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Store CSV in Redis

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
rediscsvstore

Problem

I needed to make a small script to read a csv file and store 2 columns in Redis. Can you guys take a look over the code and let me know how I can improve my code style?

#!/usr/bin/env/ python
"""Proces csv file and store key-value pair of columns in redis"""

import csv, redis, json
from sys import argv, exit

REDIS_HOST = 'localhost'
conn = redis.Redis(REDIS_HOST)

def get_csv_data(csv_file, **columns):
    col1, col2 = columns['columns']
    with open(csv_file, encoding='utf-8') as csvf:
        csv_data = csv.reader(csvf)
        return [(i[col1], i[col2]) for i in csv_data]

def store_data(data):
    for i in data:
        if i[1] not in conn:
            conn.set(i[0], i[1])

    return data

def main():
    if len(argv) < 2:
        exit("usage: python {file} file.csv [column1, column2]".format(file=__file__))
    csv_file = argv[1] 
    columns = list(map(int,argv[2:4])) or [1, 3]
    data = get_csv_data(csv_file, columns=columns)
    print (json.dumps(store_data(data)))

if __name__ == '__main__':
    main()

Solution

Here are my suggestions:

import csv, redis, json
import sys

REDIS_HOST = 'localhost'

def read_csv_data(csv_file, ik, iv):
    with open(csv_file, encoding='utf-8') as csvf:
        csv_data = csv.reader(csvf)
        return [(r[ik], r[iv]) for r in csv_data]

def store_data(conn, data):
    for i in data:
        conn.setnx(i[0], i[1])
    return data

def main():
    if len(sys.argv) < 2:
    sys.exit(
        "Usage: %s file.csv [key_column_index, value_column_index]"
        % __file__
    )
    columns = (0, 1) if len(sys.argv) < 4 else (int(x) for x in sys.argv[2:4])
    data = read_csv_data(sys.argv[1], *columns)
    conn = redis.Redis(REDIS_HOST)
    print (json.dumps(store_data(conn, data)))

if '__main__' == __name__:
    main()


Feel free to ask questions on any of the changes. Some notes:

  • In general, I don't like from x import y. It clutters your global namespace and conflicts can become an issue.



  • In your original code, I guess you meant if i[0] not in conn, to test whether the key already exists, not the value right? That being the case, it's better to use the built in SETNX feature in Redis.



  • I changed your **kwargs to *args. Passing a dictionary containing a single tuple was a bit confusing.



  • Your list(map(...)) or was pretty clever, but I find this way more readable. Just personal preference. Tuple is preferable to list for immutable data.



  • I only initialize the connection after parsing the command line arguments and loading the CSV file. In general, only constants should be module scope.



  • I swapped __name__ and '__main__'. I prefer putting the constant term in a comparison first. That way the interpreter will catch the sometimes hard to spot typo == -> =

Code Snippets

import csv, redis, json
import sys

REDIS_HOST = 'localhost'

def read_csv_data(csv_file, ik, iv):
    with open(csv_file, encoding='utf-8') as csvf:
        csv_data = csv.reader(csvf)
        return [(r[ik], r[iv]) for r in csv_data]

def store_data(conn, data):
    for i in data:
        conn.setnx(i[0], i[1])
    return data

def main():
    if len(sys.argv) < 2:
    sys.exit(
        "Usage: %s file.csv [key_column_index, value_column_index]"
        % __file__
    )
    columns = (0, 1) if len(sys.argv) < 4 else (int(x) for x in sys.argv[2:4])
    data = read_csv_data(sys.argv[1], *columns)
    conn = redis.Redis(REDIS_HOST)
    print (json.dumps(store_data(conn, data)))

if '__main__' == __name__:
    main()

Context

StackExchange Code Review Q#68612, answer score: 8

Revisions (0)

No revisions yet.