HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

Validating a CSV list of contacts and convert it to JSON

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
convertcsvcontactsjsonvalidatingandlist

Problem

I've written a class that takes a file, validates the formatting of the lines from an input file and writes the set of valid lines to an output file. Each line of the file should have a first name, last name, phone number, color, and zip code. A zip code is valid if it has only 5 characters, a phone number can have only 10 digits (in addition to dashes/parentheses in appropriate places). The accepted formats of each line of the input file are the following:

Lastname, Firstname, (703)-742-0996, Blue, 10013

Firstname Lastname, Red, 11237, 703 955 0373

Firstname, Lastname, 10013, 646 111 0101, Green


The program needs to write a JSON object with all of the valid lines from the input file in a list sorted in ascending alphabetical order by (last name, first name).

These are the test cases I ran with it as well as the JSON output. I think I've identified all of the edge cases with the tests but I could have missed something. This code should exemplify good design choices and extensibility and should be production quality. Should anything be added/removed from the solution to meet these requirements?

Also, any tests that would make the code fail are welcome.

The code for the solution is below:

__main__.py

import sys
from file_formatter import FileFormatter

if __name__ == "__main__":
    formatter = FileFormatter(sys.argv[-1],"result.out")
    formatter.parse_file()


file_formatter.py

```
""" file_formatter module

The class contained in this module validates a CSV file based on a set of internally
specified accepted formats and generates a JSON file containing normalized forms of the
valid lines from the CSV file.

Example:
The class in this module can be imported and passed an initial value for the input data
file from the command line like this:

$ python example_program.py name_of_data_file.in

Classes:
FileFormatter: Takes an input file and output its valid lines to a result file.
"""

import json

class FileFormatter

Solution

I think most of us eventually encounter a CSV to JSON converter problem in our careers.

When I did something similar last time, I've used a csvschema package (it is a bit outdated at the moment, but does the job). Defining your own "csv structure" class will conveniently encapsulate your field types and validation logic. The represents_int() will be replaced with a built-in IntColumn field. Other is_* functions will be replaced with custom columns.

Or, at the very least, using csv module might help with tokenizing part.

Some other notes about the code:

  • comma_first_case and comma_second_case don't need to be defined as False since you overwrite them later on



  • names[0] = names[0] + delim can be rewritten as names[0] += delim



  • remove the extra spaces around = when passing keyword arguments



  • add an extra space after the commas when passing multiple arguments to functions



-
instead of manually supporting the i counter in the parse_file() function, use enumerate():

for line_number, line in enumerate(info_file):
    valid_line = self.validate_line(line)
    if valid_line:       
        lines_dict[(valid_line["lastname"],valid_line["firstname"])] = valid_line
    else:
        errors.append(line_number)


-
you can use negative indexing, replacing line[len(line)-1] with line[-1]

  • separate top-level function and class definitions with two blank lines



  • you don't need to put the double underscores around your script file name



And, overall, really good job documenting the code. Note that now, when the code changes, you need to keep the documentation up-to-date with the code appropriately.

Code Snippets

for line_number, line in enumerate(info_file):
    valid_line = self.validate_line(line)
    if valid_line:       
        lines_dict[(valid_line["lastname"],valid_line["firstname"])] = valid_line
    else:
        errors.append(line_number)

Context

StackExchange Code Review Q#159868, answer score: 2

Revisions (0)

No revisions yet.