HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

JSON templates in Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
jsontemplatespython

Problem

I am looking for a general review of my project here. I have written some Python, but not enough to feel confident in my ability to produce idiomatic code.

```
#!/usr/bin/env python

"""mockjson.py: Library for mocking JSON objects from a template."""

__author__ = "James McMahon"
__copyright__ = "Copyright 2012, James McMahon"
__license__ = "MIT"

try:
import simplejson as json
except ImportError:
import json
import random
import re
import string
import sys

from datetime import datetime, timedelta

_male_first_name = ("James", "John", "Robert", "Michael", "William", "David",
"Richard", "Charles", "Joseph", "Thomas", "Christopher", "Daniel",
"Paul", "Mark", "Donald", "George", "Kenneth", "Steven", "Edward",
"Brian", "Ronald", "Anthony", "Kevin", "Jason", "Matthew", "Gary",
"Timothy", "Jose", "Larry", "Jeffrey", "Frank", "Scott", "Eric")
_female_first_name = ("Mary", "Patricia", "Linda", "Barbara", "Elizabeth",
"Jennifer", "Maria", "Susan", "Margaret", "Dorothy", "Lisa", "Nancy",
"Karen", "Betty", "Helen", "Sandra", "Donna", "Carol", "Ruth", "Sharon",
"Michelle", "Laura", "Sarah", "Kimberly", "Deborah", "Jessica",
"Shirley", "Cynthia", "Angela", "Melissa", "Brenda", "Amy", "Anna")
_last_name = ("Smith", "Johnson", "Williams", "Brown", "Jones", "Miller",
"Davis", "Garcia", "Rodriguez", "Wilson", "Martinez", "Anderson",
"Taylor", "Thomas", "Hernandez", "Moore", "Martin", "Jackson",
"Thompson", "White", "Lopez", "Lee", "Gonzalez", "Harris", "Clark",
"Lewis", "Robinson", "Walker", "Perez", "Hall", "Young", "Allen")
_lorem = tuple("""lorem ipsum dolor sit amet consectetur adipisicing elit
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua
Ut enim ad minim veniam quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur Excepteur sint occ

Solution


  1. Introduction



This review grew to be very long, so I'll say up front that you shouldn't take the length of this to heart: your code is not bad, especially if you are new to Python. There's always a lot of things to say about a piece of code of this length, and idiomatic Python has a bunch of features (sets, generators, comprehensions, iterators) that will be unfamiliar to users of some other languages. So take everything I have to say with a pinch of salt (except for item #1 under "General comments", which is really the only big problem here).

  1. General comments



-
By far the most important problem is your documentation. What is your library supposed to do? How do it use it? It is impossible to review code without understanding its purpose.

As a user I expect to be able to write help(mockjson) to get a description of a module and help(mockjson.generate_json) to get a description of a function, but there's nothing here. Your online documentation is also unhelpful.

After some hunting around I eventually found Mennon van Slooten's mockJSON documentation. I'm going to proceed on the basis that your purpose is to reimplement this in Python.

-
There's no test suite. I know it's hard for you to write test cases because of the randomness. You probably want to take a hint from Mennon van Slooten and provide a way for to change the random choices to a deterministic sequence of choices.

-
Your collections of example data (_male_first_name etc.) are stored as tuples, but it would be better to store these as lists. Tuples are fixed-size collections typically used as lightweight representations of records. Lists sets are variable-size collections typically containing similar items.

-
Your example data is very America-centric. Given that the purpose of your module is to produce example data for testing, it will be useful to have a wider variety of names, including names with accented letters, or in different scripts.

-
It is generally results in source code that is easier to read (with fewer quotation characters and commas) if you produce this kind of data using split. For example,

_female_first_names = u"""
    Mary Patricia Linda Barbara Elizabeth
    Zoé Yến София فاطمة 明子 美玲 ยิ่งลักษณ์
    """.split()


-
The function _random_item is already in the Python library as random.choice.

-
The string "0123456789" is already in the Python library as string.digits.

-
It's not clear why _random_data needs to strip all initial @ signs from its argument before trying to look it up. You only call this function in a context where you could easily strip the @. It also fails to raise an error if the key is missing (so that mistakes like @NUBMER are not caught).

-
Your definitions of data will look nicer (fewer quotation marks) if you use the dict constructor:

data = dict(
    NUMBER = lambda: _random_item("0123456789"),
    LETTER_UPPER = lambda: _random_item(string.uppercase),
    LETTER_LOWER = lambda: _random_item(string.lowercase),
    ...


-
Since most of the values in data are of the form lambda: _random_item(...), why not special-case them to save on boilerplate? You could write something like:

def _random_data(key):
    """
    Construct a random data item for `key` and return it.
    Raise KeyError if `key` is unknown.
    """
    constructor = data[key]
    if isinstance(constructor, types.FunctionType):
        return constructor()
    else:
        return random.choice(constructor)


and then:

data = dict(
    NUMBER = string.digits,
    LETTER_UPPER = string.uppercase,
    LETTER_LOWER = string.lowercase,
    ...


-
Building a string by repeatedly extending it with += is a well-known anti-pattern in Python. This is because extending a string is inefficient in Python: a new string gets allocated and the old string copied across each time. This means that any algorithm that builds a string by repeated extension runs like O(n2). It is nearly always better to generate the items to be assembled into the string and then join them to produce the result. This technique also avoids fencepost errors like spurious extra spaces at the start or end. So your could write:

def _lorem_ipsum():
    """
    Return a random paragraph of placeholder text.
    """
    length = random.randrange(2, len(_lorem) / 2)
    return ' '.join(random.choice(_lorem) for _ in xrange(length))


-
generate_json takes an optional argument name which is always ignored. You can omit this.

  1. Comments on generate_json_object



-
The name could be improved: generate has a special meaning in Python (referring to the output of a generator), and the json part is misleading since there's nothing JSON-specific about this function (it operates on Python objects, not on JSON representations). Since instantiate is often used for filling in a template, I think I would use a name like instantiate_object. (And instantiate_json for generate_json.)

-
The var

Code Snippets

_female_first_names = u"""
    Mary Patricia Linda Barbara Elizabeth
    Zoé Yến София فاطمة 明子 美玲 ยิ่งลักษณ์
    """.split()
data = dict(
    NUMBER = lambda: _random_item("0123456789"),
    LETTER_UPPER = lambda: _random_item(string.uppercase),
    LETTER_LOWER = lambda: _random_item(string.lowercase),
    ...
def _random_data(key):
    """
    Construct a random data item for `key` and return it.
    Raise KeyError if `key` is unknown.
    """
    constructor = data[key]
    if isinstance(constructor, types.FunctionType):
        return constructor()
    else:
        return random.choice(constructor)
data = dict(
    NUMBER = string.digits,
    LETTER_UPPER = string.uppercase,
    LETTER_LOWER = string.lowercase,
    ...
def _lorem_ipsum():
    """
    Return a random paragraph of placeholder text.
    """
    length = random.randrange(2, len(_lorem) / 2)
    return ' '.join(random.choice(_lorem) for _ in xrange(length))

Context

StackExchange Code Review Q#15280, answer score: 14

Revisions (0)

No revisions yet.