HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

Script to generate documents based on conditionals and CSV data

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
scriptconditionalscsvdatageneratebasedanddocuments

Problem

I wrote this script a while back to generate press releases. It consumes a csv of data where each row contains the data for a country and fills in values in pre-written text based on the data. In its current state, it's basically a mad libs implementation with some hard coded conditionals. I'd like to refactor it to make it a bit smarter. I had the following ideas:

  • create separate methods for static text paragraphs, paragraphs with conditionals and paragraphs with complex conditionals. How would I do this without constructing methods that take too many complicated arguments?



  • use the NLTK for parsing plurals and verb tenses.



  • Put this entire script into a Django app so that I don't have to do the tweaks myself every time they want new press releases.



An abbreviated version of the script is below. I also welcome any changes to make it more Pythonic.

```
#!/usr/bin/env python
# -- coding: utf-8 --
import datetime
import os
import sys
import csv
from docx import Document

def generate_documents():
with open(sys.argv[1], 'rb') as csvdata:
reader = csv.DictReader(csvdata)
timestamp = "output/" + datetime.datetime.now().strftime('%d-%m-%Y_%H-%M-%S')
os.makedirs(timestamp)
os.chdir(timestamp)
for row in reader:
document = Document()
name = row['country'].decode('cp1252') + ".docx"

embargo = u"EMBARGOED FOR RELEASE UNTIL {0}, DECEMBER {1} at {2} ({3})".format(row['weekday'], row['day'], row['time'], row['capital'].decode('cp1252').upper())

headline = "Colored Walls Gain International Popularity"

subheadline = u" {0} walls become common in {1}, citiznes think this development is {2}".format(row['color'], row['country'].decode('cp1252'), row['adjective'])

if row['country'].decode('cp1252') == "Andorra" or row['country'].decode('cp1252') == "Canada" or row['country'].decode('cp1252') == "Zimbabwe" or row['country'].decode('cp1252') == "Qatar" or r

Solution

Instead of decoding each field within the CSV as CP1252, you should open the entire file that way.

I think that you should approach this more as a templating problem than a document-generation problem. Templating is a challenge that has been solved many times before — though mainly generating HTML rather than OOXML. Authoring OOXML directly looks like it would be a pain though. Perhaps you would be better off using a templating solution (Django has one built in) to produce semantically correct HTML as an intermediate format. Then, you can convert the HTML into DOCX using one of many such tools available, involving Python or not.

Basically, that strategy would decompose one programming headache into two solved problems.

Context

StackExchange Code Review Q#80203, answer score: 4

Revisions (0)

No revisions yet.