snippetpythonMinor
Script to generate documents based on conditionals and CSV data
Viewed 0 times
scriptconditionalscsvdatageneratebasedanddocuments
Problem
I wrote this script a while back to generate press releases. It consumes a csv of data where each row contains the data for a country and fills in values in pre-written text based on the data. In its current state, it's basically a mad libs implementation with some hard coded conditionals. I'd like to refactor it to make it a bit smarter. I had the following ideas:
An abbreviated version of the script is below. I also welcome any changes to make it more Pythonic.
```
#!/usr/bin/env python
# -- coding: utf-8 --
import datetime
import os
import sys
import csv
from docx import Document
def generate_documents():
with open(sys.argv[1], 'rb') as csvdata:
reader = csv.DictReader(csvdata)
timestamp = "output/" + datetime.datetime.now().strftime('%d-%m-%Y_%H-%M-%S')
os.makedirs(timestamp)
os.chdir(timestamp)
for row in reader:
document = Document()
name = row['country'].decode('cp1252') + ".docx"
embargo = u"EMBARGOED FOR RELEASE UNTIL {0}, DECEMBER {1} at {2} ({3})".format(row['weekday'], row['day'], row['time'], row['capital'].decode('cp1252').upper())
headline = "Colored Walls Gain International Popularity"
subheadline = u" {0} walls become common in {1}, citiznes think this development is {2}".format(row['color'], row['country'].decode('cp1252'), row['adjective'])
if row['country'].decode('cp1252') == "Andorra" or row['country'].decode('cp1252') == "Canada" or row['country'].decode('cp1252') == "Zimbabwe" or row['country'].decode('cp1252') == "Qatar" or r
- create separate methods for static text paragraphs, paragraphs with conditionals and paragraphs with complex conditionals. How would I do this without constructing methods that take too many complicated arguments?
- use the NLTK for parsing plurals and verb tenses.
- Put this entire script into a Django app so that I don't have to do the tweaks myself every time they want new press releases.
An abbreviated version of the script is below. I also welcome any changes to make it more Pythonic.
```
#!/usr/bin/env python
# -- coding: utf-8 --
import datetime
import os
import sys
import csv
from docx import Document
def generate_documents():
with open(sys.argv[1], 'rb') as csvdata:
reader = csv.DictReader(csvdata)
timestamp = "output/" + datetime.datetime.now().strftime('%d-%m-%Y_%H-%M-%S')
os.makedirs(timestamp)
os.chdir(timestamp)
for row in reader:
document = Document()
name = row['country'].decode('cp1252') + ".docx"
embargo = u"EMBARGOED FOR RELEASE UNTIL {0}, DECEMBER {1} at {2} ({3})".format(row['weekday'], row['day'], row['time'], row['capital'].decode('cp1252').upper())
headline = "Colored Walls Gain International Popularity"
subheadline = u" {0} walls become common in {1}, citiznes think this development is {2}".format(row['color'], row['country'].decode('cp1252'), row['adjective'])
if row['country'].decode('cp1252') == "Andorra" or row['country'].decode('cp1252') == "Canada" or row['country'].decode('cp1252') == "Zimbabwe" or row['country'].decode('cp1252') == "Qatar" or r
Solution
Instead of decoding each field within the CSV as CP1252, you should open the entire file that way.
I think that you should approach this more as a templating problem than a document-generation problem. Templating is a challenge that has been solved many times before — though mainly generating HTML rather than OOXML. Authoring OOXML directly looks like it would be a pain though. Perhaps you would be better off using a templating solution (Django has one built in) to produce semantically correct HTML as an intermediate format. Then, you can convert the HTML into DOCX using one of many such tools available, involving Python or not.
Basically, that strategy would decompose one programming headache into two solved problems.
I think that you should approach this more as a templating problem than a document-generation problem. Templating is a challenge that has been solved many times before — though mainly generating HTML rather than OOXML. Authoring OOXML directly looks like it would be a pain though. Perhaps you would be better off using a templating solution (Django has one built in) to produce semantically correct HTML as an intermediate format. Then, you can convert the HTML into DOCX using one of many such tools available, involving Python or not.
Basically, that strategy would decompose one programming headache into two solved problems.
Context
StackExchange Code Review Q#80203, answer score: 4
Revisions (0)
No revisions yet.