patternpythonMinor
Process PowerPoint XML
Viewed 0 times
powerpointxmlprocess
Problem
I run a tiny open source project to help create speech aids for disabled people (the GitHub is here).
One of the things that is useful is for people to design speech setups in Powerpoint, and then have the PowerPoint file processed to extract the images and the other information. The script below processes the PowerPoint file using the python-pttx library.
I'm a pretty poor Python programmer - any hints for making things pretty or generally better would be very much appreciated.
```
#!/usr/bin/python
"Extracting Utterances from CommuniKate pagesets designed in PowerPoint"
#Todo - make the class a relevent thing
#Make the images export more effectively
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE
from pptx.enum.shapes import MSO_SHAPE_TYPE
import io
import os
from PIL import Image
import uuid
COL_TABLE = {152400: 0, 1503659: 1, 1600200: 1, 2861846: 2,
2819400: 2, 2854919: 2, 2854925: 2, 4170660: 3,
4191000: 3, 5542260: 4, 5769114: 4, 5562600: 4, 5769125: 4}
ROW_TABLE = {0: 0, 152400: 0, 152401: 0, 1981200: 1, 3771900: 2, 5562600: 3,
5610125: 3, 6095999: 3, 7314625: 4, 7340121: 4, 7340600: 4}
# Note: This may not be robust to internationalisation.
alpha = "abcdefghijklmnopqrstuvwxyz1234567890_"
# dictionary of icons,
# key = (row, col)
# value = list of one or more PICTURE shapes.
images = {}
def resizeImage(image, scaleFactor):
oldSize = image.size
newSize = (scaleFactor*oldSize[0],
scaleFactor*oldSize[1])
return image.resize(newSize, Image.ANTIALIAS)
# Helper for testing - generate unique chars.
def getShortUuid():
u = str(uuid.uuid1())
u = u.split("-")[0]
return u
def remove_punctuation(s):
s_sans_punct = ""
for letter in s:
if letter.lower() in alpha:
s_sans_punct += letter
return s_sans_punct
# from http://openbookproject.net/thinkcs/python/english3e/strin
One of the things that is useful is for people to design speech setups in Powerpoint, and then have the PowerPoint file processed to extract the images and the other information. The script below processes the PowerPoint file using the python-pttx library.
I'm a pretty poor Python programmer - any hints for making things pretty or generally better would be very much appreciated.
```
#!/usr/bin/python
"Extracting Utterances from CommuniKate pagesets designed in PowerPoint"
#Todo - make the class a relevent thing
#Make the images export more effectively
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE
from pptx.enum.shapes import MSO_SHAPE_TYPE
import io
import os
from PIL import Image
import uuid
COL_TABLE = {152400: 0, 1503659: 1, 1600200: 1, 2861846: 2,
2819400: 2, 2854919: 2, 2854925: 2, 4170660: 3,
4191000: 3, 5542260: 4, 5769114: 4, 5562600: 4, 5769125: 4}
ROW_TABLE = {0: 0, 152400: 0, 152401: 0, 1981200: 1, 3771900: 2, 5562600: 3,
5610125: 3, 6095999: 3, 7314625: 4, 7340121: 4, 7340600: 4}
# Note: This may not be robust to internationalisation.
alpha = "abcdefghijklmnopqrstuvwxyz1234567890_"
# dictionary of icons,
# key = (row, col)
# value = list of one or more PICTURE shapes.
images = {}
def resizeImage(image, scaleFactor):
oldSize = image.size
newSize = (scaleFactor*oldSize[0],
scaleFactor*oldSize[1])
return image.resize(newSize, Image.ANTIALIAS)
# Helper for testing - generate unique chars.
def getShortUuid():
u = str(uuid.uuid1())
u = u.split("-")[0]
return u
def remove_punctuation(s):
s_sans_punct = ""
for letter in s:
if letter.lower() in alpha:
s_sans_punct += letter
return s_sans_punct
# from http://openbookproject.net/thinkcs/python/english3e/strin
Solution
I have some general thoughts about your script, since I haven't worked with Powerpoint in Python.
First, you're manually typing all the letters and numbers unnecessarily. Import the string module to automatically get access to strings containing all the characters you need.
The other advantage to this is the ability to add non-ascii characters which will help localisation. You can get these characters with
Note: This will be important to remember if you're using
Your
Your whitespace in general could be better. This comment is too far from the definition it's referring to, I thought it addressed the previous one.
Try to use whitespace so that related things are together and you leave room between separate parts of the code. Also I personally find it's best to have comments appear after the function definition. Even better, make it a docstring, which is a programmatically accessible string to explain the function to a user.
I also recommend reading the PEP0008 style guide.
Your class's
First, you're manually typing all the letters and numbers unnecessarily. Import the string module to automatically get access to strings containing all the characters you need.
import string
alpha = string.ascii_lowercase + string.digits + '_'The other advantage to this is the ability to add non-ascii characters which will help localisation. You can get these characters with
string.lowercase. In your case that might make no difference because it is affected by locality, but this is what I get (in Ireland):string.lowercase + string.digits + '_'
>>> "abcdefghijklmnopqrstuvwxyzƒšœžªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ0123456789_"Note: This will be important to remember if you're using
run.text.encode('ascii', 'ignore') later, as you're only accounting for ascii there too.Your
remove_punctuation is unnecessarily long too. You can shorten it to just one line using a generator expression (essentially just a shorthand for a for loop) and the str.join function, which is a handy way to make a single string out of a list of strings attached together.def remove_punctuation(s):
return ''.join(c for c in s if c in alpha)Your whitespace in general could be better. This comment is too far from the definition it's referring to, I thought it addressed the previous one.
# Returns the closest key in the dictionary, for numerical keys.
def get_closest_key(dict, inKey):Try to use whitespace so that related things are together and you leave room between separate parts of the code. Also I personally find it's best to have comments appear after the function definition. Even better, make it a docstring, which is a programmatically accessible string to explain the function to a user.
I also recommend reading the PEP0008 style guide.
Your class's
str method can be replaced with the str.format method instead. It's a bit clearer and easier to use. It doesn't need to have explicit typing. Also, you shouldn't use \" when you could just use single quotes to wrap the string in and then use " just fine.def __str__(self):
return 'utterance[{}][{}]="{}";'.format(
self.column, self.row, self.text)Code Snippets
import string
alpha = string.ascii_lowercase + string.digits + '_'string.lowercase + string.digits + '_'
>>> "abcdefghijklmnopqrstuvwxyzƒšœžªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ0123456789_"def remove_punctuation(s):
return ''.join(c for c in s if c in alpha)# Returns the closest key in the dictionary, for numerical keys.
def get_closest_key(dict, inKey):def __str__(self):
return 'utterance[{}][{}]="{}";'.format(
self.column, self.row, self.text)Context
StackExchange Code Review Q#101803, answer score: 4
Revisions (0)
No revisions yet.