patternpythonMinor

Process PowerPoint XML

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

powerpointxmlprocess

Problem

I run a tiny open source project to help create speech aids for disabled people (the GitHub is here).

One of the things that is useful is for people to design speech setups in Powerpoint, and then have the PowerPoint file processed to extract the images and the other information. The script below processes the PowerPoint file using the python-pttx library.

I'm a pretty poor Python programmer - any hints for making things pretty or generally better would be very much appreciated.

```
#!/usr/bin/python
"Extracting Utterances from CommuniKate pagesets designed in PowerPoint"
#Todo - make the class a relevent thing
#Make the images export more effectively

from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE
from pptx.enum.shapes import MSO_SHAPE_TYPE

import io
import os
from PIL import Image

import uuid

COL_TABLE = {152400: 0, 1503659: 1, 1600200: 1, 2861846: 2,
2819400: 2, 2854919: 2, 2854925: 2, 4170660: 3,
4191000: 3, 5542260: 4, 5769114: 4, 5562600: 4, 5769125: 4}
ROW_TABLE = {0: 0, 152400: 0, 152401: 0, 1981200: 1, 3771900: 2, 5562600: 3,
5610125: 3, 6095999: 3, 7314625: 4, 7340121: 4, 7340600: 4}

# Note: This may not be robust to internationalisation.
alpha = "abcdefghijklmnopqrstuvwxyz1234567890_"

# dictionary of icons,
# key = (row, col)
# value = list of one or more PICTURE shapes.
images = {}

def resizeImage(image, scaleFactor):
oldSize = image.size
newSize = (scaleFactor*oldSize[0],
scaleFactor*oldSize[1])
return image.resize(newSize, Image.ANTIALIAS)

# Helper for testing - generate unique chars.

def getShortUuid():
u = str(uuid.uuid1())
u = u.split("-")[0]
return u

def remove_punctuation(s):
s_sans_punct = ""
for letter in s:
if letter.lower() in alpha:
s_sans_punct += letter
return s_sans_punct
# from http://openbookproject.net/thinkcs/python/english3e/strin

Solution

I have some general thoughts about your script, since I haven't worked with Powerpoint in Python.

First, you're manually typing all the letters and numbers unnecessarily. Import the string module to automatically get access to strings containing all the characters you need.

import string

alpha = string.ascii_lowercase + string.digits + '_'

The other advantage to this is the ability to add non-ascii characters which will help localisation. You can get these characters with string.lowercase. In your case that might make no difference because it is affected by locality, but this is what I get (in Ireland):

string.lowercase + string.digits + '_'
>>> "abcdefghijklmnopqrstuvwxyzƒšœžªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ0123456789_"

Note: This will be important to remember if you're using run.text.encode('ascii', 'ignore') later, as you're only accounting for ascii there too.

Your remove_punctuation is unnecessarily long too. You can shorten it to just one line using a generator expression (essentially just a shorthand for a for loop) and the str.join function, which is a handy way to make a single string out of a list of strings attached together.

def remove_punctuation(s):
    return ''.join(c for c in s if c in alpha)

Your whitespace in general could be better. This comment is too far from the definition it's referring to, I thought it addressed the previous one.

# Returns the closest key in the dictionary, for numerical keys.

def get_closest_key(dict, inKey):

Try to use whitespace so that related things are together and you leave room between separate parts of the code. Also I personally find it's best to have comments appear after the function definition. Even better, make it a docstring, which is a programmatically accessible string to explain the function to a user.

I also recommend reading the PEP0008 style guide.

Your class's str method can be replaced with the str.format method instead. It's a bit clearer and easier to use. It doesn't need to have explicit typing. Also, you shouldn't use \" when you could just use single quotes to wrap the string in and then use " just fine.

def __str__(self):
            return 'utterance[{}][{}]="{}";'.format(
                self.column, self.row, self.text)

Code Snippets

import string

alpha = string.ascii_lowercase + string.digits + '_'

string.lowercase + string.digits + '_'
>>> "abcdefghijklmnopqrstuvwxyzƒšœžªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ0123456789_"

def remove_punctuation(s):
    return ''.join(c for c in s if c in alpha)

# Returns the closest key in the dictionary, for numerical keys.


def get_closest_key(dict, inKey):

def __str__(self):
            return 'utterance[{}][{}]="{}";'.format(
                self.column, self.row, self.text)

Context

StackExchange Code Review Q#101803, answer score: 4

Revisions (0)

No revisions yet.