HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Sorting a numbered table of contents and the contents associated with it

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
sortingthewithcontentsnumberedassociatedandtable

Problem

Given a numbered table of contents with headers and the content of each section, I wanted to sort them correctly according to only the numeric value.

The first challenge: section numbers can be tricky to sort (i.e. sorted(["1.1.1", "1.1.2", "1.1.10"]) results in: ["1.1.1", "1.1.10", "1.1.2"] which is alphabetically correct, but not what I want).

Thankfully, this problem is already solved here.

Using similar logic, I want to make an OrderedDict that has for key a list of headers (where each header is a string concatenation of the section number and its title) and for value the contents of each section.

I came up with the following solution, but I want some feedback, because it seems convoluted (although it seems to be doing what I want):

from collections import OrderedDict
import re

headers = ['4.2.10 Context 4', '4.2.11 Context 5', '4.2.0 Context 1', '4.2.1 Context 2', '4.2.2 Context 3']
sections = ['C4', 'C5', 'C1', 'C2', 'C3']

def section_sort(t):
    section = t[0]
    numbering_pattern = re.compile('\d.\d[.\d]*')

    if numbering_pattern.match(section.split(' ')[0]):
        s_nbr = section.split(' ')[0]
        return [int(_) for _ in s_nbr.split('.')]

contents = OrderedDict(sorted(zip(headers, sections), key=section_sort))

for k, v in contents.items():
    print('{header}\n\t{section}'.format(header=k, section=v))


Output:

4.2.0 Context 1
C1
4.2.1 Context 2
C2
4.2.2 Context 3
C3
4.2.10 Context 4
C4
4.2.11 Context 5
C5


What's your take on this?

Solution

Just a performance tip. The function section_sort is called multiples times and you are compiling the RegEx each time. Also you are using section.split() twice in the function to get the same value. You can save these values in vars. The code would be like this:

from collections import OrderedDict
import re

headers = ['4.2.10 Context 4', '4.2.11 Context 5', '4.2.0 Context 1', '4.2.1 Context 2', '4.2.2 Context 3']
sections = ['C4', 'C5', 'C1', 'C2', 'C3']
numbering_pattern = re.compile('\d.\d[.\d]*')

def section_sort(t):
    section = t[0]

    snbr = section.split(' ')[0]
    if numbering_pattern.match(snbr):
        return [int(_) for _ in s_nbr.split('.')]

contents = OrderedDict(sorted(zip(headers, sections), key=section_sort))

for k, v in contents.items():
    print('{header}\n\t{section}'.format(header=k, section=v))

Code Snippets

from collections import OrderedDict
import re

headers = ['4.2.10 Context 4', '4.2.11 Context 5', '4.2.0 Context 1', '4.2.1 Context 2', '4.2.2 Context 3']
sections = ['C4', 'C5', 'C1', 'C2', 'C3']
numbering_pattern = re.compile('\d.\d[.\d]*')

def section_sort(t):
    section = t[0]

    snbr = section.split(' ')[0]
    if numbering_pattern.match(snbr):
        return [int(_) for _ in s_nbr.split('.')]

contents = OrderedDict(sorted(zip(headers, sections), key=section_sort))

for k, v in contents.items():
    print('{header}\n\t{section}'.format(header=k, section=v))

Context

StackExchange Code Review Q#154012, answer score: 5

Revisions (0)

No revisions yet.