patternpythonMinor
Sorting a numbered table of contents and the contents associated with it
Viewed 0 times
sortingthewithcontentsnumberedassociatedandtable
Problem
Given a numbered table of contents with headers and the content of each section, I wanted to sort them correctly according to only the numeric value.
The first challenge: section numbers can be tricky to sort (i.e.
Thankfully, this problem is already solved here.
Using similar logic, I want to make an
I came up with the following solution, but I want some feedback, because it seems convoluted (although it seems to be doing what I want):
Output:
What's your take on this?
The first challenge: section numbers can be tricky to sort (i.e.
sorted(["1.1.1", "1.1.2", "1.1.10"]) results in: ["1.1.1", "1.1.10", "1.1.2"] which is alphabetically correct, but not what I want).Thankfully, this problem is already solved here.
Using similar logic, I want to make an
OrderedDict that has for key a list of headers (where each header is a string concatenation of the section number and its title) and for value the contents of each section.I came up with the following solution, but I want some feedback, because it seems convoluted (although it seems to be doing what I want):
from collections import OrderedDict
import re
headers = ['4.2.10 Context 4', '4.2.11 Context 5', '4.2.0 Context 1', '4.2.1 Context 2', '4.2.2 Context 3']
sections = ['C4', 'C5', 'C1', 'C2', 'C3']
def section_sort(t):
section = t[0]
numbering_pattern = re.compile('\d.\d[.\d]*')
if numbering_pattern.match(section.split(' ')[0]):
s_nbr = section.split(' ')[0]
return [int(_) for _ in s_nbr.split('.')]
contents = OrderedDict(sorted(zip(headers, sections), key=section_sort))
for k, v in contents.items():
print('{header}\n\t{section}'.format(header=k, section=v))Output:
4.2.0 Context 1
C1
4.2.1 Context 2
C2
4.2.2 Context 3
C3
4.2.10 Context 4
C4
4.2.11 Context 5
C5
What's your take on this?
Solution
Just a performance tip. The function section_sort is called multiples times and you are compiling the RegEx each time. Also you are using section.split() twice in the function to get the same value. You can save these values in vars. The code would be like this:
from collections import OrderedDict
import re
headers = ['4.2.10 Context 4', '4.2.11 Context 5', '4.2.0 Context 1', '4.2.1 Context 2', '4.2.2 Context 3']
sections = ['C4', 'C5', 'C1', 'C2', 'C3']
numbering_pattern = re.compile('\d.\d[.\d]*')
def section_sort(t):
section = t[0]
snbr = section.split(' ')[0]
if numbering_pattern.match(snbr):
return [int(_) for _ in s_nbr.split('.')]
contents = OrderedDict(sorted(zip(headers, sections), key=section_sort))
for k, v in contents.items():
print('{header}\n\t{section}'.format(header=k, section=v))Code Snippets
from collections import OrderedDict
import re
headers = ['4.2.10 Context 4', '4.2.11 Context 5', '4.2.0 Context 1', '4.2.1 Context 2', '4.2.2 Context 3']
sections = ['C4', 'C5', 'C1', 'C2', 'C3']
numbering_pattern = re.compile('\d.\d[.\d]*')
def section_sort(t):
section = t[0]
snbr = section.split(' ')[0]
if numbering_pattern.match(snbr):
return [int(_) for _ in s_nbr.split('.')]
contents = OrderedDict(sorted(zip(headers, sections), key=section_sort))
for k, v in contents.items():
print('{header}\n\t{section}'.format(header=k, section=v))Context
StackExchange Code Review Q#154012, answer score: 5
Revisions (0)
No revisions yet.