HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythongitMinor

Counting the number of days worked for all commiters to a git repo

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
numberthecountingworkedallforcommitersgitdaysrepo

Problem

I wrote this script to collect evidence of the number of days worked for the purpose of claiming some government tax credits.

I'm looking for some ways to clean it up. I'm especially wondering if there is a cleaner way to uniqueify a list than list(set(my_list) and maybe a better way to do:

d = dict(zip(commiters, [0 for x in xrange(len(commiters))]))


import os
from pprint import pprint

lines = os.popen('git log --all').read().split('\n')

author_lines = filter(lambda str: str.startswith('Author'), lines)
date_lines = filter(lambda str: str.startswith('Date'), lines)
author_lines = map(lambda str: str[8:], author_lines)
date_lines = map(lambda str: str[8:18].strip(), date_lines)

lines = zip(author_lines, date_lines)

lines = sorted(list(set(lines)), key = lambda tup: tup[0])

commiters = list(set(map(lambda tup: tup[0], lines)))

d = dict(zip(commiters, [0 for x in xrange(len(commiters))]))

for item in lines:
    d[item[0]] += 1

pprint(d)

Solution

For this part:

author_lines = filter(lambda str: str.startswith('Author'), lines)
date_lines = filter(lambda str: str.startswith('Date'), lines)
author_lines = map(lambda str: str[8:], author_lines)
date_lines = map(lambda str: str[8:18].strip(), date_lines)


This might be clearer, not that I have anything against map or filter, but list comprehensions do combine them nicely when you need both:

author_lines = [line[8:] for line in lines if line.startswith('Author')]
date_lines = [line[8:18].strip() for line in lines if line.startswith('Date')]


This:

lines = sorted(list(set(lines)), key = lambda tup: tup[0])


Can become:

lines = sorted(set(lines), key = lambda tup: tup[0])


for slightly less repetition (sorted automatically converts to a list).

And are you sure the key is even necessary? Tuples get sorted by elements just fine, the only reason to sort specifically by only the first element is if you want to preserve the original order of lines with the same author, rather than sorting them by the date line.

... Actually, why are you even sorting this at all? I don't see anything in the rest of the code that will work any differently whether it's sorted or not.

For this:

commiters = list(set(map(lambda tup: tup[0], lines)))


Why are you zipping author_lines and date_lines and then unzipping again? Just do:

commiters = set(author_lines)


or am I missing something?

And this:

d = dict(zip(commiters, [0 for x in xrange(len(commiters))]))
for item in lines:
    d[item[0]] += 1


You're just getting commit counts, right? Use Counter:

import collections
d = collections.Counter([author_line for author_line,_ in lines])


Or, if your python version doesn't have collections.Counter:

import collections
d = collections.defaultdict(lambda: 0)
for author_line,_ in lines:
    d[author_line] += 1


... Wait, are you even using date_lines anywhere? If not, what are they there for?

Code Snippets

author_lines = filter(lambda str: str.startswith('Author'), lines)
date_lines = filter(lambda str: str.startswith('Date'), lines)
author_lines = map(lambda str: str[8:], author_lines)
date_lines = map(lambda str: str[8:18].strip(), date_lines)
author_lines = [line[8:] for line in lines if line.startswith('Author')]
date_lines = [line[8:18].strip() for line in lines if line.startswith('Date')]
lines = sorted(list(set(lines)), key = lambda tup: tup[0])
lines = sorted(set(lines), key = lambda tup: tup[0])
commiters = list(set(map(lambda tup: tup[0], lines)))

Context

StackExchange Code Review Q#13298, answer score: 2

Revisions (0)

No revisions yet.