HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythongitMinor

Team git commit cleaner

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
cleanerteamcommitgit

Problem

I cleaned a big repository for my team with this Python code. My goal was for every developer in my team to check if some bad email are in the commit, and replace the information by the good one. I use git filter-branch and a for loop in bash.

Because I can't make an array in an array in bash, I created a Python script to handle all the developers in my team.

Any idea on how I can optimize this code? git filter-branch take a long time.

# coding=utf-8
import subprocess
import os

def generate_command(dev):
    emails_string = ""
    for email in dev["emails"]:
        emails_string += '"%s" ' % email
    return """git filter-branch -f --env-filter 'OLD_EMAILS=(%s)
CORRECT_NAME="%s"
CORRECT_EMAIL="%s"
for email in ${OLD_EMAILS[@]};
do
        if [ "$GIT_COMMITTER_EMAIL" = "$email" ]
        then
                export GIT_COMMITTER_NAME="$CORRECT_NAME"
                export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL"
        fi
        if [ "$GIT_AUTHOR_EMAIL" = "$email" ]
        then
                export GIT_AUTHOR_NAME="$CORRECT_NAME"
                export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL"
        fi
done' --tag-name-filter cat -- --branches --tags""" % (emails_string.strip(),
                                                       dev["author_name"],
                                                       dev["author_email"])

developers = [
    {
        "emails": ["bad_email_author1@mycompany.com", "bad_email2_author1@mycompany.com"],
        "author_name": "first dev",
        "author_email": "good_email_author1@mycompany.com"
    },
    {
        "emails": ["bad_email_author2@mycompany.com", "bad_email2_author2@mycompany.com"],
        "author_name": "second dev",
        "author_email": "good_email_author2@mycompany.com"
    }
]

if __name__ == '__main__':
    for developer in developers:
        subprocess.call(generate_command(developer), shell=True)

Solution

First reaction: wow this is scary: Python script generating Bash which again calls some Bash in it. But I see the filter-env technique comes straight out from an example in the docs.

I would have written this in pure Bash, using a helper function that takes as parameters:

  • author name



  • author email



  • one or more bad email addresses



And then for each bad email address, call git filter-branch like you did,
but all in pure Bash.

As far as the Python part is concerned, this can be done better:

emails_string = ""
for email in dev["emails"]:
    emails_string += '"%s" ' % email


Using a list comprehension:

emails_string = " ".join(['"%s"' % email for email in dev["emails"]])


With this, you don't need to .strip() the emails_string when you generate the command string.

Code Snippets

emails_string = ""
for email in dev["emails"]:
    emails_string += '"%s" ' % email
emails_string = " ".join(['"%s"' % email for email in dev["emails"]])

Context

StackExchange Code Review Q#73346, answer score: 2

Revisions (0)

No revisions yet.