HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythongitMinor

Script to checkout multiple repositories to a certain commit hash

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
scriptcommitcheckouthashrepositoriesmultiplecertain

Problem

I am writing a script which checkout a git repo to certain commit hash, do something and switch back to master. The purpose of this script is to take homework solutions of students from bitbucket. Note that all the repos are under same bitbucket account. There is a master bitbucket account which is admin of all these repos and students have the write access to their respective repo. The students must adhere to following directory structure in their repos:

-assignments
 |- assignment-1
 |- assignment-2
 .
 .
 .
 |- assignment-X


The directories inside these contain the homework. Once the teacher has given the deadline, the students must commit their code before the deadline. The script will see the git log, find the commit which is made before deadline, switch to that revision and rsync the solutions to the local directory.

So, this script will:

  • First get the list of bitbucket repo names from a file


(students-info.json)

  • For each repo, see if the repo already exists locally. If it does, then do a git pull to get the latest commit



  • If not do a git clone



  • Now, find a commit which is made before deadline



  • switch to that revision



  • do a rsync of the required assignment homework directory to solutions-directory/assignment-x-deadline/student-id



  • switch back to master branch



I am looking for any tips, suggestions, general code improvements, bugs, anything.

Here is my code:

```
#!/bin/python

"""
This script will take assignment solutions from each student repository. Based
on the timestamp given, it finds out the last commit made before timestamp
(i.e. deadline) and it checks out that revision, rsyncs the solution folder
of the required assignment with the solutions-repo and resets to HEAD.

The timestamp should be of the format 'Month Date H:M:S Year'

eg. Dec 19 22:31:01 2013

Input : List of students ids, assignment-id, timestamp

Example usage: To take out solutions of assignment 11 whose deadline was
Dec 19 22:31:01 2013

Solution

Your idea of inspecting the timestamps of the commits is conceptually flawed. Git is a distributed version control system, with no central server or any other means of notarizing timestamps. The timestamps are determined solely by the system clock on the machine on which the commit was created, and that clock can be trivially rolled back. Therefore, the only foolproof approach is to clone/pull all of the repositories at the time of the deadline.

Then, there is the question of which branch you want to inspect. Do you want to consider only the master branch? If so, it would be a good idea to specify the master branch when running git log. Keep in mind that if you consider all commits that were created before the deadline, you may end up taking a commit that was rolled back by the student. In other words, if the student makes a commit, then changes her mind (using git reset --hard HEAD^), you may be misconstruing the discarded version as the submission, simply because it has a later timestamp. For that reason, I hope that you only inspect commits along an agreed-upon branch or tag, rather than everything that might happen to exist in the repository.

In get_commit_hash(), you use the %ad pretty-printing format to obtain commit_timestamp. That's a misnomer, as %ad gets the authorship timestamp, not the commit timestamp. I believe you should be more interested in the commit timestamp. (Authorship times aren't even necessarily monotonic as you progress through the commit chain, since commits can be rearranged using git rebase.)

Assuming that you still want to go through with your original plan, you're working too hard. This should get you the hash of the latest commit on the master branch with a commit date in 2013:

git log -n 1 --until='2013-12-31 23:59:59' --pretty=%H master


Better yet, read what gitrevisions(1) says about "the value of the ref at a point in time", and skip all that analysis.

git checkout 'master@{2013-12-31 23:59:59}'


By the way, I strongly recommend that you abandon your date format in favour of ISO 8601.

Credit: https://xkcd.com/1179/

Code Snippets

git log -n 1 --until='2013-12-31 23:59:59' --pretty=%H master
git checkout 'master@{2013-12-31 23:59:59}'

Context

StackExchange Code Review Q#41492, answer score: 3

Revisions (0)

No revisions yet.