Merging Git Repos for Archival Purposes

Andrew Bolster

Senior R&D Manager (Data Science) at Synopsys Software Integrity Group and Treasurer @ Bsides Belfast and NI OpenGovernment Network

TL;DR

I had reason to want to combine git repos into one big repo consisting of repos in their own folders, while ideally maintaining the histories of all those repos for archaeological purposes.

There are many reasons why someone would want to do this, and my specific use case isn’t relevant. Good luck.

Why so complicated?

  • ‘Hidden’ files (dotfiles) suck
  • Shell Wildcards suck
  • Wildcards with selective exclusions (i.e. .git) suck
  • File names with spaces suck
  • Trailing Slashes suck
  • Rewriting History sucks

Raw version because I don’t trust GIST and embeds and such….

#!/bin/bash

usage() {
	cat << EOF
This script imports a git repo (accessible from https://\$origin/\$user/\$repo) and all its history as subdirectory of a destination (available locally at \$dest)
It is designed for non-production, archival processes and may destroy everything you've ever loved because you looked at it funny. You have been warned.
The structure of the destination will end up something like this:
~/src
- \$dest
  - origins
    - \$origin
      - \$user
        - \$repo
Required Arguments:
	-u|--user: The user that owns the repo to be imported
	-r|--repo: The name of the repository to be imported
	-d|--dest: The local name of the destination repository (assumed to be under ~/src)
	-o|--origin: The git server that is the origin of the repo to be imported
EOF
}
if [  $# -le 6 ]; then 
    usage
    exit 1
fi 

while [[ "$#" -gt 0 ]]; do
    case $1 in
        -u|--user) user="$2"; shift ;;
        -r|--repo) repo="$2"; shift ;;
    	-d|--dest) dest="$2"; shift ;;
	    -o|--origin) origin="$2"; shift ;;
    	*) echo "Unknown parameter passed: $1"; usage; exit 1 ;;
    esac
    shift
done

tmp="/tmp/_${dest}_tmp"

echo "Importing $origin/$user/$repo into $dest"

rm -rf ~/src/$repo
cd ~/src
git clone https://$origin/$user/$repo
cd $repo
git filter-branch \
	--tree-filter "mkdir -p $tmp/origin; git ls-files | cpio -pdumB $tmp/origin; git ls-files | xargs  -d '\n' rm -r; find . -type d -empty -delete; mkdir -p origins/$origin/$user; mv $tmp/origin origins/$origin/$user/$repo/"\
	--tag-name-filter cat --prune-empty -- --all
if [ $? -eq 0 ]; then
	## WAIT PATIENTLY
	cd ../$dest
	git remote add $repo ../$repo
	git fetch $repo --tags
	git merge --allow-unrelated-histories $repo/master #Youre on your own if you want a different / multiple branch(es)...
	git remote remove $repo
else
	echo failed for $user/$repo
fi

Published: June 23 2020

  • category:
blog comments powered by Disqus