Fixing Up a Git Repository With Broken Alternates

When I'm working, I'll occasionally make use of Git's alternates feature. If you're not familiar with it, the alternates facility allows you to save time and space when you want a pristine copy of a repository. What it does is it sets up a link to a reference repository, and that repository is consulted for objects. If an object doesn't exist in the reference repository, only then is it brought into the fresh repository from upstream. So let's say you have a checkout of some repository (I'll use Rakudo as an example), and you want to create a new clone from GitHub, but you want to save bandwidth. We can activate the alternates facility using clone's --reference option Another way to save bandwidth is to clone from the local repository on disk; the two repositories will even share disk space via hard links if they're on the same filesystem. When cloning from a file-based repository, however, your origin will point to that file-based repository and will be behind the remote copy if the file-based one is, which I didn't want. In addition, the sharing stops after the clone, unlike with alternates. Another way to save bandwidth is to clone from the local repository on disk; the two repositories will even share disk space via hard links if they're on the same filesystem. When cloning from a file-based repository, however, your origin will point to that file-based repository and will be behind the remote copy if the file-based one is, which I didn't want. In addition, the sharing stops after the clone, unlike with alternates. :

  $ cd /tmp
  $ git clone github:rakudo/rakudo # let's make a copy I feel ok blowing away
  $ git clone --reference /tmp/rakudo/.git github:rakudo/rakudo rakudo-jvm

So now we have a fresh clone of rakudo and its Git objects are stored in ~/projects/rakudo. An example of when I would do this is if I'm running MoarVM tests on rakudo and I want to build the JVM version to test some stuff on that, or if I'm running a server program from a branch at work that someone may be playing with and I don't want to kick them off in order to fix a bug on master.

The problem is that sometimes I leave these extra repositories lying around, rename or move them, and eventually accidentally delete the original. If that happens, you get a bunch of angry output from Git when you try to do...well, anything:

  $ rm -rf /tmp/rakudo/
  $ cd /tmp/rakudo-jvm
  $ git status
  error: object directory /tmp/rakudo/.git/objects does not exist; check .git/objects/info/alternates.
  On branch nom
  Your branch is up-to-date with 'origin/nom'.
  nothing to commit, working directory clean
  error: object directory /tmp/rakudo/.git/objects does not exist; check .git/objects/info/alternates.

A simple fix would be to just clone the original repository again, but I saw this as a challenge: how could I actually fix my hopelessly broken repository? git actually tells you how in the documentation, for git clone --shared but I'd already done the work when I discovered that =/ git actually tells you how in the documentation, for git clone --shared but I'd already done the work when I discovered that =/

So the first thing I did was think about how the feature works; the alternates feature uses a file called .git/objects/info/alternates to find its alternate sources. I looked for references to alternates in Git's documentation, and came across clone's --dissociate option. What it does is after the clone is complete is copy the alternate files into the new repository, so that it doesn't need the original anymore. This is exactly what I wanted, just for an existing repository! I didn't see --dissociate for other Git commands, so I dug into clone's implementation, and discovered this:

static void dissociate_from_references(void)
{
    static const char* argv[] = { "repack", "-a", "-d", NULL };
    char *alternates = git_pathdup("objects/info/alternates");

    if (!access(alternates, F_OK)) {
        if (run_command_v_opt(argv, RUN_GIT_CMD|RUN_COMMAND_NO_STDIN))
            die(_("cannot repack to clean up"));
        if (unlink(alternates) && errno != ENOENT)
            die_errno(_("cannot unlink temporary alternates file"));
    }
    free(alternates);
}

So all git-clone does is call git repack -a -d! After this, I simply created a fresh clone from the upstream repository, placed $new_repo/.git/objects into the broken repository's .git/objects/info/alternates files, ran git repack -a -d, and removed .git/objects/info/alternates.

Another way of saving space and time is to use Git's (relatively) new worktree feature. Once I get used to using that, I'll probably transition to it instead.

Published on 2016-01-03