The Case of the Moving Merge Base
A while ago, my friend and coworker Matt pinged me with a Git question. He was looking into an issue with a GitHub Actions workflow that only runs on files that changed in a given pull request, and a user reported that it was running against way too many files - so many, in fact, that the workflow was failing because it was exceeding the maximum command line length!
Now, the set of files changed by a pull request is determined by the merge base between the commits at the tips of the source and target branches - it's basically their most common ancestor. Matt showed me the logs from the workflow, which was logging the commit ID for the merge base - but when we ran the exact same git merge-base command locally, we got a different commit back! The only real difference between our local repositories and the ones used by our workflows is that the latter use shallow clones and fetches, so we figured that that had something to do with it - and sure enough, when we increased the fetch depth for the action, we saw the correct commit ID for the merge base in the logs.
So "problem solved" - but needless to say, we were both still very confused, and I decided to dig into it further.
Deepening Our Understanding
My big shock was that you can get a different result for a merge base as you deepen - I had assumed that either the correct one would be found, or none at all. The way merge base works, conceptually, is that given commits L and R, it finds a commit C that L and R share as a common ancestor, with the added condition that there's no other commit that both meets these criteria and also has C as an ancestor (eg. there is no "closer" commit).
So what we're looking for is a situation where:
- We fetch up to depth
D - There's a commit
B(forBad merge base) that's withinDsteps of our target commit - There's a commit
G(forGood merge base) that's not withinDsteps of our target commit Bis an ancestor ofG
This sounds...kind of impossible at first - after all, how could B be closer than G if B is G's ancestor - but I suspected that merges might be the culprit here. I started wondering how fetch depth interacts with merges, so here's a pop quiz for you: if I have a merge commit with 5 commits on each side's branch, does git fetch --depth=5:
- ...fetch the merge commit and the first two ancestors on each side, in a sort of breadth-first fashion?
- ...fetch the merge commit and the first four ancestors on one of the left or right, in a sort of depth-first fashion?
- ...fetch the merge commit and the first four ancestors on both sides, considering they are within 5 steps of what we're fetching?
The answer is: the last one! It fetches all commits that are within 5 steps of the target commit. And therein lies the trick - if there's a short path to B from both sides of the merge commit, but the path to G from one of the sides of the merge commit is long, that results in a situation where the merge base can change as you deepen!
This is kind of hard to wrap your head around with just words, so here is a diagram. It's a little weird to look at at first, but we'll go through it step-by-step - I just wanted to provide the full context to start:
Now that we have the full commit graph for reference, let's walk through what we see as we deepen. At depths 1 and 2, we just see the merge commit M and then its immediate parents L and R - nothing interesting. So let's look at the commit graph at depth 3:
--depth=3
Now we're starting to get into "interesting" territory! We see our commit on the left side of the merge is itself a merge, and we see our "good" merge base (G) makes its first appearance. But if we did git merge-base L R, it wouldn't show up, because in our current view of the commit graph, we don't know that G is an ancestor of L. In fact, we don't see any common ancestors between L and R yet, so git merge-base L R would give us nothing.
--depth=4
Here we arrive at the situation where git merge-base L R gives us our "bad merge" commit B! This is because while G is visible, Git still doesn't know that it's reachable from L, due to shallowness. What's notable to me is that G's other child - P - is in the commit graph - we just don't know who its parent(s) is/are yet.
--depth=5
...and here we are, back at the full commit graph. git merge-base L R gives the correct answer, because now G is reachable from L via P.
One of our solutions was to iteratively deepen until the calculated merge base stops changing (which actually wouldn't work reliably - the merge base can stay the same for multiple deepens before changing again) - but that would incur at least one extra fetch per workflow run, which isn't great.
Solving The Problem
Now that we know what's happening - what do we do about it?
Well, we can just accept that this happens sometimes - I looked at 10,000 pull requests in our monorepo at work, and checked how many had a shortest distance from either side of the merge to the merge base that was greater than our fetch depth. After doing this, I found the "wrong merge base calculation" behavior happening about 1.3% of the time. And it's only a problem if it seriously influences the list of changed files - after all, we only noticed because a particular pull request caused a downstream action to exceed the argv size limit! If users rebase their branches often, that should reduce this problem's frequency.
We can also just bump the fetch depth - in my experiments, the "deepest" correct merge base was a little more than 200 commits from either side of the merge. So changing the fetch depth to something like 250 largely solves this problem, and it doesn't take a ton more time than a fetch depth of 25. Speaking with my pragmatist hat on, that's the approach I would advocate.
Buuuuut those answers just don't feel satisfying, do they? So if you're still here (thanks!), there is a way we could fix file change calculation to deal with this behavior.
Going Even Deeper (Until We Don't Have To)
Remember when I said
P- is in the commit graph - we just don't know who its parent(s) is/are yet.
Well, therein lies the key to the "proper" fix - if git merge-base A B gives you a merge base MB and none of the commits between MB..A and MB..B are on the shallowness boundary, you have the correct merge base! If there is at least one commit that's on the shallowness boundary, you need to deepen - it doesn't necessarily mean you'll find a better merge base, but you could.
Now - how do you tell if a commit is on the shallowness boundary? A scrappy way is to just look for "(grafted)" in the output of git log - so for the above example, you'd check if git log --oneline L R --not MB | grep --quiet --fixed-strings '(grafted)' exits non-zero.
I hope you enjoyed deepening your knowledge of git merge-base with me today!