Git Squashing: An Illustrated Guide
GitHub’s new ‘squash and merge’ button has given much easier access to this Git power feature. But what is squashing, really, and what is it useful for?
Imagine you’re working on a small feature, and toward that end you check in 9 commits. That might represented something like this:
Occasionally implementations are this smooth, with a relatively direct path of sequential commits leading to a more or less complete feature.
But that’s not always how development goes — sometimes implementations take a slightly less linear path. This is why Git allows cleanup of these false or messy paths.
Trimming false paths the simple way: removing and reverting commits
But development is a highly experimental process, so sometimes it goes a little bit more like this:
With the final feature being comprised of [c1, c2, c3, c8, and c9]. In a perfect world where this is a ‘pure’ false path, there are a couple ways to handle this. If you’re lucky, these commits are experimentation that happened purely on your local machine, you can simply discard them by resetting to a c3 starting point and then building from there:
But let’s say all of the problematic commits [c4-c7] have been pushed to a remote host, and now these need to be undone. The most proper way to do this is not a hard reset, but instead to revert these commits. While this may seem like a subtle difference since both are functionally a form of undoing past work, Git views reversion as a ‘forward’ type of action. Reverting, in effect, creates an exact negative diff of the reverted commits that is applied on top of them, like this:
This is an important distinction for two reasons. First, if other developers have updated their copies of the code and have a record of c4-c7 on their machines, reversion gives them a clean path to move forward: a diff that can be applied on top of their work, even if it touches the same code. Second, since reversion is a positive action, this retains a record of the c4-c7 in the version history, which may prove to be useful context for future developers.
Where resets are ‘destructive’ and historically revisionist (erasing the record of work done), reversions are not. This makes reversion a friendlier and less risky way to undo changes, particularly changes that are likely to have been shared with others.
Squashing, like resets, is also an inherently destructive act, but this can be useful when the goal is to actively destroy distracting parts of the implementation history for the benefit of other developers.
Squashing to compress experimental work
Squashing comes into play in the case of pseudo-false paths, where there’s lots of experimentation happening and some of it works out. This development work isn’t entirely a dead end, but isn’t entirely usable either. Often there’s a sort of ‘cleanup’ commit that trims out all the leftovers: code remnants that crept in during experimentation, but didn’t end up being used.
That might be represented something like this:
This is a great example of where squashing can help compress this little leg of development and yield a more condensed version history for the feature, something like this:
Here we’ve squashed out some of the experimentation noise, but still retained a nicely granular version history for the feature. In the event that a bug was introduced, there are still various bisect and cherrypick points available, and a good amount of context available should any future forensics be needed.
Finding this ‘Goldilocks zone’ is the key to healthy squashing. While compressing a days worth of development into a feature might make it easier to review, it comes at a cost. The cost of this cleaner history is that it’s also a riskier history that destroys some of Git’s inherent advantages: the ability to move fast with lots of safety points in the event something goes awry.
What does a squashed commit look like?
There are a couple important things to note about squashed commits, specifically what information gets preserved and what gets discarded. Both the original contributor information and time stamps are discarded and replaced by the contributor doing the squash, and time of the squash. The original commit messages will, by default, be concatenated into a new commit message, so the whole process looks something like this:
The most important thing to note here is that squashing inherently distorts both the work timeline and any notion of proper attribution. Any use of squashing where the ‘squasher’ is someone other than the original contributor(s) is particularly problematic, as there will be no accurate record of who originally wrote those pieces of the codebase. This has the unfortunate side effect of making ‘git blame’ rather misleading, as it will report the squasher having authored that code.
Because of this, it’s a good idea to limit squashing to cases like the one described above — smaller tactical squashes of one’s own work, where the goal is to clear pieces of the work record which, if left in, would do more harm than good.
Travis Kimmel is the CEO, co-founder of GitPrime, the leader in data-driven productivity reporting for software engineering teams. He is experienced in building high-performing teams, and empowering people to do their best work of their careers. Follow @traviskimmel on Twitter.
Get Engineering Impact: the weekly newsletter for managers of software teams
Keep current with trends in engineering leadership, productivity, culture, and scaling development teams.