Synopsis
This guide will show you how you can import a Git repository's commit history into another, while keeping the GPG signatures. This guide also works if the the target repository has diverged from the source repository.
How to Merge Divergent Git Histories
Introducing the Problem
Last June, I was hired by a precision farming startup to fulfill their IT-related needs and work on their emerging web-based management platform. The platform had originally been developed and delivered by an external firm that adopted a typical REST-SPA architecture and maintained it in two separate Git repositories: one for the client and another for the API.>
Over time, I made significant improvements to streamline the development process and bring the code up to standard. One such improvement was migrating to Next.js to leverage server-side rendering and reduce API development time—an endeavor that proved successful. However, as the frontend became increasingly "server-sided," I noticed growing code and dependency duplication between the app and the API. This duplication led me to merge the codebase into a single monorepo, which was another success.>
I made that decision in the middle of a feature development phase and was content at the time with simply moving both codebases into a new repository and starting a fresh commit history. However, not having file history readily accessible from my CLI has been a significant inconvenience. Now that the feature is delivered, it’s time to resolve that issue.
First, here's a short overview of how one might have gone into this situation:>
mkdir monorepo
cd monorepo
cp ../frontend .; rm -rf frontend/.git
cp ../backend .; rm -rf backend/.git
git init
# (edit, add, commit, push, repeat ...)
The Solution
In this guide, we’ll import a single repository history into another one, but you can repeat the process as needed for each history you wish to import.
First, clone a working copy of the source and target repositories (in our case, the frontend and the monorepo). We’re taking this step because we’re going to rewrite the history itself, which means we won’t have an easy way to revert our changes if something goes wrong.
# Create working copies of our repositories
git clone ./monorepo monorepo-temp
git clone ./frontend frontend-temp
Next, since I’ve placed the frontend under the "frontend" directory in the monorepo, I’ll rewrite the entire history of the frontend repository so that all commits occur within that directory. This will allow us to seamlessly merge the histories later on. To do this, you'll need git-filter-repo installed on your system, as it is not a standard Git command.
# Re-write the source repo's history
cd frontend-temp
git filter-repo --to-subdirectory-filter frontend --force
Note that doing so will overwrite all GPG signatures. To fix this, offer every developer who made a signed commit in that repository the chance to sign their commits again with the following command. This workaround is far from bulletproof. First, it's slow, impractical and annoying; secondly, it could potentially make developers sign commits they didn't initially sign (including potentially "forged" commits). However, since I'm the only developer on the team and the first one to sign his commits, I’m gonna do it anyway.
# (optional) re-sign commits
git filter-branch --commit-filter 'if [ "$GIT_COMMITTER_EMAIL" = "<your-email>" ]; \
then git commit-tree -S "$@"; \
else git commit-tree "$@"; \
fi' HEAD
Now, we’ll navigate into our monorepo, add our source repository as a remote, and fetch it.>
# Add source repo as a remote for the target repo
cd ../monorepo-temp
git remote add frontend-temp ../frontend-temp
git fetch frontend-temp master
Finally, it's time to merge and link the histories. To do so, we'll need to replace the target's first commit with a similar one, but that is also going to be a descendant of the source's latest commit. We can do that using the following function:
# Function to add a parent to a commit
add_parent() {
local old_commit="$1"
local new_parents="${@:2}"
local old_parents="$(git log --pretty=%P -1 "$old_commit")"
local tree="$(git log --pretty=%T -1 "$old_commit")"
local message="$(git log --pretty=%B -1 "$old_commit")"
# remove --gpg-sign if you don't sign your commits
local new_commit=$(git commit-tree "$tree" --gpg-sign -m "$message" $(echo "$old_parents $new_parents" | sed 's/[^ ]\+/-p &/g'))
git rebase --committer-date-is-author-date --onto "$new_commit" "$old_commit"
}
This function first call git commit-tree
to create a commit
object that shares the same tree, message, and parents as the original
commit, but adds some new parents and has it's own GPG signature. Then the
function invokes git rebase --onto
to replace the previous
commit with the new one. Here's how to use it:
# Linking the commit histories
add_parent <first-commit-sha1> frontend-temp/master
# You can also add multiple parents at once
add_parent <first-commit-sha1> frontend-temp/master backend-temp/master ...
Did you know that a commit can have more than one or two parents? Commits with multiple parents are colloquially called "merge commits" and "octopus merges," respectively.
Cleaning Up
Finally, you can remove the remote and the temporary directories
# Cleaning up
git remote remove frontend-temp
rm -rf ../frontend-temp
# monorepo-temp is now our main worktree, publish it with `git push --force-with-lease`
Conclusion
Ideally, we should do all that immediately after copying the source repository. As always with Git, there’s more than one way to solve a problem, and this method just happens to be the one I came up with for my specific use case. I have no doubt that most people following this guide will encounter their own unique challenges and bumps along the way—things I can’t possibly foresee. But I believe these steps should take you far enough to find your way through. That’s all—peace.