Sequential Merge Strategy in Large-Scale Repository

MalBot · January 8, 2024, 7:20am

Problem

Merging pull requests into large-scale repositories with hundreds of developers working together can be challenging. When a PR (pull request) is raised, CI checks run on top of it. As the git repository size increases, the number of pull requests will increase proportionally.

As an ideal case, we should always run CI with the base as the updated main (default) branch.

Assuming a pull request will take 10 minutes to pass CI, on a large-scale repo, it is almost certain that there are other pull requests being merged during those 10 minutes. In this situation, it will have to update the branch again with the latest base branch code and restart CI.

Solution

Optimistic Merge Strategy: The traditional optimistic merge strategy allows developers to merge pull requests as long as they pass CI checks, even if their branch is not up-to-date with the latest changes in the main branch. This strategy could potentially merge the breaking code into the base branch, as each pull request doesn’t take into consideration other merged pull requests at the same time.
Sequential Merge Strategy: It makes sure that CI runs on combined code and pull requests are merged in sequence, so that it ensures the main branch is always green.

In this article, we will discuss the pros and cons of the Optimistic Merge Strategy and how the hybrid approach(Sequential Merge Strategy) works.

Optimistic Merge Strategy

While this approach is simple and convenient, it can introduce instability to the main branch, especially in large-scale repositories with numerous active developers.

There can be multiple use-cases where it fails. We will discuss one basic example here:

Let’s say we have file X in the main branch
obj={x:1}

Let’s say we have PR1, which has changes in File X
obj={x:1,y:1}

We have PR2, which has changes in the same File X
obj={x:1,y:2}

Individually, both pull requests passed the CI but after optimistic merging both pull requests, it may lead to an unstable main branch.
obj={x:1,y:1,y:2}

To resolve this issue, we will go with the hybrid approach (Sequential Merge Strategy) .

Sequential Merge Strategy

Idea

The sequential merge strategy addresses the limitations of optimistic merging by ensuring that CI checks are run on the combined code of the main branch and all pending pull requests. This ensures that the main branch remains stable and free of merge conflicts, even when multiple pull requests are being merged simultaneously.

Implementation

As soon as a user tries to merge the pull request, it goes into the merge queue (temporary memory). A merge-queue is a queue in which a pull request is added if a user tries to merge it. If the pull request merges or fails, the pull request gets dequeued.

The proposed solution is to start a CI job right away when a pull request is added to the merge-queue. However, it should include all the code prior to the pull request, even those that are not merged yet, because if it’s in sequential order, the later pull requests should have all the code in the main and previous pull requests. A simple example: there are three pull requests (PR1, PR2, and PR3) in the merge-queue.

As the first step, it will create a temporary branch from each of the pull requests. From PR1, it will create a temporary branch named pr1-merge-temp. Since it is the first pull request in the merge-queue, it should be updated from the main branch. Now the temporary branch has the changes to PR1 and the new code in the base branch after the pull request creation time. CI will be running on this temporary branch, and the NX-affected CI plan would be created with main (as it is the first pull request in the merge-queue) as the base and pr1-merge-temp as the head.

Regarding PR2, it will create another temporary branch called pr2-merge-temp, combining the code from PR2 and from the previous temporary branch (pr1-merge-temp). CI will be running on this temporary branch, and a CI plan would be created with the previous temporary branch (pr1-merge-temp) as the base and pr2-merge-temp as the head.

The same process would be done for PR3.

How to Merge

Every 30 seconds, the program checks the status of the first pull request only. If CI is green, then we merge the pull request.

Let’s say CI passes for PR1, which means we can merge PR1 with the main branch. Next, if CI passes for PR2, we can merge this branch as CI passes on the temporary branch (base + PR1 + PR2) against the previous temporary branch (base + PR1) and PR1 is already merged.

Please keep in mind that it will only merge the top item from the queue so that it can keep the order.

Failure Case

If one of the items fails CI (let’s assume PR2 is the item), it will remove it from the merge queue and restart the process again for all later pull requests in order to get rid of the code from PR2.

The above graph shows why it must restart the process: All later pull requests temporary branches have taken a pull from the failure branch, and CI is running on top of it. Since we removed that pull request from the queue, we need to update all later pull requests in the merge queue.

Benefits

The sequential merge strategy offers several advantages over traditional merge strategies:

Stability: The main branch remains stable.
Predictability: Merges are predictable and follow a clear order, minimising the risk of unexpected behavior.
Reliability: CI checks are run on the combined code, ensuring that the entire codebase is tested and validated.

Conclusion

The sequential merge strategy provides a robust and efficient approach to managing pull requests in large-scale repositories. By ensuring that CI checks are run on the combined code and merging pull requests in a sequential manner, this strategy helps maintain a stable and reliable codebase, even with a large number of active developers working on the project.

Sequential Merge Strategy in Large-Scale Repository was originally published in Walmart Global Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Article Link: Sequential Merge Strategy in Large-Scale Repository | by Diksha Agarwal | Walmart Global Tech Blog | Jan, 2024 | Medium