HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

How to achieve a smooth transition from “the one big VCS repository for all products” organisation model to the “many small VCS repositories” model?

Submitted by: @import:stackexchange-devops··
0
Viewed 0 times
vcstheallachieverepositoriesonebigsmalltransitionrepository

Problem

It is a common scenario that the codebase of a product held by a repository in some VCS system evolves to a point where that codebase can arguably be seen as containing several products. Splitting the codebase across several VCS repositories, each dedicated to a single product, can leverage several benefits (see Benefits of having a product per VCS repository over the bloat repository model below). On the technical side, splitting the codebase is a rather easy step as most VCS support this operation. The split however might rise engineering issues related to automated testing, continuous delivery, service integration or monitoring (see Issues raised by the split.) Organisations planning to perform such a split therefore need to know how to perform this transition as smoothly as possible, that is, without interrupting their delivery and monitoring pipeline. The first step of this is probably to better understand the notion of project and how to delineate the split in a monolithic codebase.

In the answers to this questions, I would like to see:

-
An attempt to give a working definition of what a product is, which gives practical criterions to actually delineate products in an existing codebase.

-
According to this working definition, elaborate a plan that actually perform the split. We can make the simplifying assumption that the codebase is processed by a fully automated sdlc implementing continuous-integration and continuous-delivery. That is, each branch is validated by an automated testsuite implemented in the current codebase and each merge to some “magic” branch generate product-artefacts that are tested and deployed. (Product artefacts are e.g. source tarballs, documentation, binary software packages, Docker images, AMIs, unikernels.)

-
Such a plan is satisfying if it explains how to circumvent the

Issues raised by the split

-
How automated testing procedures relate to the pre-existing monolithic repository and the split repositories?

-
How auto

Solution

It is a fascinating question for which real answers may not actually exist; I appreciate that while you tried to keep the question contextualized on the VCS, it naturally scaled by itself up to infrastructure design and implementation planning.

Though, it seems many of us are working of this kind of transitioning, which can be exciting, but at the same time so frustrating and painful; and I would like to share my experiences and views, trying not to be pedantic, and just because I may not be such a good engineer, also not to be boring.

Design

Infrastructure and architecture should go together to write a modern software. Tightly coupled, if you want. It might sounds weird to use those words, but we are not certainly talking about code itself here: I mean they must be part of the same blueprint. When the clouds arrived, and people started to write software for them, how many people then realized that by putting the mudballs there, they just would be the same mudballs in a different place(?) Maybe a few forward thinking people could foresee that, and they are likely working in devops today. As devops is just a buzzword with so many differnt meanings for differnt people, I have seen places in which the devops team would sit in every architecture meeting; other places in which is automation only. To achieve this kind of transformation, we have to fight our way to sit there.

Confidence

The transition must be kept isolated, in the sense that a consistent cut of history must exist, that provides the transition itself and itself only, without any other change (after several months of preparation). With what confidence one would approve it and push the red button?

I mean the codebase must change to accommodate the new VCS structure, and it will be very difficult to keep it merged during development. (for this issue there may be facilitating strategies, I'll talk about a common one later, that can help parallelize development a bit).

Well I bet the only way is with behavioural testing, and the same suite of behaviour tests should be launched to verify the old with new codebase. We are not verifying the application behaves as expected here, but that the transition does not alter the behaviour. Having failing tests may be a good thing! If they continue to fail!

In fact it is very uncommon for mudballs to be well tested; usually the code is very tightly coupled, and likely, for most legacy code, it was not developed with a test driven approach, not even unit tests.

If such test code is missing whatsoever, it shall be written first.

Strategy

Yes, the transition must be kept isolated; but at the same time integrated. I know I may sound crazy here, but I wouldn't find other words to describe how confidence can keep up to business. Very few, if none at all, companies would like to stop development for a big monolithic codebase, to make space for this transition, and we are not making it just happen within a toss of a coin. Maybe hundreds of developers might be continuously contributing to the codebase (I would use the tampering word here, from our POV). While our work must address a specific snapshot to provide confidence, we have to keep ourselves rebased (not in a git meaning here), to avoid to fall behind forever.

The implementation strategy in here can give different experiences. A common line of development is to wrap/adapt (exposing endpoints with optionally rearranged schemes) newer implementation branches (well, living in other repositories in this case), when they need to interact with the core. Transitioning with a strategy like this, along with refactoring, can at the same time offer a POC scenario for the VCS transition, and later on a step by step approach. See it like sculpting the ball of mud. Yeah life offers so many funnier things.

Technical Debt

Business management spheres started to understand technical debt and keep it into consideration. Nope, scratch that, not true. While is it increasingly common to gather measurements and quality data, in terms of static code analysis, code reviewing, behavioural test results, and performance, and generate nice reports and everything... it still remains incredibly difficult to make the business accept a continuous refactoring approach. The business value of it. "We are following an agile process, and this will not bring in any enhancement to the features, wouldn't it?". Basically, by doing so they are negating the existence of technical debt.
I see this as the common missing necessary step to be able to start any transition from monolith to microservices architectures.

Reaggregation

After all this, you might still want to provide a single repository-like view in which you can build more than one single product. For any reason, ie curr/next release, multibrand, customer builds. Tools like google repo may help in this case. Never used one myself, but I know I'll need one day.

Integration testing

With microservices, integration testing assumes a diff

Context

StackExchange DevOps Q#171, answer score: 9

Revisions (0)

No revisions yet.