Incremental development

Introducing significant code changes to working software is scary. Many things might go wrong, and in the best case, you’ll be able to roll back to the previous release quickly. But in the worst case, you already migrated your database and removed legacy columns, making rollback impossible 😱 Now you’re firefighting, trying to fix all issues on production ASAP.

I’ve experienced all of that in my development career. Over time, I learned to deal with it in a less obstructive way. The secret is an incremental approach to significant code changes. Instead of trying to make all necessary changes in one go, we should split them into small chunks. Every chunk should be small enough to be safe for deployment but big enough to help us make meaningful progress towards the end goal. Let me illustrate this with examples.

Example 1: New big feature

I will use a feature from my Bulk Price Editor app as an illustration. The app used to have only tasks for bulk editing prices. But I wanted to add a new big feature - the ability to launch sales. This was a significant code change as it affected the core of the app. It required changes to all parts of the code:

Adding new database columns and tables
Introducing new models
Making shared tables and code polymorphic
Updating most views, etc.

Trying to make these changes in one go felt very scary to me. Just imagine this huge PR with an enormous diff, that changes dozens of files and all layers of the application:

How would you review it?
How would you test it?
Most importantly, how would you release it?

If something goes wrong after release (and it usually does), it will be difficult to roll back as the database has already changed significantly. If you have multiple issues, you’ll need to try to fix them all at once. It’s a very stressful thing.

So I’ve chosen another way. Instead of trying to merge this monster PR, I started splitting it into chunks:

One chunk to create new tables (without using them in code yet)
Another to modify existing columns
The next chunk to make existing code polymorphic (without even introducing the new entity at this point)

The goal in this process is to make every chunk isolated and independently deployable. They are all are serving the final goal of adding this new big feature. But they’re doing it one step at a time. If something goes wrong - you still have a chance to roll back that small change and fix the problem.

When preparation for the new feature is completed and all small chunks are merged, I usually introduce a feature flag. It can be as simple as a column on the customer’s model (eg sales_enabled:boolean). A feature flag is an important part, allowing you to put new code into production without affecting existing customers. The feature flag is deployed in a separate PR too. When the feature flag is in place - it’s time to finalize the feature and deploy it to production. It’s safe, as nobody can access it yet. I usually try the new feature myself first, fixing any last bugs and then start rolling it out for all customers.

The algorithm for adding new features can be summarized as follows:

Start by creating a big draft PR that implements the feature to a minimal working state. At this step, you’re just prototyping and trying to make the feature work without thinking of the release process at all.
When you have a draft, start extracting smaller PR’s that are preparing different parts of the app: migrating the DB, changing existing code to work with the new feature, etc. Isolate these PR’s and deploy them separately.
Next, add a feature flag and release it to production.
Release the new feature’s code under the feature flag.
Test it yourself on production and roll it out to all users.

Example 2: Upgrading legacy app

Multiple times in my development practice I faced the task of upgrading a legacy application. And by legacy I mean not the previous Rails version, but rather multiple Rails version behind, with outdated Ruby, lots of jQuery and an outdated frontend stack. In the worst case, such app could also have performance issues due to unoptimized DB queries.

When you’re thinking of a case like this you might have an idea that it might be easier to rewrite everything on an up to date stack. And I used to participate in such rewrite on one of the projects. Spoiler alert - the new version was never released because everyone was scared to deploy it to production as it would inevitably hurt the business. And it’s understandable. Existing legacy software might be in a bad state, but it works. It serves customers and makes money. Yes, it might have bugs. But those are known bugs. New software will have new bugs and most likely lots of compatibility issues. And the scope of changes compared to the old app will be so big that debugging any issues after deployment will be a huge problem.

Instead of a full rewrite, I prefer applying similar incremental development principles to do the upgrade. The goal in the process is to make improvements one by one, making the app better without interrupting current operations. Here’s how it might look in practice (steps are taken from one of my recent upgrades):

Update minor dependencies as much as possible. Some of them can’t be updated at this point so we should leave them to the end.
Update the Ruby version, one minor version at a time, fixing all compatibility issues and checking that it is stable on production before moving to the next version.
Upgrade the Rails version, again, one minor version at a time.
Replace legacy bundling with a modern alternative. For example, switch from webpacker to jsbundling-rails.
Add Stimulus JS and replace jQuery with Stimulus controllers where possible.
Introduce Turbo, if necessary, and remove the legacy JS framework.
Upgrade or replace your CSS framework, if necessary.
Upgrade the latest dependencies.

Every one of these steps involves one or many PRs. And every step should be independently deployable. By isolating changes we’re reducing risks and making it much easier to debug and fix any potential issues. And I don’t remember any major upgrade without some sort of issues after release.

Summary

There’s a simple signal that will tell you when it’s time to try incremental development - when you have a big PR, that is changing so much stuff that you’re hesitant to deploy it to production. It most likely stays open for a while and you even resist to reviewing it, as it’s just too much scope to process at once.

When you’re noticing these symptoms - try to create a new branch and extract a minimal set of changes from the big PR that are safe to deploy and moving the needle forward a bit. It can be just a feature flag, or some refactoring required for the new feature. Release changes from the new branch, check them on production and repeat - pick the next set of changes. Keep doing that until the old big PR becomes obsolete and the full scope of necessary changes is released.

Adopting this approach helps me ship big changes to my apps with minimal risks. It won’t magically fix all errors that you might encounter after each release, but it will make much easier to tackle them. Since the scope of changes is limited, it’s easier to find and fix them. And you almost always have an option to rollback until all issues are resolved.

How do you manage complex product changes and upgrades? Share it with me on X (Twitter) or over email. I’d love to hear your thoughts!