Short Story of a Long Migration

How we migrated the Zalando Logistics Operating Services to Java 8

photo of Oleksandr Volynets
Oleksandr Volynets

Software Engineer

Posted on Apr 26, 2018

How we migrated the Zalando Logistics Operating Services to Java 8

“Never touch working code!” goes the old saying. How often do you disregard this message and touch a big monolithic system? This article tells you why you should ignore common wisdom and, in fact, do it even more often.

Preface

Various kinds of migration are a natural part of software development. Do you remember the case when the current database didn’t scale enough? Or maybe there is need for a new tech stack when the existing stack does not meet changing requirements? Or perhaps the migration from the monolithic application to the microservice architecture is hard. There could also be smaller-scale migrations like upgrading to a newer version of the dependency, e.g. Spring, or Java Runtime Environment (JRE). This is the story on how a relatively simple task of migration from Java 7 to Java 8 was performed on a large-scale monolithic application that has ultimate criticality to the business.

Zalos as the service for Logistics Operations

Zalos (Zalando Logistics System) is a set of Java services, backend and frontend, that contains submodules to operate most functions inside the warehouses operated by Zalando. The scale of Zalos can be summarized by the following statistics:

  • more than 80,000 git commits,
  • more than 70 active developers in 2017,
  • almost 500 maven submodules,
  • around 13,000 Java classes with 1.3m lines of code, plus numerous production and test resource files,
  • operates with around 600 PostgreSQL tables and more than 3,000 stored procedures.

Zalos 2, denoted as just Zalos below, is the second generation of the system, and has grown to this size over the past five years. Patterns that were, at the time, easy to adopt for scaling up architectural functionality, have quickly become a bottleneck with the growing number of teams maintaining it. It is deployed to all Zalando warehouses every second week, and every week there is a special procedure to create a new release branch. Each deployment takes about five hours, branching takes about the same time. When also considering urgent patches, it takes a significant portion of each team’s time to do regular deployment or maintenance operations.

Now, what happens if the system is left unmaintained for a while? The package dependencies and Java libraries become obsolete and, as a consequence, security instability grows. Then, one day one of the core infrastructure systems has to change the SSL certificate, and this causes some downtime in all relevant legacy systems operating a deprecated Java version. For the logistics services these problems might become a big disaster, and you start thinking: “What does it take to migrate Zalos from Java 7 to Java 8?”

Migration? Easy!

With some basic experience with Java 9, the option to go even further has been rejected pretty fast: a combination of Java-9 modularity and 500 sub-modules doesn’t look very positive. Well, bad luck. What else do you need to keep in mind for Java 8 support? Spring? Sure. GWT? Maybe. Guava? Oh yes. Generics? This too.

This is a good time to talk about the tech stack for Zalos. It contains backend as well as frontend parts, both running Spring 3. The backend uses PostgreSQL databases via the awesome sprocwrapper library. Both backend and frontend rely on Zalando-internal parent packages to take care of dependency management. The frontend engine is GWT 2.4 with some SmartGWT widgets. And, to mention a few more challenges, it uses Maven overlays with JavaScript but more on this later.

Our first strategy was to bump as many package dependencies as we can. Spring 4 which fully supports Java 8, GWT 2.8.2 that already has support for Java 9, Guava 23.0, etc. We use GWT 2.4; a jump of over five years development-wise. Hard dependency on our internal Zalando dependencies had ruled out the major Spring upgrade too. Guava 23 has deprecated some methods and we would need to change quite an amount of code: again, a failure.

Let’s try an another strategy then: bump as little as we can. This strategy worked much better. We only needed to have Spring 3.2.13 and Guava 20.0, plus required upgrades like javassist and org.reflections. The matrix of compatible versions is shown in the appendix. GWT dependency was left untouched, although it limits our client code to Java 7. A compromise but not a blocker: there is little active development of new GWT code anyway.

Now, overlays, or in our case Dependency Hell, is a feature of Maven to include dependencies from a WAR or a ZIP file and it “inlines” the complete package as is. And it does so with all its dependencies. As an example, this means, should an overlay have a different version of spring-core, you get two versions of spring-core in the final WAR artifact. When the application starts, it will get confused which version to use for which parts of the application, and various ClassNotFound exceptions will pop up. Bad luck, republishing all war-overlays with updated dependencies is required.

Go-live or don’t rush?

It took just two weeks of highly-motivated and self-driven work for two people to crack the problem and run the 500-module monolith on the laptop with Java 8. It took two more weeks to deploy it to the staging environment after fixing multiple issues. After that, it took two more months to finally deploy it to the production environment. Why so long? Because we deal with the utmost critical system that has several serious constraints, and here they are:

  1. Deployments. Deployment to production lasts up to five hours and it should not interfere with any other deployment, due to internal limitations of the deployment system. With absolute priority for production deployment there isn’t much time for experimenting with the migration. Solution? Tweaking the deployment service helped reduce deployment time by about one third to have some freedom for experimenting on a staging environment.
  2. Development. There are still about 25 commits per day in the main branch. Breaking it would have a significant impact on feature development, and it isn’t easy to experiment with JDK versions from the feature branch. This isn’t good, but still there is a more serious constraint.
  3. Warehouse operations. They are the backbone of an e-commerce company and should not be interrupted by the migration. The risk of any bug should be carefully minimized to maintain the service liveness.

To solve at least two constraints, we created a concrete three-step plan on how we execute the migration in a safe manner and be able to roll back at any time:

  1. Upgrades of all packages compatible with both Java 7 and 8 without changing runtime version. This ensured that there are no changes for deployment
  2. Switch to Java 8 runtime (JRE) keeping source code in Java 7 mode. This step ensured that we can safely change the deployment settings without touching the code and dependencies.
  3. Switch to Java 8 development mode to fully support Java 8 features. No major deployment changes were done with this step.

In addition, except for a staging environment, every step was carefully tested on a so-called beta environment which operates on production data.

Outlook

The migration was completed despite some failed attempts a few years ago. Several things have happened. The service has become a little more stable and secure. The code can now be written with lambdas, method references, etc. Deployment service has been improved too. But most importantly, the legacy system got attention. Even though we had one camp of people who said, “We tried that before, why do you want to try again?” there was also the second camp with, “You are crazy but yeah, do it”. No matter what was tried before and in what manner, it is never too late to try again.

Keep your legacy code under careful supervision: add code quality metrics, minimize maintenance efforts, optimize release cycles. With this you will stop having “Legacy Nightmares” but rather have a maintained piece of code.

Appendix

Here is a list Maven dependencies and related changes that finally made it working together:

null

In addition, the following compilation and runtime settings were required:

  • and properties for maven-compiler-plugin set to 1.8
  • tomcat 7, i.e. run services with “mvn tomcat7:run-war” and not “mvn tomcat:run-war” which uses tomcat 6 by default.

Come work with us! Have a look at our jobs page.



Related posts