Zalando Engineering Bloghttps://engineering.zalando.com/2024-02-20T00:00:00+01:0012 Golden Signals To Discover Anomalies And Performance Issues on Your AWS RDS Fleet2024-02-20T00:00:00+01:002024-02-20T00:00:00+01:00Dmitry Kolesnikovtag:engineering.zalando.com,2024-02-20:/posts/2024/02/twelve-golden-signals-to-discover-anomalies-and-performance-issues-on-aws-rds.html<p>Automate anomaly detection for AWS RDS at scale.</p><p><img alt="Logo rds-health utility" src="https://engineering.zalando.com/posts/2024/02/images/rds-health-v2.png#previewimage"></p>
<p><strong>TL;DR</strong>: Database per service pattern in the microservices world brings an overhead on operating database instances, observing its health status and anomalies. Standardisation on methodology and tooling is a key factor for the success at the scale. We have incorporated learning from past incidents, anomalies and empirical observations into a methodology of observing the health status using 12 golden signals. The most simple way to adopt these methodology within your engineering environment is an open source utility <a href="https://github.com/zalando/rds-health">rds-health</a> recently released by us.</p>
<h3>The problem of maintaining robustness at scale</h3>
<p>Since Zalando concluded <a href="https://engineering.zalando.com/posts/2016/10/jimmy-to-microservices-the-journey-one-year-later.html">the organisation's scalability using microservice pattern</a>, the company has experienced steady growth across multiple dimensions: in the number of users, in the technology landscape and number of teams involved in building and running systems. So far, Zalando is a leading European online fashion retailer. It is critical that our architecture is robust to withstand challenges and uncertainties while teams innovate and experiment with new ideas.</p>
<p><strong>Overhead by microworld.</strong> <a href="https://engineering.zalando.com/tags/microservices2.html">Microservices</a> became a design style for us to define system architectures, purify core business concepts, evolve solutions in parallel, make things look uniform, and implement stable and consistent interfaces across systems. Our engineering teams independently design, build and operate multiple microservices. Often, microservices are implemented with a datastore following <a href="https://microservices.io/patterns/data/database-per-service.html">the design pattern – database per service</a>, where each service deploys its own database instances. The <a href="https://opensource.zalando.com/tech-radar/">Zalando TechRadar</a> guides teams about the database selection and their deployment options – AWS RDS with Postgres as one of the available options.</p>
<p><strong>Hidden costs by toil.</strong> Operating swarm of small databases at company scale quickly gets tough. Complex anomaly detection tasks, such as byzantine failures or issues with SQL statements, takes a noticeable investment all over the place. A combination of manual processes and ad-hoc scripts to manage the health conditions of database instances are not an option at the scale. It became increasingly time-consuming and error-prone, some teams are required to allocate engineers for sprint or even months for such activities.</p>
<p>Standardisation is one of the factors that reduces this complexity. It is well known that if teams use the same frameworks or design pattern then making changes at scale becomes easier. Same concept is extendable into the operation domain. We have limited the fragmentation by providing stronger guidelines to our engineers on what metrics to observe from datastore components.</p>
<p>We have developed a methodology on how to detect anomalies with AWS RDS workload through 12 “golden signals”. We also decided to release an open-source command line utility (https://github.com/zalando/rds-health) to help automate and streamline detection of anomalies and performance issues. The utility provides a consistent and repeatable way to automatically analyse database metrics, reducing the risk of errors and improving overall efficiency.</p>
<h3>12 Golden Signals</h3>
<p>Setup and operating high-performing databases requires observability of a large variety of signals across multiple buckets: CPU, Memory, Disk and Workload. Thanks to past incidents and empirical observations, we have reduced complexity so that only a few signals from each of the discussed buckets need to be analysed for making a reliable conclusion about the heals status of database instances. This is how we got twelve golden signals.</p>
<ol>
<li>
<p><strong>C1: CPU Utilisation</strong> <code>os.cpuUtilization.total</code> - typical database workloads are bound to memory or storage, high CPU is an anomaly that requires further investigation. Our past experience advises us that CPU utilisation over 40% - 60% on database instances eventually leads to incidents.</p>
</li>
<li>
<p><strong>C2: CPU Await</strong> <code>os.cpuUtilization.await</code> - the Linux kernel reports time is spent waiting for IO requests from its very beginning toward its end using await metric. Its high value indicates that a database instance is bound to the IO bandwidth of storage. Similar to the previous metric, we have concluded that any value above 5 - 10% eventually leads to incident.</p>
</li>
<li>
<p><strong>M1: Swapped In from disk</strong> <code>os.swap.in</code> - Swap is an extension of RAM into the disk. Operating system swaps the RAM pages into the disk and back when there is not enough memory to run the workload. Any intensive activities indicate that the database instance is running on low memory. Considering the disk performance is order of magnitude slower, any swap activity would slow down the operating system and its applications.</p>
</li>
<li>
<p><strong>M2: Swapped Out to disk</strong> <code>os.swap.out</code> - See explanation above.</p>
</li>
<li>
<p><strong>D1: Storage Read IO</strong> <code>os.diskIO.rdsdev.readIOsPS</code> - Storage IO bandwidth is an essential resource for high-performing databases. It is required to align the IO bandwidth with the overall database workload so that there is enough bandwidth to handle workload. In the case of AWS RDS, the metric value shall be aligned with the storage configuration deployed for database instance. With the GP2 volume type, IOPS are provisioned by volume size, 3 IOPS per GB of storage with a minimum of 100 IOPS. The IO volume type has an explicit value defined at deployment time. Note that a very low value shows that the entire dataset is served from memory.</p>
</li>
<li>
<p><strong>D2: Storage Write IO</strong> <code>os.diskIO.rdsdev.writeIOsPS</code> - See explanation above. Also note that a high number shows that the workload is write-mostly and potentially bound to the IO capacity of storage.</p>
</li>
<li>
<p><strong>D3: Storage IO Latency</strong> <code>os.diskIO.rdsdev.await</code> - Overall performance of storage is a function of its IO bandwidth and its latency. The latency metric reflects the time spent by the storage to load data blocks into memory. High storage latency implies a higher latency to conduct applications workload on the database. Our empirical observations show that storage latency above 10 ms eventually leads to incident, the latency above 5 ms impacts on applications SLOs. A typical storage latency for database systems should be less than 4 - 5 ms.</p>
</li>
<li>
<p><strong>P1: Cache Hit Ratio</strong> <code>db.Cache.blks_hit / (db.Cache.blks_hit + db.IO.blk_read)</code> - Databases do reading and writing of application data in blocks. The number of blocks read by the database from the physical storage has to be aligned with storage IO bandwidth provisioned to the database instance. Database caches these blocks in the memory to optimise the application performance. When clients request data, the database checks cached memory and if there is no relevant data there it has to read it from disk, thus queries become slower. Any values below 80 % show that databases have insufficient amount of shared buffers or physical RAM. Data required for top-called queries don't fit into memory, and the database has to read it from disk.</p>
</li>
<li>
<p><strong>P2: Blocks Read Latency</strong> <code>db.IO.blk_read_time</code> - The metric reflects the time used by the database to read blocks from the storage. High latency on the storage implies a high latency of application workload. We have observed an impact on SLOs when the latency has grown above 10 ms.</p>
</li>
<li>
<p><strong>P3: Database Deadlocks</strong> <code>db.Concurrency.deadlocks</code> - Number of deadlocks detected in this database. Ideally, it shall be 0. The application schema and IO logic requires evaluation if the number is high.</p>
</li>
<li>
<p><strong>P4: database transactions</strong> <code>db.Transactions.xact_commit</code> - Number of transactions executed by database. The low number indicates that the database instance is standby.</p>
</li>
<li>
<p><strong>P5: SQL efficiency</strong> [db.SQL.tup_fetched / db.SQL.tup_returned] - SQL efficiency shows the percentage of rows fetched by the client vs rows returned from the storage. The metric does not necessarily show any performance issue with databases but high ratio of returned vs fetched rows should trigger the question about optimization of SQL queries, schema or indexes. For example, If you do <code>select count(*) from million_row_table</code>, one million rows will be returned, but only one row will be fetched.</p>
</li>
</ol>
<h3>Open Source Command Line Utility</h3>
<p>AWS offers a wide range of observability solutions for AWS RDS such as AWS CloudWatch, AWS Performance Insights and others. These off-the-shelf solutions help anyone with setting up alerts and debugging anomalies when one of twelve golden signals is violated. We are only missing an efficient utility to holistically observe the status of the entire AWS RDS fleet in your account with “a single click of the button”.</p>
<p><img alt="Screenshot rds-health utility" src="https://engineering.zalando.com/posts/2024/02/images/rds-health-screenshot.png#center"></p>
<p>This is how the <a href="https://github.com/zalando/rds-health"><code>rds-health</code></a> utility was born. It conducts analysis of AWS RDS instances using time-series metrics collected by AWS Performance Insights. Actually, the utility is a frontend for AWS APIs that simply automates analysis of discussed golden signals across your accounts and regions. The utility can be easily customised to meet specific use cases, allowing users to tailor their workflows to their unique needs. Some of the key features include:</p>
<ul>
<li>Show configuration of all AWS RDS instances and clusters;</li>
<li>Check health of all AWS RDS deployments;</li>
<li>Conduct capacity planning for your AWS RDS deployments.</li>
</ul>
<p>Check out our open source project at https://github.com/zalando/rds-health. It guides you through simple installation and configuration steps together with tutorials about its features. We are looking forward to hearing your feedback and suggestions for improvement. Please raise <a href="https://github.com/zalando/rds-health">an issue on the project</a>.</p>
<h3>Conclusion</h3>
<p>Our objective is reduction of complexity through limiting the fragmentation within our engineering ecosystems by enabling teams with engineering and operational guidelines. The discussed methodology on how to detect anomalies with AWS RDS workload through 12 “golden signals” is one of the examples about solving the complexity at Zalando.</p>
<p>Standardisation is not only guidelines but also automations of repetitive tasks, freeing up time for more creative and strategic work. We are happy to empower the Open Source Community with our learning and approaches on observing AWS RDS instances at scale through open source utility. Apply these learnings within your teams.</p>
<p>If you have any questions about our methodology or open source utility <code>rds-health</code> itself, please raise <a href="https://github.com/zalando/rds-health">an issue on the project</a>. Contributions are welcomed and encouraged!</p>Paper Announcement: Joint Order Selection, Allocation, Batching and Picking for Large Scale Warehouses2024-01-29T00:00:00+01:002024-01-29T00:00:00+01:00Julius Pätzoldtag:engineering.zalando.com,2024-01-29:/posts/2024/01/paper-warehouse-order-batching.html<p>Sharing our latest research paper on warehouse order batching.</p><p>We, as the Zalando team BART, are excited to share our latest research paper, describing the optimization problem of order batching and picking in Zalando's warehouses. In this paper (preprint available on <a href="https://arxiv.org/abs/2401.04563">arxiv</a>), we formally introduce our proposed order batching problem and provide benchmark instances, two baseline algorithms, and a solution validation tool, all made publicly available on <a href="https://github.com/zalandoresearch/batching-benchmarks/">GitHub</a>. Our goal is to provide insights to the research community on planning and optimizing the warehouse order picking process in large-scale warehouses, such as Zalando's.</p>
<h3>The Underlying Optimization Problem</h3>
<p>Zalando Tech Logistics is responsible for creating the software that manages all Zalando warehouses and their processes. Team BART, part of Zalando's Logistics Algorithms department, provides the decision-making algorithms for order batching and picking. These decisions can be broken down into four parts:</p>
<ol>
<li>Order Selection: Which customer orders are processed next?</li>
<li>Item Allocation: Which warehouse items are used to fulfill a selected order?</li>
<li>Batching: Which selected orders are picked together?</li>
<li>Picking: How are batches split up into pick tours?</li>
</ol>
<p>Traditionally, these decision problems are considered individually and solved using simplified rules. For example, order selection could be done using a first-in-first-out approach. However, our experience and analysis of batching algorithms have shown that a purely sequential approach is far from optimal. While there has been some research on these problems in the literature, there is no closed formulation, to the best of our knowledge, that encapsulates all four problems into one. And this is exactly what we aim to achieve with our paper: We combine all of the four problems into one, named Joint Order Selection, Allocation, Batching and Picking.</p>
<h3>Benchmark Instances</h3>
<p>To ensure a clear understanding of the problem statement, we provide benchmark instances for the Joint Order Selection, Allocation, Batching, and Picking Problem. These instances allow anybody interested to immediately try out their ideas for solving this problem. Additionally, we share the implementation of two baseline algorithms described in the paper.</p>
<h3>Outlook</h3>
<p>We aim to stimulate academic discussion around the Joint Order Selection, Allocation, Batching, and Picking Problem. We believe there are practitioners and researchers interested in this type of optimization problem. By providing benchmark instances, we hope to establish a standard definition that can be easily adapted for further research.</p>
<p>Publishing this problem formulation also allows us to share insights on how we are solving this problem at Zalando. We look forward to sharing more in our next publication. In the meantime, we welcome any feedback and collaboration from the community: Feel free to share your feedback via <a href="https://github.com/zalandoresearch/batching-benchmarks/discussions">GitHub</a>.</p>Tale of 'metadpata': the revenge of the supertools2024-01-23T00:00:00+01:002024-01-23T00:00:00+01:00Bartosz Ocytkotag:engineering.zalando.com,2024-01-23:/posts/2024/01/tale-of-metadpata-the-revenge-of-the-supertools.html<p>One day in November 2022, we brought down our shop with a single character. This post recaps on the lessons we learned from this incident.</p><p><img alt="this is fine meme" src="https://engineering.zalando.com/posts/2024/01/images/this-is-fine.jpg#previewimage"></p>
<h2>The perfect storm</h2>
<p>In the mids of Cyber Week preparation in November 2022, I was DMd by a colleague with a request to quickly join a call. To my surprise as I was anticipating a 1:1 call, I got greeted by a message indicating that 60+ others are in the call as well. It turned out that I was just about to join an incident response call for what later got to be known internally as the "metadpata" incident.</p>
<p>In the call, a group of colleagues was trying to put the jigsaw pieces together analyzing why suddenly a large amount of DNS entries across our AWS accounts were removed, causing our shop to effectively go offline for our customers. Additionally, all of us except for the cloud infrastructure team were locked out of accessing AWS accounts and internal tools due to missing DNS entries, rendering the incident response difficult. In short – the classic DNS incident that you may be familiar with from other write-ups. Some helpful and lucky souls hastily started to copy their cached DNS entries before they expired. It was an all hands on deck situation with everyone focused on the single goal of restoring service for our customers ASAP. What followed in the incident call was a controlled disaster recovery with colleagues manually restoring DNS entries starting with essential tooling, followed by core infrastructure, and the services powering our on-site experiences to restore service for our customers.</p>
<p>How was it possible that the DNS entries across multiple accounts suddenly disappeared? The Pull Request that triggered the event was aimed at adjusting YAML configuration for our infrastructure. However, apart from changing configuration for a test account, it also contained a "p" character in one of the configuration fields called "metadata" transforming it into "metadpata". Yet, why was this single character so powerful and destructive?</p>
<h2>Enter supertools</h2>
<p>We coined the term <em>supertools</em> when working on the Post Mortem for the incident. These are applications or scripts that have the ability to execute large-scale changes across the infrastructure. Initially well intentioned as daemons automating creation of resources and implementing various stages of their lifecycle, they also perform cleanup operations that result in removal of resources. The latter operation, typically used for cleanup of resources that are to be decommissioned is easy to become subject to cost optimization. As part of cost-saving measures, the pacing of executing deletion operations was sped up.</p>
<p>The tool processing the configuration with the unfortunate typo is responsible for setting up AWS accounts. It is a background job that parses the configuration and computes the operations that are to be executed on each affected account. It uses the <code>metadata</code> object to calculate the accounts to work on. The typo resulted the configuration to be interpreted as "no accounts" which in turn was interpreted to be equal to the situation where all accounts are to be decommissioned. The deletion process was triggered and it managed to delete hosted zones containing DNS entries, which triggered the incident. Luckily, the deletion process ran into an error when performing the deletion operations, reducing the scope of the incident and the disaster recovery required.</p>
<h2>Incident response</h2>
<p>While our incident response culture is well established, this incident tested it to its full extent. In an all hands on deck situation, the cloud infrastructure team was focused on disaster recovery, organized via an incident call. Through an incident chat room, our colleagues were reporting the impact they still observed and reported on the progress of recovery in their clusters. The Incident Commanders focused on determining the approach and priority of the recovery efforts as well as on facilitating the communication between the chatroom and the incident call. Throughout the incident response we switched the Incident Commanders according to their areas of expertise which kept the incident response focused and efficient.</p>
<h2>Post Mortem</h2>
<p>Through great collaboration across teams to recover the needed DNS entries and restore service for our customers, we were back online in a few hours. As the first incident of its kind and with a large scale impact for our customers, it got high attention across the organization. Predictably, this resulted in an overload of Google Docs that limits the concurrent editors for the document who were working on the Post Mortem. To reduce the likelihood of this happening again, we've changed all links to Post Mortem documents shared with big audiences use the <code>/preview</code> URL by default.</p>
<p>Being close to the start of Cyber Week the focus for the team was to complete the Post Mortem analysis work and decide upon immediate actions to prevent a similar incident from happening. This included pausing changes to the configuration, a review of all supertools in place, and temporary deactivation of the relevant deletion processes. We also wrote a 1-pager summary of the incident and shared it proactively with the whole organization to keep everyone informed about the types of action items scheduled short- and mid-term as agreed during an Incident Review.</p>
<h2>Infrastructure changes</h2>
<p>An important and often vigorously discussed part of Post Mortems are the action items aimed at preventing recurrence of the incident. In our case, we analyzed how infrastructure changes are reviewed and rolled out a number of improvements with the aim of improving the validation and reducing the blast radius of infrastructure changes that go wrong. We will focus on the most impactful changes that were implemented.</p>
<h3>Account lifecycle management changes</h3>
<p>We have introduced a new step in the account decommissioning process that simulates deletion using Network ACLs. We also remove the delegation for the DNS zone assigned to the account to ensure that related CNAMEs will not resolve anymore. The account is left in this state for one week before proceeding further with the real decommissioning. This acts as a final "scream test" to make sure there are no more dependencies on this account.</p>
<p>Having assessed the trade-offs and risks for deletion of resources, we have additionally decided to be more careful with deletion of resources that have low cost savings potential compared to the impact a wrong deletion could have. These changes are now done manually and take a longer time to complete, an acceptable trade-off we're willing to take to reduce the risk. To mitigate the potential cost increase, we are monitoring the account costs for the previous 7 days. In case it is over a certain threshold, we look at deleting the resources manually.</p>
<h3>Change validation</h3>
<p>We've introduced a series of validation steps, for example stringent checks for the presence of mandatory keys and the preview of all stack templates using <a href="https://github.com/aws-cloudformation/cfn-lint">AWS CloudFormation Linter</a> before they get deployed.</p>
<p>Also, we have set up jsonschema validation for all our configuration files. All these checks run both locally (thanks to pre-commit hooks) and in the CI/CD pipelines. We also did some small quality of life improvements to enable autocompletion and schema validation in our local IDEs, which mitigates the possibility of typos and errors and is <a href="https://developers.redhat.com/blog/2020/11/25/how-to-configure-yaml-schema-to-make-editing-files-easier#yaml_schema">simple to set up</a>:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># yaml-language-server: $schema=schema/config_schema.json</span>
<span class="l l-Scalar l-Scalar-Plain">(your config)</span>
</code></pre></div>
<p>Additionally, for creation/decommissioning of critical resources, we have introduced several automated quality checks which ensure that all the change corresponds to the user request and the pull request description. These checks also introduce additional approval from the respective account or cost center owners and validation from respective managers. The checks are implemented as a GitHub bot that comments on the Pull Request and blocks the merge until all the checks are validated.</p>
<h3>Change previews</h3>
<p>We have implemented automated previews in the Pull Request comments. This feature leverages the <a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-changesets.html">AWS CloudFormation "ChangeSet" feature</a>. When an updated CF stack template is provided to the CloudFormation "CreateChangeSet" endpoint, CloudFormation generates a json preview of the changes, which then can be executed or rejected. We read this ChangeSet from each account in our AWS Organization and merge them to create a human readable preview of changes in a PR comment. After the preview is created, the ChangeSet is dropped.</p>
<p><img alt="Preview of changes in Pull Requests" src="https://engineering.zalando.com/posts/2024/01/images/cf-preview-gh.png#center"></p>
<figcaption style="text-align:center">Preview of changes in Pull Requests</figcaption>
<p><br/></p>
<h3>Phased rollout</h3>
<p>Our Kubernetes cluster rollout already included a phased rollout to different groups of clusters. This idea was extended to our AWS infrastructure. The rollout process adopted by our tooling now includes gradual rollout to different release channels, each associated with a few AWS account categories (e.g. playground, test, infra). All changes must go through all release channels before getting to production. This approach allows us to gradually deploy changes to different accounts, ensuring a more controlled propagation that catches errors early on with a limited blast radius. The trade-off here is of course that the rollout takes a longer time.</p>
<h2>Summary</h2>
<p>Supertools never sleep (unless you program them otherwise!). They're powerful yet often misjudged in review processes as they're expected to only trigger action in the scope of expected changes. As our story shows, this is highly dependent on the implementation and it's highly important to implement additional safety nets in the processes and tooling. We hope that the examples of changes we've implemented in our infrastructure will help you reflect and improve mechanisms in your own context.</p>
<hr>
<p><em>We're hiring! If you enjoy solving complex Engineering problems as we do, consider <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Applied%20Science&filters%5Bcategories%5D%5B1%5D=Product%20Design%2C%20User%20Research%20%26%20UX%20Writing&filters%5Bcategories%5D%5B2%5D=Product%20Management%20%28Technology%29&filters%5Bcategories%5D%5B3%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B4%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Frontend&filters%5Bcategories%5D%5B7%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B8%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B9%5D=Software%20Engineering%20-%20Machine%20Learning&filters%5Bcategories%5D%5B10%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B11%5D=Software%20Engineering%20-%20Principal%20Engineering&filters%5Bcategories%5D%5B12%5D=Applied%20Science%20%26%20Research&search=software%20engineering">joining our teams at Zalando</a>.</em></p>Using modules for Testcontainers with Golang2023-12-19T00:00:00+01:002023-12-19T00:00:00+01:00Fabien Pozzobontag:engineering.zalando.com,2023-12-19:/posts/2023/12/using-modules-for-testcontainers-with-golang.html<p>In this post, we explain how to use modules for Testcontainers with Golang and how to fix common issues.</p><p><img alt="Testcontainers with Go" src="https://engineering.zalando.com/posts/2023/12/images/go-test-containers.jpg#previewimage"></p>
<h2>Introduction</h2>
<p><a href="https://github.com/testcontainers/testcontainers-go">Testcontainers for Go</a> enables developers to run easily tests against containerized dependencies. In our previous articles, you can find <a href="https://engineering.zalando.com/posts/2021/02/integration-tests-with-testcontainers.html">an introduction of Integration tests with Testcontainers</a>
and <a href="https://engineering.zalando.com/posts/2022/04/functional-tests-with-testcontainers.html">explore how to write Functional tests with Testcontainers</a> (in Java).</p>
<p>This blog post will deep dive into how to use modules and a common issue for Testcontainers with Golang.</p>
<h3>What we use it for?</h3>
<p>Services often use external dependencies like datastore or queues.
It is possible to mock these dependencies but if you want to run for example integration test, it is better to verify against the real dependency (or close enough).</p>
<p>Starting a container with the image of the dependency is a convenient way to verify that the application works as expected.
With Testcontainers, starting the container is done programmatically so that you can define it as part of your tests. The machine running the tests (developer, CI/CD) requires to have a container runtime interface (e.g. Docker, Podman...)</p>
<h2>Basic implementation</h2>
<p>Testcontainers for Go is very easy to use, <a href="https://golang.testcontainers.org/quickstart/#3-spin-up-redis">the quick start example</a> is:</p>
<div class="highlight"><pre><span></span><code><span class="nx">ctx</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">TODO</span><span class="p">()</span>
<span class="nx">req</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">testcontainers</span><span class="p">.</span><span class="nx">ContainerRequest</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Image</span><span class="p">:</span><span class="w"> </span><span class="s">"redis:latest"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">ExposedPorts</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"6379/tcp"</span><span class="p">},</span>
<span class="w"> </span><span class="nx">WaitingFor</span><span class="p">:</span><span class="w"> </span><span class="nx">wait</span><span class="p">.</span><span class="nx">ForLog</span><span class="p">(</span><span class="s">"Ready to accept connections"</span><span class="p">),</span>
<span class="p">}</span>
<span class="nx">redisC</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">testcontainers</span><span class="p">.</span><span class="nx">GenericContainer</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">testcontainers</span><span class="p">.</span><span class="nx">GenericContainerRequest</span><span class="p">{</span>
<span class="w"> </span><span class="nx">ContainerRequest</span><span class="p">:</span><span class="w"> </span><span class="nx">req</span><span class="p">,</span>
<span class="w"> </span><span class="nx">Started</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="p">})</span>
<span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">defer</span><span class="w"> </span><span class="kd">func</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">redisC</span><span class="p">.</span><span class="nx">Terminate</span><span class="p">(</span><span class="nx">ctx</span><span class="p">);</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}()</span>
</code></pre></div>
<p>If we dive into the code above, we notice that:</p>
<ol>
<li><code>testcontainers.ContainerRequest</code> initialises a struct with container image, exposed port and waiting strategy parameters</li>
<li><code>testcontainers.GenericContainer</code> starts the container returning the container and error structs</li>
<li><code>redisC.Terminate</code> terminates the container with <code>defer</code> once the test is done</li>
</ol>
<h2>Implementing our own internal library</h2>
<p>From the example in the previous section, there is some minor inconvenience:</p>
<ol>
<li><code>wait.ForLog("Ready to accept connections")</code> uses logs to wait for start of the container which can break easily</li>
<li><code>ExposedPorts: []string{"6379/tcp"}</code> requires knowledge of the exposed port for Redis</li>
</ol>
<p>There might also be some additional environment variables and other parameters useful to run a Redis container which requires deeper knowledge.
As such, we decided to create an internal library which would initialise container with the default parameters required to ease test implementation.
To remain flexible, we used the <a href="https://golang.cafe/blog/golang-functional-options-pattern.html">Functional Options Pattern</a> so that consumer can still customize depending on the needs.</p>
<p>Example of implementation for Redis:</p>
<div class="highlight"><pre><span></span><code><span class="kd">func</span><span class="w"> </span><span class="nx">defaultPreset</span><span class="p">()</span><span class="w"> </span><span class="p">[]</span><span class="nx">container</span><span class="p">.</span><span class="nx">Option</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[]</span><span class="nx">container</span><span class="p">.</span><span class="nx">Option</span><span class="p">{</span>
<span class="w"> </span><span class="nx">container</span><span class="p">.</span><span class="nx">WithPort</span><span class="p">(</span><span class="s">"6379/tcp"</span><span class="p">),</span>
<span class="w"> </span><span class="nx">container</span><span class="p">.</span><span class="nx">WithGetURL</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">port</span><span class="w"> </span><span class="nx">nat</span><span class="p">.</span><span class="nx">Port</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"localhost:"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">port</span><span class="p">.</span><span class="nx">Port</span><span class="p">()</span>
<span class="w"> </span><span class="p">}),</span>
<span class="w"> </span><span class="nx">container</span><span class="p">.</span><span class="nx">WithImage</span><span class="p">(</span><span class="s">"redis"</span><span class="p">),</span>
<span class="w"> </span><span class="nx">container</span><span class="p">.</span><span class="nx">WithWaitingStrategy</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">container</span><span class="p">.</span><span class="nx">Container</span><span class="p">)</span><span class="w"> </span><span class="nx">wait</span><span class="p">.</span><span class="nx">Strategy</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">wait</span><span class="p">.</span><span class="nx">ForAll</span><span class="p">(</span>
<span class="w"> </span><span class="nx">wait</span><span class="p">.</span><span class="nx">NewHostPortStrategy</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">Port</span><span class="p">),</span>
<span class="w"> </span><span class="nx">wait</span><span class="p">.</span><span class="nx">ForLog</span><span class="p">(</span><span class="s">"Ready to accept connections"</span><span class="p">))</span>
<span class="w"> </span><span class="p">}),</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="c1">// New - create a new container able to run redis</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">New</span><span class="p">(</span><span class="nx">options</span><span class="w"> </span><span class="o">...</span><span class="nx">container</span><span class="p">.</span><span class="nx">Option</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">container</span><span class="p">.</span><span class="nx">Container</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">container</span><span class="p">.</span><span class="nx">Container</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">defaultPreset</span><span class="p">(),</span><span class="w"> </span><span class="nx">options</span><span class="o">...</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">o</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">o</span><span class="p">(</span><span class="o">&</span><span class="nx">c</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">c</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
<span class="c1">// Start - start a Redis container and return a container.CreatedContainer</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">Start</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span><span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="o">...</span><span class="nx">container</span><span class="p">.</span><span class="nx">Option</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">container</span><span class="p">.</span><span class="nx">CreatedContainer</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">p</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">New</span><span class="p">(</span><span class="nx">options</span><span class="o">...</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">container</span><span class="p">.</span><span class="nx">CreatedContainer</span><span class="p">{},</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">Start</span><span class="p">(</span><span class="nx">ctx</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>Usage of the library for Redis:</p>
<div class="highlight"><pre><span></span><code><span class="nx">ctx</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">TODO</span><span class="p">()</span>
<span class="nx">cc</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">redis</span><span class="p">.</span><span class="nx">Start</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">container</span><span class="p">.</span><span class="nx">WithVersion</span><span class="p">(</span><span class="s">"latest"</span><span class="p">))</span>
<span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">defer</span><span class="w"> </span><span class="kd">func</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cc</span><span class="p">.</span><span class="nx">Stop</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">);</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}()</span>
</code></pre></div>
<p>With this internal library, developers could easily add tests for Redis without the need to figure out the waiting strategy, exposed port, etc.
In case of incompatibility, the internal library could be updated to centrally fix the issue.</p>
<h2>Common issue - Garbage collector (Ryuk / Reaper)</h2>
<p>Testcontainers covers the extra mile of ensuring that container is removed once test is done using a <a href="https://golang.testcontainers.org/features/garbage_collector/#garbage-collector">Garbage Collector</a> which is an additional container started as a "sidecar".
This container is responsible for stopping the container being tested even if your test crash (which would prevent <code>defer</code> to run).</p>
<p>When using Docker, it works without problem, but with other container runtime interfaces (like Podman) often you will get this kind of error: <code>Error response from daemon: container create: statfs /var/run/docker.sock: permission denied: creating reaper failed: failed to create container</code>.</p>
<p>One way to "fix this" is to deactivate it with the environment variable <code>TESTCONTAINERS_RYUK_DISABLED=true</code>.</p>
<p>Another way is to set the Podman machine rootful and add:</p>
<div class="highlight"><pre><span></span><code><span class="nb">export</span><span class="w"> </span><span class="nv">TESTCONTAINERS_RYUK_CONTAINER_PRIVILEGED</span><span class="o">=</span>true<span class="p">;</span><span class="w"> </span><span class="c1"># needed to run Reaper (alternative disable it TESTCONTAINERS_RYUK_DISABLED=true)</span>
<span class="nb">export</span><span class="w"> </span><span class="nv">TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE</span><span class="o">=</span>/var/run/docker.sock<span class="p">;</span><span class="w"> </span><span class="c1"># needed to apply the bind with statfs</span>
</code></pre></div>
<p>In our internal library we took the approach of disabling it by default as developers had issues running it locally.</p>
<h2>Moving to modules</h2>
<p>Once our internal library was stable enough, we decided that it was time to give back to the community by contributing to Testcontainers.
But surprise... <a href="https://golang.testcontainers.org/modules/">modules</a> has just been introduced in Testcontainers.
Module is doing exactly what our internal library was for, we therefore migrated all our services to modules and discontinued the internal library.
From the migration, we learned that it was possible to use the standard library out of the box now that modules have been introduced, which reduces the maintenance cost of our services.
The main challenge was to fine-tune developer environment variables to run on the developer machine (make Garbage Collector work) using Makefile.</p>
<p>Adapted example from <a href="https://golang.testcontainers.org/modules/redis/#usage-example">testcontainers documentation</a>:</p>
<div class="highlight"><pre><span></span><code><span class="nx">ctx</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">TODO</span><span class="p">()</span>
<span class="nx">redisContainer</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">redis</span><span class="p">.</span><span class="nx">RunContainer</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span>
<span class="w"> </span><span class="nx">testcontainers</span><span class="p">.</span><span class="nx">WithImage</span><span class="p">(</span><span class="s">"docker.io/redis:latest"</span><span class="p">),</span>
<span class="p">)</span>
<span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">defer</span><span class="w"> </span><span class="kd">func</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">redisContainer</span><span class="p">.</span><span class="nx">Terminate</span><span class="p">(</span><span class="nx">ctx</span><span class="p">);</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}()</span>
</code></pre></div>
<h2>Conclusion</h2>
<p>Testcontainers for Golang is a great library to support testing which is even better now that modules have been introduced. Some small impediments with the Garbage collector exist, but that can be fixed easily as described in this post.</p>
<p>I hope with this blog, if you haven't already, that you will adopt Testcontainers, highly recommended to improve testability of your applications.</p>Migrating From Elasticsearch 7.17 to Elasticsearch 8.x: Pitfalls and Learnings2023-11-20T00:00:00+01:002023-11-20T00:00:00+01:00Maryna Cherniavskatag:engineering.zalando.com,2023-11-20:/posts/2023/11/migrating-from-elasticsearch-7-to-8-learnings.html<p>With Elasticsearch, moving from one major version to another is a big jump. Usually, it is updated in gradual increments, minor to minor version. It is difficult to make a big move. There's no official step-by-step and usually, it just doesn't happen. So, how did we approach it? Read on to find out.</p><h2>What this article is about</h2>
<ul>
<li>What kind of changes we had to make to the codebase</li>
<li>How we did the actual upgrade</li>
<li>What challenges we faced</li>
<li>How we did the data transfer</li>
<li>How the data was kept in sync</li>
</ul>
<h2>What this article is not</h2>
<ul>
<li>A step-by-step guide on how to upgrade Elasticsearch (read on to find out why).</li>
</ul>
<h2>Who we are</h2>
<p>We are a team from the Search & Browse department, the department in Zalando that is responsible for all things search (read: relevance, personalisation, sorting, filters, full text search, ... in short, everything that forms the search experience). The search applications are using Elasticsearch as the main datastore, so we are also the ones responsible for its well-being.</p>
<h2>Why upgrade</h2>
<p>We have been using Elasticsearch for a long time. It was upgraded more or less on a regular basis, but we were always a bit behind the latest version (Elastic has a regular release schedule; the releases are all scheduled well in advance). We were on version 7.17 for a while, and while we were pretty happy with it, we still had a few reasons to upgrade to 8.x.</p>
<p>First, we wanted to use the new features that were introduced in 8.0. Namely, <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#approximate-knn">the approximate kNN (k nearest neighbors) - or ANN-search</a>. The vector search was already used in Search & Browse, but it was <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#exact-knn">the exact kNN search</a>, the brute-force and less performant one. What Elastic says about the approximate vs exact kNN search is this:</p>
<blockquote>
<p>In most cases, you’ll want to use approximate kNN. Approximate kNN offers lower latency at the cost of slower indexing and imperfect accuracy.</p>
<p>Exact, brute-force kNN guarantees accurate results but doesn’t scale well with large datasets. With this approach, a <code>script_score</code> query must scan each matching document to compute the vector function, which can result in slow search speeds. However, you can improve latency by using a <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html">query</a> to limit the number of matching documents passed to the function. If you filter your data to a small subset of documents, you can get good search performance using this approach.</p>
</blockquote>
<p>There is also a <a href="https://www.elastic.co/blog/introducing-approximate-nearest-neighbor-search-in-elasticsearch-8-0">great article about ANN on Elastic blog
by Julie Tibshirani</a> - read it, you won't regret it.</p>
<p>Second, we also wanted to be on the latest version for performance and security reasons, because obviously, every new release has a lot of security fixes and performance improvements.</p>
<h2>Why it's difficult to upgrade</h2>
<p><img alt="You don't just upgrade Elasticsearch" src="https://engineering.zalando.com/posts/2023/11/images/youcantjustupdate.jpg"></p>
<figcaption style="text-align:center">Boromir telling you that you don't just upgrade Elasticsearch</figcaption>
<p>Usually, Elasticsearch is updated in gradual increments, minor to minor version, and it's difficult, not to mention dangerous, to make such a big move as going from one major version to another. Also, the documentation on the official website, while ample, is pretty disorganized, and there's no complete step-by-step for such an endeavor. And even if you were to gather all the information from the docs, it's still not enough. You need to know what to do with your data, how to keep it in sync, and how to make sure that the new version is working as expected.</p>
<p>In Zalando, the size of data is pretty massive. We have millions of articles in each country, and while the <a href="https://en.zalando.de/women/">gender root page for women</a> in Germany will show you 450k items, it's simply not the full picture. This number is just how many items at most get scanned to show you the first page. The actual number of items is much higher. And we currently have 28 domains (country + language combos), each with its own catalog. So in short, we have a lot of data, and we need to make sure that it's not lost or corrupted during the upgrade.</p>
<h2>How we approached the upgrade</h2>
<p>Another reason why one can't just go and upgrade Elasticsearch is because, well, it's not an island.</p>
<p>What I mean is, it's not some independent entity that has a value all by itself. It's our datastore, and it's used by a lot of our services. So before one goes and upgrades this massive thing, one should think of possible breaking changes in the product. And also, one should think about how it changes the actual usage of Elasticsearch.</p>
<p>The main search application in Zalando, the one that deals directly with Elasticsearch queries, is called Origami.
From the description on its (internal) repository page:</p>
<blockquote>
<p>Origami is the Zalando Core Search API. It provides a powerful information retrieval language and engine that integrates several microservice components built by the Search Department. In the landscape of Zalando Search and Browse platform, Origami is the connector - coordinating all search intelligence to serve correct search results to customers.</p>
<p>Origami builds on top of Elasticsearch and our internal/Zalando-specific suite of APIs. These APIs will facilitate composing/serving search and discovery, navigation, and analytics functionalities.</p>
</blockquote>
<p>The application is written in Scala and using a <a href="https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html">Java High Level REST Client, which got deprecated in Elasticsearch 7.15.0</a> and replaced by <a href="https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/7.17/introduction.html">ElasticSearch Java API client</a>, so first of all, we had to update the codebase to use the new client.</p>
<h3>Updating the codebase</h3>
<p>However, updating the codebase was also not a one-step task. (This just goes deeper into the rabbit hole, doesn't it?)</p>
<p>Origami has 443k lines of code in 846 files. Of course, a lot of these files are the configs and tests and test resources, so the actual number of Scala files is much lower. But still, it's a lot of code, and a lot of it is dealing with Elasticsearch.</p>
<p>Upgrading the Elasticsearch API to be able to work with version 8.x also represented a choice. We could either use the official <a href="https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/8.6/migrate-hlrc.html">Elasticsearch Java API Client</a>, or we could use the <a href="https://github.com/sksamuel/elastic4s">Elasticsearch Scala client</a> which seemed to be quite popular and had a lot of contributors (and stars) on GitHub. Both options were available and viable. Both had their pros and cons.</p>
<p>With the Elasticsearch Java API, the advantages would be:</p>
<ul>
<li>The library is officially supported and its versions match the Elasticsearch releases;</li>
<li>There is a ready-made DSL for all the REST APIs;</li>
<li>It’s open source and the code is available on GitHub. The license is Apache License 2.0.</li>
</ul>
<p>However:</p>
<ul>
<li>It’s in Java. This means that all the lambda types, collection types, etc. are not directly interoperable and special transformations should be done within our code;</li>
<li>We’re missing on the other Scala advantages like built-in immutability, null safety and so on.</li>
</ul>
<p>The unofficial Scala client is advertised as:</p>
<ul>
<li>Providing a type-safe, concise DSL;</li>
<li>Integrating with standard Scala futures or other effects libraries;</li>
<li>Using Scala collections library over Java collections;</li>
<li>Returning <code>Option</code> where the Java methods would return <code>null</code>;</li>
<li>Using Scala <code>Durations</code> instead of strings/longs for time values;</li>
<li>Supporting typeclasses for indexing, updating, and search backed by Jackson, Circe, Json4s, PlayJson and Spray Json implementations;</li>
<li>Supporting Java and Scala HTTP clients such as Akka-Http;</li>
<li>Providing reactive-streams implementation;</li>
<li>Providing a testkit subproject ideal for tests.</li>
</ul>
<p>The disadvantages, however, could not be ignored:</p>
<ul>
<li>It’s not official and the releases are not closely following Elastic’s release schedule. At the time we were looking at it, Elasticsearch was already at v8.7 and this library’s last version was 8.5.4. (It could work with Elasticsearch up to version 8.6 though);</li>
<li>Because it did not implement all the new features, there was no DSL for kNN search. KNN search was still available via sending a pure JSON query, but it was not a pretty option.</li>
</ul>
<p>In the end, we decided to go with the Elasticsearch Java API client. The main reason was that it was officially supported and the releases were closely following the Elasticsearch releases, and it wouldn't just disappear into thin air in the unlikely case when its creator would suddenly want to quit. Also, it had DSL for all the REST APIs. The absense of the kNN search DSL in the Scala library was really disappointing, because approximate kNN search was one of the main reasons why we wanted to upgrade in the first place.</p>
<p>So, the choice was made.</p>
<p>But.</p>
<p>As I said before, this was a large application.</p>
<p>How does one make sure that no existing functionality is going to break when upgrading the API? How does one make sure that all the existing queries are still going to work?</p>
<p><strong>Obviously, you write a test.</strong></p>
<h3>Writing a test</h3>
<p>There was one more decision that we made while selecting a migration strategy, and that was to start with <a href="https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/migrate-hlrc.html#_compatibility_mode_using_a_7_17_client_with_elasticsearch_8_x">compatibility mode</a>. This meant that we would use the Elasticsearch High Level Rest Client from version 7.x, but in the compatibility mode, so that it would instruct Elasticsearch 8.x to behave like the old client. This way we would be able to upgrade the Elasticsearch cluster first, and then upgrade the client gradually. With this approach, we would avoid rewriting too much code at once. And afterward, we would be able to use one of the <a href="https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/migrate-hlrc.html#_transition_strategies">transition strategies, recommended by Elasticsearch, to gradually upgrade the client</a>.</p>
<p>This approach was also a good fit, since we assumed that we might have a time during the transition phase when the application would have to deal with both Elasticsearch 7.x and Elasticsearch 8.x. Because our Elasticsearch was a multi-cluster deployment, it would be practically impossible to upgrade in one go. We would have to start with less mission-critical clusters, and then gradually move to the more important ones. So, we would definitely have to deal with both versions of Elasticsearch for some time.</p>
<p>So how to write such a test?</p>
<p>This is where <a href="https://testcontainers.com/">Testcontainers</a> shine. Basically, we had a helper class looking like this:</p>
<div class="highlight"><pre><span></span><code><span class="k">object</span><span class="w"> </span><span class="nc">ESContainers</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="nc">Version7179</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"7.17.9"</span>
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="nc">Version86</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"8.6.2"</span>
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="nc">Version88</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"8.8.2"</span>
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="nc">VersionDefault</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">Version7179</span>
<span class="w"> </span><span class="k">def</span><span class="w"> </span><span class="nf">initAndStartESContainer</span><span class="p">(</span><span class="n">version</span><span class="p">:</span><span class="w"> </span><span class="nc">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">VersionDefault</span><span class="p">):</span><span class="w"> </span><span class="nc">ElasticsearchEndPoint</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">container</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">ElasticsearchContainer</span><span class="p">(</span><span class="s">s"docker.elastic.co/elasticsearch/elasticsearch:</span><span class="si">$</span><span class="n">version</span><span class="s">"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="n">withReuse</span><span class="p">(</span><span class="kc">true</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="n">withCreateContainerCmdModifier</span><span class="p">(</span><span class="n">cmd</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">cmd</span><span class="p">.</span><span class="n">getHostConfig</span><span class="p">.</span><span class="n">withCapAdd</span><span class="p">(</span><span class="nc">Capability</span><span class="p">.</span><span class="nc">SYS_CHROOT</span><span class="p">))</span>
<span class="w"> </span><span class="n">container</span><span class="p">.</span><span class="n">start</span><span class="p">()</span>
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">hostAndPort</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">container</span><span class="p">.</span><span class="n">getHttpHostAddress</span><span class="p">.</span><span class="n">split</span><span class="p">(</span><span class="s">":"</span><span class="p">)</span>
<span class="w"> </span><span class="nc">ElasticsearchEndPoint</span><span class="p">(</span><span class="n">hostAndPort</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span><span class="w"> </span><span class="n">hostAndPort</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="n">toInt</span><span class="p">,</span><span class="w"> </span><span class="n">container</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>And then, in the test, we would just do this to start Elasticsearch with the version we needed.</p>
<div class="highlight"><pre><span></span><code><span class="k">private</span><span class="w"> </span><span class="k">lazy</span><span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">endpoint</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">ESContainers</span><span class="p">.</span><span class="n">initAndStartESContainer</span><span class="p">(</span><span class="nc">Version88</span><span class="p">)</span>
</code></pre></div>
<p>Since at some point we'd have to deal with both versions of the API, we had to test three combinations:</p>
<ul>
<li>Elasticsearch 7.x with Elasticsearch 8.x API;</li>
<li>Elasticsearch 8.x with Elasticsearch 8.x API;</li>
<li>Elasticsearch 8.x with Elasticsearch 7.x API.</li>
</ul>
<p>And with each, we needed to make sure that the common types of actions, done by the application, continue to work as expected.</p>
<p>So this is exactly what we did. We wrote three test classes:</p>
<ul>
<li><code>NewClientWithOldElasticTest</code></li>
<li><code>OldClientWithNewElasticTest</code></li>
<li><code>NewClientWithNewElasticTest</code></li>
</ul>
<p><strong>Why is there no <code>OldClientWithOldElasticTest</code>? Because we already knew that it was working. It was what the application we already had.</strong></p>
<p>Each class was checking that the application was able to do the following:</p>
<ul>
<li>Create an index;</li>
<li>Create a document;</li>
<li>Create kNN vector mappings;</li>
<li>Index kNN vector data;</li>
<li>Search for a document with a kNN query;</li>
<li>Delete an index;</li>
<li>Close the client.</li>
</ul>
<p>The tests were not covering all the queries that we ran - only the common types. But even with this simplified approach we were able to discover a few issues, for which we had to make changes to the codebase.</p>
<h3>Issues discovered and fixes applied</h3>
<ul>
<li>Elasticsearch 8 deprecated the <code>_type</code> field in search response, so we had to remove it from all the test case resources that represented example JSONs for the expected response.</li>
<li>Elasticsearch 8 didn't allow null in the <code>is_write</code> parameter when creating an alias for the index. Therefore, code was added to set this flag explicitly.</li>
<li>Range query based on date/epoch_second <a href="https://discuss.elastic.co/t/date-range-not-working-as-expected-between-elasticsearch-7-17-and-elasticsearch-8-6/328825">didn't work with upper/lower bounds specified as numbers</a>. (According to the Elastic team, it was a feature and would not be fixed). Due to that, the range boundaries had to be stringified before being passed to Elasticsearch.</li>
<li>In Elasticsearch 8, a cluster setting called <code>action.destructive_requires_name</code> now defaults to <code>true</code> instead of <code>false</code>. Since our e2e tests were dropping all test indexes by wildcard before starting, they all started crashing. So, a change was introduced to update this setting on a cluster to allow the test suits run this action. The method that was doing it was only used in test suites, because for a real production cluster, it's pretty unsafe.</li>
</ul>
<p>Moreover, when we started to switch the other, more detailed integration tests to Elasticsearch 8, we found an issue that was a little more involved. Some of those tests started to fail with the following error:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"query_shard_exception"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"reason"</span><span class="p">:</span><span class="w"> </span><span class="s2">"it is mandatory to set the [nested] context on the nested sort field: [trace.origami.timestamp]."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"index_uuid"</span><span class="p">:</span><span class="w"> </span><span class="s2">"_xvEa8gNSFyCDm0aFXqYhg"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"index"</span><span class="p">:</span><span class="w"> </span><span class="s2">"article_1"</span>
<span class="p">}</span>
</code></pre></div>
<p>That seemed to refer to the sort clause that we had in the e2e test suite:</p>
<div class="highlight"><pre><span></span><code><span class="nt">"sort"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"trace.origami.timestamp"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"order"</span><span class="p">:</span><span class="w"> </span><span class="s2">"desc"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">]</span>
</code></pre></div>
<p>The page about sorting on a nested field for ES 8.8 (current at that time) <a href="https://www.elastic.co/guide/en/elasticsearch/reference/8.8/sort-search-results.html#nested-sorting">says that there should be a path specified in a "nested.path" clause of the sort</a>. However, the <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.17/sort-search-results.html#nested-sorting">same page for ES 7.17 states exactly the same</a>, but the query still runs fine without that clause.</p>
<p>So something changed between the versions in such a way that it started erroring out in ES8, whereas in ES7 it was working fine, despite the docs stating that the parameter is non-optional (<a href="https://discuss.elastic.co/t/nested-sorting-differs-between-es7-and-es8/337904/2">the thread I created on ES discussion board suggests there was a bug and it was fixed</a>). So, we had to add the <code>nested.path</code> clause to the sort clauses in the queries that were sorting on nested fields, meaning that the sort clause from the example above would now look like this.</p>
<div class="highlight"><pre><span></span><code><span class="nt">"sort"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"trace.origami.timestamp"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"order"</span><span class="p">:</span><span class="w"> </span><span class="s2">"desc"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"nested"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"path"</span><span class="p">:</span><span class="w"> </span><span class="s2">"trace"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">]</span>
</code></pre></div>
<h2>Deprecating Elasticsearch settings in preparation for 8.x migration</h2>
<p>Summary of changes:</p>
<ul>
<li><a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-threadpool.html#fixed-auto-queue-size">Remove fixed_auto_queue_size thread pool</a>. It’s replaced with the normal fixed thread pool configuration.</li>
<li>Replace deprecated <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.17/breaking-changes-7.1.html#_deprecation_of_old_transport_settings">transport.tcp.compress</a>.</li>
<li>Replace node role settings with new <code>node.roles</code> settings (see <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.17/breaking-changes-7.9.html#breaking_79_settings_changes">one</a> and <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-node.html#coordinating-only-node">two</a>).</li>
<li>Due to <a href="https://github.com/elastic/elasticsearch/issues/65577">an existing bug</a>, the coordinating role needs to be set as a default which can in turn be overridden by setting the <code>node.roles</code> environment variables with specific values.</li>
<li>Remove deprecated <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.17/breaking-changes-7.7.html#deprecate-defer-cluster-recovery-settings">gateway.recover_after_master_nodes setting</a>.</li>
<li>Add human approval to prevent upgrading master nodes before data nodes.</li>
<li>Explicitly disable the serial GC using <code>-XX:-UseSerialGC</code> to avoid the following error messages during start up:
<code>text
Error occurred during initialization of VM
Multiple garbage collectors selected</code>
even though <code>-XX:+UseZGC</code> or <code>-XX:+UseG1GC</code> is explicitly enabled. Most likely an intermediate script was logging this message. In ES 8.x the container can unsuccessfully exit because of this error.</li>
<li>Coordinating nodes are <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-node.html#coordinating-only-node">enabled by default by specifying an empty value</a>.</li>
<li>Data nodes will only have <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-node.html#data-node">the “data” role defined</a>.</li>
<li>Monitoring checks had to be updated because the role abbreviations changed and became stricter than before.</li>
</ul>
<h2>How we did the actual upgrade</h2>
<p>Finally, it seemed that the application was prepared to work with non-homogenous Elasticsearch versions. At last, it was time to upgrade the Elasticsearch cluster itself.</p>
<p>There is a <a href="https://www.elastic.co/guide/en/elastic-stack/8.11/upgrading-elastic-stack.html#prepare-to-upgrade">documentation page</a> with some advice about going from 7.x to 8.x, and it states that first, one should move to 7.17. From there, it is recommended to use an <a href="https://www.elastic.co/guide/en/kibana/7.17/upgrade-assistant.html">Upgrade Assistant</a> tool to help prepare for the upgrade. As an alternative, is also recommended to use the <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-reindex.html">Reindex API</a> to reindex the data from the old version to the new one.</p>
<p>So in short, Elasticsearch provides two ways to upgrade:</p>
<ul>
<li>The <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.17/rolling-upgrades.html">rolling upgrade</a> approach;</li>
<li><a href="https://www.elastic.co/guide/en/elastic-stack/8.11/upgrade-elastic-stack-for-elastic-cloud.html#upgrading-reindex">Upgrading via reindex</a>.</li>
</ul>
<p>First one is upgrading live. It means that you upgrade the cluster node by node, and the cluster is still available during the upgrade. The second one is upgrading via reindex. It means that you create a new cluster, and you reindex the data from the old cluster to the new one. Then you switch the traffic to the new cluster and shut down the old one.</p>
<p>In general, <a href="https://www.elastic.co/guide/en/elastic-stack/current/upgrading-elasticsearch.html">Elastic recommends</a> doing a rolling upgrade in a following way:</p>
<ul>
<li>Upgrade the data nodes first;</li>
<li>Upgrade other non-master nodes (ML-dedicated, coordinating, etc.);</li>
<li>Upgrade the master nodes.</li>
</ul>
<p>This is because the data nodes can join the cluster with the master nodes of a lower version, but older data nodes can't always join the newer cluster. So, if you upgrade the master nodes first, the data nodes might fail to join it, and the cluster will be unavailable.</p>
<p>In general, the rolling upgrade is the recommended way to upgrade, because it's less disruptive. However, in our case, it represented too many dangers. First of all, we have a multi-cluster deployment, and the clusters are pretty large, so we're talking about some terabytes of data. It would take a lot of time to upgrade the cluster node by node, and during this time, the cluster would be in a mixed state, with some nodes being upgraded and some not, with relocating shards, and in general in a degraded state.</p>
<p>That, in itself, wouldn't be so scary. What would indeed be bad is if something were to go wrong. If we faced data loss, we'd have no choice but to go with restoring the data from snapshots and then resetting the input streams to bring the data up to date. This would take quite some time, because we'd have to do it for all the indices in the cluster, and during all this time, the catalog of products would either be unavailable or would have stale or partial data.</p>
<p>So, we decided to go with the second option, the reindexing. It meant that we'd have to create a new cluster, reindex the data from the old one, and then gradually switch the traffic to the new cluster. It would take more time, but it would be way less risky and less disruptive, because when the data would be in sync, going to the new cluster would be just a matter of switching the routing. If something went wrong, the rollback procedure would be almost instantaneous as it would again be just the routing switched back.</p>
<p><strong>And last but not least, having both clusters running side by side would give us time to test the new cluster and make sure that it was working as expected and performed at the same level. We could first test if with shadow traffic, and then gradually increase the traffic to the new cluster and decrease it on the old one.</strong></p>
<h3>Procedure per cluster</h3>
<p>The procedure for each of out cluster would be similar and would include the following steps:</p>
<ul>
<li>Deploy ES8 cluster.</li>
<li>Setup monitoring.</li>
<li>Create index templates (because if we were to index the data from the old cluster, we'd have to make sure that the new cluster has the same index templates as the old one).</li>
<li>Restore data from the latest snapshot.</li>
<li>Set up the shadow <strong>intake</strong> traffic. This meant that the data would gradually converge with the old cluster, but the queries would still be served by the old cluster. If we were to consider the moment the snapshot was taken as point A and the moment shadow intake was enabled on the new cluster as point B, then it would mean that we have full data from beginning to A, and then from B to the end.</li>
<li>That left us with the gap between points A and B, so the next step would be to perform the data update by resetting the data streams to the point of just before the snapshot was taken.</li>
<li>Shadow query traffic. This would be performed gradually, with monitoring for errors.</li>
<li>Verify that the new cluster works as expected and compare the cluster performance with the old one.</li>
<li>Switch the live traffic to ES8 cluster (again, gradually shifting the percentages).</li>
<li>Remove old traffic and clean up old cluster resources.</li>
</ul>
<p>If these steps sound familiar, it is because they are. It is basically the Blue/Green procedure that is usually used for disaster recovery (failover cluster), or for testing something new. The only difference is that we were using it for the one-time Elasticsearch cluster upgrade and not keep the second cluster around. (We are also looking into applying the same approach for the failover cluster, but since our deployments are very large and complicated, we're still getting there.) This Blue/Green approach was also used by the team behind <a href="https://www.zalando-lounge.de">Zalando Lounge</a> which has a separate catalog of products, also backed by Elasticsearch, so we had some in-house experience to compare with.</p>
<h4>Routing and shadowing</h4>
<p>The whole mechanism is based on a delicate balance of routing and shadowing. We use an open-sourced solution called <a href="https://opensource.zalando.com/skipper/">Skipper</a> as an ingress controller, which gives us access to <a href="https://opensource.zalando.com/skipper/reference/filters/">filters</a>. For the routing, we're using a custom resource type called <a href="https://opensource.zalando.com/skipper/kubernetes/routegroups/">RouteGroup</a>. For example, to ensure that the intake pipeline ingests data into the new cluster, the route group configuration needs to be modified to shadow the <strong>intake</strong> traffic for the <code>/bulk</code> and <code>/_alias/{index}_write</code> endpoints. Here is a somewhat simplified example configuration for shadowing the specified endpoints:</p>
<div class="highlight"><pre><span></span><code><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">zalando.org/v1</span>
<span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">RouteGroup</span>
<span class="nt">spec</span><span class="p">:</span>
<span class="w"> </span><span class="nt">hosts</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">cluster-name-{{{CLIENT}}}.ingress.cluster.local</span>
<span class="w"> </span><span class="nt">backends</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">backend-old</span>
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">network</span>
<span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="s">"http://backend-old.ingress.cluster.local"</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">backend-new</span>
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">network</span>
<span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="s">"http://backend-new.ingress.cluster.local"</span>
<span class="w"> </span><span class="nt">routes</span><span class="p">:</span>
<span class="w"> </span><span class="c1">## match to shadow /_bulk, /_alias/{index}_ad*_write to new backend with ES8</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">pathSubtree</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/</span>
<span class="w"> </span><span class="nt">pathRegexp</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">^/(_bulk|_alias/(index-name-template)_[\d]+_write)$</span>
<span class="w"> </span><span class="nt">predicates</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">HeaderRegexp("elasticsearch-index-name", "^(index-name-template)_[\d]+($|_.*)")</span>
<span class="w"> </span><span class="nt">filters</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">teeLoopback("intake_shadow")</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">preserveHost("false")</span>
<span class="w"> </span><span class="nt">backends</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">backendName</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">backend-old</span>
<span class="w"> </span><span class="c1">## shadow "intake_shadow" matched requests to new backend with ES8</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">pathSubtree</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/</span>
<span class="w"> </span><span class="nt">pathRegexp</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">^/(_bulk|_alias/(index-name-template)_[\d]+_write)$</span>
<span class="w"> </span><span class="nt">predicates</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">HeaderRegexp("elasticsearch-index-name", "^(index-name-template)_[\d]+($|_.*)")</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Tee("intake_shadow")</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Weight(2)</span><span class="w"> </span><span class="c1">## hack required to not match route with Traffic() and teeLoopback()</span>
<span class="w"> </span><span class="nt">filters</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">preserveHost("false")</span>
<span class="w"> </span><span class="nt">backends</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">backendName</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">backend-new</span>
</code></pre></div>
<p>But that's not all. Before shadowing the intake, the mapping templates should be created. One way to do it would be to just grab them and recreate to the new cluster. But that would mean that we'd have to do it manually, and also we might miss the updates to them if they were to happen while the clusters were still running side by side. Since the templates are stored in our code repos and updated (based on the version) on application restart, the traffic related to template creation also should have been shadowed, so we had to capture this specific traffic too. Snippet of code (shortened):</p>
<div class="highlight"><pre><span></span><code><span class="nt">spec</span><span class="p">:</span>
<span class="w"> </span><span class="nt">routes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/:index/_mapping</span>
<span class="w"> </span><span class="nt">predicates</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">HeaderRegexp("elasticsearch-index-name", "^(index-name-template)_[\d]+($|_.*)")</span>
<span class="w"> </span><span class="c1">## <...></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/_template/*</span>
<span class="w"> </span><span class="nt">predicates</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">HeaderRegexp("elasticsearch-index-name", "^(index-name-template)_[\d]+($|_.*)")</span>
</code></pre></div>
<h4>Monitoring</h4>
<p>The whole process would make no sense if we were going blind. Since it was a multistep procedure, we needed to see how each step is changing the data, affecting the cluster, performing compared to the old cluster, etc. So we needed to set up monitoring. It was based on creating <a href="https://docs.lightstep.com/docs/welcome-to-lightstep">Lightstep</a> streams and setting up the dashboards in <a href="https://grafana.com">Grafana</a>. The dashboards were showing the traffic from both clusters side by side per endpoint, and the key metrics like latency and error rate. We also monitored CPU and memory consumption via Kubernetes.</p>
<p><strong>One of the most important things was that the data would be in sync, so the boards also had index sizes and the difference between them for the old and new cluster. This way, we could see if say restoring from the snapshot was indeed successful and if the follow-up of shadow intake and stream resetting was resulting in data converging in the end.</strong></p>
<h4>Alerting</h4>
<p>And last but not least, before each new cluster went live, we had to update alerts and checks that were set up on the corresponding old cluster. We had to make sure that the alerts were pointing to the new cluster and that the checks were still working as expected. We also had to make sure that the alerts were not firing during the upgrade.</p>
<h4>Backing up the data</h4>
<p>And of course, as soon as the new cluster went live serving queries and the data on the old cluster stopped being updated (or preferably before that), we set up the snapshotting. We had to make sure that the data was backed up, using the same policies that the previous cluster was using.</p>
<h2>Challenges we faced</h2>
<p>The process of upgrading the cluster was not without challenges. Some of them were expected, some were not, and some were purely based on people never having performed some procedures before, or on something slipping one's attention.</p>
<p>One such thing resulted in duplicates being shown in the product catalog country-wide, because there was a routing error while switching the country index from an old cluster to the new one, so one extra index was created automatically (and erroneously) and for some time two different indices with duplicate content were existing behind the same alias. But that was quickly fixed, and the duplicates were removed by just dropping the mistakenly created index. (And hey, it's better to show the product twice than not to show it at all, right?)</p>
<p>In general, the whole process was an amazing learning experience, and the whole team is now better prepared for the next upgrade and feels more confident tackling Elasticsearch in general. So, while assuredly sh*t still can and will happen, what matters is how you deal with it and what you learn from it.</p>
<p>For example, the difficulty experienced by team members while restoring the data was a good indicator that our existing procedure of restoring from snapshot was extremely fussy and error-prone, which resulted in looking for alternative solutions, like Kibana-based workflows, to make the process more straightforward and more obvious. Historically, we were using custom scripts and our CI pipeline for that, but now we're aiming to get our engineers better acquainted with Kibana. The scripts are still the default way, but we're getting there.</p>
<h2>Success!</h2>
<p>As always after a big project, we had a retrospective, and the team was pretty happy with the results. The upgrade was successful, and the new cluster was performing at the same level as the old one. The new features were working as expected, and the new cluster was stable. The monitoring was set up, and the dashboards were showing the data in sync. The alerts were firing as expected, and the checks were working. So all in all, it was a success.</p>
<p>But you know what?</p>
<p>Products keep upgrading. Progress is the only constant thing in the world. So, we're already looking into the next upgrade, and we're already thinking about how to make it even better.</p>
<p>And we will keep evolving, because that's what we do.</p>
<p><strong>We're Zalando. We dress code.</strong></p>
<p>(See what I did here? Even though I can't take any credit for this. This is a slogan that we once had on our company hoodies!)</p>
<h2>Helpful links</h2>
<ul>
<li><a href="https://www.elastic.co/guide/en/kibana/7.17/upgrade-assistant.html">Elasticsearch upgrade assistant</a></li>
<li><a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-reindex.html">Elasticsearch reindex API</a></li>
<li><a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.17/rolling-upgrades.html">Elasticsearch rolling upgrades</a></li>
<li><a href="https://www.elastic.co/guide/en/elastic-stack/current/upgrading-elasticsearch.html">Elasticsearch upgrade guide</a></li>
<li><a href="https://www.elastic.co/guide/en/cloud/current/ec-snapshot-restore.html">Restoring the Elasticsearch data from snapshot</a></li>
</ul>
<hr>
<p><em>We're hiring! If you enjoy solving complex Engineering problems as we do, consider <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Applied%20Science&filters%5Bcategories%5D%5B1%5D=Product%20Design%2C%20User%20Research%20%26%20UX%20Writing&filters%5Bcategories%5D%5B2%5D=Product%20Management%20%28Technology%29&filters%5Bcategories%5D%5B3%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B4%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Frontend&filters%5Bcategories%5D%5B7%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B8%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B9%5D=Software%20Engineering%20-%20Machine%20Learning&filters%5Bcategories%5D%5B10%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B11%5D=Software%20Engineering%20-%20Principal%20Engineering&filters%5Bcategories%5D%5B12%5D=Applied%20Science%20%26%20Research&search=software%20engineering">joining our teams at Zalando</a>.</em></p>
<p><br /></p>Mastering Testing Efficiency in Spring Boot: Optimization Strategies and Best Practices2023-11-14T00:00:00+01:002023-11-14T00:00:00+01:00Hassan Elseoudytag:engineering.zalando.com,2023-11-14:/posts/2023/11/mastering-testing-efficiency-in-spring-boot-optimization-strategies-and-best-practices.html<p>Unlock the secrets to supercharging your Spring Boot tests! Explore how we utilized specific techniques, resulting in a 60% reduction in test runtime!</p><h2>Introduction 🚀</h2>
<p>Hey there, fellow engineers! Let's dive into the exciting world of Spring Boot testing with JUnit. It is incredibly powerful, providing a realistic environment for testing our code. However, if we don't optimize our tests, they can be slow and negatively affect lead time to changes for our teams.</p>
<p>This blog post will teach you how to optimize your Spring Boot tests, making them faster, more efficient, and more reliable.</p>
<p>Imagine an application whose tests take 10 minutes to execute. That's a lot of time! Let's roll up our sleeves and see how we can whiz through those tests in no time! 🕒✨</p>
<h2>Understanding Test Slicing in Spring</h2>
<p>Test slicing in Spring allows testing specific parts of an application, focusing only on relevant components, rather than loading the entire context. It is achieved by annotations like <code>@WebMvcTest</code>, <code>@DataJpaTest</code>, or <code>@JsonTest</code>. These annotations are a targeted approach to limit the context loading to a specific layer or technology. For instance, <code>@WebMvcTest</code> primarily loads the Web layer, while <code>@DataJpaTest</code> initializes the Data JPA layer for more concise and efficient testing. This selective loading approach is a cornerstone in optimizing test efficiency.</p>
<p>There are more annotations that can be used to slice the context. See official Spring <a href="https://docs.spring.io/spring-boot/docs/current/reference/html/test-auto-configuration.html#appendix.test-auto-configuration.slices">documentation on Test Slices</a>.</p>
<h2>Test Slicing: Using @DataJpaTest as a replacement for @SpringBootTest 🧩</h2>
<p>Let's take a look at an example (code below). The test first deletes all the data (shipments and containers, each shipment can have multiple containers) from the target tables, and then saves a new shipment. Next, it creates a thread pool with 50 threads, where each thread calls the <code>svc.createOrUpdateContainer</code> method.</p>
<p>The test will wait until all the threads are finished, then it will check that the database has only one container.</p>
<p>It's all about checking concurrency issues and involves a swarm of threads, clocking in at about 16 seconds on my machine – a massive chunk of time for a single service check, right?</p>
<div class="highlight"><pre><span></span><code><span class="nd">@ActiveProfiles</span><span class="p">(</span><span class="s">"test"</span><span class="p">)</span>
<span class="nd">@SpringBootTest</span>
<span class="kd">abstract</span><span class="w"> </span><span class="kd">class</span><span class="w"> </span><span class="nc">BaseIT</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">lateinit</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nv">shipmentRepo</span><span class="p">:</span><span class="w"> </span><span class="n">ShipmentRepository</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">lateinit</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nv">containerRepo</span><span class="p">:</span><span class="w"> </span><span class="n">ContainerRepository</span>
<span class="p">}</span>
<span class="kd">class</span><span class="w"> </span><span class="nc">ContainerServiceTest</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">BaseIT</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">lateinit</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nv">svc</span><span class="p">:</span><span class="w"> </span><span class="n">ContainerService</span>
<span class="w"> </span><span class="nd">@BeforeEach</span>
<span class="w"> </span><span class="kd">fun</span><span class="w"> </span><span class="nf">setup</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">shipmentRepo</span><span class="p">.</span><span class="na">deleteAll</span><span class="p">()</span>
<span class="w"> </span><span class="n">containerRepo</span><span class="p">.</span><span class="na">deleteAll</span><span class="p">()</span>
<span class="w"> </span><span class="n">shipmentRepo</span><span class="p">.</span><span class="na">save</span><span class="p">(</span><span class="n">shipment</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nd">@Test</span>
<span class="w"> </span><span class="kd">fun</span><span class="w"> </span><span class="nf">testConcurrentUpdatesForContainer</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="nv">executor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Executors</span><span class="p">.</span><span class="na">newFixedThreadPool</span><span class="p">(</span><span class="m">50</span><span class="p">)</span>
<span class="w"> </span><span class="n">repeat</span><span class="p">(</span><span class="m">50</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">executor</span><span class="p">.</span><span class="na">execute</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">containerService</span><span class="p">.</span><span class="na">createOrUpdateContainer</span><span class="p">(</span><span class="s">"</span><span class="si">${</span><span class="n">shipment</span><span class="p">.</span><span class="na">id</span><span class="si">}${</span><span class="n">svc</span><span class="p">.</span><span class="na">DEFAULT_CONTAINER</span><span class="si">}</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">Patch</span><span class="p">(</span><span class="s">"NEW_LABEL"</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">executor</span><span class="p">.</span><span class="na">shutdown</span><span class="p">()</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">executor</span><span class="p">.</span><span class="na">awaitTermination</span><span class="p">(</span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">TimeUnit</span><span class="p">.</span><span class="na">MILLISECONDS</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// busy waiting for executor to terminate</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">assertThat</span><span class="p">(</span><span class="n">containerRepo</span><span class="p">.</span><span class="na">find</span><span class="p">(</span><span class="n">shipment</span><span class="p">)).</span><span class="na">hasSize</span><span class="p">(</span><span class="m">1</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The first problem we have is the class declaration:</p>
<div class="highlight"><pre><span></span><code><span class="kd">class</span><span class="w"> </span><span class="nc">ContainerServiceTest</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">BaseIT</span><span class="p">()</span>
</code></pre></div>
<p>The issue starts with the <code>BaseIT</code> class using <code>@SpringBootTest</code>. This causes the Spring context for the entire application to be loaded (every time we mess with context caching mechanisms, we'll get to that later!). When the application is large enough, a huge number of beans are loaded - a costly operation for tests with specific objectives.</p>
<p>But no, we don't want to load everything. All we need to load is the <code>ContainerService</code> bean and JPA repositories. We can switch to <code>@DataJpaTest</code>. This annotation only loads the JPA part of the application, which is what we need for this test. Let's try it out!</p>
<div class="highlight"><pre><span></span><code><span class="nd">@DataJpaTest</span>
<span class="kd">class</span><span class="w"> </span><span class="nc">ContainerServiceTest</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">lateinit</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nv">svc</span><span class="p">:</span><span class="w"> </span><span class="n">ContainerService</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">lateinit</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nv">shipmentRepo</span><span class="p">:</span><span class="w"> </span><span class="n">ShipmentRepository</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">lateinit</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nv">containerRepo</span><span class="p">:</span><span class="w"> </span><span class="n">ContainerRepository</span>
<span class="p">}</span>
</code></pre></div>
<p>Upon execution, an exception is thrown:</p>
<div class="highlight"><pre><span></span><code><span class="n">org</span><span class="p">.</span><span class="n">springframework</span><span class="p">.</span><span class="n">beans</span><span class="p">.</span><span class="n">factory</span><span class="p">.</span><span class="nl">BeanCreationException</span><span class="p">:</span><span class="w"> </span><span class="n">Failed</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="nf">replace</span><span class="w"> </span><span class="n">DataSource</span><span class="w"> </span><span class="k">with</span><span class="w"> </span><span class="n">an</span><span class="w"> </span><span class="n">embedded</span><span class="w"> </span><span class="k">database</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">tests</span><span class="p">.</span><span class="w"> </span><span class="k">If</span><span class="w"> </span><span class="n">you</span><span class="w"> </span><span class="n">want</span><span class="w"> </span><span class="n">an</span><span class="w"> </span><span class="n">embedded</span><span class="w"> </span><span class="k">database</span><span class="w"> </span><span class="n">please</span><span class="w"> </span><span class="n">put</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">supported</span><span class="w"> </span><span class="n">one</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">classpath</span><span class="w"> </span><span class="ow">or</span><span class="w"> </span><span class="n">tune</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="nf">replace</span><span class="w"> </span><span class="n">attribute</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="nv">@AutoConfigureTestDatabase</span><span class="p">.</span>
</code></pre></div>
<p><code>@DataJpaTest</code> has an annotation <code>@AutoConfigureTestDatabase</code>, which by default, sets up an H2 in-memory database for the tests, and configures <code>DataSource</code> to use it. However, in this case, the H2 dependency is not found in the classpath.</p>
<p>And actually, we don't want to use H2 for our tests, so we can tell <code>@AutoConfigureTestDatabase</code> not to replace our configured database with an H2. Plus, we have to configure and load our own database, which is performed here by importing a <code>@Configuration</code> class called <code>EmbeddedDataSourceConfig</code> (It simply creates a <code>@Bean</code> of type <code>DataSource</code>).</p>
<div class="highlight"><pre><span></span><code><span class="nd">@DataJpaTest</span>
<span class="nd">@AutoConfigureTestDatabase</span><span class="p">(</span><span class="n">replace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">AutoConfigureTestDatabase</span><span class="p">.</span><span class="na">Replace</span><span class="p">.</span><span class="na">NONE</span><span class="p">)</span>
<span class="nd">@Import</span><span class="p">(</span><span class="n">EmbeddedDataSourceConfig</span><span class="o">::</span><span class="n">class</span><span class="p">)</span><span class="w"> </span><span class="c1">// Import the embedded database configuration if needed.</span>
<span class="nd">@ActiveProfiles</span><span class="p">(</span><span class="s">"test"</span><span class="p">)</span><span class="w"> </span><span class="c1">// Use the test profile to load a different configuration for tests.</span>
<span class="kd">class</span><span class="w"> </span><span class="nc">ContainerServiceTest</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// test code</span>
<span class="p">}</span>
</code></pre></div>
<p>Let's try to run the test again. Now, it fails with this error:</p>
<div class="highlight"><pre><span></span><code>org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'ContainerServiceTest': Unsatisfied dependency expressed through field 'containerService'
</code></pre></div>
<p>You already know the trick, you need to load the <code>ContainerService</code> bean in the Spring context!</p>
<div class="highlight"><pre><span></span><code><span class="nd">@DataJpaTest</span>
<span class="nd">@AutoConfigureTestDatabase</span><span class="p">(</span><span class="n">replace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">AutoConfigureTestDatabase</span><span class="p">.</span><span class="na">Replace</span><span class="p">.</span><span class="na">NONE</span><span class="p">)</span>
<span class="nd">@Import</span><span class="p">(</span><span class="n">ContainerService</span><span class="o">::</span><span class="n">class</span><span class="p">,</span><span class="w"> </span><span class="n">EmbeddedDataSourceConfig</span><span class="o">::</span><span class="n">class</span><span class="p">)</span>
<span class="nd">@ActiveProfiles</span><span class="p">(</span><span class="s">"test"</span><span class="p">)</span>
<span class="kd">class</span><span class="w"> </span><span class="nc">ContainerServiceTest</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// test code</span>
<span class="p">}</span>
</code></pre></div>
<p>Uh-oh! The Spring context loads successfully, but the test fails with the following error:</p>
<div class="highlight"><pre><span></span><code>java.lang.AssertionError:
Expected size:<1> but was:<0> in:
<[]>
</code></pre></div>
<p>If you look at <code>@DataJpaTest</code>, you will notice that it uses the <code>@Transactional</code> annotation. It means that by default, deleting data from the target tables and creating a new container will only be committed at the end of the test method, thus the changes are not visible to the transactions created by the threads.</p>
<p>Since we would like to commit the transaction inside the main transaction (which <code>@DataJpaTest</code> uses), we need to use <code>Propagation.REQUIRES_NEW</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nd">@DataJpaTest</span>
<span class="nd">@AutoConfigureTestDatabase</span><span class="p">(</span><span class="n">replace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">AutoConfigureTestDatabase</span><span class="p">.</span><span class="na">Replace</span><span class="p">.</span><span class="na">NONE</span><span class="p">)</span>
<span class="nd">@Import</span><span class="p">(</span><span class="n">ContainerService</span><span class="o">::</span><span class="n">class</span><span class="p">,</span><span class="w"> </span><span class="n">EmbeddedDataSourceConfig</span><span class="o">::</span><span class="n">class</span><span class="p">)</span>
<span class="nd">@ActiveProfiles</span><span class="p">(</span><span class="s">"test"</span><span class="p">)</span>
<span class="kd">class</span><span class="w"> </span><span class="nc">ContainerServiceTest</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">lateinit</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nv">transactionTemplate</span><span class="p">:</span><span class="w"> </span><span class="n">TransactionTemplate</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">lateinit</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nv">svc</span><span class="p">:</span><span class="w"> </span><span class="n">ContainerService</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">lateinit</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nv">shipmentRepo</span><span class="p">:</span><span class="w"> </span><span class="n">ShipmentRepository</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">lateinit</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nv">containerRepo</span><span class="p">:</span><span class="w"> </span><span class="n">ContainerRepository</span>
<span class="w"> </span><span class="nd">@BeforeEach</span>
<span class="w"> </span><span class="kd">fun</span><span class="w"> </span><span class="nf">setup</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">transactionTemplate</span><span class="p">.</span><span class="na">propagationBehavior</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">TransactionTemplate</span><span class="p">.</span><span class="na">PROPAGATION_REQUIRES_NEW</span>
<span class="w"> </span><span class="n">transactionTemplate</span><span class="p">.</span><span class="na">execute</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">shipmentRepo</span><span class="p">.</span><span class="na">deleteAll</span><span class="p">()</span>
<span class="w"> </span><span class="n">containerRepo</span><span class="p">.</span><span class="na">deleteAll</span><span class="p">()</span>
<span class="w"> </span><span class="n">shipmentRepo</span><span class="p">.</span><span class="na">save</span><span class="p">(</span><span class="n">shipment</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>🎉 The test passes, completing in just 8 seconds (load context + run) - twice as fast as before!</p>
<h2>Test Slicing: @JsonTest Precision in Validating JSON Serialization/Deserialization 💡</h2>
<p>Consider this test snippet:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">EventDeserializationIT</span><span class="w"> </span><span class="kd">extends</span><span class="w"> </span><span class="n">BaseIT</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">RESOURCE_PATH</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"event-example.json"</span><span class="p">;</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">ObjectMapper</span><span class="w"> </span><span class="n">objectMapper</span><span class="p">;</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">Event</span><span class="w"> </span><span class="n">dto</span><span class="p">;</span>
<span class="w"> </span><span class="nd">@Test</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">testDeserialization</span><span class="p">()</span><span class="w"> </span><span class="kd">throws</span><span class="w"> </span><span class="n">Exception</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">json</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Resources</span><span class="p">.</span><span class="na">toString</span><span class="p">(</span><span class="n">Resources</span><span class="p">.</span><span class="na">getResource</span><span class="p">(</span><span class="n">RESOURCE_PATH</span><span class="p">),</span><span class="w"> </span><span class="n">UTF_8</span><span class="p">);</span>
<span class="w"> </span><span class="n">dto</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">objectMapper</span><span class="p">.</span><span class="na">reader</span><span class="p">().</span><span class="na">forType</span><span class="p">(</span><span class="n">Event</span><span class="p">.</span><span class="na">class</span><span class="p">).</span><span class="na">readValue</span><span class="p">(</span><span class="n">json</span><span class="p">);</span>
<span class="w"> </span><span class="n">assertThat</span><span class="p">(</span><span class="n">dto</span><span class="p">.</span><span class="na">getData</span><span class="p">().</span><span class="na">getNewTour</span><span class="p">().</span><span class="na">getFromLocation</span><span class="p">()).</span><span class="na">isNotNull</span><span class="p">();</span>
<span class="w"> </span><span class="n">assertThat</span><span class="p">(</span><span class="n">dto</span><span class="p">.</span><span class="na">getData</span><span class="p">().</span><span class="na">getNewTour</span><span class="p">().</span><span class="na">getToLocation</span><span class="p">()).</span><span class="na">isNotNull</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The objective of this test is to ensure proper deserialization. We can use <code>@JsonTest</code> annotation to import the beans that we need in the test. We only need object mapper, no need to extend any other classes! Using this annotation will only apply the configuration relevant to JSON tests (i.e. <code>@JsonComponent</code>, Jackson Module).</p>
<div class="highlight"><pre><span></span><code><span class="nd">@JsonTest</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">EventDeserializationTest</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">ObjectMapper</span><span class="w"> </span><span class="n">objectMapper</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Test implementation</span>
<span class="p">}</span>
</code></pre></div>
<h2>Test Slicing: @WebMvcTest for REST APIs 🌐</h2>
<p>Using <code>@WebMvcTest</code>, we can test REST APIs without firing up the server (e.g., the embedded Tomcat), or loading the whole application context. It’s all about targeting specific controllers. Fast and efficient, just like that!</p>
<div class="highlight"><pre><span></span><code><span class="nd">@WebMvcTest</span><span class="p">(</span><span class="n">ShipmentServiceController</span><span class="p">.</span><span class="na">class</span><span class="p">)</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">ShipmentServiceControllerTests</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">MockMvc</span><span class="w"> </span><span class="n">mvc</span><span class="p">;</span>
<span class="w"> </span><span class="nd">@MockBean</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">ShipmentService</span><span class="w"> </span><span class="n">service</span><span class="p">;</span>
<span class="w"> </span><span class="nd">@Test</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">getShipmentShouldReturnShipmentDetails</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">given</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="na">service</span><span class="p">.</span><span class="na">schedule</span><span class="p">(</span><span class="n">any</span><span class="p">())).</span><span class="na">willReturn</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">LocalDate</span><span class="p">());</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="na">mvc</span><span class="p">.</span><span class="na">perform</span><span class="p">(</span>
<span class="w"> </span><span class="n">get</span><span class="p">(</span><span class="s">"/shipments/12345"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">accept</span><span class="p">(</span><span class="n">MediaType</span><span class="p">.</span><span class="na">APPLICATION_JSON</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">andExpect</span><span class="p">(</span><span class="n">status</span><span class="p">().</span><span class="na">isOk</span><span class="p">())</span>
<span class="w"> </span><span class="p">.</span><span class="na">andExpect</span><span class="p">(</span><span class="n">jsonPath</span><span class="p">(</span><span class="s">"$.number"</span><span class="p">).</span><span class="na">value</span><span class="p">(</span><span class="s">"12345"</span><span class="p">))</span>
<span class="w"> </span><span class="c1">// ...</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h2>Taming Mock/Spy Beans and Context Caching Dilemmas 🔍</h2>
<p>Let's delve into the intricacies of the Spring Test context caching mechanism!</p>
<p>When your tests involve Spring Test features (e.g., <code>@SpringBootTest</code>, <code>@WebMvcTest</code>, <code>@DataJpaTest</code>), they require a running Spring Context. Starting a Spring Context for your test requires a considerable amount of time, especially if the entire context is populated using <code>@SpringBootTest</code>, resulting in increased test execution overhead and longer build times if each test starts its own context.</p>
<p>Fortunately, Spring Test provides a mechanism to cache a started application context and reuse it for subsequent tests with similar context requirements.</p>
<p>The cache is like a map, with a certain capacity. The map key is computed from a few parameters, including the beans loaded into the context.</p>
<p>The cache key consists of:</p>
<ul>
<li>locations (from <code>@ContextConfiguration</code>)</li>
<li>classes (from <code>@ContextConfiguration</code>)</li>
<li>contextInitializerClasses (from <code>@ContextConfiguration</code>)</li>
<li>contextCustomizers (from <code>ContextCustomizerFactory</code>) – this includes <code>@DynamicPropertySource</code> methods as well as various features from Spring Boot’s testing support such as <code>@MockBean</code> and <code>@SpyBean</code>.</li>
<li>contextLoader (from <code>@ContextConfiguration</code>)</li>
<li>parent (from <code>@ContextHierarchy</code>)</li>
<li>activeProfiles (from <code>@ActiveProfiles</code>)</li>
<li>propertySourceLocations (from <code>@TestPropertySource</code>)</li>
<li>propertySourceProperties (from <code>@TestPropertySource</code>)</li>
<li>resourceBasePath (from <code>@WebAppConfiguration</code>)</li>
</ul>
<p>For example, if <code>TestClassA</code> specifies <code>{"app-config.xml", "test-config.xml"}</code> for the locations (or value) attribute of <code>@ContextConfiguration</code>, the TestContext framework loads the corresponding ApplicationContext and stores it in a static context cache under a key that is based solely on those locations. So, if <code>TestClassB</code> also defines <code>{"app-config.xml", "test-config.xml"}</code> for its locations (either explicitly or implicitly through inheritance) and does not define different attributes for any of the other attributes listed above, then the same ApplicationContext is shared by both test classes. This means that the setup cost for loading an application context is incurred only once (per test suite), and subsequent test execution is much faster.</p>
<p>If you use different attributes per different tests, for example different (<code>ContextConfiguration</code>, <code>TestPropertySource</code>, <code>@MockBean</code> or <code>@SpyBean</code>) in your test, the caching key changes. And for each new context (that does not exist in the cache), the context must be loaded from scratch.</p>
<p>And if there are many different contexts, the old keys from the cache are removed, thus the next running tests that could potentially use those cached contexts need to reload them.
This addition results in extra test time.</p>
<p>One efficiency optimization method is consolidating mock beans in a parent class. This ensures that the context remains unchanged, enhancing efficiency and avoiding context reloading multiple times.</p>
<p>Example before and after:</p>
<div class="highlight"><pre><span></span><code><span class="nd">@SpringBootTest</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">TestClass1</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@MockBean</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">DependencyA</span><span class="w"> </span><span class="n">dependencyA</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Test implementation</span>
<span class="p">}</span>
<span class="nd">@SpringBootTest</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">TestClass2</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@MockBean</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">DependencyB</span><span class="w"> </span><span class="n">dependencyB</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Test implementation</span>
<span class="p">}</span>
<span class="nd">@SpringBootTest</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">TestClass3</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@MockBean</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">DependencyC</span><span class="w"> </span><span class="n">dependencyC</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Test implementation</span>
<span class="p">}</span>
</code></pre></div>
<p>If we tried to run the above example, the context will be reloaded 3 times, which is not efficient at all.
Let's try to optimize it.</p>
<div class="highlight"><pre><span></span><code><span class="nd">@SpringBootTest</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">abstract</span><span class="w"> </span><span class="kd">class</span> <span class="nc">BaseTestClass</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@MockBean</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">DependencyA</span><span class="w"> </span><span class="n">dependencyA</span><span class="p">;</span>
<span class="w"> </span><span class="nd">@MockBean</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">DependencyB</span><span class="w"> </span><span class="n">dependencyB</span><span class="p">;</span>
<span class="w"> </span><span class="nd">@MockBean</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">DependencyC</span><span class="w"> </span><span class="n">dependencyC</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// Extend the BaseTestClass for each test class</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">TestClass1</span><span class="w"> </span><span class="kd">extends</span><span class="w"> </span><span class="n">BaseTestClass</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Test</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">testSomething1</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Test implementation</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">TestClass2</span><span class="w"> </span><span class="kd">extends</span><span class="w"> </span><span class="n">BaseTestClass</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Test</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">testSomething2</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Test implementation</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">TestClass3</span><span class="w"> </span><span class="kd">extends</span><span class="w"> </span><span class="n">BaseTestClass</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Test</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">testSomething3</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Test implementation</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Now, the context will be reloaded only once, which is more efficient!</p>
<p>Or even better: You can avoid class inheritance by using <code>@Import</code> annotation to import configuration classes that contain the mock beans.</p>
<div class="highlight"><pre><span></span><code><span class="nd">@TestConfiguration</span>
<span class="kd">class</span><span class="w"> </span><span class="nc">Config</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@MockBean</span>
<span class="w"> </span><span class="n">private</span><span class="w"> </span><span class="n">DependencyA</span><span class="w"> </span><span class="n">dependencyA</span><span class="p">;</span>
<span class="w"> </span><span class="nd">@MockBean</span>
<span class="w"> </span><span class="n">private</span><span class="w"> </span><span class="n">DependencyB</span><span class="w"> </span><span class="n">dependencyB</span><span class="p">;</span>
<span class="w"> </span><span class="nd">@MockBean</span>
<span class="w"> </span><span class="n">private</span><span class="w"> </span><span class="n">DependencyC</span><span class="w"> </span><span class="n">dependencyC</span><span class="p">;</span>
<span class="p">}</span>
<span class="nd">@Import</span><span class="p">(</span><span class="n">Config</span><span class="o">::</span><span class="n">class</span><span class="p">)</span>
<span class="nd">@ActiveProfiles</span><span class="p">(</span><span class="s">"test"</span><span class="p">)</span>
<span class="kd">class</span><span class="w"> </span><span class="nc">TestClass1</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Test code</span>
<span class="p">}</span>
</code></pre></div>
<h2>Think twice before using @DirtiesContext ❗</h2>
<p>Applying <code>@DirtiesContext</code> to a test class removes the application context after tests are executed. This marks the Spring context as dirty, preventing Spring Test from reusing it. It's important to carefully consider using this annotation.</p>
<p>Although some use it to reset IDs in the database, better alternatives exist. For instance, the <code>@Transactional</code> annotation can be used to roll back the transaction after the test is executed.</p>
<h2>Parallel Execution of Tests 🏎️</h2>
<p>By default, JUnit Jupiter tests run sequentially in a single thread. However, enabling tests to run in parallel, for faster execution, is an opt-in feature introduced in JUnit 5.3. 🚀</p>
<p>To initiate parallel test execution, follow these steps:</p>
<ol>
<li>
<p>Create a <code>junit-platform.properties</code> file in test/resources.</p>
</li>
<li>
<p>Add the following configuration to the file:
<code>junit.jupiter.execution.parallel.enabled = true</code></p>
</li>
<li>
<p>Add the following to every class you want to run parallel. <code>@Execution(CONCURRENT)</code></p>
</li>
</ol>
<p>Keep in mind that certain tests might not be compatible with parallel execution due to their nature. For such cases, you should not add <code>@Execution(CONCURRENT)</code>. See <a href="https://junit.org/junit5/docs/snapshot/user-guide/#writing-tests-parallel-execution">JUnit: writing tests – parallel execution</a> for more explanation on the different execution modes.</p>
<h2>Results 📊</h2>
<p>Applying all the optimizations mentioned above made a big difference in our CI/CD pipeline. Our tests are much faster, taking only <strong>4 minutes and 15 seconds</strong> now, compared to the previous time <strong>(10 minutes 7 seconds)</strong>, which is a massive <strong>60</strong>% improvement! 🌟</p>
<h2>Conclusion 🎬</h2>
<p>In this adventure of optimizing Spring Boot tests, we've harnessed a collection of strategies to bolster test efficiency and speed. Let's summarize the tactics we've implemented:</p>
<ul>
<li><strong>Test Slicing:</strong> Leveraging <code>@WebMvcTest</code>, <code>@DataJpaTest</code>, and <code>@JsonTest</code> to focus tests on specific layers or components. You can check more about (<a href="https://docs.spring.io/spring-boot/docs/current/reference/html/features.html#features.testing.spring-boot-applications">Testing Spring Boot Applications</a>).</li>
<li><strong>Context Caching Dilemmas:</strong> Overcoming challenges related to dirty ApplicationContext caches by optimizing the use of mock and spy beans. See <a href="https://docs.spring.io/spring-framework/reference/testing/testcontext-framework/ctx-management/caching.html">Spring Test Context Caching</a>.</li>
<li><strong>Parallel Test Execution:</strong> Enabling parallel test execution to significantly reduce test suite execution time. See <a href="https://junit.org/junit5/docs/current/user-guide/#writing-tests-parallel-execution">JUnit 5 User Guide on Parallel Execution</a>.</li>
</ul>
<p>These strategies collectively transform testing into a faster, more reliable, and efficient process. Each tactic, used alone or combined, contributes significantly to optimized testing practices, empowering engineers to deliver higher-quality software with enhanced efficiency.</p>Patching the PostgreSQL JDBC Driver2023-11-09T00:00:00+01:002023-11-09T00:00:00+01:00Declan Murphytag:engineering.zalando.com,2023-11-09:/posts/2023/11/patching-pgjdbc.html<p>Contributing to the PostgreSQL JDBC Driver to address the issue of runaway WAL growth in Logical Replication</p><h1>Introduction</h1>
<p>This blog post describes a recent contribution from Zalando to the Postgres JDBC driver to address <a href="https://github.com/pgjdbc/pgjdbc/issues/1490">a long-standing issue</a> with the driver’s integration with Postgres’ logical replication that resulted in runaway Write-Ahead Log (WAL) growth. We will describe the issue, how it affected us at Zalando, and detail the fix made upstream in the JDBC driver that fixes the issue for Debezium and all other clients of the Postgres JDBC driver.</p>
<h2>Postgres Logical Replication at Zalando</h2>
<p>Builders at Zalando have access to a low-code solution that allows them to declare event streams that source from Postgres databases. Each event stream declaration provisions a micro application, powered by <a href="https://debezium.io/">Debezium Engine</a>, that uses Postgres Logical Replication to publish table-level change events as they occur. Capable of publishing events to a variety of different technologies, with arbitrary event transformations via AWS Lambda, these event streams form a core part of the Zalando infrastructure offering. At the time of writing, there are hundreds of these Postgres-sourced event streams out in the wild at Zalando.</p>
<p>One common problem that occurs with Logical Replication is excessive growth of Postgres WAL logs. At times, the Write Ahead Log (WAL) growth could occur to the point where the WAL would consume all of the available disk space on the database node resulting in demotion of the node to read-only - an undesirable outcome in a production setting indeed! This issue is prevalent in cases where a table being streamed receives very little to no write traffic - but once a write is made, any excessive WAL growth disappears instantly. In recent years, as the popularity of Postgres-sourced event streams has grown in Zalando, we see this issue occurring more and more often.</p>
<p>So what is happening at a low level during this event-streaming process? How does Postgres reliably ensure that all data change events are emitted and captured by an interested client? The answers to these questions were crucial to understanding the problem and finding its solution.</p>
<p>To explain the issue and how we solved it, we first must explain a little bit about the internals of Postgres replication. In Postgres, the Write Ahead Log (WAL) is a strictly ordered sequence of events that have occurred in the database. These WAL events are the source of truth for the database, and streaming and replaying WAL events is how both Physical and Logical Replication work. Physical replication is used for database replication. Logical Replication, which is the subject of this blog, allows clients to subscribe to data change WAL events. In both cases, replication clients track their progress through the WAL by checkpointing their location, known as the Log Sequence Number (LSN), directly on the primary database. WAL events stored on the primary database can only be discarded after all replication clients, both physical and logical, confirm that they have been processed. If one client fails to confirm that it has processed a WAL event, then the primary node will retain that WAL event and all subsequent WAL events until confirmation occurs.</p>
<p>Simple, right?</p>
<p>Well, the happy path is quite simple, yes. However as you may imagine, this blog post concerns a path that is anything but happy.</p>
<h2>The Problem</h2>
<p>Before we go on, allow me to paint a simplified picture of our architecture which was experiencing issues with this process:</p>
<p><img alt="A Postgres database with logical replication set up on two of its three tables" src="https://engineering.zalando.com/posts/2023/11/images/logical-replication.png#center"></p>
<figcaption style="text-align:center">A Postgres database with logical replication set up on two of its three tables</figcaption>
<p><br/></p>
<p>We have a database with multiple tables, denoted here by their different colors: blue (1), pink (2), purple (3), etc. Additionally, we are listening to changes made to the blue and pink tables specifically. The changes are being streamed via Logical Replication to a blue client and a pink client respectively. In our case, these clients are our Postgres-sourced event streaming applications which use <a href="https://github.com/debezium/debezium">Debezium</a> and <a href="https://github.com/pgjdbc/pgjdbc">PgJDBC</a> under the hood to bridge the gap between Postgres byte-array messages and Java by providing a user-friendly API to interact with.</p>
<p>The key thing to note here is that changes from all tables go into the same WAL. The WAL exists at the server level and we cannot break it down into a table-level or schema-level concept. All changes for all tables in all schemas in all databases on that server go into the same WAL.</p>
<p>In order to track the individual progress of the blue and pink replication, the database server uses a construct called a replication slot. A replication slot should be created for each client - so in this case we have blue (upper, denoted <code>1</code>) and pink (lower, denoted <code>2</code>) replication slots - and each slot will contain information about the progress of its client through the WAL. It does this by storing the LSN of the last flushed WAL, among some other pieces of information but let’s keep it simple.</p>
<p>If we zoom into the WAL, we could illustrate it simplistically as follows:</p>
<p><img alt="Each client has a replication slot, tracking its progress through the WAL." src="https://engineering.zalando.com/posts/2023/11/images/replication-slots-1.png#center"></p>
<figcaption style="text-align:center">Each client has a replication slot, tracking its progress through the WAL.</figcaption>
<p><br/></p>
<p>Here, I have illustrated LSNs as decimal numbers for clarity. In reality, they are expressed as hexadecimal combinations of page numbers and positions.</p>
<p>As write operations occur on any of the tables in the database, those write operations are written to the WAL - the next available log position being <code>#7</code>. If a write occurs on e.g. the blue table, a message will be sent to the blue client with this information and once the client confirms receipt of change <code>#7</code>, the blue replication slot will be advanced to <code>#7</code>. However WAL with LSN <code>#7</code> can’t be recycled and its disk space freed up just yet, since the pink replication slot is still only on <code>#6</code>.</p>
<p><img alt="As changes occur in the blue table, the blue client's replication slot advances, but the pink slot has no reason to move" src="https://engineering.zalando.com/posts/2023/11/images/replication-slots-2.png#center"></p>
<figcaption style="text-align:center">As changes occur in the blue table, the blue client's replication slot advances, but the pink slot has no reason to move</figcaption>
<p><br/></p>
<p>If the blue table were to continue receiving writes, but without a write operation occurring on the pink table, the pink replication slot would never have a chance to advance, and all of the blue WAL events would be left sitting around, taking up space.</p>
<p><img alt="This will continue with WAL growing dangerously large, risking using all of the disk space of the entire server" src="https://engineering.zalando.com/posts/2023/11/images/replication-slots-3.png#center"></p>
<figcaption style="text-align:center">This will continue with WAL growing dangerously large, risking using all of the disk space of the entire server</figcaption>
<p><br/></p>
<p>However once a write occurs in the pink table, this change will be written to the next available WAL position, say <code>#14</code>, the pink client will confirm receipt and the pink replication slot will advance to position <code>#14</code>. Now we have the below state:</p>
<p><img alt="As soon as a write occurs in the pink table, the pink replication slot will advance and the WAL events can be deleted up to position #13, as they are no longer needed by any slot" src="https://engineering.zalando.com/posts/2023/11/images/replication-slots-4.png#center"></p>
<figcaption style="text-align:center">As soon as a write occurs in the pink table, the pink replication slot will advance and the WAL events can be deleted up to position #13, as they are no longer needed by any slot</figcaption>
<p><br/></p>
<p>This was the heart of the issue. The pink client is not interested in these WAL events, however until the pink client confirms a later LSN in its replication slot, Postgres cannot delete these WAL events. This will continue ad infinitum until the disk space is entirely used up by old WAL events that cannot be deleted until a write occurs in the pink table.</p>
<h2>Mitigation Strategies</h2>
<p>Many blog posts have been written about this bug, phenomenon, behavior, call it what you will. Hacky solutions abound. The most popular by far was creating scheduled jobs writing dummy data to the pink table in order to force it to advance. This solution had been used in Zalando in the past but it’s a kludge that doesn’t address the real issue at the heart of the problem and mandates a constant extra workload overhead from now and forever more when setting up Postgres logical replication.</p>
<p>Even Gunnar Morling, the ex-Debezium Lead, has <a href="https://www.morling.dev/blog/insatiable-postgres-replication-slot/">written</a> about the topic.</p>
<p>Byron Wolfman, in a blog post, alludes to the pure solution before abandoning the prospect in favour of the same kludge. The following quote is an extract from his <a href="https://wolfman.dev/posts/pg-logical-heartbeats/">post</a> on the topic:</p>
<p><img alt="Excerpt from a blog post which details both the pure solution of advancing the cursor as well as the “fake writes” hack" src="https://engineering.zalando.com/posts/2023/11/images/wolfman-blog-post-excerpt.png#center"></p>
<figcaption style="text-align:center">Excerpt from a blog post which details both the pure solution of advancing the cursor as well as the “fake writes” hack</figcaption>
<p><br/></p>
<p>This was indeed the solution in its purest form. In our case with a Java application as the end-consumer, the first port-of-call for messages from Postgres was PgJDBC, the Java Driver for Postgres. If we could solve the issue at this level, then it would be abstracted away from - and solved for - all Java applications, Debezium included.</p>
<h2>Our Solution</h2>
<p>The key was to note that while Postgres only sends Replication messages in case of a write operation, it is sending KeepAlive messages on a regular basis in order to maintain the connection between it and, in this case, PgJDBC. This KeepAlive message contains very little data: some identifiers, a timestamp, a single bit denoting if a reply is required, but most crucially, the KeepAlive message contains the current WAL LSN of the database server. Historically, PgJDBC would not respond to a KeepAlive message and nothing would change on the server-side as a result of a KeepAlive message being sent. This needed to change.</p>
<p><img alt="The original flow of messages between the database server and the PGJDBC driver. Only replication messages received confirmations from the driver" src="https://engineering.zalando.com/posts/2023/11/images/message-flow-original.png#center"></p>
<figcaption style="text-align:center">The original flow of messages between the database server and the PgJDBC driver. Only replication messages received confirmations from the driver.</figcaption>
<p><br/></p>
<p>The fix involved updating the client to keep track of the LSN of the last Replication message received from the server and the LSN of the latest message confirmed by the client. If these two LSNs are the same, and the client then receives a KeepAlive message with a higher LSN, the client can imply that it has flushed all relevant changes and that some irrelevant changes are happening on the database that the client doesn't care about. The client can safely confirm receipt of this change back to the server, thus advancing its replication slot position and allowing the Postgres server to delete those irrelevant WAL events. This approach is sufficiently conservative enough to allow confirmation of LSNs while guaranteeing that no relevant events can be skipped.</p>
<p><img alt="The updated flow of messages now includes confirmation responses for each KeepAlive message as well, allowing all replicas to constantly confirm receipt of WAL changes" src="https://engineering.zalando.com/posts/2023/11/images/message-flow-updated.png#center"></p>
<figcaption style="text-align:center">The updated flow of messages now includes confirmation responses for each KeepAlive message as well, allowing all replicas to constantly confirm receipt of WAL changes</figcaption>
<p><br/></p>
<p>The fix was implemented, tested, submitted to PgJDBC in <a href="https://github.com/pgjdbc/pgjdbc/pull/2941">a pull request</a>. Merged on August 31st 2023, this fix is scheduled to be released in the 42.7.0 version of PgJDBC.</p>
<h2>Rollout</h2>
<p>Our Debezium-powered streaming applications support backwards compatibility with functionality that has been removed from newer versions of Debezium. In order to maintain this backwards compatibility, our applications do not use the latest version of Debezium and, by extension, do not use the latest version of PgJDBC which is pulled in as a transitive dependency by Debezium. In order to take advantage of the fix while still maintaining this backwards compatibility, we modified our build scripts to optionally override the latest version of the transitive PgJDBC dependency and we took advantage of this option to build not one, but two Docker images for our applications: one unchanged and another with a locally built version, 42.6.1-patched, of PgJDBC that contained our fix. We rolled this modified Docker image out to our test environment while still using the unchanged image in our production environment. This way we could safely verify that our event-streaming applications continued to behave as intended and monitor the behaviour in order to verify the issue of WAL growth had been addressed.</p>
<p>To verify the issue had indeed disappeared, we monitored a graph of the total WAL Size over the course of a few days on a low-activity database. Before the implementation of the fix, it would be common to see the following graph of total WAL size, indicating the presence of the issue over 36 hours:</p>
<p><img alt="Graph of WAL before the fix" src="https://engineering.zalando.com/posts/2023/11/images/wal-growth-before.png#center"></p>
<figcaption style="text-align:center">Runaway WAL growth before the fix</figcaption>
<p><br/></p>
<p>That same database after the fix now has a WAL Size graph that looks like the below, over the same time range and with no other changes to the persistence layer, service layer or activity:</p>
<p><img alt="Graph of WAL after the fix" src="https://engineering.zalando.com/posts/2023/11/images/wal-growth-after.png#center"></p>
<figcaption style="text-align:center">WAL growth (or lack thereof!) after the fix</figcaption>
<p><br/></p>
<p>As the fix itself was designed to be sufficiently conservative when confirming LSNs so that we could guarantee that an event would never be skipped or missed, this evidence was sufficient for us to confidently roll out the newer Docker images to our production clusters, solving the issue of runaway WAL growth for 100s of Postgres-sourced event streams across Zalando. No more hacks required :)</p>Understanding GraphQL Directives: Practical Use-Cases at Zalando2023-10-19T00:00:00+02:002023-10-19T00:00:00+02:00Boopathi Rajaa Nedunchezhiyantag:engineering.zalando.com,2023-10-19:/posts/2023/10/understanding-graphql-directives-practical-use-cases-zalando.html<p>In this blog post, we dive into the practical applications of GraphQL directives at Zalando. With simple examples, we aim to highlight how they enhance our use cases. From defining precise authorization requirements to efficiently handling metadata, GraphQL directives offer flexibility and control in our API development process.</p><h2>GraphQL directives</h2>
<p>In GraphQL, if you've used the syntax that starts with an <code>@</code>, for example, <code>@foo</code>, then you've used GraphQL directives. Directives provide a way to extend the language features of GraphQL using a supported syntax. Certain directives are built into GraphQL, like <code>@skip</code>, <code>@include</code>, <code>@deprecated</code>, and <code>@specifiedBy</code>, and are supported by all GraphQL engines.</p>
<p>If we look closer, we can see that two of these directives (<code>@skip</code> and <code>@include</code>) are used only in the queries, and the other two (<code>@deprecated</code> and <code>@specifiedBy</code>) are used only in the schema. This is because GraphQL directives are defined for two different categories of locations - <code>TypeSystem</code> and <code>ExecutableDefinition</code>. The <code>TypeSystem</code> directives are defined for the schema, and the <code>ExecutableDefinition</code> directives are defined for the queries. We will discuss this in detail in the next section.</p>
<p>The query directives are generally useful for clients to express certain types of metadata for the query. The schema directives are generally useful for declaratively specifying common server-side behaviors, for example, authorization requirements, marking sensitive data, etc.</p>
<h2>Part 1: Schema directives at Zalando</h2>
<p>The schema directives refer to the directives defined for the <code>TypeSystem</code> locations. The type system directives are available for the locations listed below. Consider <code>@foo</code> a directive for the location mentioned in the 1st column.</p>
<div class="highlight"><pre><span></span><code><span class="k">directive</span><span class="w"> </span><span class="err">@foo</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="err">LOCATION_IN_FIRST_COLUMN</span>
</code></pre></div>
<!--
Because the line containing
union X @foo = A | B
treats `|` as table separator and messes up the table formatting
-->
<!-- prettier-ignore -->
<table>
<thead>
<tr>
<th>Directive Location</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td>SCHEMA</td>
<td><code>schema @foo { query: Query }</code></td>
</tr>
<tr>
<td>SCALAR</td>
<td><code>scalar x @foo</code></td>
</tr>
<tr>
<td>OBJECT</td>
<td><code>type Product @foo { }</code></td>
</tr>
<tr>
<td>FIELD_DEFINITION</td>
<td><code>type X { field: String @foo }</code></td>
</tr>
<tr>
<td>ARGUMENT_DEFINITION</td>
<td><code>type X { field(arg: Int @foo): String }</code></td>
</tr>
<tr>
<td>INTERFACE</td>
<td><code>interface X @foo {}</code></td>
</tr>
<tr>
<td>UNION</td>
<td><code>union X @foo = A | B</code></td>
</tr>
<tr>
<td>ENUM</td>
<td><code>enum X @foo { A B }</code></td>
</tr>
<tr>
<td>ENUM_VALUE</td>
<td><code>enum X { A @foo B }</code></td>
</tr>
<tr>
<td>INPUT_OBJECT</td>
<td><code>input X @foo { }</code></td>
</tr>
<tr>
<td>INPUT_FIELD_DEFINITION</td>
<td><code>input X { field: String @foo }</code></td>
</tr>
</tbody>
</table>
<p><a href="https://the-guild.dev/about-us">The guild - https://the-guild.dev</a> has a great <a href="https://the-guild.dev/graphql/tools/docs/schema-directives">article</a> and a mechanism for implementing schema directives via their <a href="https://the-guild.dev/graphql/tools">graphql-tools</a> packages. I highly recommend reading it and using graphql-tools for implementing schema directives.</p>
<p>The gist is that you can define a directive in the schema and implement the directive in the resolver layer. The directive is implemented as a function that takes the resolver function as an argument and returns a new resolver function. The new resolver function can be used to implement the directive logic.</p>
<p>You can think of schema directives as some function call injected to your resolver function in a declarative way. Consider the following illustration to understand where the directive function can be invoked in the context of a resolver.</p>
<div class="highlight"><pre><span></span><code><span class="cm">/**</span>
<span class="cm"> * Illustration of schema directives execution in</span>
<span class="cm"> * the query execution pipeline</span>
<span class="cm"> */</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">resolvers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Query</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">async</span><span class="w"> </span><span class="nx">product</span><span class="p">(</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// schema directives</span>
<span class="w"> </span><span class="nx">schemaDirectivesExecutions</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// resolver logic</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">product</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">getProduct</span><span class="p">(</span><span class="nx">id</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// schema directives</span>
<span class="w"> </span><span class="nx">schemaDirectivesExecutions</span><span class="p">();</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">product</span><span class="p">;</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">},</span>
<span class="p">};</span>
</code></pre></div>
<h3><code>@isAuthenticated</code></h3>
<p>At Zalando, we use SSO for customer authentication and <a href="https://auth0.com/blog/what-is-step-up-authentication-when-to-use-it/">step-up authentication</a>. Our GraphQL server handles publicly available data like the product data, and also handles confidential data like customer-related data.</p>
<p>The queries can contain customer fields along with product fields and other non-customer data. Here, we need to ensure that the customer is authenticated and has the correct authenticity levels (<a href="https://developer.okta.com/docs/guides/step-up-authentication/main/">ACR Value</a>) whenever a field or mutation containing customer information is used in the query. So, we need a way to control this granularly for different data points in the schema. The directive <code>@isAuthenticated</code> is used for this purpose.</p>
<p>The directive is defined in the schema as follows -</p>
<div class="highlight"><pre><span></span><code><span class="k">scalar</span><span class="w"> </span><span class="err">ACRValue</span><span class="w"> </span><span class="err">@specifiedBy(url:</span><span class="w"> </span><span class="err">"https://example.com/zalando-acr-value")</span>
<span class="k">directive</span><span class="w"> </span><span class="err">@isAuthenticated(</span>
<span class="w"> </span><span class="err">"""</span>
<span class="w"> </span><span class="err">The</span><span class="w"> </span><span class="err">ACR</span><span class="w"> </span><span class="err">value</span><span class="p">,</span><span class="w"> </span><span class="err">which</span><span class="w"> </span><span class="err">indicates</span><span class="w"> </span><span class="err">the</span><span class="w"> </span><span class="err">level</span><span class="w"> </span><span class="err">of</span><span class="w"> </span><span class="err">authenticity</span>
<span class="w"> </span><span class="err">expected</span><span class="w"> </span><span class="err">to</span><span class="w"> </span><span class="err">perform</span><span class="w"> </span><span class="err">the</span><span class="w"> </span><span class="err">operation.</span>
<span class="w"> </span><span class="err">Optional.</span><span class="w"> </span><span class="err">If</span><span class="w"> </span><span class="err">not</span><span class="w"> </span><span class="err">provided</span><span class="p">,</span><span class="w"> </span><span class="err">the</span><span class="w"> </span><span class="err">default</span><span class="w"> </span><span class="err">behavior</span><span class="w"> </span><span class="err">is</span><span class="w"> </span><span class="err">to</span><span class="w"> </span><span class="err">simply</span>
<span class="w"> </span><span class="err">validate</span><span class="w"> </span><span class="err">a</span><span class="w"> </span><span class="err">user</span><span class="w"> </span><span class="err">is</span><span class="w"> </span><span class="err">authenticated</span><span class="w"> </span><span class="err">and</span><span class="w"> </span><span class="err">has</span><span class="w"> </span><span class="err">no</span><span class="w"> </span><span class="err">ACR</span><span class="w"> </span><span class="err">requirements.</span>
<span class="w"> </span><span class="err">"""</span>
<span class="w"> </span><span class="err">acrValue:</span><span class="w"> </span><span class="err">ACRValue</span>
<span class="err">)</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">FIELD_DEFINITION</span>
</code></pre></div>
<p>For example, it is used in a mutation definition as follows -</p>
<div class="highlight"><pre><span></span><code><span class="k">type</span><span class="w"> </span><span class="err">Query</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">customer</span><span class="p">:</span><span class="w"> </span><span class="n">Customer</span><span class="w"> </span><span class="nd">@isAuthenticated</span>
<span class="err">}</span>
<span class="err">type</span><span class="w"> </span><span class="err">Mutation</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="err">updateCustomerInfo</span><span class="p">(</span>
<span class="w"> </span><span class="n">email</span><span class="p">:</span><span class="w"> </span><span class="no">String</span>
<span class="w"> </span><span class="n">phoneNumber</span><span class="p">:</span><span class="w"> </span><span class="no">String</span>
<span class="w"> </span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">UpdateCustomerInfoResult</span><span class="w"> </span><span class="nd">@isAuthenticated</span><span class="p">(</span><span class="n">acrValue</span><span class="p">:</span><span class="w"> </span><span class="no">HIGH</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<h3><code>@sensitive</code></h3>
<p>We expose customer-sensitive data via our GraphQL API - like the email address, customer name, phone number, address, etc, to render the customer profile page. We also use observability tools and monitoring tools like logging and tracing. We do not want such sensitive customer data in the logs and traces. So, we need a way to control logging so that the logs contain enough information to debug issues but not sensitive customer data. The directive <code>@sensitive</code> is used for this purpose.</p>
<div class="highlight"><pre><span></span><code><span class="k">directive</span><span class="w"> </span><span class="err">@sensitive(</span>
<span class="w"> </span><span class="err">"An</span><span class="w"> </span><span class="err">optional</span><span class="w"> </span><span class="err">reason</span><span class="w"> </span><span class="err">why</span><span class="w"> </span><span class="err">the</span><span class="w"> </span><span class="err">field</span><span class="w"> </span><span class="err">is</span><span class="w"> </span><span class="err">marked</span><span class="w"> </span><span class="err">as</span><span class="w"> </span><span class="err">sensitive"</span>
<span class="w"> </span><span class="err">reason:</span><span class="w"> </span><span class="err">String</span>
<span class="err">)</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">ARGUMENT_DEFINITION</span>
</code></pre></div>
<p>For example, it is used in a mutation definition as follows -</p>
<div class="highlight"><pre><span></span><code><span class="k">type</span><span class="w"> </span><span class="err">Mutation</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">updateCustomerInfo</span><span class="p">(</span>
<span class="w"> </span><span class="n">email</span><span class="p">:</span><span class="w"> </span><span class="no">String</span><span class="w"> </span><span class="err">@</span><span class="n">sensitive</span><span class="err">(</span><span class="n">reason</span><span class="p">:</span><span class="w"> </span><span class="s">"Customer email address"</span><span class="p">)</span>
<span class="w"> </span><span class="nl">phoneNumber</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="nd">@sensitive</span><span class="p">(</span><span class="n">reason</span><span class="p">:</span><span class="w"> </span><span class="s">"Customer phone number"</span><span class="p">)</span>
<span class="w"> </span><span class="err">):</span><span class="w"> </span><span class="n">UpdateCustomerInfoResult</span>
<span class="p">}</span>
</code></pre></div>
<p>It could be somewhat manual and forgetful to add <code>@sensitive</code> to the correct arguments in the schema proactively. So, we also rely on a schema linter to automatically fail when a field/argument name contains sensitive keywords like <code>password</code>, <code>email</code>, <code>phone</code>, <code>bank</code>, <code>bic</code>, <code>account</code>, <code>owner</code>, <code>order</code>, <code>token</code>, <code>voucher</code>, <code>customer</code>, etc. This way, we can ensure we do not forget to add <code>@sensitive</code> to the correct fields/arguments.</p>
<p>Implementing this directive is also quite simple and does not require any resolver logic. It can be implemented in NodeJS as follows (the implementation is shortened to fit into a post) -</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span><span class="w"> </span><span class="nx">getSensitiveVariables</span><span class="p">(</span><span class="nx">schema</span><span class="p">,</span><span class="w"> </span><span class="nb">document</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">sensitiveVariables</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s2">"graphql"</span><span class="p">).</span><span class="nx">validate</span><span class="p">(</span><span class="nx">schema</span><span class="p">,</span><span class="w"> </span><span class="nb">document</span><span class="p">,</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">(</span><span class="nx">context</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">({</span>
<span class="w"> </span><span class="nx">Variable</span><span class="p">(</span><span class="nx">node</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">isSensitive</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span>
<span class="w"> </span><span class="p">.</span><span class="nx">getArgument</span><span class="p">()</span>
<span class="w"> </span><span class="o">?</span><span class="p">.</span><span class="nx">astNode</span><span class="o">?</span><span class="p">.</span><span class="nx">directives</span><span class="o">?</span><span class="p">.</span><span class="nx">some</span><span class="p">(</span>
<span class="w"> </span><span class="p">(</span><span class="nx">directive</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">directive</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s2">"sensitive"</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">isSensitive</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">sensitiveVariables</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">}),</span>
<span class="w"> </span><span class="p">]);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">sensitiveVariables</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<h3><code>@requireExplicitEndpoint</code></h3>
<p>With GraphQL, all of the varieties of HTTP requests fit into one single pattern - <code>POST /graphql</code>. It makes using techniques and tools available for REST APIs - like rate limiting, bot protection, caching, and other security practices fail to work out of the box. So, we need a way to control different schema sections to be exposed via different HTTP endpoints. The directive <code>@requireExplicitEndpoint</code> is used for this purpose.</p>
<div class="highlight"><pre><span></span><code><span class="k">directive</span><span class="w"> </span><span class="err">@requireExplicitEndpoint(endpoints:</span><span class="w"> </span><span class="err">[String!]!)</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">FIELD_DEFINITION</span>
</code></pre></div>
<p>In implementing this directive, we override the resolver for the respective field where it is used. We can access the request parameters (like pathname) by running GraphQL over HTTP. We then match the pathname with the list of endpoints provided in the directive and return an error if there is no match.</p>
<p>This directive allows us to define custom routes for different schema sections and prevents the client from accessing the entire schema via a single HTTP endpoint, <code>POST /graphql.</code> For example, let's see how we can define this directive for the <code>updateDeliveryAddress</code> mutation.</p>
<div class="highlight"><pre><span></span><code><span class="k">type</span><span class="w"> </span><span class="err">Mutation</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">updateDeliveryAddress</span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="p">:</span><span class="w"> </span><span class="no">ID</span><span class="err">!</span>
<span class="w"> </span><span class="n">newAddress</span><span class="p">:</span><span class="w"> </span><span class="no">CustomerAddress</span><span class="err">!</span>
<span class="w"> </span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">UpdateDeliveryAddressResult</span>
<span class="w"> </span><span class="nd">@requireExplicitEndpoint</span><span class="p">(</span><span class="n">endpoints</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s">"/customer-addresses"</span><span class="p">])</span>
<span class="p">}</span>
</code></pre></div>
<p>So, a mutation query like the following will fail with an error when executing via <code>/graphql</code> endpoint -</p>
<div class="highlight"><pre><span></span><code><span class="c"># POST /graphql</span>
<span class="k">mutation</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">updateDeliveryAddress</span><span class="p">(</span><span class="n">id</span><span class="p">:</span><span class="w"> </span><span class="s">"1234"</span><span class="p">,</span><span class="w"> </span><span class="n">newAddress</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">name</span><span class="p">:</span><span class="w"> </span><span class="s">"Boopathi"</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h3><code>@draft</code>, <code>@allowedFor</code></h3>
<p>We use persisted queries and define different schema stability levels for different sections of the schema. We have a separate blog post explaining the details of <a href="https://engineering.zalando.com/posts/2022/02/graphql-persisted-queries-and-schema-stability.html">how Zalando uses persisted queries</a> and how we think about schema stability and granular control.</p>
<p>The <code>@draft</code> and <code>@allowedFor</code> directives are used for this purpose. It prevents clients from persisting a query that is not stable yet.</p>
<div class="highlight"><pre><span></span><code><span class="c"># Draft</span>
<span class="k">directive</span><span class="w"> </span><span class="err">@draft</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">FIELD_DEFINITION</span>
<span class="c"># Restricted usage: Only for the specified components</span>
<span class="k">directive</span><span class="w"> </span><span class="err">@component(name:</span><span class="w"> </span><span class="err">String!)</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">QUERY</span>
<span class="k">directive</span><span class="w"> </span><span class="err">@allowedFor(componentNames:</span><span class="w"> </span><span class="err">[String!]!)</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">FIELD_DEFINITION</span>
</code></pre></div>
<h3><code>@final</code></h3>
<p>Enums in GraphQL are tricky to evolve. Adding a new value to an enum is not considered a breaking change, but it is still a "dangerous" change. It is "dangerous" because the client might not have a handler for the new value. It is easy to update the client code for web applications, but for the mobile native apps shipped to the app store, it is impossible to update the client code. Though we practice defensive coding practices to handle unknown values, we still need a way to control the evolution of enums in a safe manner. The directive <code>@final</code> is used for this purpose.</p>
<div class="highlight"><pre><span></span><code><span class="k">directive</span><span class="w"> </span><span class="err">@final</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">ENUM</span>
</code></pre></div>
<p>The implementation of this directive is absolutely nothing - i.e., it does not need any runtime behavior. It is only used in our GraphQL linter that executes during the build time and prevents additions of new values to enums which are marked as final. When we want to make a dangerous change, we remove the <code>@final</code> directive in the first pull request and reason about and find if old apps would break by making this "dangerous" change. After extending the enum, we add it in a separate pull request. This process is cumbersome, but it is on purpose. It must be more complicated to make dangerous changes, and it is a trade-off we are willing to make.</p>
<p>The ideal situation would be that all enums are treated as final by default, and this directive is never required in the first place. During schema evolution, your use case might warrant such directives to control a smooth schema evolution.</p>
<h3><code>@extensibleEnum</code></h3>
<p>As we are discussing enums, another use-case of directives for enums, primarily one-off use cases, and extending them is the common case. Creating enums for one use case is tricky in these cases, and extending it has dangerous consequences. At Zalando, we have RESTful API guidelines, and one of the recommendations is to use <a href="https://opensource.zalando.com/restful-api-guidelines/#112">x-extensible-enum</a> to represent all enums. This recommendation is so that the enums can evolve, and the client is aware, right from the name, that it is extensible. We use the directive <code>@extensibleEnum</code> for this purpose. The type in GraphQL for the field would be <code>String</code>, and the directive is used to provide the list of allowed values.</p>
<div class="highlight"><pre><span></span><code><span class="k">directive</span><span class="w"> </span><span class="err">@extensibleEnum(values:</span><span class="w"> </span><span class="err">[String!]!)</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">FIELD_DEFINITION</span>
</code></pre></div>
<p>For example, it is used in a query definition as follows -</p>
<div class="highlight"><pre><span></span><code><span class="k">type</span><span class="w"> </span><span class="err">CustomerConsent</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">status</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="err">!</span><span class="w"> </span><span class="nd">@extensibleEnum</span><span class="p">(</span><span class="n">values</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s">"GRANTED"</span><span class="p">,</span><span class="w"> </span><span class="s">"REJECTED"</span><span class="p">])</span>
<span class="p">}</span>
</code></pre></div>
<p>With <code>@extensibleEnum</code>, we found that contributors to the schema are more likely to think about the evolution of schema. We also noticed that contributors are more likely to use this directive for defining enums than the GraphQL native enum, as this directive is more explicit about the extensibility of the enum.</p>
<h3><code>@resolveEntityId</code></h3>
<p>Our GraphQL schema defines certain types as Entities related to the <a href="https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model">Entity-Relationship model</a>. We define entities abstractly as the basic building blocks for designing customer experience. For example, product, customer, brand, etc. are some entities. The entity definition has some properties -</p>
<ul>
<li>it follows a specific template/pattern of resolvers that is mostly the same for all entities</li>
<li>it is of a specific type name as defined in the schema</li>
<li>it has a unique ID of a specific pattern (for example, <code>entity:product:1234</code> for <code>type Product</code>)</li>
<li>it has a set of fields that are common to all entities</li>
</ul>
<p>To solve these cases holistically, we use the directive <code>@resolveEntityId</code> defined against each entity definition in the schema.</p>
<div class="highlight"><pre><span></span><code><span class="k">directive</span><span class="w"> </span><span class="err">@resolveEntityId(</span>
<span class="w"> </span><span class="err">"An</span><span class="w"> </span><span class="err">optional</span><span class="w"> </span><span class="err">override</span><span class="w"> </span><span class="err">name</span><span class="w"> </span><span class="err">for</span><span class="w"> </span><span class="err">the</span><span class="w"> </span><span class="err">entity</span><span class="w"> </span><span class="err">name</span><span class="w"> </span><span class="err">in</span><span class="w"> </span><span class="err">its</span><span class="w"> </span><span class="err">ID"</span>
<span class="w"> </span><span class="err">override:</span><span class="w"> </span><span class="err">String</span>
<span class="err">)</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">OBJECT</span>
</code></pre></div>
<p>The usage is as follows -</p>
<div class="highlight"><pre><span></span><code><span class="k">type</span><span class="w"> </span><span class="err">Product</span><span class="w"> </span><span class="k">implements</span><span class="w"> </span><span class="err">Entity</span><span class="w"> </span><span class="err">@resolveEntityId</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">ID</span><span class="err">!</span>
<span class="p">}</span>
</code></pre></div>
<p>The implementation of this directive is two-fold. For one, we generate TypeScript code based on the <code>resolveEntityId</code> directive. This code generation allows us to develop the boilerplate code for the entity ID type definitions and resolvers - for example, the <code>__typename</code> resolvers. The other part is the runtime, where an <code>id</code> resolver is added to wrap the entity IDs - for example, consider the product - <code>entity:product:1234</code> is the full entity ID, and the <code>1234</code> is called the SKU of the product.</p>
<h2>Part 2: Query directives at Zalando</h2>
<p>Query directives are directives that are defined for the <code>ExecutableDefinition</code> locations. The executable directives are available for the locations listed below. Consider <code>@foo</code> a directive for the location mentioned in the 1st column.</p>
<div class="highlight"><pre><span></span><code><span class="k">directive</span><span class="w"> </span><span class="err">@foo</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="err">LOCATION_IN_FIRST_COLUMN</span>
</code></pre></div>
<table>
<thead>
<tr>
<th style="text-align: left;">Directive Location</th>
<th style="text-align: left;">Example</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">QUERY</td>
<td style="text-align: left;"><code>query name @foo {}</code></td>
</tr>
<tr>
<td style="text-align: left;">MUTATION</td>
<td style="text-align: left;"><code>mutation name @foo {}</code></td>
</tr>
<tr>
<td style="text-align: left;">SUBSCRIPTION</td>
<td style="text-align: left;"><code>subscription name @foo {}</code></td>
</tr>
<tr>
<td style="text-align: left;">FIELD</td>
<td style="text-align: left;"><code>query { product @foo {} }</code></td>
</tr>
<tr>
<td style="text-align: left;">FRAGMENT_DEFINITION</td>
<td style="text-align: left;"><code>fragment x on Query @foo { }</code></td>
</tr>
<tr>
<td style="text-align: left;">FRAGMENT_SPREAD</td>
<td style="text-align: left;"><code>query { ...x @foo }</code></td>
</tr>
<tr>
<td style="text-align: left;">INLINE_FRAGMENT</td>
<td style="text-align: left;"><code>query { ... @foo { } }</code></td>
</tr>
<tr>
<td style="text-align: left;">VARIABLE_DEFINITION</td>
<td style="text-align: left;"><code>query ($id: ID @foo) { }</code></td>
</tr>
</tbody>
</table>
<p>Unlike schema directives, <a href="https://the-guild.dev/graphql/tools">graphql-tools</a> does not support attaching functions to resolvers the same way for query directives. They also have an excellent point: query directives are good for annotating the query with metadata and not for resolver logic. Likewise, most of our use cases include attaching metadata at the query level and one case for observability and monitoring.</p>
<p>For query metadata, the implementation is as simple as going through the parsed GraphQL document (<a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">AST - Abstract Syntax Tree</a>) and extracting the metadata from the query directives. We use a two-step approach for the use case that adds behavior to a field - specifically the <code>@omitErrorTag</code> directive (discussed below). In the first step before execution, we extract the field paths of the fields that have this directive. In the second step, after execution, we match the error paths and omit the error tag for those extracted paths.</p>
<h3><code>@component</code></h3>
<p>The <code>@component</code> directive defines a component name by the client for the query. This directive is used in our observability and monitoring tools and for schema stability - restricted usage in production. See our blog post <a href="https://engineering.zalando.com/posts/2022/02/graphql-persisted-queries-and-schema-stability.html">GraphQL persisted queries and schema stability</a> for more details.</p>
<div class="highlight"><pre><span></span><code><span class="k">directive</span><span class="w"> </span><span class="err">@component(name:</span><span class="w"> </span><span class="err">String!)</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">QUERY</span>
</code></pre></div>
<h3><code>@tracingTag</code></h3>
<p>The <code>@tracingTag</code> directive defines an <a href="https://opentelemetry.io/">OpenTelemetry</a> tracing tag for the query. Using this directive on a query adds a specific client-defined tag to our tracing spans. The clients can then follow the traces and filter by this tag to find the traces for a particular query. This directive is useful for debugging, troubleshooting, monitoring specific set of queries, etc.</p>
<div class="highlight"><pre><span></span><code><span class="k">directive</span><span class="w"> </span><span class="err">@tracingTag(value:</span><span class="w"> </span><span class="err">String!)</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">QUERY</span><span class="w"> </span><span class="err">|</span><span class="w"> </span><span class="k">MUTATION</span><span class="w"> </span><span class="err">|</span><span class="w"> </span><span class="k">SUBSCRIPTION</span>
</code></pre></div>
<h3><code>@omitErrorTag</code></h3>
<p>The <code>@omitErrorTag</code> directive is used to omit marking the tracing span as an error. This directive can be used on a particular field in the query. This directive lets the client define that some field errors are noncritical and should not be reported for alerting. The 24x7 on-call team can then focus on the critical errors and not be distracted by the noise.</p>
<div class="highlight"><pre><span></span><code><span class="k">directive</span><span class="w"> </span><span class="err">@omitErrorTag</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">FIELD</span>
</code></pre></div>
<h3><code>@maxCountInBatch</code></h3>
<p>The <code>@maxCountInBatch</code> directive is used at the Query level to declare the maximum number of queries that can be batched together in a single request. This directive is client-controlled i.e. it is only available during <a href="https://engineering.zalando.com/posts/2022/02/graphql-persisted-queries-and-schema-stability.html">build/persist time</a>. At runtime, the directive is used to prevent overfetching of data and bot abuse of the GraphQL API.</p>
<p>Our GraphQL server allows batching of multiple queries in a single batch. With persisted queries, we only send the id of the query, and the client cannot send a raw query in production. So, the system design allows the safe usage of <code>maxCountInBatch</code> controlled by the clients.</p>
<div class="highlight"><pre><span></span><code><span class="k">directive</span><span class="w"> </span><span class="err">@maxCountInBatch(value:</span><span class="w"> </span><span class="err">Int!)</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">QUERY</span>
</code></pre></div>
<h3>Example usage of all of the above query directives</h3>
<div class="highlight"><pre><span></span><code><span class="k">query</span><span class="w"> </span><span class="nf">product_card</span><span class="p">(</span><span class="nv">$id</span><span class="p">:</span><span class="w"> </span><span class="nb">ID</span><span class="p">!)</span>
<span class="c">#</span>
<span class="c"># component directive</span>
<span class="err">@</span><span class="nf">component</span><span class="p">(</span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">"</span><span class="nc">web</span><span class="err">-product-card"</span><span class="p">)</span>
<span class="c">#</span>
<span class="c"># tracing tag directive to add a tag to the tracing span</span>
<span class="err">@</span><span class="nf">tracingTag</span><span class="p">(</span><span class="err">value</span><span class="p">:</span><span class="w"> </span><span class="err">"</span><span class="nc">slo</span><span class="err">-1s"</span><span class="p">)</span>
<span class="c">#</span>
<span class="c"># maxCountInBatch directive to limit the number of queries in a batch request</span>
<span class="err">@</span><span class="nf">maxCountInBatch</span><span class="p">(</span><span class="err">value</span><span class="p">:</span><span class="w"> </span><span class="err">50)</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nc">product</span><span class="err">(id</span><span class="p">:</span><span class="w"> </span><span class="err">$</span><span class="nc">id</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="n">name</span>
<span class="w"> </span><span class="n">brand</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="n">name</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c"># omitErrorTag directive to omit marking the tracing</span>
<span class="w"> </span><span class="c"># span as an error if inWishlist field errors</span>
<span class="w"> </span><span class="n">inWishlist</span><span class="w"> </span><span class="nd">@omitErrorTag</span>
<span class="w"> </span><span class="err">}</span>
<span class="err">}</span>
</code></pre></div>
<h2>Conclusion</h2>
<p>Query directives allow clients to define metadata and, on rare occasions, behavior. Schema directives, on the other hand, allow the server to define behavior, validation, and resolution logic in a declarative manner. Schema directives carry the added advantage that the servers can make breaking changes to these directives, as these directives are not consumed by the client - they only experience the resulting behavior. It's important when designing a directive to consider its properties, use cases, trade-offs, and where the control should lie.</p>
<p>The use cases outlined in this blog post represent some of the ways we use GraphQL directives at Zalando. There are numerous other cases that we'll cover in future blog posts. I hope this piece provides a good starting point for you to explore GraphQL directives and their practical applications.</p>
<hr>
<p><em>If you would like to work on similar challenges, consider <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=95c8de231us&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B1%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B2%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B3%5D=Software%20Engineering%20-%20Frontend&filters%5Bcategories%5D%5B4%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Machine%20Learning&filters%5Bcategories%5D%5B7%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B8%5D=Software%20Engineering%20-%20Principal%20Engineering&filters%5Bcategories%5D%5B9%5D=Applied%20Science%20%26%20Research&filters%5Bcategories%5D%5B10%5D=Product%20Design%20%26%20User%20Experience&filters%5Bcategories%5D%5B11%5D=Product%20Management&search=software%20engineer">joining our engineering teams</a>.</em></p>
<hr>
<h2>Further reading</h2>
<ul>
<li><a href="https://the-guild.dev/graphql/tools/docs/schema-directives">Schema Directives - GraphQL Tools</a></li>
<li><a href="https://engineering.zalando.com/posts/2022/02/graphql-persisted-queries-and-schema-stability.html">GraphQL persisted queries and Schema stability</a></li>
<li><a href="https://engineering.zalando.com/posts/2021/04/modeling-errors-in-graphql.html">Modeling Errors in GraphQL</a></li>
<li><a href="https://engineering.zalando.com/posts/2021/03/optimize-graphql-server-with-lookaheads.html">Optimize GraphQL Server with Lookaheads</a></li>
</ul>My First Year as an Engineering Manager at Zalando2023-09-26T00:00:00+02:002023-09-26T00:00:00+02:00Kaan Bobactag:engineering.zalando.com,2023-09-26:/posts/2023/09/my-first-year-as-an-engineering-manager-at-zalando.html<p>Reflecting on my first year as an Engineering Manager at Zalando.</p><h3>Starting a New Journey</h3>
<p>Moving forward in career steps is always an exciting adventure, even if it comes with challenges. For me, the biggest challenge was becoming an engineering manager in a foreign country. Stepping into a new country as an expat, with a culture I wasn't all that familiar with, was a completely fresh start.
When I said yes to my new journey, I started researching Zalando to learn more.</p>
<p>My first stop was the Zalando Engineering Blog - a real treasure for someone like me who was curious about the engineering culture and practices at what would be my new company. Reading post after post, I was amazed by everything - the interesting engineering topics, challenges, solutions, and approaches.
Since I love reading and writing blog posts, I even dreamt of contributing here someday. Now, looking at today and thinking about my first year, I see that I've gained lots of experiences and learnings that I can put into words. While one post won't cover all the details, I believe I can create a short but nice summary of my journey so far. So, let's begin.</p>
<h3>First Impressions</h3>
<p>On my first day, as I stepped into the office, one thing truly resonated with me. A phrase was inscribed on the floor: <em>"Always put yourself in the customer’s shoes"</em>. This is one of the founding mindset of Zalando which I would learn in the next few days. This also marked the first of many reminders that would constantly keep me aware of how important customers are for Zalando.</p>
<p>As I walked around and met with various people, I realised the impressive international working environment with a rich multicultural and diverse setup.
From day one and with each passing day, I've come to believe that this is Zalando's greatest wealth. And on a personal note having colleagues from all corners of the world, having lunches, coffee breaks, learning from their diverse experiences – these are indeed great benefits that cannot be simply found in contracts.</p>
<h3>Onboarding</h3>
<p>As I settled in, my onboarding journey kicked off right away. Zalando provides an excellent <a href="https://engineering.zalando.com/posts/2021/04/making-the-remote-onboarding-a-success.html">onboarding program</a> for newbies. It covers not only technical topics but also goes into Zalando's culture, with a lot of inspiring meetings. This also creates an opportunity to connect with colleagues from different departments that you may not have had a chance to interact with otherwise.</p>
<p>Besides Zalando's onboarding, it was important for me to really understand how my department and team contribute to the company. So, I focused on what we do and how our work helps Zalando’s success. My department is Pricing Platform, and our main scope is pricing and discounting tools and algorithms.
The more I learnt, the more I was amazed by how much data science, engineering, and analytics are involved in something as simple as a 20% discount on the web site.
For me, the real test is, if I can successfully explain the project details to my dad, who doesn't know much about tech except using a smartphone. If he gets it, then I'm pretty sure I truly understand what we do in our department. When I told my dad about my department's job, I started with, <em>"dad you will not believe how that simple discount you see in the webpage is calculated"</em>.</p>
<h3>Cyber Week</h3>
<p>My first big challenge was Cyber Week. Since I joined Zalando just a month before Cyber Week, everyone was talking about it. Coming from a country that doesn't have Cyber Week, I initially thought (I'll admit it shamelessly) that Zalando was having a week of cyber security tests, which actually sounded pretty cool. But then, when I understood what Cyber Week was really about, I realised how important it was for Zalando.</p>
<p>The <a href="https://engineering.zalando.com/posts/2020/10/how-zalando-prepares-for-cyber-week.html">readiness for Cyber Week</a> and all the preparations that go into it completely impressed me. The structured game plan, <a href="https://engineering.zalando.com/posts/2023/01/how-we-manage-our-1200-incident-playbooks.html">playbooks</a>, situation rooms, incident processes – they were all new concepts to me, and I was amazed by how operational excellence can be.
There’s no way I can cover all the details of Cyber Week in this post, but there's one thing I have to mention. During the final minutes of Black Friday, there's this tradition of virtually gathering with the shift crew and watching the order monitoring spike up like a hockey stick, marking the peak order rate during Black Friday. That moment made a strong impact on me, showing how our little contributions as software engineers play a role in those big successes.</p>
<h3>Growing Together</h3>
<p>While I've mostly focused on the technical and operational aspects of Zalando, I can't skip the people part, of course. Zalando has an amazing culture when it comes to managing and developing people. They provide different ways to grow with clear expectations. One thing that really surprised me was that Zalando offers both management and technical expert paths for software engineers. For example, after becoming a Senior Software Engineer, you can choose to either become an <a href="https://engineering.zalando.com/posts/2023/01/how-you-can-have-impact-as-an-engineering-manager.html">Engineering Manager</a> or a <a href="https://engineering.zalando.com/posts/2022/02/principal-engineering-at-zalando.html">Principal Engineer</a>. This is quite unique, something I hadn't encountered before in my past experiences. It’s not about getting pushed into management; instead you have the opportunity to advance based on your skills and aspirations at the same level as management roles.</p>
<h3>Feedback Culture</h3>
<p>Talking about <a href="https://engineering.zalando.com/posts/2022/07/growth-engineering-at-zalando.html">career growth</a>, I shouldn't forget to mention performance evaluation. This is a vital aspect of any organization's success. Zalando recognizes this importance and has implemented effective practices to ensure that performance management is done right. Performance evaluation at Zalando starts with collecting feedback, the most important part of the process, in my opinion. Company provides an ideal environment for sharing and receiving feedback. You can receive feedback from your peers, team members, and stakeholders, essentially from the people you interact with daily. This culture of openness to feedback has been invaluable in helping me understand where we can improve as a team and how I can grow as a leader beyond my current capabilities.</p>
<p>Moreover, in my role as a leader, I know the importance of giving constructive feedback and facilitating performance evaluations for my team members. Zalando has several effective practices in place to support leaders in this regard. We receive support from experienced leaders, seek guidance from our peers in different departments, and collaborate with P&O (People and Operations) business partners. Throughout the year, we also have access to various training sessions, coaching sessions, and leaders' enablement programs. This comprehensive support to leaders makes sharing constructive feedback, which ultimately helps everyone reach their full potential, a seamless and rewarding part of the job.</p>
<h3>It Is Not All About Work</h3>
<p>While I've mostly shared the business aspect of Zalando, I must acknowledge that Zalando also knows how to have a good time. There are a lot of communities with various interests, running, fishing, beach volleyball, board games, or more technical topics like Python or Linux guilds.</p>
<p>The company also gives a big importance to continuous improvement, which is, of course, a crucial aspect of a software engineer's work. Departments organize hack weeks; for instance, our department had an Innovation Sprint where individuals pitched initiatives using cutting-edge technologies like generative AI. Every month, Tech Academy hosts a Coffee Bytes event, a casual coffee meet-up with no set agenda, allowing members of the tech community to connect and make friends. Considering all these examples, despite the importance of business and customers, having fun is equally important at Zalando. I realized this right from the beginning when I saw one of the t-shirts with the slogan <em>"Zalando, we dress code"</em>.</p>
<h3>What's Next?</h3>
<p>Finishing up this look back, my first year as an Engineering Manager at Zalando has been a really good journey with lots of learning, growing, and experiencing new things. The diverse and dynamic environment, along with focusing on people and having fun, has been like magic. Thinking about what is next, I'm looking forward to continuing adding my small touch to Zalando's great work, enjoying the mix of tough challenges, teamwork, and moments that make us laugh. Here's to more times of growing, trying new things, and maybe getting a few more awesome sneakers along the way!</p>
<hr>
<p><em>Do you like growing engineering talent and building high performing engineering teams? Consider joining Zalando as an <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B1%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B2%5D=Applied%20Science&filters%5Bcategories%5D%5B3%5D=Product%20Design%2C%20User%20Research%2C%20Content%20Design&filters%5Bcategories%5D%5B4%5D=Product%20Management%20%28Technology%29&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B7%5D=Software%20Engineering%20-%20Frontend&filters%5Bcategories%5D%5B8%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B9%5D=Software%20Engineering%20-%20Machine%20Learning&filters%5Bcategories%5D%5B10%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B11%5D=Software%20Engineering%20-%20Principal%20Engineering&search=%22engineering%20manager%22">Engineering Manager</a>.</em></p>Sunrise: Zalando's developer platform based on Backstage2023-08-03T00:00:00+02:002023-08-03T00:00:00+02:00Lacey Nageltag:engineering.zalando.com,2023-08-03:/posts/2023/08/sunrise-zalandos-developer-platform-based-on-backstage.html<p>Lessons learned from adopting Backstage as Developer Platform at Zalando.</p><h2>Introduction</h2>
<p>Since 2021, Zalando invested in building up a developer portal called Sunrise, aimed to become the starting point for Builders at Zalando. The portal is based on Spotify's <a href="https://github.com/backstage/backstage">Backstage platform</a> with additional extensions built internally. Sunrise enables everyone at Zalando to view and discover information about teams, applications, APIs, events, CI/CD pipelines, Infrastructure accounts and costs, and much more. In this post, we explore how adopting Backstage impacted the daily life of Software Engineers at Zalando and get insights from Lacey and Arthur who led the efforts on the Product and Engineering side.</p>
<p><img alt="Sunrise: application view" src="https://engineering.zalando.com/posts/2023/08/img/sunrise-application-view.png"></p>
<figcaption style="text-align:center">Fig 1. Sunrise: detailed information about applications</figcaption>
<h3>Lacey, what's your role in creating Sunrise?</h3>
<p><em>Lacey:</em> Funny story, I actually ran a vision workshop with the team responsible for the Developer Portal at Zalando before I became a member of the department! As the official product manager, I helped solidify the vision with a platform mindset and an experience strategically focused on interoperability and usability. I worked with engineering stakeholders and the engineering manager to devise a strategy and roadmap to give us the best chance at efficient implementation, good adoption, and improved satisfaction from users so that more platform and infra teams would want to contribute. And of course, I'm probably the loudest promoter of our platform's & contributors' solutions 😅</p>
<h3>Arthur, how about you? How are you involved here?</h3>
<p><em>Arthur:</em> Hello! I've actually started to be involved with Sunrise as an early adopter and active user first, before moving internally to the team in May 2022. Since then, I've been leading the engineering team, driving the delivery of new features, coordinating support and maintenance on the platform, contributing to the product vision and ensuring our alignment with the organizational strategy, all the while managing our amazing team of 4 software engineers.</p>
<h3>Why did Zalando choose Backstage for its developer portal? Was any similar solution in place before?</h3>
<p><em>Lacey:</em> Before Sunrise, we had over 100 disconnected interfaces & resources, plus "the Developer Console" which centralized links to resources mostly for the <em>Code</em> through <em>Deploy</em> steps of the Developer Journey. After recognizing that we'd need to evolve into a platform to achieve our vision, we considered several options (including building everything ourselves), and Spotify happened to reach out while we were still in the discovery & design phase. What made it a great fit then, was that we had extremely limited resources and skills (both engineering & design) on the team at the time, so we recognized that having an out-of-the-box solution for a design-system and plugins like the basic Software Catalog would be necessary for us to deliver something fast enough to justify the strategic investment & potential risk of failure.</p>
<h3>I hear that our Engineers are really excited about Sunrise. Why and what features are they most excited about?</h3>
<p><em>Lacey:</em> From pretty much the beginning, the topic of interoperability has been prevalent as it's what enables us to eliminate friction from the day to day tasks Builders need to perform. Users really celebrated a deeper integration that two contributing teams collaborated on to make the experience of deploying data pipelines more seamless, and features that make org structure and reporting lines more transparent have also had very quick and wide adoption. We also have some very popular Platform features that enable all our users (regardless of whether they actually own services or not) to see personalized content by default and further customize personalization settings. The day to day features that people actually use the most are the action-oriented-easy-access links on the homepage, the CI/CD interface, Search, and the Application catalog, which includes integrations to tooling and resources across the SDLC.</p>
<h3>How do you measure adoption of the platform and along each part of the SDLC?</h3>
<p><em>Lacey:</em> Since our vision for Sunrise was to make it the "daily" starting point for Builders, we monitor the share of Builders using the platform on weekdays, and weekly as our primary success metrics for adoption. Since not all features actually need to be used daily (for example, every single person won't be registering a new application every working day), we let contributors determine what makes sense for their integrations and we provide them with a centralized dashboard and support with Analytics to make it easier to understand usage. In the future, we hope to map adoption of features to more tangible improvements in operational performance.</p>
<h3>What features were added on top of Backstage's open-source project?</h3>
<p><em>Lacey:</em> That's actually a pretty big question. For our earliest release, we added a personalized homepage with easy-access links to things engineers use often like open PRs and recently deployed pipelines, and added a support overview that they were used to from previous tooling, and our CI/CD platform that is internally built. Since then, we've integrated 27 other tools & services through 30 front-end plugins ranging from our internal <a href="https://engineering.zalando.com/posts/2022/04/zalando-machine-learning-platform.html">machine learning platform</a>, through widgets that make users aware of base image vulnerabilities or delivery performance insights, to a personalized dashboard covering all aspects of critical business events, like Cyber Week. Some of those plugins were contributed back to the open source community, such as the interface for our <a href="https://github.com/zalando/backstage-plugin-api-linter">API Linter, Zally</a>. Our platform features personalization – especially for users who don't own components themselves, but who have some accountability for them – increased adoption amongst <a href="https://engineering.zalando.com/posts/2022/02/principal-engineering-at-zalando.html">principal engineers</a> and leadership, and has helped contributors to Sunrise provide similar reporting-like features that they never had before with very little effort that in turn drive more regular use within engineering teams.</p>
<h3>Which team operates the platform? Any challenges that you had to overcome to support Zalando's user base?</h3>
<p><em>Arthur:</em> Our team is called Builder Portal, and has been operating and evolving the platform since its inception. Our biggest technical challenge at Zalando's scale has been managing the various pre-existing sources of data and determining how to sync them with <a href="https://backstage.io/docs/features/software-catalog/">Backstage's Catalog</a> system. We currently have over 40k registered entities (between applications, teams, and users) which we sync daily with the respective source of truth services. In terms of adoption, the biggest challenge from the get-go was to make sure that the experience is approachable and consistent for all users, regardless of which part of the development journey they are working on. Builders can be very opinionated in their ways of working, so making sure that our decisions are well thought out and will ultimately support them in working productively and happily can be challenging sometimes, but it's also very rewarding. And hey – we're Builders ourselves too, so we also enjoy using Sunrise while maintaining it.</p>
<p><em>Lacey:</em> A lot of what we see impacting adoption of new features is that people have built habits – and incredibly long bookmark lists – to make up for deficiencies of the fragmented tooling. What turned out to be most impactful for solving this problem is ensuring that we redirect users from old features to the new ones in Sunrise shortly after making them generally available and then <em>completely shut down</em> the old tooling.</p>
<h3>Backstage is open-source. How does Zalando and your team approach upstream contributions? Can you name some notable examples?</h3>
<p><em>Arthur:</em> Whenever we find some limitation in Backstage in comparison to what we want a feature to look like, we reflect on whether this is something that could impact other adopters of the platform or whether it's a Sunrise-specific problem. If it's the former, we reach out via a GitHub issue (e.g. <a href="https://github.com/backstage/backstage/issues/17481">bug report</a> and <a href="https://github.com/backstage/backstage/issues/9805">feature request</a>). If we know how to solve it, we also contribute a pull request (e.g. respective <a href="https://github.com/backstage/backstage/pull/17485">bugfix</a> and <a href="https://github.com/backstage/backstage/pull/10041">new feature</a>). We also keep an eye out for opportunities to share in-house plugins with the community. As mentioned by Lacey earlier, last year we open-sourced our <a href="https://github.com/zalando/backstage-plugin-api-linter">API Linter plugin</a>.</p>
<p><img alt="Backstage Plugin: API linter using Zally under the hood" src="https://engineering.zalando.com/posts/2023/08/img/backstage-plugin-api-linter.png"></p>
<figcaption style="text-align:center">Fig 2. Sunrise: open-sourced API linter plugin</figcaption>
<h3>How about the internal features? How easy has it been to get contributions from outside of your team?</h3>
<p><em>Arthur:</em> We have at least ten other plugins (the number grows sporadically) owned and maintained by other teams in Zalando, including our own Continuous Delivery and Machine Learning Platform teams. There's always an initial barrier of entry (as with any other application and framework) for contributors to understand the domain-specific language of Backstage, as well as the standards we have implemented on the platform, especially since many platform teams don't have a lot of front-end engineers available to work on the user interface of their plugins. We invest a lot in creating standard components and documenting our patterns so contributors can spend less time figuring out which button to use and more time improving the overall experience for their users.</p>
<h3>You recently reached a major milestone – 2,000 PRs merged to the repository and Sunrise replacing multiple internal tools and the prior generation of the developer portal. What's the next big milestone that you look forward to?</h3>
<p><em>Lacey:</em> Creating comprehensive visibility into <em>everything</em> running in production and mapping the relationships between entities – automatically where possible – so that we can centrally support global improvements to the operational health of systems and teams. The <a href="https://engineering.zalando.com/posts/2023/04/how-sboms-change-the-dependency-game.html">SBOM work</a> you mentioned in your recent post is a big part of that, but we are also working on surfacing the relationships between entities like data pipelines and applications, as well as the relationships of applications and their components to business problems through a standardized and semi-automated documentation of Domains. Having that oversight will enable us to shift left not only security and compliance, but also productivity, reliability, and cost efficiency by providing insights about the current balance of operational health in relationship to business metrics relevant to our high-level Domains. It will give Builders easier access to the information they need to involve the right stakeholders and make decisions about what kind of work to invest in and when. To put it shortly: we're all a bit happier, more secure, and more efficient when working with transparency and less uncertainty.</p>
<h3>Any tips that you'd give to teams who are also adopting Backstage as the foundation for their developer portal?</h3>
<p><em>Lacey:</em> Haha, the list is long because I've learned a lot over the life of this initiative. I'd sum it up as:</p>
<ul>
<li>Having a <strong>clear, inspirational vision</strong> that includes (and delineates) the needs of both users and contributors – and that you <em>constantly</em> communicate – will be key for motivating contributors and for reaching the critical mass of user journeys needed for users to feel the benefit of your platform.</li>
<li>To drive adoption and impact, look for opportunities to <strong>personalize content</strong> to make it easier to recognise and understand, and invest in <strong>increasing the interoperability</strong> along the journeys your users take to complete tasks between both fully integrated interfaces and features, as well as external tooling – and don't forget to shut down old tooling!</li>
<li>Whether you're using an open source plugin or building something yourself from scratch, <strong>investing in great UX research and design is <em>critical</em></strong> for building an experience that will remain cohesive as it grows – that's important so that your users are enabled to actually find the things you build, and are happy to use them.</li>
</ul>
<p><em>Arthur:</em> My tip is to <strong>leverage the power of open source</strong>! The Backstage Community is ever-growing and provides a lot of interesting, well-maintained plugins for you to make use of, so don't shy away from engaging with it. The framework itself is also constantly evolving and growing its scope, and with some big adopters already leveraging it (including us!), you're sure to see a lot of examples of interesting use cases that will support your teams to be more productive.</p>
<p><em>Bartosz:</em> Thanks for the conversation and for walking us through our approach to buliding a Developer Platform!</p>
<hr>
<p><em>We're hiring! If you're passionate about developer platforms, or if you would like to use one on a daily basis, join one of our <a href="https://jobs.zalando.com/en/tech/jobs/">Engineering teams</a>!</em></p>
<p><em>If you would like to know more about Sunrise, check out Henning's talk <a href="https://youtu.be/4EGTa8u-7Ws?t=479">Cloud native developer experience at Zalando</a> or the <a href="https://platformengineering.org/talks-library/sunrise-zalandos-internal-developer-platform">related post</a>.</em></p>All you need to know about timeouts2023-07-26T00:00:00+02:002023-07-26T00:00:00+02:00Anton Ilinchiktag:engineering.zalando.com,2023-07-26:/posts/2023/07/all-you-need-to-know-about-timeouts.html<p>How to set a reasonable timeout for your microservices to achieve maximum performance and resilience.</p><p>Nobody likes to wait. We at Zalando are not an exception. We don't like our customers to wait too long for delivery, we don't like them to wait during checkout, and we don't like microservices that take too long to respond.
In this post we're going to talk about - how to set a reasonable timeout for your microservices to achieve maximum performance and resilience.</p>
<h2>Why set timeout</h2>
<p>Before we start, let’s answer the simple question: "Why timeout?". A successful response, even if it takes time, is better than a timeout error. Hmm… not always, it depends!</p>
<p>First of all, if your server does not respond or takes too long to respond, nobody will wait for it. Instead of challenging the patience of your users, follow the fail-fast principle. Let your clients retry or handle an error on their side. When possible return a fallback value.</p>
<p>Another important aspect is resource utilisation. While a client is waiting for a response, various resources are being utilised: threads, https connections, database connections, etc.
Even if the client has closed the connection, without a proper timeout configuration the request is still being processed on your side, which means that resources are busy.</p>
<p><img alt="Client closed connection" src="https://engineering.zalando.com/posts/2023/07/images/client_closed_connection.png#center"></p>
<p>Remember, <strong>when you increase timeouts you potentially decrease the throughput of your application!</strong></p>
<p>Using infinite timeout or very high timeout is a bad strategy. For a while, you won't see the problem until one of your downstream services gets stuck and your thread pool gets exhausted.
Unfortunately, many libraries set default timeouts too high or infinite. They aim to attract as many users as possible and try to make their library work in most situations. But for production services, it is not acceptable. It can even be dangerous.
For example for native java HttpClient the default connection/request timeout is infinite, which is unlikely within your SLA :)</p>
<p><strong>The default timeout is your enemy, always set timeouts explicitly!</strong></p>
<h2>Connection timeout vs. request timeout</h2>
<p>The distinction between connection timeout and request timeout can cause confusion.
First, let's have a look at what Connection timeout is.</p>
<p>If you google or ask ChatGPT you’ll get something like this:</p>
<p><em>A connection timeout refers to the maximum amount of time a client is willing to wait while attempting to establish a connection with a server. It measures the time it takes for a client to successfully establish a network connection with a server. If the connection is not established within the specified timeout period, the connection attempt is considered unsuccessful, and an error is typically returned to the client.</em></p>
<p>What does it mean to establish a connection?
TCP uses a three-way handshake to establish a reliable connection. The connection is full duplex, and both sides synchronize (SYN) and acknowledge (ACK) each other. The exchange of these four flags is performed in three steps—SYN, SYN-ACK, and ACK.</p>
<p><img alt="tcp three-way handshake" src="https://engineering.zalando.com/posts/2023/07/images/handshake.png#center"></p>
<p>A connection timeout should be sufficient to complete this process and the actual transmission of packets is gated by the quality of the connection.</p>
<p>In simple words, the value for the connection timeout should be derived from the quality of the network between services.
If a remote service is running in the same datacenter or the same cloud region, connection time should be low. And the opposite, if you’re working on a mobile application then connection time to a remote service might be quite high.</p>
<p>To give you some insights. Round-trip time (RTT) in fiber, New York to San Francisco ~42ms, New York to Sydney ~160ms.
You can also look at <a href="https://clients.amazonworkspaces.com/Health.html">Connection Health Check by Amazon</a>. This is what I get from my local machine, RTT 28ms to the recommended AWS Region.</p>
<p><img alt="connection health check" src="https://engineering.zalando.com/posts/2023/07/images/connection_health_check.png#center"></p>
<h3>When does connection timeout occur</h3>
<p>A connection timeout occurs only upon starting the TCP connection. This usually happens if the remote machine does not answer. This means that the server has been shut down, you used the wrong IP/DNS name, the wrong port or the network connection to the server is down. Another frequent condition is when a given endpoint simply drops packets without a response. The remote endpoint's firewall or security settings may be configured to drop certain types of packets or traffic from specific sources.</p>
<h3>Connection timeout best practices</h3>
<p>A common practice for microservices is to set a connection timeout equal to or slightly lower than the timeout for the operation. This approach may not be ideal since the two processes are different.
Whereas establishing a connection is a relatively quick process, an operation can take hundreds or thousands of ms!</p>
<p>You can setup a connection timeout which is some multiple of your expected RTT. <strong>Connection timeout = RTT * 3 is commonly used as a conservative approach</strong>, but you can adjust it based on your specific needs.</p>
<p>In general, the connection timeout for a microservice should be set low enough so that it can quickly detect an unreachable service, but high enough to allow the service to start up or recover from a short-lived problem.</p>
<h3>Request Timeout</h3>
<p>A request timeout, on the other hand, pertains to the maximum duration a client is willing to wait for a response from the server after a successful connection has been established. It measures the time it takes for the server to process the client's request and provide a response.</p>
<h2>Setting optimal request timeout</h2>
<p>Imagine you are going to integrate your microservice with a new API.</p>
<p>The first step would be to look at SLAs provided by the microservice or API you are calling.
Unfortunately, not all services provide SLAs and even if they do you should not trust blindly.
The SLA value is good enough only for starting to test real latency.</p>
<p>If possible, run an integration with the new API in shadow mode and collect metrics. This code should run parallel to the existing production integration, but without affecting the production system (run it in a separate thread-pool, mirror traffic, etc).</p>
<p>After collecting latency metrics such as p50, p99, p99.9 you can define the so-called acceptable rate of false timeouts. Let's say you go with a false timeout rate 0.1% that means the max timeout you can set is p99.9 corresponding latency percentile on the downstream service.</p>
<p>At this step you have a max timeout value you can set but you have a trade-off:</p>
<ul>
<li>set timeout to the max value</li>
<li>decrease timeout and enable retry</li>
</ul>
<p>Based on the test results you need to choose the timeout strategy. We'll cover retries a little bit later.</p>
<p>The next challenge you will face is a chain of calls.
Imagine your service has SLA 1000ms and it calls sequentially Order Service with p99.9 = 700ms and then Payment Service with p99.9 = 700ms. How to configure timeout and not breach the SLA?</p>
<p><img alt="Chain of calls" src="https://engineering.zalando.com/posts/2023/07/images/chain_of_calls.png#center"></p>
<p><strong>Option 1: Share your time budget</strong>
One option would be to share your time budget (your SLA) between services and set timeouts accordingly 500ms for Order Service and 500ms for Payment Service.
In this case, you have a guarantee that you will not breach your SLA but you might have some false positive timeouts.</p>
<p><strong>Option 2: Introduce a TimeLimiter for your API</strong>
Since different services will not simultaneously respond with the maximum delay, you can wrap the chained calls in a time limiter and set the maximum acceptable timeout for both services. In this case you could create a time limiter 1sec and set a timeout 700ms for downstream services.</p>
<p>In Java, you could use <code>CompletableFuture</code> and several methods among which are <code>orTimeout</code> and <code>completeOnTimeOut</code> that provide built-in support for dealing with timeouts.</p>
<div class="highlight"><pre><span></span><code><span class="n">CompletableFuture</span>
<span class="w"> </span><span class="p">.</span><span class="na">supplyAsync</span><span class="p">(</span><span class="n">orderService</span><span class="p">.</span><span class="na">placeOrder</span><span class="p">(...))</span>
<span class="w"> </span><span class="p">.</span><span class="na">thenApply</span><span class="p">(</span><span class="n">paymentService</span><span class="p">.</span><span class="na">updateBalance</span><span class="p">(...))</span>
<span class="w"> </span><span class="p">.</span><span class="na">orTimeout</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">TimeUnit</span><span class="p">.</span><span class="na">SECONDS</span><span class="p">);</span>
</code></pre></div>
<p>There is also a nice TimeLimiter module provided by the <a href="https://resilience4j.readme.io/docs/timeout">Resilience4j library</a></p>
<h2>Retry or not retry</h2>
<p>The idea is simple - consider enabling retry when there is a chance of success.</p>
<p><strong>Temporary failures:</strong> Retry is suitable for temporary failures that are expected to be resolved after a short period, such as network glitches, server timeouts, or database connection issues. Retry can also avoid a bad node. Given a large enough deployment (e.g. 100 pods), a single pod might have a substantial performance regression, but if requests are load balanced in a sufficiently random way retrying is faster then awaiting a response from the bad node.</p>
<ul>
<li>Retry on timeout errors and 5xx errors</li>
<li>Do not retry on 4xx errors</li>
</ul>
<p><strong>Idempotent operations:</strong> If the operation being performed is idempotent, meaning that executing it multiple times has the same result as executing it once, retries are generally safe.</p>
<p><strong>Non-idempotent operations</strong> can cause unintended side effects if retried multiple times. Examples include operations that modify data, perform financial transactions, or have irreversible consequences. Retrying such operations can lead to data inconsistency or duplicate actions.</p>
<p>Even if you think an operation is idempotent, if possible, ask the service owner whether it is a good idea to enable retries.</p>
<p>For safely retrying requests without accidentally performing the same operation twice, consider supporting additional <em>Idempotency-Key</em> header in your API. When creating or updating an object, use an idempotency key. Then, if a connection error occurs, you can safely repeat the request without the risk of creating a second object or performing the update twice. You can read more about this idempotency pattern here <a href="https://stripe.com/docs/api/idempotent_requests">Idempotent Requests by Stripe</a> and <a href="https://aws.amazon.com/builders-library/making-retries-safe-with-idempotent-APIs/">Making retries safe with idempotent APIs by Amazon</a>.</p>
<p><strong>Circuit breaker:</strong> always consider implementing circuit breakers when enabling retry. When failures are rare, that's not a problem. Retries that increase load can make matters significantly worse.</p>
<p><strong>Exponential backoff:</strong> Implementing exponential backoff can be an effective retry strategy. It involves increasing the delay between each retry attempt exponentially, reducing the load on the failing service and preventing overwhelming it with repeated requests. Here is a fantastic blog on how <a href="https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/">AWS SDKs support exponential backoff and jitter</a> as a part of their retry behaviour.</p>
<p><strong>Time-sensitive operations:</strong> Retries may not be appropriate for time-critical operations. The trade-off here is to decrease a timeout and enable retries or keep the max acceptable timeout value. Retries might not work well where p99.9 is close to p50.</p>
<p>Look at the graph, on the first one, timeouts occasionally happens, a big difference between p99 and p50, a good case for enabling retries</p>
<p><img alt="Retry is applicable" src="https://engineering.zalando.com/posts/2023/07/images/retry_applicable.png#center"></p>
<p>On the second graph, timeouts happen periodically, <strong>p99 is close to p50, do not enable retries</strong>
<img alt="Retry is not applicable" src="https://engineering.zalando.com/posts/2023/07/images/retry_is_not_applicable.png#center"></p>
<h2>Recap</h2>
<ul>
<li>set timeout explicitly on any remote calls</li>
<li>set connection timeout = expected RTT * 3</li>
<li>set request timeout based on collected metrics and SLA</li>
<li>fail-fast or return a fallback value</li>
<li>consider wrapping chained calls into time limiter</li>
<li>retry on 5xx error and do not retry on 4xx</li>
<li>think about implementing a circuit breaker when retrying</li>
<li>be polite and ask the API owner for permission to enable retries</li>
<li>support <em>Idempotency-Key</em> header in your API</li>
</ul>
<h2>Resources</h2>
<p><a href="https://hpbn.co/primer-on-latency-and-bandwidth/#speed-of-light-and-propagation-latency">Speed of Light and Propagation Latency</a><br/>
<a href="https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter">Timeouts, retries, and backoff with jitter by AWS</a><br/>
<a href="https://cseweb.ucsd.edu/classes/sp18/cse291-c/post/schedule/p74-dean.pdf">The Tail at Scale - Dean and Barroso 2013</a><br/>
<a href="https://blog.acolyer.org/2015/01/15/the-tail-at-scale/">The Tail at Scale - Adrian Colyer 2015</a><br/>
<a href="https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/">The complete guide to Go net/http timeouts by Cloudflare</a><br/>
<a href="https://www.linkedin.com/pulse/handling-timeouts-microservice-architecture-arpit-bhayani/">Handling timeouts in a microservice architecture</a><br/>
<a href="https://aws.amazon.com/builders-library/making-retries-safe-with-idempotent-APIs/">Making retries safe with idempotent APIs by AWS</a><br/>
<a href="https://stripe.com/docs/api/idempotent_requests">Idempotent Requests by Stripe</a><br/></p>Rendering Engine Tales: Road to Concurrent React2023-07-11T00:00:00+02:002023-07-11T00:00:00+02:00Rene Eichhorntag:engineering.zalando.com,2023-07-11:/posts/2023/07/rendering-engine-tales-road-to-concurrent-react.html<p>Integrating React's Concurrent features into Zalando's web framework. In this post we go over our solution design, early benchmarks, and some useful tips about common hydration mismatch errors.</p><p><img alt="Outfit Page" src="https://engineering.zalando.com/posts/2023/07/images/rengine-outfit-page.png#previewimage"></p>
<p>Welcome back to our web platform blog series! It's been a while since we <a href="https://engineering.zalando.com/posts/2021/09/micro-frontends-part2.html">last talked about</a> our approach to large-scale front-end development at Zalando. We are excited now to reconnect and share with you some substantial enhancements we've made to the streaming and rendering architecture of our Rendering Engine framework.</p>
<p>The first post of this new series will recap how Rendering Engine works, its relationship with Concurrent React, and our journey with it including design and implementation challenges as well as successes gained so far. <br/>
Additionally, it covers the main hydration mismatch errors we faced during this upgrade, our solutions and recommendations for avoiding them, and some extra tips and tricks for debugging this type of issue.</p>
<h2>Intro</h2>
<p>"Rendering Engine" is the web framework that is maintained by and currently used in Zalando to render the <a href="https://en.zalando.de/">Fashion Store website</a>, and is designed for building any web application with similar needs.</p>
<p>You might know Rendering Engine (<strong>RE</strong>) from our previous blog posts about Micro Frontends at Zalando and our journey through them from Project Mosaic with its <a href="https://engineering.zalando.com/posts/2018/12/front-end-micro-services.html">fragments</a> and <a href="https://github.com/zalando/tailor">Tailor</a>, to <a href="https://engineering.zalando.com/posts/2021/03/micro-frontends-part1.html">Interface Framework</a> (<a href="https://engineering.zalando.com/posts/2021/09/micro-frontends-part2.html">part 2</a>).</p>
<p>In a nutshell, <strong>RE</strong> is a web framework best suited for creating a website that:</p>
<ul>
<li>Uses React to render the UI</li>
<li>Inherently implements universal rendering (server side / client side) with high emphasis on server rendering and page load performance</li>
<li>Its page content, layout and UI steering is highly driven by backend in a nestable approach</li>
<li>The backend can be a recommendation engine, a CMS-like system able to define the shape and content of pages, or any other similar system.</li>
</ul>
<p>The building blocks of RE's language for defining what to render, are <strong>Entities</strong>.
Each <strong>Entity</strong> is a block of content that from a business-logic perspective has a specific identity, and can have other Entities nested inside. For example in the context of a fashion store, an Entity could be a Product, a Collection of products, an Outfit, etc. Which when organized in tree-like structures, can be used to define full layout and contents of pages.
Defining each Entity from the backend is done through specifying a <strong><em>type</em></strong>, <strong><em>id</em></strong>, and optional extra data in the form of <strong><em>hints</em></strong>. We'll skip how RE handles defining layouts from the backend for the time being.</p>
<p>So by considering Entities to be responsible for describing "<em>what to render</em>" (by the backend), then specifying "<em>how to render</em>" is the responsibility of what we call a <strong>Renderer</strong> (by the client). <br/>
Each <strong>Renderer</strong> is a self-contained TypeScript module powered by multiple RE features provided during server- and client-side rendering.
Each Renderer is responsible to render a specific type of Entity, while each Entity-type can be represented by multiple Renderers depending on the extra hints data.</p>
<p>This assignment mapping is defined via something called <strong>Rendering Rules</strong>. These configurations are passed to RE, which include "selectors" for matching the incoming Entity definitions from backend, and support nested and per-page rules.</p>
<p>There are a handful of other features built into this framework including monitoring, experimentation, tracking, a different rendering output for server driven mobile apps, etc. but for now this introduction should do.</p>
<h2>React 18's Concurrent Rendering</h2>
<h4 style="opacity: 0.7">(and how it fits Rendering Engine like a glove)</h4>
<p>Performance has always been one of the key focus areas of Rendering Engine from its beginnings. Aside from being built with performance in mind and going through many micro improvements over the years, it also comes with some performance features built inside, including but not limited to streaming, lazy-loading, partial streaming and partial hydration (yes, almost the same concept as in Concurrent React!).</p>
<p>Although these performance related features have proven to be very important in the success of the Fashion Store website, their code's maintenance, improvements and required education as well as knowledge sharing come with a cost.</p>
<p>But more importantly, we anticipated having React's built-in support for these features would most probably bring even more performance boosts to the table.</p>
<p>Additionally, React's concurrent rendering APIs seamlessly integrate with the architecture of RE because its Renderers serve as ideal candidates for being encapsulated within a Suspense boundary. This enables them to function as individual blocks that can be server-rendered, streamed, hydrated, and client-rendered "concurrently". Especially since many of them have already been using Rendering Engine's own partial hydration/streaming features!</p>
<p>As a result, we have been very excited about the concurrent React 18 for quite a while and as soon as the opportunity arrived, we started the migration and refactoring of Rendering Engine's core functionalities to use the concurrent features.</p>
<p>Needless to say, this migration task has also had its challenges and costs! So now that we have finished some important milestones and are close to completion, we thought it is a good chance to start sharing our challenges, successes and learnings with you.</p>
<h2>Design challenges with Concurrent Rendering</h2>
<p>Rendering Engine at its core includes logic for handling the resolution of server's specified Entity definitions or layout into the corresponding Renderers, fetching their data as well as handling all the other aforementioned features like experimentation, tracking, etc. And only after that, it hands over the UI rendering responsibilities to React. <br/>
These happen gradually (and if needed, recursively) in a way that makes sure that Renderers remain independent while getting their data and rendering/streaming their final html, which makes way for performance gains.</p>
<p>So initially, with React 18 we thought of moving as much of this logic as possible (from data fetching to experimentation, tracking, etc.) to the React concurrent APIs such as Suspense and <code>useTransition</code>, through custom hooks - which is often referred to as the "Render-As-You-Fetch pattern. With the aim of reducing complexity and required effort among other things.</p>
<p>But after a trial phase and implementing a proof of concept, we faced some issues, the main ones being:</p>
<ul>
<li>In cases where keeping the correct order of the content during streaming/hydration is important, the closest available solution would be to use the <code>SuspenseList</code> API. But it still seems to be <a href="https://github.com/facebook/react/issues/22771#issuecomment-969451702">experimental, with some limitations</a>.</li>
<li>The <a href="https://github.com/facebook/react/issues/25082"><code>useTransition</code> API not considering nested suspense boundaries</a>, causing bad UX in some scenarios.</li>
<li>By utilizing hooks to initiate requests or other async operations, the timing of fetch operations becomes coupled with the order of rendering, which may not be optimal for performance.</li>
<li>Progressive hydration and streaming, necessitate the availability of all the data required for client-side rendering as early as possible. This implies that, in addition to the HTML generated by components, it is crucial to stream their data to prevent redundant requests from being made by the client.<ul>
<li>During the trial phase, the streaming and caching layer to support this issue wasn't yet handled by React. And as of now, the <a href="https://github.com/facebook/react/pull/25502">latest supporting feature</a> is still not final.</li>
</ul>
</li>
</ul>
<h3>Chosen technical design</h3>
<p>Due to the limitations mentioned above, we finally decided to go with a mixed solution.</p>
<p>In this approach, the concurrent streaming, hydration, rendering and basically all the Concurrent benefits are still achieved via fully utilizing React: by wrapping every Renderer in a Suspense boundary, and handling changes through concurrent APIs. <br/>
But at the same time, we created an "Application State" layer which encapsulates the main logic and Renderers data outside of React components/hooks in a central place, which dictates to the Suspense boundaries their state.</p>
<p>This way, the full power of orchestrating when to suspend a component (Renderer) depending on its place in the tree, handling the order of the suspended components, and deciding how to manage a transition considering the nested Suspense boundaries, would all be available and customizable in this Application State layer. <br/>
<em>We will share the details of the technical solution for ordered streaming/hydration in another post</em>.</p>
<p>In other words, everytime RE finds the matching Renderer and resolves all its corresponding data for an Entity definition (through "resolveEntity" step), the output will be written to the Application State layer. In the meantime React is rendering the Renderer components which are wrapped with Suspense. <br/>
To access data from the Application State, the suspendable Renderers use the "Connector hook". <br/>
The Connector hook reads from the application state which either returns the data that was asked for, or creates a promise that will be resolved once the data has been written. The promise is then used to suspend the component and React will automatically re-render once the Promise has been resolved. <br/>
<em>Imagine Redux's <code>useSelector</code> hook, but instead of immediately returning selected data you get a Promise that only resolves once a reducer has made the data available.</em></p>
<p><img alt="Rendering Engine architecture using Concurrent React" src="https://engineering.zalando.com/posts/2023/07/images/rengine-concurrent-react.jpg"></p>
<h2>Benefits gained from Concurrent Rendering</h2>
<p>As we are still going through the changes and final steps of the full-fledged concurrent mode described above, the full benefits of it are yet to be observed.</p>
<p>Till date, we achieved some performance improvements by mainly using the new streaming and hydration root APIs.</p>
<h3>Performance improvements from <code>renderToPipeableStream</code> and <code>hydrateRoot</code> APIs</h3>
<p>As one of the milestones, after pure version upgrade and handling breaking changes, we solely changed RE's internal streaming and hydration code to use the new React 18 APIs instead. i.e. <code>renderToPipeableStream</code> instead of <code>renderToNodeStream</code>, and <code>hydrateRoot</code> instead of <code>hydrate</code>. <br/>
We rolled out this change through an A/B test covering all pages of our e-commerce website, and in the end we observed these mild performance (and business metric) improvements:</p>
<p><strong>Overall</strong></p>
<ul>
<li><a href="https://web.dev/inp/">INP</a>: <span style="color: #61bd6d"><strong>-5.69%</strong></span></li>
<li><a href="https://web.dev/fid/">FID</a>: <span style="color: #61bd6d"><strong>-8.81%</strong></span></li>
<li><a href="https://web.dev/lcp/">LCP</a>: <span style="color: #61bd6d"><strong>-2.43%</strong></span></li>
<li><a href="https://web.dev/fcp/">FCP</a>: <span style="color: #61bd6d"><strong>-0.23%</strong></span></li>
<li><strong>Bounce rate</strong>: <span style="color: #61bd6d"><strong>-0.24%</strong></span></li>
</ul>
<p><strong>Per page:</strong>
(some of the frequently visited pages)</p>
<table>
<thead>
<tr>
<th style="text-align: center;"><strong>Metric</strong></th>
<th style="text-align: center;"><strong>Home page</strong></th>
<th style="text-align: center;"><strong>Catalog page</strong><br/><em>(list of products and search)</em></th>
<th style="text-align: center;"><strong>Product Details page</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center;"><strong>INP</strong></td>
<td style="text-align: center;"><span style="color: #61bd6d"><strong>-2.92%</strong></span></td>
<td style="text-align: center;"><span style="color: #61bd6d"><strong>-6.76%</strong></span></td>
<td style="text-align: center;"><span style="color: #61bd6d"><strong>-6.09%</strong></span></td>
</tr>
<tr>
<td style="text-align: center;"><strong>FID</strong></td>
<td style="text-align: center;"><span style="color: #61bd6d"><strong>-2.98%</strong></span></td>
<td style="text-align: center;"><span style="color: #61bd6d"><strong>-17.11%</strong></span></td>
<td style="text-align: center;"><span style="color: #61bd6d"><strong>-6.06%</strong></span></td>
</tr>
<tr>
<td style="text-align: center;"><strong>Exit Rate</strong></td>
<td style="text-align: center;"><span style="color: #61bd6d"><strong>-0.43%</strong></span></td>
<td style="text-align: center;"><span style="color: #61bd6d"><strong>-0.06%</strong></span></td>
<td style="text-align: center;"><span style="color: #61bd6d"><strong>-0.06%</strong></span></td>
</tr>
</tbody>
</table>
<p>Needless to say, this shows great promise, and we are now even more excited about the results of the next steps.</p>
<h2>Technical challenges: Rise of the Hydration Mismatch errors!</h2>
<p>As also stated in <a href="https://github.com/reactjs/rfcs/blob/ba9bd5744cb922184ec9390515910cd104a30c6e/text/0215-server-errors-in-react-18.md#hydration-mismatches">some documentations around React 18</a>, because the new React APIs are way more sensitive towards existing hydration mismatch issues, after the migration to the new streaming and hydration APIs, we started receiving a lot more hydration error logs (via Sentry) for Zalando Fashion Store. <br/>
So during this migration, we've been finding and fixing these issues to prevent negative user impact as much as possible. And after fixing dozens of different types of issues deep inside hundreds of Renderers, we were able to considerably reduce the number of the hydration mismatch errors occuring in the wild. That being said, there are still some more errors to fix which are harder to reproduce and find due to the dynamic nature of the page content in Fashion Store. <br/>
Nevertheless, below you can find the most common issues we found so far, and how we were able to fix them.</p>
<p>After that, we also briefly share some tips and tricks about the debugging process. Because - as you may also know if you have faced these errors in your projects - debugging them is not always a straightforward task, and to be honest, React's error logs (especially coming from the production environment) aren't very helpful!</p>
<h3>Main types of issues we faced, and suggested solutions</h3>
<p>Before going through details of each type, in some cases we realized that based on product requirements, one might actually not need to render some content on SSR (Server Side Rendering) and only the CSR (Client Side Rendering) would be enough. <br/>
Hence the obvious fix might be to just skip rendering on SSR and only show the content once the app is mounted on the user's browser.</p>
<p>To do that, we can rely on React hooks and lifecycle methods to ensure the app/component has been mounted on the browser. For example:</p>
<p><strong>Instead of</strong></p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="c1">//...</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">dataThatDiffersBetweenClientAndServer</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">props</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="o"><</span><span class="nx">div</span><span class="o">></span><span class="p">{</span><span class="nx">dataThatDiffersBetweenClientAndServer</span><span class="p">}</span><span class="o"><</span><span class="err">/div></span>
<span class="w"> </span><span class="p">);</span>
</code></pre></div>
<p><strong>Do</strong></p>
<div class="highlight"><pre><span></span><code><span class="c1">//...</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">isMounted</span><span class="p">,</span><span class="w"> </span><span class="nx">setIsMounted</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">React</span><span class="p">.</span><span class="nx">useState</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span>
<span class="w"> </span><span class="nx">React</span><span class="p">.</span><span class="nx">useEffect</span><span class="p">(()</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">setIsMounted</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">[]);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">dataThatDiffersBetweenClientAndServer</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">props</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="o"><</span><span class="nx">div</span><span class="o">></span><span class="p">{</span><span class="nx">isMounted</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="nx">dataThatDiffersBetweenClientAndServer</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"some fallback"</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="kc">null</span><span class="p">}</span><span class="o"><</span><span class="err">/div></span>
<span class="w"> </span><span class="p">);</span>
</code></pre></div>
<p>There are similar cases where due to the basic differences between the SSR and the CSR, like some data only being available on client side, one might need to render different content or elements on the two. For example, based on the exact specifications of the user's device, you want to display an app download banner.</p>
<p>For these scenarios, the suggestion would again be to simply wait until the initial hydration phase is finished on the client side, and then render the different content.</p>
<p><strong>Note</strong>: in such cases, be mindful of layout shifts that can happen as a result of some element popping into the view.</p>
<p>With that out of the way, let's dive into the list of issues.</p>
<h4>1. Timers</h4>
<p>This is a common and somewhat expected source of hydration mismatch issues simply because if you're calculating and rendering the distance between two specific points in time (usually from past/future to now), it will result in slightly different values when calculated on SSR compared to a few moments later on CSR.</p>
<p>As also mentioned in <a href="https://react.dev/reference/react-dom/client/hydrateRoot#suppressing-unavoidable-hydration-mismatch-errors">React docs</a>, in such cases where the mismatch is unavoidable, the suggestion is to simply tell React that the difference is expected and that React should ignore the mismatch during hydration. The way to do this is by passing the prop <code>suppressHydrationWarning={true}</code> to the element that contains such a mismatch. Keep in mind that this prop only works one level deep, so you have to pass it to the closest element wrapping the mismatching text. For example:</p>
<p><strong>Instead of</strong></p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="c1">//...</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">timeDistance</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">targetDate</span><span class="p">.</span><span class="nx">getTime</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nb">Date</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="o"><</span><span class="nx">div</span><span class="o">></span><span class="p">{</span><span class="nx">timeDistance</span><span class="p">}</span><span class="o"><</span><span class="err">/div></span>
<span class="w"> </span><span class="p">);</span>
</code></pre></div>
<p><strong>Do</strong></p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="c1">//...</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">timeDistance</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">targetDate</span><span class="p">.</span><span class="nx">getTime</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nb">Date</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="o"><</span><span class="nx">div</span><span class="w"> </span><span class="nx">suppressHydrationWarning</span><span class="o">=</span><span class="p">{</span><span class="kc">true</span><span class="p">}</span><span class="o">></span><span class="p">{</span><span class="nx">timeDistance</span><span class="p">}</span><span class="o"><</span><span class="err">/div></span>
<span class="w"> </span><span class="p">);</span>
</code></pre></div>
<h4>2. Localization of dates and different time-zones</h4>
<p>Converting date values from raw formats (e.g. ISO 8601 <code>2023-01-01T20:00:00.000Z</code>) to human-readable strings can be a tricky cause of hydration mismatch errors. <br/>
Because if the timezone used for conversion is different between the server and client, the resulting values can be different as well.</p>
<p>So for example if the timezone is not specified while using the localization APIs (e.g. <code>Intl.DateTimeFormat</code> or <code>Date.prototype.toLocaleString</code>), then the host timezone will be used and if the SSR server has a different timezone than the user, it will lead to different localized date values in the end.</p>
<p>It's hard to decide what the best solution is in these cases especially because as of now it is not possible to know the exact local timezone of the user on SSR based on http headers (in the initial request). <br/>
On top of that, the question of which timezone to use for displaying dates is ultimately a product decision.</p>
<p>But if a specific universal timezone is approved and provided (for example the website's domain's matching timezone), then specifying that universal timezone to the conversion APIs on both the client and server code can fix this issue. Meaning:</p>
<p><strong>Instead of</strong></p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="c1">//...</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="o"><</span><span class="nx">div</span><span class="o">></span>
<span class="w"> </span><span class="p">{</span><span class="nx">someDate</span><span class="p">.</span><span class="nx">toLocaleString</span><span class="p">(</span><span class="nx">locale</span><span class="p">)}</span>
<span class="w"> </span><span class="p">{</span><span class="ow">new</span><span class="w"> </span><span class="nb">Intl</span><span class="p">.</span><span class="nx">DateTimeFormat</span><span class="p">(</span><span class="nx">locale</span><span class="p">).</span><span class="nx">format</span><span class="p">(</span><span class="nx">someDate</span><span class="p">)}</span>
<span class="w"> </span><span class="o"><</span><span class="err">/div></span>
<span class="w"> </span><span class="p">);</span>
</code></pre></div>
<p><strong>Do</strong></p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="c1">//...</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="o"><</span><span class="nx">div</span><span class="o">></span>
<span class="w"> </span><span class="p">{</span><span class="nx">someDate</span><span class="p">.</span><span class="nx">toLocaleString</span><span class="p">(</span><span class="nx">locale</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">timeZone</span><span class="o">:</span><span class="w"> </span><span class="nx">universalTimezone</span><span class="w"> </span><span class="p">})}</span>
<span class="w"> </span><span class="p">{</span><span class="ow">new</span><span class="w"> </span><span class="nb">Intl</span><span class="p">.</span><span class="nx">DateTimeFormat</span><span class="p">(</span><span class="nx">locale</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">timeZone</span><span class="o">:</span><span class="w"> </span><span class="nx">universalTimezone</span><span class="w"> </span><span class="p">}).</span><span class="nx">format</span><span class="p">(</span><span class="nx">someDate</span><span class="p">)}</span>
<span class="w"> </span><span class="o"><</span><span class="err">/div></span>
<span class="w"> </span><span class="p">);</span>
</code></pre></div>
<p>That being said, depending on the situation and product requirements, an alternative approach would be to just move the conversion to the backend so that the client simply receives dates in the localized format - which has passed through timezone transformation (and localisation).</p>
<h4>3. Localization of numbers</h4>
<h5 style="opacity: 0.7">(and a Safari bug for "de-AT" locale!)</h5>
<p>Similar to converting dates and importance of timezones, when converting raw numbers to localized human-readable strings (e.g. <code>12345</code> to <code>"12,345"</code>) if the locale is not specified, then the host's locale will be used and it can lead to different results. So it's important to always pass a universal locale to these APIs which is consistent during server and client rendering:</p>
<p><strong>Instead of</strong></p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="c1">//...</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="o"><</span><span class="nx">div</span><span class="o">></span>
<span class="w"> </span><span class="p">{</span><span class="nx">someNumber</span><span class="p">.</span><span class="nx">toLocaleString</span><span class="p">()}</span>
<span class="w"> </span><span class="p">{</span><span class="ow">new</span><span class="w"> </span><span class="nb">Intl</span><span class="p">.</span><span class="nx">DateTimeFormat</span><span class="p">().</span><span class="nx">format</span><span class="p">(</span><span class="nx">someNumber</span><span class="p">)}</span>
<span class="w"> </span><span class="o"><</span><span class="err">/div></span>
<span class="w"> </span><span class="p">);</span>
</code></pre></div>
<p><strong>Do</strong></p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="c1">//...</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="o"><</span><span class="nx">div</span><span class="o">></span>
<span class="w"> </span><span class="p">{</span><span class="nx">someNumber</span><span class="p">.</span><span class="nx">toLocaleString</span><span class="p">(</span><span class="nx">universalLocale</span><span class="p">)}</span>
<span class="w"> </span><span class="p">{</span><span class="ow">new</span><span class="w"> </span><span class="nb">Intl</span><span class="p">.</span><span class="nx">DateTimeFormat</span><span class="p">(</span><span class="nx">universalLocale</span><span class="p">).</span><span class="nx">format</span><span class="p">(</span><span class="nx">someNumber</span><span class="p">)}</span>
<span class="w"> </span><span class="o"><</span><span class="err">/div></span>
<span class="w"> </span><span class="p">);</span>
</code></pre></div>
<p>But in very specific cases, we observed that the localisation APIs act differently between SSR and CSR, which again lead to generating different values, thus hydration mismatches!</p>
<p>We particularly encountered this issue with the Safari browser where for the de-AT locale, the localisation APIs (like <code>Intl.NumberFormat</code> or <code>tolocalestring</code>) generate values like <code>"2.345"</code> but other browsers including Chrome and Firefox as well as Node.js generate values like <code>"2 345"</code> for the same locale!</p>
<p>So an alternative approach in these cases would be to receive the final localized values from the backend and show that to the user without needing any more modifications, thus eliminating the mismatches.</p>
<h4>4. Invalid HTML nesting</h4>
<p>This issue might be a new cause of hydration mismatch in React 18, which happens as a result of incorrect HTML like nesting a <code><div></code> inside a <code><p></code> or <code><button></code> inside <code><button></code>. We couldn't find clear documentation from React explaining why HTML validity issues lead to hydration mismatch errors (aside from community discussions <a href="https://github.com/facebook/react/issues/24519">like here</a>). But regardless, to avoid them, adding markup validation steps (like <a href="https://github.com/MananTank/eslint-plugin-validate-jsx-nesting">this eslint plugin</a>) could be helpful.</p>
<p>Either Way, in such cases the obvious goal is to use semantically correct HTML elements while nesting. For example:</p>
<p><strong>Instead of</strong></p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="c1">//...</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="o"><</span><span class="nx">div</span><span class="o">></span>
<span class="w"> </span><span class="o"><</span><span class="nx">p</span><span class="o">><</span><span class="nx">div</span><span class="o">></span><span class="nx">Some</span><span class="w"> </span><span class="nx">text</span><span class="o"><</span><span class="err">/div></p></span>
<span class="w"> </span><span class="o"><</span><span class="nx">button</span><span class="o">><</span><span class="nx">button</span><span class="o">></span><span class="nx">Button</span><span class="w"> </span><span class="nx">text</span><span class="o"><</span><span class="err">/button></button></span>
<span class="w"> </span><span class="o"><</span><span class="err">/div></span>
<span class="w"> </span><span class="p">);</span>
</code></pre></div>
<p><strong>Do</strong></p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="c1">//...</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="o"><</span><span class="nx">div</span><span class="o">></span>
<span class="w"> </span><span class="o"><</span><span class="nx">p</span><span class="o">><</span><span class="nx">span</span><span class="o">></span><span class="nx">Some</span><span class="w"> </span><span class="nx">text</span><span class="o"><</span><span class="err">/span></p></span>
<span class="w"> </span><span class="o"><</span><span class="nx">button</span><span class="o">><</span><span class="nx">span</span><span class="o">></span><span class="nx">Button</span><span class="w"> </span><span class="nx">text</span><span class="o"><</span><span class="err">/span></button></span>
<span class="w"> </span><span class="o"><</span><span class="err">/div></span>
<span class="w"> </span><span class="p">);</span>
</code></pre></div>
<h3>Some debugging tips & tricks</h3>
<p>Soon after receiving the new hydration mismatch logs in our error tracking system (Sentry), it was clear that the most important first step in debugging them is whether we can reproduce them or not! <br/>
Because due to the nature of the React hydration errors in its production bundle, there is not much detail you can get from the error messages in Sentry. Although including the <a href="https://github.com/facebook/react/blob/v18.2.0/packages/react-reconciler/src/ReactInternalTypes.js#L254"><code>componentStack</code></a> from the <code>hydrateRoot</code>‘s <code>onRecoverableError</code> callback in the logs comes in quite handy, (especially after cleaning the stack a bit to make it more readable) but due to code minification and uglifying in production bundle of your application, you will still have to carry out complicated tasks and use the provided line/column numbers to find the closest components with the help of sourcemaps.</p>
<p>On top of that, if a website has dynamic content served to each user like Zalando Fashion Store, it may be even harder to reproduce the exact page (with the same content) that was receiving a specific error.</p>
<p>Another issue we encountered was that the <code>onRecoverableError</code> callback is usually called multiple times by React for a single hydration mismatch problem, both polluting our Sentry logs as well as making the debugging process harder. <br/>
This seems to be due to <a href="https://github.com/facebook/react/blob/fc929cf4ead35f99c4e9612a95e8a0bb8f5df25d/packages/react-reconciler/src/ReactFiberHydrationContext.js#L447">the way hydration phase works</a>, in which React compares a list of available server rendered DOM nodes with a list of client rendered React elements ("fibers") and tries to match them together and basically hydrate the nodes. And when matching and hydration fails for a specific node instance and errors are logged, it <a href="https://github.com/facebook/react/blob/fc929cf4ead35f99c4e9612a95e8a0bb8f5df25d/packages/react-reconciler/src/ReactFiberHydrationContext.js#L474">tries to hydrate the next one</a>. What we observed here was that (at least in some cases) because of the previous mismatching node/fiber, the order of the lists becomes broken, and that leads to all the next ones failing as well. And that means a lot of other hydration mismatch error logs which aren't necessarily correct. <br/>
To mitigate this in the production environment, we modified our error tracking code to only send the first hydration error log to Sentry. We also found this to be very helpful to keep in mind during development debugging.</p>
<p>But in case reproducing the error locally is possible, then we found these steps to be helpful:</p>
<ul>
<li>Work on the first error log, and after it's fixed, check if any other one remains.</li>
<li>Based on the log and the <code>componentStack</code>, find the closest component(s) causing the issue.</li>
<li>In some cases the cause of the issue is obvious in the specified component's source code - for example the issue number 4 mentioned above (Invalid HTML nesting).<ul>
<li>With HTML nesting issues, the log usually contains the text <code>validateDOMNesting(...)</code>.</li>
</ul>
</li>
<li>In other cases where the cause is not very obvious, what we found helpful was to check the React dev bundle (<code>react-dom/umd/react-dom.development.js</code>) and put debuggers on places which log the hydration errors (usually the <code>checkForUnmatchedText</code> or <code>throwOnHydrationMismatch</code> functions).<ul>
<li>Then by loading the page, try to find out what is the exact React fiber that causes the issue, and based on that find the component/element. Don't be afraid to go higher in the stack and use more debuggers!</li>
<li>In some cases we realized that the fiber is the same element that caused the issue, but in others, it's more confusing as the fiber is something that was rendered <strong>after</strong> a mismatching (usually missing) node instance that was the actual cause of the issue.</li>
<li>Here it also helps to check different variables like <code>fiber</code>, <code>nextInstance</code>, <code>current</code>, etc. including their received props.</li>
</ul>
</li>
</ul>
<h2>Conclusion</h2>
<p>The migration to React 18 and its concurrent features was of extra importance for our Rendering Engine framework due to its unique architecture. And despite the challenges, the results have been promising so far, especially since we observed improvements over Fashion Store website’s Core Web Vitals and bounce rate.</p>
<p>Additionally, the upgrade shined a light on the hidden hydration mismatch issues scattered in different components, which led us to not only fix many of them, but also collect and internally document them along with recommendations and debugging tips for further reference.</p>
<h2>Next Steps</h2>
<p>We are planning to share more detailed posts in the future about the architecture and technical specs of Rendering Engine - especially in light of the Concurrent features. <br/>
Additionally, we aim to share the effects of the new features and the final architecture on Zalando Fashion Store's performance.</p>
<p>Next up, we're excited to start using React Server Components which have shown great promise so far. Stay tuned!</p>Riptide HTTP Client tutorial2023-06-29T00:00:00+02:002023-06-29T00:00:00+02:00Olga Semernitskaiatag:engineering.zalando.com,2023-06-29:/posts/2023/06/riptide-http-client-tutorial.html<p>Riptide: learning the fundamentals of the open source Zalando HTTP client</p><p><img alt="Riptide logo - big ocean wave" src="https://engineering.zalando.com/posts/2023/06/images/wave.jpg#center"></p>
<h2>Overview</h2>
<p><a href="https://github.com/zalando/riptide">Riptide</a> is a Zalando open source Java HTTP client
that implements declarative client-side response routing.
It allows dispatching HTTP responses very easily to different handler methods based on various characteristics of the response,
including status code, status family, and content type.
The way this works is similar to server-side request routing, where any request that reaches a web application
is usually routed to the correct handler based on the combination of URI (including query and path parameters), method,
Accept and Content-Type header.
With Riptide, you can define handler methods on the client side based on the response characteristics.
See <a href="https://github.com/zalando/riptide/blob/main/docs/concepts.md">the concept document</a> for more details. Riptide is part of the core Java/Kotlin stack and is used in production by hundreds of applications at Zalando.</p>
<p>In this tutorial, we'll explore the fundamentals of Riptide HTTP client. We'll learn how to initialize it and examine various use cases:
sending simple GET and POST requests, and processing different responses.</p>
<h2>Maven Dependencies</h2>
<p>First, we need to add the library as a dependency into the <code>pom.xml</code> file:</p>
<div class="highlight"><pre><span></span><code><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>org.zalando<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>riptide-core<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>${riptide.version}<span class="nt"></version></span>
<span class="nt"></dependency></span>
</code></pre></div>
<p>Check <a href="https://mvnrepository.com/artifact/org.zalando/riptide">Maven Central page</a>
to see the latest version of the library.</p>
<h2>Client Initialization</h2>
<p>To send HTTP requests, we need to build an <code>Http</code> object, then we can use it for all our HTTP requests for
the specified base URL:</p>
<div class="highlight"><pre><span></span><code><span class="n">Http</span><span class="p">.</span><span class="na">builder</span><span class="p">()</span>
<span class="w"> </span><span class="p">.</span><span class="na">executor</span><span class="p">(</span><span class="n">executor</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">requestFactory</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">SimpleClientHttpRequestFactory</span><span class="p">())</span>
<span class="w"> </span><span class="p">.</span><span class="na">baseUrl</span><span class="p">(</span><span class="n">getBaseUrl</span><span class="p">(</span><span class="n">server</span><span class="p">))</span>
<span class="w"> </span><span class="p">.</span><span class="na">build</span><span class="p">();</span>
</code></pre></div>
<h2>Sending Requests</h2>
<p>Sending requests using Riptide is pretty straightforward:
you need to use an appropriate method from the created <code>Http</code> object depending on the HTTP request method.
Additionally, you can provide a request body, query params, content type, and request headers.</p>
<h3>GET Request</h3>
<p>Here is an example of sending a simple GET request:</p>
<div class="highlight"><pre><span></span><code><span class="n">http</span><span class="p">.</span><span class="na">get</span><span class="p">(</span><span class="s">"/products"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">header</span><span class="p">(</span><span class="s">"X-Foo"</span><span class="p">,</span><span class="w"> </span><span class="s">"bar"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">call</span><span class="p">(</span><span class="n">pass</span><span class="p">())</span>
<span class="w"> </span><span class="p">.</span><span class="na">join</span><span class="p">();</span>
</code></pre></div>
<h3>POST Request</h3>
<p>POST requests also can be sent easily:</p>
<div class="highlight"><pre><span></span><code><span class="n">http</span><span class="p">.</span><span class="na">post</span><span class="p">(</span><span class="s">"/products"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">header</span><span class="p">(</span><span class="s">"X-Foo"</span><span class="p">,</span><span class="w"> </span><span class="s">"bar"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">contentType</span><span class="p">(</span><span class="n">MediaType</span><span class="p">.</span><span class="na">APPLICATION_JSON</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">body</span><span class="p">(</span><span class="s">"str_1"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">call</span><span class="p">(</span><span class="n">pass</span><span class="p">())</span>
<span class="w"> </span><span class="p">.</span><span class="na">join</span><span class="p">();</span>
</code></pre></div>
<p>In the next sections, we will explain the meanings of the <code>call</code>, <code>pass</code>, and <code>join</code> methods from the code snippets above.</p>
<h2>Response Routing</h2>
<p>One of the main features of the Riptide HTTP client is declarative response routing.
We can use the <code>dispatch</code> method to specify processing logic (routes) for different response types.
The <code>dispatch</code> method accepts the <code>Navigator</code> object as its first parameter, this parameter specifies which response attribute
will be used for the routing logic.</p>
<p>Riptide has several default <code>Navigator</code>-s:</p>
<table>
<thead>
<tr>
<th>Navigator</th>
<th>Response characteristic</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>Navigators.series()</code></td>
<td>Class of status code</td>
</tr>
<tr>
<td><code>Navigators.status()</code></td>
<td>Status</td>
</tr>
<tr>
<td><code>Navigators.statusCode()</code></td>
<td>Status code</td>
</tr>
<tr>
<td><code>Navigators.reasonPhrase()</code></td>
<td>Reason Phrase</td>
</tr>
<tr>
<td><code>Navigators.contentType()</code></td>
<td>Content-Type header</td>
</tr>
</tbody>
</table>
<h3>Simple Routing</h3>
<p>Let's see how we can use response routing:</p>
<div class="highlight"><pre><span></span><code><span class="n">http</span><span class="p">.</span><span class="na">get</span><span class="p">(</span><span class="s">"/products/{id}"</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">dispatch</span><span class="p">(</span><span class="n">status</span><span class="p">(),</span>
<span class="w"> </span><span class="n">on</span><span class="p">(</span><span class="n">OK</span><span class="p">).</span><span class="na">call</span><span class="p">(</span><span class="n">Product</span><span class="p">.</span><span class="na">class</span><span class="p">,</span><span class="w"> </span><span class="n">product</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">log</span><span class="p">.</span><span class="na">info</span><span class="p">(</span><span class="s">"Product: "</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">product</span><span class="p">)),</span>
<span class="w"> </span><span class="n">on</span><span class="p">(</span><span class="n">NOT_FOUND</span><span class="p">).</span><span class="na">call</span><span class="p">(</span><span class="n">response</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">log</span><span class="p">.</span><span class="na">warn</span><span class="p">(</span><span class="s">"Product not found"</span><span class="p">)),</span>
<span class="w"> </span><span class="n">anyStatus</span><span class="p">().</span><span class="na">call</span><span class="p">(</span><span class="n">pass</span><span class="p">()))</span>
<span class="w"> </span><span class="p">.</span><span class="na">join</span><span class="p">();</span>
</code></pre></div>
<p>In this example, we demonstrate retrieving a product by its ID and handling the responses.
We use the <code>Navigators.status()</code> static method to route our responses based on their statuses.
We then describe processing logic for different statuses:</p>
<ul>
<li><code>OK</code> - we use a version of the <code>call</code> method that deserializes the response body
into the specified type (<code>Product</code> in our case). This deserialized object is then used as a parameter
for a consumer, which is passed as a second argument to the <code>call</code> method.
In our example, the consumer simply logs the <code>Product</code> object.</li>
<li><code>NOT_FOUND</code> - we assume that we won't receive a <code>Product</code> response, so we use
another version of the <code>call</code> method with a single argument: a consumer accepting <code>org.springframework.http.client.ClientHttpResponse</code>.
In this scenario, we decide to log a warning message.</li>
<li>All other statuses we intend to process in the same way. To achieve this we use the <code>Bindings.anyStatus()</code> static function,
allowing us to describe the processing logic for all remaining statuses. In our case, we have decided that no action
is required for such statuses, so we utilize the <code>PassRoute.pass()</code> static method, that returns do-nothing handler.</li>
</ul>
<p>In Riptide all requests are sent using an <code>Executor</code> (configured in the <code>executor</code> method in the <strong>Client initialization</strong> section).
Because of this, responses are always processed in separate threads and the
<code>dispatch</code> method returns <code>CompletableFuture<ClientHttpResponse></code>. To make the invoking thread waiting
for the response to be processed, we use the <code>join()</code> method in our example.</p>
<h3>Nested Routing</h3>
<p>We can have nested (multi-level) routing for our responses. For example, the first level of routing can be based
on the response <code>series</code>, and the second level - on specific status codes:</p>
<div class="highlight"><pre><span></span><code><span class="n">http</span><span class="p">.</span><span class="na">get</span><span class="p">(</span><span class="s">"/products/{id}"</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">dispatch</span><span class="p">(</span><span class="n">series</span><span class="p">(),</span>
<span class="w"> </span><span class="n">on</span><span class="p">(</span><span class="n">SUCCESSFUL</span><span class="p">).</span><span class="na">call</span><span class="p">(</span><span class="n">Product</span><span class="p">.</span><span class="na">class</span><span class="p">,</span><span class="w"> </span><span class="n">product</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">log</span><span class="p">.</span><span class="na">info</span><span class="p">(</span><span class="s">"Product: "</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">product</span><span class="p">)),</span>
<span class="w"> </span><span class="n">on</span><span class="p">(</span><span class="n">CLIENT_ERROR</span><span class="p">).</span><span class="na">dispatch</span><span class="p">(</span>
<span class="w"> </span><span class="n">status</span><span class="p">(),</span>
<span class="w"> </span><span class="n">on</span><span class="p">(</span><span class="n">NOT_FOUND</span><span class="p">).</span><span class="na">call</span><span class="p">(</span><span class="n">response</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">log</span><span class="p">.</span><span class="na">warn</span><span class="p">(</span><span class="s">"Product not found"</span><span class="p">)),</span>
<span class="w"> </span><span class="n">on</span><span class="p">(</span><span class="n">TOO_MANY_REQUESTS</span><span class="p">).</span><span class="na">call</span><span class="p">(</span><span class="n">response</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="p">{</span><span class="k">throw</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">RuntimeException</span><span class="p">(</span><span class="s">"Too many reservation requests"</span><span class="p">);}),</span>
<span class="w"> </span><span class="n">anyStatus</span><span class="p">().</span><span class="na">call</span><span class="p">(</span><span class="n">pass</span><span class="p">())),</span>
<span class="w"> </span><span class="n">on</span><span class="p">(</span><span class="n">SERVER_ERROR</span><span class="p">).</span><span class="na">call</span><span class="p">(</span><span class="n">response</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="p">{</span><span class="k">throw</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">RuntimeException</span><span class="p">(</span><span class="s">"Server error"</span><span class="p">);}),</span>
<span class="w"> </span><span class="n">anySeries</span><span class="p">().</span><span class="na">call</span><span class="p">(</span><span class="n">pass</span><span class="p">()))</span>
<span class="w"> </span><span class="p">.</span><span class="na">join</span><span class="p">();</span>
</code></pre></div>
<p>In the example above, we implement nested routing. First, we dispatch our responses based on the <code>series</code> using the
static method <code>Navigators.series()</code>, and then we dispatch <code>CLIENT_ERROR</code> responses based on their specific statuses.
For other series such as <code>SUCCESSFUL</code>, we utilize a single handler per series without any nested routing.</p>
<p>Similar to the previous example, we use the <code>PassRoute.pass()</code> static method to skip actions for certain cases.
Additionally, we use <code>Bindings.anyStatus()</code> and <code>Bindings.anySeries()</code> methods to define default behavior
for all series or statuses that are not explicitly described. Furthermore, in this example, we've chosen to throw
exceptions for specific cases, these exceptions can be then caught and processed in the invoking code -
see <code>TOO_MANY_REQUESTS</code> status and <code>SERVER_ERROR</code> series routes.</p>
<h2>Returning Response Objects</h2>
<p>In some cases we need to return a response object from the REST endpoints invocation - we can use a <code>riptide-capture</code> module to do so.</p>
<p>Let's take a look on a simple example:</p>
<div class="highlight"><pre><span></span><code><span class="n">ClientHttpResponse</span><span class="w"> </span><span class="n">clientHttpResponse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">http</span><span class="p">.</span><span class="na">get</span><span class="p">(</span><span class="s">"/products/{id}"</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">dispatch</span><span class="p">(</span><span class="n">status</span><span class="p">(),</span>
<span class="w"> </span><span class="n">on</span><span class="p">(</span><span class="n">OK</span><span class="p">).</span><span class="na">call</span><span class="p">(</span><span class="n">Product</span><span class="p">.</span><span class="na">class</span><span class="p">,</span><span class="w"> </span><span class="n">product</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">log</span><span class="p">.</span><span class="na">info</span><span class="p">(</span><span class="s">"Product: {}"</span><span class="p">,</span><span class="w"> </span><span class="n">product</span><span class="p">)),</span>
<span class="w"> </span><span class="n">anyStatus</span><span class="p">().</span><span class="na">call</span><span class="p">(</span><span class="n">response</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="p">{</span><span class="k">throw</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">RuntimeException</span><span class="p">(</span><span class="s">"Invalid status"</span><span class="p">);}))</span>
<span class="w"> </span><span class="p">.</span><span class="na">join</span><span class="p">();</span>
</code></pre></div>
<p>As mentioned earlier, when we invoke the <code>dispatch</code> method, it returns a <code>CompletableFuture<ClientHttpResponse></code>.
If we then invoke the <code>join()</code> method and wait for the result of invocation - we'll get an object of type <code>ClientHttpResponse</code>.
However, with the assistance of the <code>riptide-capture</code> module, we can return a deserialized object from
the response body instead. In our example, the deserialized object has a type <code>Product</code>.</p>
<p>First, we need to add a dependency for the <code>riptide-capture</code> module:</p>
<div class="highlight"><pre><span></span><code><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>org.zalando<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>riptide-capture<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>${riptide.version}<span class="nt"></version></span>
<span class="nt"></dependency></span>
</code></pre></div>
<p>Now let's rewrite the previous example using the <code>Capture</code> class. This class allows us to extract a value of
a specified type from the response body:</p>
<div class="highlight"><pre><span></span><code><span class="n">Capture</span><span class="o"><</span><span class="n">Product</span><span class="o">></span><span class="w"> </span><span class="n">capture</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Capture</span><span class="p">.</span><span class="na">empty</span><span class="p">();</span>
<span class="n">Product</span><span class="w"> </span><span class="n">product</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">http</span><span class="p">.</span><span class="na">get</span><span class="p">(</span><span class="s">"/products/{id}"</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">dispatch</span><span class="p">(</span><span class="n">status</span><span class="p">(),</span>
<span class="w"> </span><span class="n">on</span><span class="p">(</span><span class="n">OK</span><span class="p">).</span><span class="na">call</span><span class="p">(</span><span class="n">Product</span><span class="p">.</span><span class="na">class</span><span class="p">,</span><span class="w"> </span><span class="n">capture</span><span class="p">),</span>
<span class="w"> </span><span class="n">anyStatus</span><span class="p">().</span><span class="na">call</span><span class="p">(</span><span class="n">response</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="p">{</span><span class="k">throw</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">RuntimeException</span><span class="p">(</span><span class="s">"Invalid status"</span><span class="p">);}))</span>
<span class="w"> </span><span class="p">.</span><span class="na">thenApply</span><span class="p">(</span><span class="n">capture</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">join</span><span class="p">();</span>
</code></pre></div>
<p>In this example, we pass the <code>capture</code> object to the route for the <code>OK</code> status. The purpose of the <code>capture</code> object
is to deserialize the response body into a <code>Product</code> object and store it for future use.
Then we invoke the <code>thenApply(capture)</code> method to retrieve stored <code>Product</code> object. The <code>thenApply(capture)</code> method
will return a <code>CompletableFuture<Product></code>, so we again can utilize the <code>join()</code> method
to get a <code>Product</code> object, as we did in the previous examples.
See also <a href="https://github.com/zalando/riptide/tree/main/riptide-capture">the riptide-capture module page</a> for more details.</p>
<h2>Conclusion</h2>
<p>In this article, we've demonstrated the fundamental use cases of the Riptide HTTP client.
You can find the code snippets with complete imports on <a href="https://github.com/zalando-incubator/riptide-demo/tree/main/src/test/java/org/zalando/fundamentals">GitHub</a>.</p>
<p>In future articles, we'll explore usage of Riptide plugins - they provide additional logic for your REST client,
such as retries, authorization, metrics publishing etc. Additionally, we'll look at Riptide Spring Boot starter,
that simplifies an <code>Http</code> object initialization.</p>
<hr>
<p><em>We're hiring! Join one of our <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&search=software%20engineer&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B1%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B2%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B3%5D=Software%20Engineering%20-%20Frontend&filters%5Bcategories%5D%5B4%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Machine%20Learning&filters%5Bcategories%5D%5B7%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B8%5D=Software%20Engineering%20-%20Principal%20Engineering">Software Engineering</a> teams at Zalando.</em></p>Context Based Experience in Zalando2023-06-26T00:00:00+02:002023-06-26T00:00:00+02:00Shlomi Israeltag:engineering.zalando.com,2023-06-26:/posts/2023/06/context-based-experience-in-zalando.html<p>Using context-aware decisions to provide partner-tailored experiences, and how we achieved this for our selective distribution brands</p><p>In 2022 we developed a unique partner experience that speaks to dedicated requirements from selective distribution brands and retailers around visual representation, brand storytelling and protecting brand equity. Our solution provides dedicated brand exposure across the experience and at the same time respects special requirements to secure brand equity. In order to achieve consistency with other articles, a general context-aware mechanism needed to be implemented.</p>
<p>We derived a plan to create distinction and elevation in the experience. The criteria for enabling an experience are based on explicit customer intent. For instance, searching for the retailer name or one of its brands will enable the elevated experience. Viewing their product details page will also enable it. These intentions are identified by our backend systems with specific business domain rules, i.e. the Search backend will have different rules from the Product backend.</p>
<p>To date, the Fashion Store was based solely on domain-specific data. These new rules, defined on customer intent and context, introduced new challenges in Zalando, and required a new solution. For instance, the same product can behave differently depending on that context. While viewing the catalog without any intent for a brand distinctive experience, for the sake of consistency, all products, including ones belonging to other distribution brands have a gray background, even though the brand elevated experience may dictate, for example, a white background.</p>
<p>In order to achieve this we needed to identify what we should apply for each use case, meaning what are the brand's requirements, and when they should be applied - which rules should be checked in order to understand the customer's context or intent.</p>
<p>Brand requirements can be a complicated matter. We identified some which were global on the merchant level; for instance, let's say one of the distribution brands are required to have different packshot images, with white backgrounds, whilst we typically use gray backgrounds in Zalando. Other requirements are brand-specific. Some brands are only to be shown in the product catalog when the brand or its products are explicitly requested to be shown by specific search queries or via catalog filters.</p>
<p>In order to support different kinds of requirements, we use the concept of <em>experiences</em>. Experiences are simply a collection of policies that we need to apply, and a list of selection rules.</p>
<p>For example, a policy may be the theme configuration that needs to be applied, or whether we are allowed to show the product under certain conditions. The selection rules define the criteria that enable the experience, e.g. selection by brand codes. This means that selecting a specific brand in the brand filter will change the experience to the one that has been configured for that brand.</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"XP_ID"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"XP_NAME"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"policies"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"THEME"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"THEME_NAME"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"theme_config1"</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span>
<span class="w"> </span><span class="err">...</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PRODUCT__FLAGS__HIDE_SALE"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="nt">"selection_metadata"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"experience_brands"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"brand_code"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"BRANDNAME"</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div>
<p>Selection rules can be another complicated matter. For instance, how to decide which experience to choose when two brands belong to different experiences? Thinking about the right use cases to support the business needs, whilst keeping simplicity is the key. Our approach to solving some cases is to define <em>Fallback</em> experiences, to be able to catch these use-cases.</p>
<p>As mentioned in other <a href="/tags/microservices.html">posts</a> here in Zalando Engineering Blog, Zalando has many microservices, and even our <a href="https://engineering.zalando.com/posts/2021/03/micro-frontends-part1.html">Frontend’s architecture</a> is based on micro frontends. We defined the general data structure to understand the experience, but how can we orchestrate it across Zalando's ecosystem?</p>
<p>In order to get into that, we need to break down the flow into two steps. The first one is the <em>Experience Resolution</em> step. This starts very early <a href="https://engineering.zalando.com/posts/2021/09/micro-frontends-part2.html">during the root entity resolution</a>.</p>
<p>Let's say that a customer browses a catalog page. This will send a request to Rendering Engine, which will resolve the root entity by sending a request to the Fashion Store API (GraphQL), which will then query the Catalog backend system. The catalog has its own business logic to understand the customer’s intent and it will find the best matching experience, using its <code>selection_metadata</code>.</p>
<p>The resolved experience name is then stored in the Rendering Engine request state.</p>
<p><img alt="Root Entity Experience Resolution" src="https://engineering.zalando.com/posts/2023/06/images/root-entity-resolution.png#center"></p>
<figcaption style="text-align:center">Fig 1. Root Entity Experience Resolution</figcaption>
<p><br/>At this point we have only resolved the root entity. We don’t yet know which renderers (micro-frontends) are required. During this process, we start the second step, where each one of them will query Fashion store API independently, only this time the query will use the previously resolved experience. In the catalog, we have product cards, whose data is populated by a different backend, the Product backend. As we have already resolved the experience, the Product backend can now understand which policies are required. For Zalando’s experience it will select the gray background images with the watermark, instead of the white ones.</p>
<p><img alt="Child Renderers with stored exprience" src="https://engineering.zalando.com/posts/2023/06/images/child-renderer-resolution.png#center"></p>
<figcaption style="text-align:center">Fig 2. Child Renderers are reusing previous resolved experience</figcaption>
<p><br/>Using this new mechanism, we successfully managed to introduce new concepts to Zalando. It has opened a door for so many new possibilities that we can leverage to further enhance the customer experience.</p>
<hr>
<p><em>We're hiring! If you're passionate about similar challenges, join one of our <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&search=react%20typescript">Frontend teams</a>!</em></p>How Software Bill of Materials change the dependency game2023-04-13T00:00:00+02:002023-04-13T00:00:00+02:00Bartosz Ocytkotag:engineering.zalando.com,2023-04-13:/posts/2023/04/how-sboms-change-the-dependency-game.html<p>In this post, we explain what questions and insights Software Bill of Materials (SBOMs) provide across thousands of microservices</p><h2>Dependency hygiene</h2>
<p>Dependency updates are a tedious task when maintaining thousands of microservices.
Some teams use tools like <a href="https://github.com/dependabot">dependabot</a>, <a href="https://github.com/scala-steward-org/scala-steward">scala-steward</a> that create pull requests in repositories when new library versions are available. Other teams update dependencies regularly in bulk, supported by build system plugins (e.g. <a href="https://www.mojohaus.org/versions-maven-plugin/">maven-versions-plugin</a>, <a href="https://github.com/ben-manes/gradle-versions-plugin">gradle-versions-plugin</a>). Playing the catch-up game and getting some visibility through incoming pull requests or changes is far from great, though and we can do better here.</p>
<h2>On the importance of dependency data and hygiene</h2>
<p>What's needed for dependency management is the ability to get a complete picture of used dependencies over time and analyze trends over time. This granular data allows teams to step up their game.</p>
<p>Critical vulnerabilities in commonly used libraries (e.g. log4j, spring, commons-text) require an ability to find all affected applications in minutes. Only this way can the impact of a vulnerability be assessed and mitigated quickly. Some projects, like openssl, preannounce security updates allowing for more preparation time.</p>
<p>Similarly, upgrades to major versions of libraries, changes in licensing of open-source libraries (for example Akka) create the need to understand the library footprint to assess the need for action or migration costs. Bugs in libraries tend to eventually trigger production incidents and it's necessary to have a way to find all affected teams, track progress of patches across all applications, and identify reasons why teams struggle to keep up.</p>
<p>At Zalando, we use <strong>Software Bill of Materials</strong> (aka. SBOMs) to help answer various questions about application dependencies. We publish a curated data set containing dependency data from the SBOM for every application we deploy, based on its Container image. The data set is available in our data lake and thus can be easily queried and visualized by any engineer.</p>
<h2>What are SBOMs?</h2>
<p>The Software Bill of Materials contains information about the packages and libraries used by an application. It can be generated for an application based on its source code or extracted from a Docker container. The SBOM includes packages used by the operating system as well as the application and its dependencies. For each entry, the name, version, and license is tracked. Common formats like <a href="https://cyclonedx.org/specification/overview/">CycloneDX</a> or <a href="https://github.com/spdx/spdx-spec/blob/v2.2/schemas/spdx-schema.json">SPDX</a> help with portability and integration into various tooling. For example, <a href="https://github.com/anchore/syft">syft</a> can generate an SBOM file that can be further parsed with <a href="https://github.com/anchore/grype">grype</a> to periodically scan the application's SBOMs for vulnerabilities. On top, GitHub introduced recently an <a href="https://github.blog/2023-03-28-introducing-self-service-sboms/">on-demand SBOM generation</a> feature.</p>
<p>The SBOM needs to be generated with every software change, for example as part of the CI/CD pipeline. Some countries recommend or even mandate the use of SBOMs in certain scenarios in order to manage cyber security and software supply chain risks (see <a href="https://media.defense.gov/2022/Sep/01/2003068942/-1/-1/0/ESF_SECURING_THE_SOFTWARE_SUPPLY_CHAIN_DEVELOPERS.PDF">Securing the Software Supply Chain: Recommended Practices Guide for Developers</a>).</p>
<h2>What questions can the SBOM help to answer?</h2>
<p>In the context of dependency management, SBOMs collected for all applications help us answer a variety of questions:</p>
<ul>
<li>Which applications use dependency X (in version Y)?</li>
<li>How many distinct versions of dependency X do we use across all applications?</li>
<li>Does the dependency hygiene differ per language?</li>
<li>How quickly after release, are new versions of libraries adopted? Does adoption differ for versions that have known security vulnerabilities?</li>
<li>When adopting a new Docker base image, what are its contents?</li>
<li>Which application has dependencies licensed under license X?</li>
<li>Which distinct licences are being used by application dependencies?</li>
</ul>
<p>From Docker image metadata, we can infer the owning team and thus target communication when reaching out to teams. For large-scale patch actions (like the famous log4j upgrade), we prepare change sets for different types of build files and automate the Pull Request creation across all repositories. This allows for central tracking of the patch progress and requires minimal support from the team for the deployment.</p>
<p>Another insight from analyzing the SBOM data was our usage of the AWS SDK. We noticed that some applications were using the full SDK (200MB+ in Java) instead of its individual modules. Addressing this finding helped reduce build times and lower resulting docker image size significantly.</p>
<h2>Show me real data!</h2>
<p>Our diverse application footprint across languages allows us to perform a comparison of the amount of libraries typical applications have.
Looking at the data, the number of dependencies grows exponentially. Here an example for Python:</p>
<p><img alt="Number of dependencies in Python applications" src="https://engineering.zalando.com/posts/2023/04/images/sbom-python-dependencies-per-application.png#center"></p>
<figcaption style="text-align:center">Fig 1. Number of dependencies in Python applications</figcaption>
<p><br/>Looking across languages we have two outliers that have the most amount of dependencies.
For Python it's jupyter (2.5x next biggest app) and for Java it's tableau (3.14x next biggest app).</p>
<p>To compare how hungry each language ecosystem is for dependencies, we can plot the percentiles for the number of dependencies per application. Python wins the race with the lowest amount of dependencies, followed by golang (ca. 1.4-2x when compared to Python). Next in line is Java (covers Java, Kotlin, Scala as the SBOM scanner detects java-archives) with 2-3x more dependencies than golang and lastly JavaScript (incl. TypeScript) with 5-10x more dependencies than Java.</p>
<p><img alt="Number of dependencies per language" src="https://engineering.zalando.com/posts/2023/04/images/dependencies-per-language.png#center"></p>
<figcaption style="text-align:center">Fig 2. Number of dependencies per language</figcaption>
<h3>Another popular library used across Java and Kotlin projects</h3>
<p>This example highlights the challenge with long-term maintenance of a large application footprint. As the frequency of changes to an application reduces, it's more difficult for teams to plan dependency updates for those applications, unless there are security issues to address. The following graph looks at the usage of an internal library with three data snapshots.</p>
<p><img alt="Usage of an internal library plotted over time" src="https://engineering.zalando.com/posts/2023/04/images/internal-library-usage.png#center"></p>
<figcaption style="text-align:center">Fig 3. Usage of an internal library</figcaption>
<p><br/>We can see that versions 0.22.0+ exhibit expected behavior by being replaced with the next available version. On the other hand, usage of version 0.21.0 constantly increases, even though three newer versions are available in Q4. This situation requires further inspection. It is likely that new applications are created by using the same application template, which misses the dependency update.</p>
<h2>SBOM Data quality</h2>
<p>The SBOM data quality varies. For the JVM languages, we observed differing package names, group ids being detected. This increases the complexity of correlating library use across languages. Further, some SBOMs did not show any java-archive entries, because the team's build process flattened all dependencies into an uber-jar and the required metadata needed for library detection was lost. Hence, we recommend caution when using SBOM tools and double-checking that the SBOM generation works correctly for all applications.</p>
<h2>Summary and future outlook</h2>
<p>In addition to smaller findings like the one with AWS SDK, the value of SBOMs has already been proven with the very low time it takes us to analyze the impact of the Akka license change or CVEs.</p>
<p>We look to dive deeper into our SBOM data as we collect more historical data. Aside from observing trends on library usage and adoption, we hope to be able to correlate dependency data with dependency hygiene practices, deployment frequency, change failure rates, and lead times for each application. For our shared libraries, we aim to understand how to help reduce the burden of dependency updates acknowledging that plugin adoption is insufficient to remain a healthy dependency posture.</p>
<p>If you're not using SBOMs for dependency analysis yet, you're missing out on a great tool helping you to create more transparency. We're curious to read your stories and insights on SBOMs.</p>
<hr>
<p><em>If you're as passionate about Software Engineering as we are, take a look at <a href="https://jobs.zalando.com/en/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B1%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B2%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B3%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B4%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Principal%20Engineering">open positions in our Engineering teams</a>.</em></p>Gender Equity in IT Panel by Zalando Women in Tech Employee Resource Group2023-04-12T00:00:00+02:002023-04-12T00:00:00+02:00Anja Bergnertag:engineering.zalando.com,2023-04-12:/posts/2023/04/gender-equity-in-it-panel-women-in-tech.html<p>Three Women in Tech leaders discuss Gender Equity in IT on a discussion panel organized by our Women in Tech Employee Resource Group.</p><p><img alt="Our panelists on stage, from left to right: Ana Peleteiro Ramallo (host), Tian Su, Joyce Chen" src="https://engineering.zalando.com/posts/2023/04/images/gender-equity-1.jpeg#center"></p>
<p>As part of their week-long International Women's Day event series, the Zalando Women's Network and the Zalando Women in Tech Employee Resource Groups recently held an event to discuss the challenges that women in tech face in the workplace and to share ideas about how to overcome them. We welcomed women in tech leadership to the panel, who shared their experiences and insights into the world of work: Joyce Chen, VP Engineering Beauty; Tian Su, VP Customers, and host Ana Peleteiro Ramallo, Director of Applied Science.</p>
<p>Joyce Chen shared her past experience of being the first woman engineer in an all-men engineering group. She acknowledged that unconscious bias education has made progress over the last 10 years, and that she now has the language to describe what she went through. However, she also noted that the ratio of women to men in engineering, particularly in leadership positions, is still not good enough. To overcome this, Joyce shared the importance of mentoring, sponsorship, and reskilling.</p>
<p>Joyce also acknowledged that she often feels like she needs to work harder to prove her worth in a field dominated by men. She highlighted that this is a common feeling among women, and it stems from historic biases that still exist today.
<em>"To overcome this feeling: network, seek mentorship, believe in yourself, and empower yourself to achieve greatness."</em></p>
<p>Tian Su highlighted, <em>"Men have historically been in leadership positions and therefore shaped society's perception of what good leadership looks like. This is why leadership is often seen through masculine traits. By bringing diversity into leadership, we can get different leadership styles, which can be beneficial for everyone."</em> Tian also discussed the challenges in a former company of being the only mother on her team, which meant that she was not always able to attend social and training events after work. However, when she shared this with her former team, they realised that they hadn't considered this at all! They took the time and care to understand her situation, and they improved.</p>
<p>Ana Peleteiro Ramallo explained, <em>"The way we think we need to behave at work is shaped by the leadership styles we see around us. It's important to bring clarity and your own perspective to your manager in order to help them understand your point of view"</em>.</p>
<p><img alt="Our panelists on stage, from left to right: Ana Peleteiro Ramallo (host), Tian Su, Joyce Chen" src="https://engineering.zalando.com/posts/2023/04/images/gender-equity-2.jpeg#center"></p>
<p>The panelists also discussed the importance of role models, allies, and mentoring in helping women to succeed in the workplace. Joyce stressed the need for sponsorship and support, and encouraged allies to speak up and amplify women's voices. Tian noted that her husband is her biggest ally, and that intentional outreach from colleagues who are men can also make a difference. Ana emphasized the importance of finding allies who understand you and are willing to listen.</p>
<p>The event then opened to a Q&A session, and the panel was asked how to build resilience and overcome unconscious bias. Ana stressed the importance of communicating your perspectives and raising your voice when necessary, while Tian suggested taking conversations to a 1:1 setting to create a safe and open environment. Joyce emphasized the need for transparency and training, starting from the interview stage.</p>
<p>Overall, the event was a great opportunity to share ideas and support women in the workplace. By continuing to have these conversations and advocating for change, we can work towards a more equitable and inclusive future for all. Thanks to the Zalando Women's Network and the Women in Tech Employee Resource Groups for organizing this session, and the panellists for sharing their experiences and thoughts with us!</p>
<hr>
<p><em>Find out more about our <a href="https://zln.do/402RHq3">tech teams</a>.</em></p>Applied Methods from Mathematical Optimization and Machine Learning in E-commerce2023-02-21T00:00:00+01:002023-02-21T00:00:00+01:00Amin Joratitag:engineering.zalando.com,2023-02-21:/posts/2023/02/gor-workshop.html<p>Report from a workshop hosted by Zalando in October 2022</p><p>Last year, Zalando hosted the 106th meeting of the <a href="https://www.gor-ev.de/">Gesellschaft für Operations Research e.V. (Germany Society of Operations Research)</a> working group on <a href="https://www.gor-ev.de/arbeitsgruppen/praxis-der-mathematischen-optimierung/praxis-der-mathematischen-optimierung-meetings">Practice of Mathematical Optimization</a>. The workshop took place October 6-7, 2022 at the Zalando Headquarters in Berlin.</p>
<h2>Applied Methods from Mathematical Optimization and Machine Learning</h2>
<p>Techniques from the field of mathematical optimization on the one hand and from machine learning on the other hand have been crucial components in delivering solutions to customers in the e-commerce industry. Serving over 50 million customers and delivering a quarter billion orders last year, Zalando, is one of the largest online retail stores in Europe.
Operating at such a large scale gives rise to a plethora of technical problems within these two fields that our applied scientists tackle across various teams. Thus, Zalando was uniquely positioned to host this workshop at the confluence of these two scientific fields, titled "Applied Methods from Mathematical Optimization and Machine Learning in E-commerce".
The workshop included a number of talks by representatives from industry and academia from all over Germany. The presentations included applications ranging from forecasting to network design, pricing, logistics, scheduling, and vehicle routing, among others. See <a href="http://www.gor-ev.de/wp-content/uploads/2022/10/PMO106-invitation.pdf">the full program</a> of the workshop for more details.</p>
<p>The event took place in hybrid mode with streaming available for virtual attendees and presenters.
The majority participants, i.e. around sixty, attended the event in person.
They took advantage of the various networking opportunities during coffee breaks, the conference dinner and a tour of the historic east-side gallery, the largest remaining section of the Berlin wall, right across from the workshop venue at Zalando headquarters in Berlin.</p>
<p><img alt="Group Picture" src="https://engineering.zalando.com/posts/2023/02/images/gor-workshop.jpg#center"></p>
<p>Applied Scientists from Zalando presented two different use-cases at the confluence of optimization and ML in the workshop. The pricing team gave a talk about challenges in large scale article discounting, while the logistics team made a presentation about stock distribution and its challenges.</p>
<h2>Pricing</h2>
<p>The pricing team is responsible for the science behind offering attractive prices to customers.
Their talk about <a href="https://github.com/zalando/public-presentations/blob/master/files/2022-10-06-GOR-PRT_presentation.pdf">Challenges in Large Scale Article Discounting</a> gave a glimpse in the
multitude of challenges that are connected to discounting for the entirety of Zalando's assortment.</p>
<p>Even with a proven machinery that manages to recommend millions of discounts under given business targets,
many pitfalls have to be circumvented.
We discussed the following complications and mentioned potential treatments.</p>
<h3>Forecasting Challenges</h3>
<p>The demand for niche articles, typically with just few sales per month, is hard to predict accurately.
Moreover, articles with many sizes, e.g. jeans with many length and width combinations, can behave like multiple separate articles: different customers consider purely their own size, which creates a demand only on certain sizes.
On top, some costs like shipping and returns are a mixed calculation based on the collection of articles handled together.</p>
<h3>Optimization Challenges</h3>
<p>An optimization model has to respect the business setup in its decisions.
Several constraints were created so that the model has to follow business decisions, e.g. the model has to sell to customers in a sales period even if it would be more profitable to keep items now for sales in the future.
Without them, it could be proposed to take an article offline for a certain period or prefer to sell stronger in countries where shipment costs are lower.
On a technical side, some optimization problems can be infeasible through incompatible business targets and require adjustment recommendations.</p>
<h3>Processes and Measuring</h3>
<p>Further consideration stem from the connected processes around pricing.
Matching competitors' prices, incorporating sales events and warehouse capacities
are crucial in order to recommend profitable discounts.
Ultimately, the impact has to be measured via A/B testing.
When it comes to pricing, we have to carefully set it up to rule out customer discrimination by different prices and to enable gathering valuable insights.</p>
<h2>Logistics</h2>
<p>The logistics team delivered a talk titled <a href="https://github.com/zalando/public-presentations/blob/master/files/2022-10-07-GOR-Alea-Kea-Waffle_presentation.pdf">Mathematical Optimization Meets Machine Learning to Optimize Stock Distribution</a>.
Zalando operates a network of interconnected warehouses and return centers serving its customer base across Europe. In order to best serve our customers we need to make our stock available to our customers where and when they desire it. This requires listening to our customers' demands and distribute stock across our network and within each facility accordingly. In this talk, we outlined the challenges at the core of this stock distribution problem and dived deep into some technical aspects.</p>
<h3>Demand Forecasting</h3>
<p>We model demand prediction as a time series forecasting problem at the individual article level for each of the markets we are active in for any given day. We produce probabilistic forecasts for each such problem using a deep recurrent neural network. Challenges abound in demand forecasting for the fashion industry where articles have fast turnover due to seasonality, the fast moving nature of fashion, and the diversity of trends in our vast customer base. This probabilistic demand forecast is used as input to solve two major optimization problems: (i) Item Network Distribution Problem: how best to distribute our stock across our facilities, and (ii) In-warehouse Item Relocation Problem: how best to position our articles within each facility.</p>
<h3>Item Network Distribution</h3>
<p>In the item network distribution problem, items are moved between warehouses: We need to ensure that for each country, the warehouses serving that country have the article assortment and stock quantities that best fulfill the country's expected demand. Our objectives are to maximize sales and minimize delivery times and costs. We discussed the algorithm currently used to make distribution decisions and presented some results.</p>
<h3>In-warehouse Item Relocation</h3>
<p>The in-warehouse item relocation problem is defined at the warehouse level. A warehouse contains various storage areas with different capacities and speed for collecting one item. Given a constant stream of incoming and outgoing items, we can relocate items between storage areas to achieve a distribution that is optimal for the demand reduced to a warehouse. We presented a formalization of the problem and prospective approaches to solve it.</p>How we manage our 1200 incident playbooks2023-01-31T00:00:00+01:002023-01-31T00:00:00+01:00Bartosz Ocytkotag:engineering.zalando.com,2023-01-31:/posts/2023/01/how-we-manage-our-1200-incident-playbooks.html<p>We consolidated our incident playbooks in September 2019. 1200 playbooks later...</p><p>At Zalando, we use Incident Playbooks to support our on-call teams with emergency procedures that can be used to mitigate incidents.
In this post, we describe how we structured incident playbooks, and how we manage these across 100+ on-call teams.</p>
<h3>Incident Playbooks - where are we now?</h3>
<p>We consolidated our incident playbooks as part of preparation for <a href="https://engineering.zalando.com/posts/2020/10/how-zalando-prepares-for-cyber-week.html">Cyber Week</a> in 2019. Fast forward to 2023 and we have over 1200 playbooks that our teams have authored.
Given the 850+ applications in scope for on-call coverage across 100+ on-call teams, that's 1.41 playbooks per application and ca. 12 playbooks per on-call team. The diagram below shows how our playbook collection has increased over the years. It's easy to see how Cyber Week preparations in Q3 of each year result in significant increases in the playbook collection.</p>
<p><img alt="Count of incident playbooks over time" src="https://engineering.zalando.com/posts/2023/01/images/incident-playbooks.png"></p>
<figcaption style="text-align:center">Count of incident Playbooks over time</figcaption>
<p>As expected, most applications have just a few playbooks. Below, you can see the number of applications per playbook count.</p>
<p><img alt="Number of applications per playbook count" src="https://engineering.zalando.com/posts/2023/01/images/playbook-count-distribution.png"></p>
<figcaption style="text-align:center">Number of applications per playbook count</figcaption>
<h3>What are incident playbooks?</h3>
<p>Our Incident Playbooks cover emergency procedures to initiate in case a certain set of conditions is met, for example when one of our systems is overloaded and the existing resiliency measures (e.g. circuit breakers) are insufficient to mitigate the observed customer impact. In such cases there are often measures we can take, though they will degrade the customer experience.
These emergency procedures are pre-approved by the respective Business Owner of the underlying functionality, allowing for quicker incident response without the need for explicit decision making while critical issues are ongoing.</p>
<p>Further, playbooks make incident response less stressful for colleagues on on-call rotations. Each on-call member takes the time to become familiar with the procedures and understands the toolbox they have available during incidents. New playbooks are reviewed by the on-call team, shared as part of on-call handover or operational reviews, and practiced in game days, or as part of preparation for big events.</p>
<p>The procedures document the <em>conditions</em> (e.g. increased error rates), <em>business impact</em> (e.g. conversion rate decrease), <em>operational impact</em> (e.g. reduction of DB load), <em>mean time to recover</em>, and the <em>steps</em> to execute. This structure allows all stakeholders involved in incident response to clearly understand the executed actions and target state of the system to expect. Lastly, by having playbooks in a single location, our Incident Responders and Incident Commanders have easy access to all available emergency procedures in a consistent format. This simplifies collaboration across teams during outages.</p>
<p>More often than not, our playbooks cover the whole system (a few microservices) instead of its individual components being covered through separate procedures. When the bigger system context is considered, there are more options available to mitigate issues.</p>
<p>When we started in 2019, we first focused on a collection of procedures that were already known, but not consistently documented.
Next, as part of the Cyber Week preparations we wanted to explore and strengthen the mechanisms we have in place to mitigate overload or capacity issues across the different touchpoints of the customer (e.g. product listing pages) and partner journeys (e.g. processing of price updates).</p>
<p>Let's consider two examples:</p>
<h4>1) Product Listing Pages (aka. catalog)</h4>
<p>Our <a href="https://en.zalando.de/womens-clothing/">catalog pages</a> integrate multiple data sources, such as teasers, sponsored products, and outfits.
Fetching data from all sources comes at increased costs compared to a simple article grid. Therefore, we have a set of playbooks that disable the different data sources in order to reduce the load on the backends providing the APIs and the underlying Elasticsearch cluster. The playbooks are sorted in such way that we apply the playbooks with least business impact first. In one of our evening Cyber Week shifts, we encountered performance degradation resulting in increased latencies, which was hard to diagnose. While one part of the team was busy troubleshooting the issue, another part of the team executed multiple of the prepared playbooks in sequence in order to mitigate the customer impact.</p>
<p>Example playbook for catalog:</p>
<ul>
<li><strong>Title</strong>: Disable calls for outfits in the Catalog’s article grid</li>
<li><strong>Trigger</strong>: High latency for fetching outfits for the article grid or High CPU usage for Elasticsearch's outfit queries</li>
<li><strong>Mean time to recover:</strong> 3 minutes after updating configuration</li>
<li><strong>Operational Health Impact</strong>: No more outfit calls from Catalog, reduced request rates to Elasticsearch by x%.</li>
<li><strong>Business Impact</strong>: Outfits won't be shown as part of the catalog pages.</li>
</ul>
<h4>2) Monitoring system</h4>
<p>Our monitoring system <a href="https://opensource.zalando.com/zmon/">ZMON</a> had a component ingesting metrics data and storing these in KairosDB TSDB, backed by Cassandra. Pre-scaling of the Zalando platform for Cyber Week peak workload resulted in a multi-factor increase in metrics pushed by the individual application instances, resulting in ingestion delays due to Cassandra cluster overload. To mitigate similar incidents, we developed a tiering system with three criticality tiers for the metrics, so that in case of overload of the TSDB, we could still ingest the most important metrics necessary to plot essential dashboards required to monitor the Cyber Week event. This playbook is still in place today, even though we changed our metrics storage.</p>
<p>Example playbook for ZMON:</p>
<ul>
<li><strong>Title</strong>: Drop non-critical metrics due to TSDB overload</li>
<li><strong>Trigger</strong>: Metrics Ingestion SLO is at risk of being breached (link to alert/dashboard)</li>
<li><strong>Mean time to recover:</strong> 2 minutes after updating configuration</li>
<li><strong>Operational Health Impact</strong>: Loss of tier-3 and tier-2 metrics. Only tier-1 metrics are processed, leading to 40% load reduction on the metrics TSDB.</li>
<li><strong>Business Impact</strong>: None</li>
</ul>
<h3>How do we author playbooks?</h3>
<p>We use documentation site built using <a href="https://www.mkdocs.org/">mkdocs</a> to host the documentation containing a description of the incident process and all playbooks. We generate the playbook directory structure based on our OpsGenie on-call teams. This way there is always a skeleton available for every team to contribute their playbooks to. When we started in 2019 we had a team of 3 reviewers, who as part of the playbook reviews were committed throughout the year to explain the purpose/guidance of the playbooks and align these to a common standard. With sufficient examples and knowledge spread across the organization, we switched to using <a href="https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners">CODEOWNERS</a> to delegate the reviews to representatives of the departments, skilled in operational excellence.</p>
<p>To remind new contributors about our playbook guidelines, we use a pull request template with a few check boxes as means for self-verification of playbook completeness. The 1st line of the template contains a TODO with a nudge for a 1-line summary of the changes. This proved to an easy way of providing reviewers with more context about the performed changes.</p>
<h3>Integrating playbook data with application reviews</h3>
<p>Aside from the information about triggers and impact for playbooks, we also collect additional metadata allowing us to integrate playbooks with our application review process:</p>
<ul>
<li>Application – links playbooks to the involved applications</li>
<li>Expiry date – allows to nudge teams to re-review playbooks that will expire soon</li>
</ul>
<p>To keep integration simple, along with the documentation, we also generate a JSON file with playbook metadata.
During the application review process it's indicated per application (from certain criticality tier onward) whether there are any playbooks defined for it and whether any of these are expired.</p>
<p>With time, we made it mandatory for applications of certain criticality to have an assigned playbook.
This partially increased the scope of the playbooks beyond the key emergency procedures while at the same time providing training to our engineers in the authoring of playbooks and thinking about the overload and failure scenarios that can occur.</p>
<h3>Summary</h3>
<p>When we initially created the incident playbooks site, maintenance of playbooks as markdown files was considered to be good means for ensuring consistency, but rather of temporary nature. To be consistent with our UI-driven application review workflow, we intended to manage playbooks in the same way.
Managing structured data in markdown is not ideal, despite the ability to use front matter for metadata.
However, managing playbooks in a code repository provides us with easy means for cross-team reviews using pull requests.
This key advantage keeps us from moving to a UI-driven workflow where such collaboration would be limited.</p>
<p>We can certainly recommend every team to think about the failure scenarios their systems can experience, for example as part of production readiness reviews or game days. Without them, there are several key incidents that would have had a markedly larger impact on our customer experience.</p>
<p>Imagining how to react to such scenarios by putting the system into a degraded state, trading off availability over customer experience, can spark interesting conversations about resilience mechanisms that can be built into the software. These conversations drive engineers to make changes to their design to fundamentally improve availability, or at least, to ensure their software facilitates easier intervention.</p>
<p>If used often enough, playbooks should be ideally automated.</p>
<hr>
<p><em>If you're as passionate about Software Engineering as we are, take a look at <a href="https://jobs.zalando.com/en/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B1%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B2%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B3%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B4%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Principal%20Engineering">open positions in our Engineering teams</a>.</em></p>How You Can Have Impact As An Engineering Manager2023-01-26T00:00:00+01:002023-01-26T00:00:00+01:00Gary Raffertytag:engineering.zalando.com,2023-01-26:/posts/2023/01/how-you-can-have-impact-as-an-engineering-manager.html<p>How Engineering Managers create impact and shape organisational culture</p><p><em>If you are a good leader,</em><br />
<em>Who talks little,</em><br />
<em>They will say.</em><br />
<em>When your work is done,</em><br />
<em>And your aim fulfilled,</em><br />
<em>“We did it ourselves”</em></p>
<p>- Lao-Tse</p>
<p>Last year, I <a href="https://engineering.zalando.com/posts/2022/07/growth-engineering-at-zalando.html">shared</a> how Zalando enables and supports the continued growth of our Software Engineers. The piece was written from a leadership perspective. A natural sequel to that would describe how our leaders are empowered. Specifically, I would like to provide my own perspective on how Engineering Managers can create impact and shape organisational culture.</p>
<h1>Team Structures</h1>
<p>To provide some context, Engineering Managers use the distinction between the <strong>“Team You Lead”</strong> and the <strong>“Team You Are On”</strong>. For the former, an Engineering Manager, is responsible for a single delivery team of Software Engineers or Applied Scientists. This is the team that they are leading. The latter refers to the Engineering Manager’s own team (their peer group that forms a department, and is led by a Head of Engineering).</p>
<h2>The Team You Lead</h2>
<p>I use the team you lead as the starting point to describe Engineering Management, because this, in my opinion, is the bread and butter of the role. Forming and leading a high-performing delivery team is no small feat. The team of individuals must collectively progress through the four stages of forming (purpose and raison d’etre), storming (sharing feedback, ideation, and defining roles within the group), norming (establishing ways of working and responsibilities), and performing (peak delivery and problem-solving). Take a look at Patrick Lencioni’s <a href="https://www.amazon.co.uk/Five-Dysfunctions-Team-Leadership-Lencioni/dp/0787960756">Five Dysfunctions of a Team</a> (or read the <a href="https://www.amazon.co.uk/Five-Dysfunctions-Team-Illustrated-Leadership/dp/0470823380/">Manga Edition</a> for a more illustrated journey) to peek into the complex problems that leaders need to resolve in order to keep their team healthy.</p>
<p>Engineering Managers are accountable for driving the delivery of projects from start to finish - encompassing the entire lifecycle of what the team builds, how they structure step-changes to systems, how they can monitor and measure the performance of said systems for operational excellence, and all the other ingredients that go into delivering effective software.</p>
<h2>The Team You Are On</h2>
<p>Beyond the team that they lead, I mentioned that Engineering Managers have another team, and this is their peer group. No two organisations are identical, but typically, multiple teams are grouped to form a department, which is fulfilling a part of the larger group strategy. This for me, is where the magic happens for Engineering Management, and it is where I encourage my direct reports to make the biggest impact.</p>
<p>Andy Grove <a href="https://www.amazon.co.uk/High-Output-Management-Andrew-Grove/dp/0679762884">defined</a> a Manager’s output as the output of her/his organisation, plus the output of neighbouring organisations under her/his influence. To put that in context, this is the output of the Team You Lead, plus the output of the teams of your peer group. For the sake of this post, I make the assumption that these teams are interacting, and I do this because “<em>A system is never the sum of its parts; it’s the product of their interaction</em>”.</p>
<h1>Interaction is Culture</h1>
<p>So, if the yield of a system is the product of how the parts interact, you might be wondering how Managers influence this.</p>
<p><em><strong>Culture has entered the chat...</strong></em></p>
<p>Culture is how work happens between people and between teams, which sounds simple, but culture is complex, and takes considerable time and effort to instil.</p>
<p>I recently read a great description of culture, which hypothesised that culture is composed of behaviour, processes, and practices. Let’s take a look at each, and hone in on the Manager’s role within.</p>
<h2>Behaviour</h2>
<p>A well known <a href="https://rework.withgoogle.com/print/guides/5721312655835136/">study</a> of engineering team effectiveness from Google, named Project Aristotle, identified the common elements of their best teams, and at the top of that list, was Psychological Safety. Psychological Safety “...refers to an individual’s perception of the consequences of taking an interpersonal risk”. If we strip this down to bare metal, it is referring to how comfortable, and encouraged, team members are to speak up, to give their opinions, and to support one another.</p>
<p>Engineering Management is not about dictating what our engineers do, nor is it about having all the answers to the hard questions. Similarly, engineers are not blindly following instructions, nor are they viewed as code labourers. Instead, Engineering Management is about creating an environment that sets clear expectations and goals, encourages voices and opinions, destigmatizes failure, encourages diverse thinking, and supports the individual growth of each team member.</p>
<p>To accomplish this, Engineering Managers are provided with the autonomy to support their teams and to enable success as they know best. They should be guided by Our Founding Mindset (OFM), but be led by their own experience and know-how.</p>
<p>Achieving this within the Team You Lead is one thing, but the key is achieving this across the wider scope of the teams within your influence. This requires customer-first thinking, working backwards from the organisational goals, and ensuring that all teams have enough information and support to achieve their target. In other words, putting purpose over ego, and doing what’s right for the organisation and the customer.</p>
<h2>Processes</h2>
<p>A successful organisation is driven by autonomous, and empowered teams. Peak inside each of these teams and you will find a diverse collective of talented, ambitious, and driven individuals. We are actively shaping the Zalando of the future by hiring great people with high potential. Our Engineering Managers are responsible for contributing to, and defining, the processes that will enable these teams of individuals to succeed.</p>
<p>Processes at Zalando are constantly evolving; responding to the ever-changing landscape in which we operate. In order to successfully equip an organisation with the necessary processes for momentum, decision making, and enablement, our Engineering Managers are required to collaborate with other leaders across multiple disciplines and job families, such as Principal Engineering, Product Management, Technical Program Management, and Design.</p>
<p>Perhaps they might be collaborating with Talent Acquisition Partners to refine the candidate experience during the hiring process or creating a Mentorship program. In other cases, they might be contributing to a cross-functional working group to define KPIs to measure progress relative to the Group Strategy. Perhaps they might be supporting the <a href="https://engineering.zalando.com/posts/2020/10/how-zalando-prepares-for-cyber-week.html">Cyber Week preparations</a>. You get the idea. These are just four examples that my cohort of Managers have been working on recently, however, they all share the running theme of intrapreneurial spirit - embodying our “Act Like an Owner” founding mindset. Making things happen throughout the organisation that ultimately become a tail-wind for impact.</p>
<h2>Practices</h2>
<p>If the purpose of processes is to shape the environment such that group thinking and empowered decision making is supported, then practice is the more granular day to day activities that sit atop the processes. These practices help Engineers to get things done.</p>
<p>As before, if we take the team you lead as the base, the Engineering Manager is responsible for working with their team to define fruitful ways of working that embrace best practices and foster collaboration. This will take time, especially for a newer team, but through trial and error, you will find that sweet spot.</p>
<p>When we hone in on practices beyond the team, we see wider collaborations across disciplines to get things done collaboratively across the department.</p>
<p>Practices, in my opinion, are the catalyst for helping Engineering Managers to understand how to scale themselves, by delegating and supporting the individuals on their team to step up and take on more responsibility. If we take a look at Communities of Practice, Operational Review Meetings, or Guilds, we typically see Engineers taking more of a leading role in establishing these practices, but in order to do this, our Engineering Managers are playing more of a supporting role. We are identifying opportunities and matching those to individual goals and aspirations. We are setting those individuals up for success by coaching, providing feedback, utilising training and development budgets, and stepping back to let them drive.</p>
<p>As individuals are growing into these responsibilities, it is important to nurture experimentation, to celebrate successes and failures, and most importantly, to provide the context (the why) of how these practices are related to the bigger picture.</p>
<h1>Conclusion</h1>
<p>Engineering Managers are responsible for steering and enabling a high-performing team of engineers, but their scope of influence and impact extends far beyond the realms of the team. Managers help to shape the behaviours, the processes, and the practices of the organisation to yield, and foster, a culture of innovation, delivery, empowerment and drive. This culture is what enables organisations to succeed in our non-linear world.</p>
<p>The Harvard Business Review recently published a <a href="https://hbr.org/2022/12/to-retain-your-best-employees-invest-in-your-best-managers">terrific article</a>, stating that in order to retain your best employees, you need to invest in your best managers. This article resonates with my own view that the success of an Engineering Organisation is greatly supported by our Engineering Managers - the ones who are close enough to the metal to implement culture, yet elevated enough to encompass a broad scope of influence, and provided with enough autonomy to innovate for the organisation.</p>
<p>I would like to finish this article off with an extract from our Role Expectations for the Management track:</p>
<p><em>“Great managers come in all shapes and sizes. There is no ‘checklist’ for leadership … No leader can do everything - some will exceed in certain capabilities while others will exceed in a different combination - this is OK and intended”.</em></p>
<hr>
<p><em>If you're interested to work with us, take a look at our <a href="https://jobs.zalando.com/en/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Leadership">open positions for Engineering Managers</a>!</em></p>More Editorial Content, please.2022-09-29T00:00:00+02:002022-09-29T00:00:00+02:00George Evanstag:engineering.zalando.com,2022-09-29:/posts/2022/09/editorial-content.html<p>Building a CMS for the Zalando Fashion Store</p><p><img alt="Zalando and Editorial Content Logo" src="https://engineering.zalando.com/posts/2022/09/images/editorial-content-logo.png#previewimage"></p>
<p>At Zalando, serving engaging content across the user journey has become increasingly important for multiple teams within the company. This required a scalable, feature-rich and easy-to-use solution, that was flexible enough to adapt to the ever-changing requirements for rich content.</p>
<p>In this post, George and Daniel describe the product that was built to serve this purpose - its problem space, the solution design process, the technological context and how the product evolved to include new use-cases, such as the Zalando Sustainability topic.</p>
<h2>Problem Space: The need for a flexible content solution</h2>
<p>The Zalando Fashion Store is first and foremost a platform to help our customers find the products they want, and it employs various strategies to personalise the experience for each customer. Zalando also aims to inform and inspire, and many of our internal teams and brand partners sought to do this by telling stories, via <em>editorial</em> content.</p>
<p>This is where "editorial landing pages" come in as static, self-contained web pages on the Zalando site containing a range of content. Landing pages are often tied in to products and brands, but not always with conversion as the primary focus. They include <a href="https://en.zalando.de/campaigns/nike-my-kinda-play-w/">awareness campaigns from key brands</a>, inspiration for a clothing category like <a href="https://en.zalando.de/campaigns/outdoor-w/">outdoor</a>, or informative pages for key Zalando initiatives like <a href="https://en.zalando.de/pre-owned-w">Pre-owned</a>, or <a href="https://en.zalando.de/about-sustainability/">sustainability in fashion</a>.</p>
<p>When George's team first started working on the topic of landing pages for Zalando Marketing Services (ZMS) campaigns, there was a legacy tooling for the creation & management of such pages already in place. However, it had many limitations affecting scalability. Also, it was based on Zalando's "Mosaic" system architecture, which was being phased out in favour of the newer <a href="https://engineering.zalando.com/posts/2021/03/micro-frontends-part1.html">Interface Framework</a>. So the team decided to build a new tool to replace the old, overcome the feature and scalability related shortcomings, on top of this new architecture.</p>
<h3>Core Requirements</h3>
<p>The shortcomings and pain-points of the previous tool became the basis of the requirements for what the team would build:</p>
<ul>
<li><strong>Ease of Use / Scalability</strong> - The previous solution required significant engineering effort to set up each page, before Content Managers could upload the content. This was ineffecient and a clear bottleneck to scalability. Therefore, the new tool should allow Content Managers to create pages, upload and publish content with no engineering involvement.</li>
<li><strong>Content Flexibility</strong> - With the previous tooling, once a page was set up, the layout could not be changed without resetting it, which would cause any content uploaded to be lost, creating a lot of repeated work. The new tool should allow the flexibility to change the layout, add and remove content, whilst preserving existing content.</li>
<li><strong>Parity with the Zalando App</strong> - In the previous tooling, web and Zalando app pages were entirely separate - they had different content formats that looked quite different with content for each being uploaded separately. This created a lot of duplicate work, both in asset creation and upload. The new tooling should allow for a single source of content, and mirror its appearance across web & app.</li>
<li><strong>Localisation</strong> - Zalando operates in 25 different markets, requiring content for a given page to be localized into several languages. The previous process for this was cumbersome and confusing, effectively repeating the content-upload for each language. Our goal was to streamline this into an efficient, user-friendly process.</li>
<li><strong>Extensibility</strong> - Creating new, engaging experiences was a key part of the ZMS use case, so we wanted a setup that would facilitate the development of new content formats. After the initial rollout, other teams also showed an interest in this capability, so creating a streamlined contribution model became a priority.</li>
<li><strong>Interface Framework</strong> - To integrate the tool with Zalando's new architecture and design system, to leverage its capabilities and scale with it.</li>
</ul>
<h2>Solution Design</h2>
<p>The first decision to make was whether to build a new CMS from scratch, or use an existing, third-party solution. We needed something flexible enough to adapt to our precise requirements, but we were also conscious that trying to reinvent the wheel by building our own CMS could grow into a project with limitless scope that we would never finish.</p>
<p>After researching many third-party CMS solutions we decided to go with <a href="https://www.contentful.com/">Contentful</a>, a headless CMS - 'headless' since it is agnostic about the 'how' of presenting content to the end user. Instead, it focuses on making the content management process as easy and intuitive as possible. The content is delivered via an API to the presentational layer, e.g. directly to an app, a static site generator such as next.js or any user consumer channel, such as Zalando's micro-service-based architecture in our case. What won us over was how flexible and scalable it is in terms of what content could be served, as well as the ease with which the CMS UI could be extended with custom apps. It also had strong multi-language support out of the box, and enabled collaboration in bigger teams.</p>
<h3>System Architecture Context</h3>
<p>Let's have a closer look at the technology context into which our solution needed to fit and how a request to a landing page would be processed, finding its way from the content consumer all the way to Contentful:</p>
<ul>
<li>
<p>There are two main consumer platforms: web and app. Our <a href="https://github.com/zalando/skipper">Skipper</a> routing service takes care of matching the request URL with the correct internal service endpoints and HTTP header enrichment:</p>
</li>
<li>
<p>Both platforms are serving a requested landing page via our <a href="https://engineering.zalando.com/posts/2021/03/micro-frontends-part1.html">Rendering Engine</a>, which fetches data for each UI element via a GraphQL query using our GraphQL aggregator, the <a href="https://engineering.zalando.com/posts/2021/03/how-we-use-graphql-at-europes-largest-fashion-e-commerce-company.html">Fashion Store API (FSA)</a>.</p>
</li>
<li>To enable data fetching for our landing pages, George's team built a data proxy service. This sits between FSA and Contentful's API, and handles content mapping & caching. This approach also ensures resilience and that the aggregation layer calls directly only Zalando-operated APIs.</li>
<li>To integrate additional content from Zalando services into the Contentful CMS, a simple content aggregator was built.</li>
</ul>
<p><img alt="System architecture context relevant for the Landing Page stack" src="https://engineering.zalando.com/posts/2022/09/images/system-architecture.png#center"></p>
<figcaption style="text-align:center">System architecture context relevant for the Landing Page stack</figcaption>
<p><br/></p>
<h3>Content Data Model</h3>
<p>The actual content of a landing page is managed within Contentful as "entries"; each entry-type having its own data schema definition, validation rules and a content-upload UI for the content editors.</p>
<p>The main entry is the landing page itself. It has basic fields like the page title, the URL path and SEO related metadata. It also has a reference list to sub-entries or "modules" - preset content formats such as banners, text blocks, a product carousels etc, or more bespoke formats such as list of sustainability certifications with background information. These can be composed using a drag-and-drop UI to build a landing page layout, and then the necessary content can be uploaded for each one. They can be rearranged/edited at any time, without then need to re-upload existing content.</p>
<p><img alt="Contentful modules screenshot" src="https://engineering.zalando.com/posts/2022/09/images/contentful-lp-modules-screenshot.png#center"></p>
<figcaption style="text-align:center">Landing page modules as arranged in Contentful</figcaption>
<p><br/></p>
<p>When a landing page request reaches FSA, it in turn calls the Contentful proxy service, which returns the data for the page and each of its modules. These are map to corresponding 'renderers' in the Rendering Engine, which render the UI components.</p>
<h3>A sustainable solution: extensibility and contributions from other teams</h3>
<p>The Sustainability Team was one of the first interested parties to reach out to George's team early on in the implementation phase. They were seeking a way to display information on the various aspects of Sustainability in fashion in an engaging way. Although this content typically exists more permanently than the short-lived marketing campaigns for which the landing pages system was primarily intended, the overlap of the problem space and requirements was significant enough to make for a beneficial collaboration.</p>
<p>Extension and adaptions were needed however, both regarding orthogonal aspects (like SEO support or a content review and approval workflow) as well as for specific presentational features.
In particular, the addition of the latter in form of self-contained new modules demonstrated that the new system is flexible enough to enable contribution from other teams.
Among the additional modules added by the Sustainability team was one showing details of the sustainability certificates Zalando supports on the product level.</p>
<p>Let's use this module to make the stack as described in the previous section a bit more tangible.</p>
<h4>The Sustainability Certificate module</h4>
<p>The purpose of the certificate module is to present a list of sustainability related certificates to our customers.
A sustainability certificate acts as the proof for sustainability related claims about a product. They can be either a 3rd party certificate like Fairtrade or GOTS or one of the criteria Zalando provides, e.g. 'Made with 70-100% recycled materials'.
On a landing page, each certificate needs to be shown with three content pieces:</p>
<ul>
<li>logo</li>
<li>title</li>
<li>description text</li>
</ul>
<p>Additionally, the whole certificate module has two headlines and an introduction text block.</p>
<p><img alt="Sustainability Landing Page - Certificate Module" src="https://engineering.zalando.com/posts/2022/09/images/lp-cert-module-screenshot.png#center"></p>
<figcaption style="text-align:center">The Certificate Module on a Sustainability related Landing Page</figcaption>
<p><br/></p>
<p>One interesting aspect of the module is that it gets its content not solely from Contentful, but partially from another Zalando service already delivering data for another customer touch point: the Sustainability accordion of the Product Detail Page.</p>
<p><img alt="Certificate on Product Landing Page" src="https://engineering.zalando.com/posts/2022/09/images/pdp-cert-screenshot.png#center"></p>
<figcaption style="text-align:center">A Sustainability Certificate on a Product Landing Page</figcaption>
<p><br/></p>
<p>Using a single source for sustainability information is valuable not only for making the life of our Content Editors easier (especially when considering the number of supported languages), but also because it's important to show accurate and up-to-date information about Sustainability claims across the whole customer journey.</p>
<p>For that reason, the Contentful data model of the module looks like this:</p>
<ul>
<li>Title</li>
<li>Subtitle</li>
<li>Overall intro description text block</li>
<li>list of certificate IDs (the list and order of certificates to show can vary from landing page to landing page)</li>
</ul>
<p>These fields are delivered via the Contentful proxy to the Fashion Store API (FSA) where the certificate IDs are enriched with the values for the logo url, title, and description in the same way as is done for requests from the Product Detail Page. The certificates are delivered to the clients by FSA in the field <code>entities</code> which is part of the <code>Collection</code> type in the GraphQL schema.</p>
<p>This ensures that the certificate detail information on Landing Pages and on the Product Detail Pages are always in sync.</p>
<p><strong>query</strong>:</p>
<div class="highlight"><pre><span></span><code><span class="k">query</span><span class="w"> </span><span class="nf">collection_certificates</span><span class="p">(</span><span class="nv">$id</span><span class="p">:</span><span class="w"> </span><span class="nb">ID</span><span class="p">!,</span><span class="w"> </span><span class="nv">$first</span><span class="p">:</span><span class="w"> </span><span class="nb">Int</span><span class="p">!)</span>
<span class="err">@</span><span class="nf">component</span><span class="p">(</span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">"</span><span class="nc">re</span><span class="err">-collection_certificates"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">collection</span><span class="p">(</span><span class="n">id</span><span class="p">:</span><span class="w"> </span><span class="nv">$id</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="n">title</span>
<span class="w"> </span><span class="n">subtitle</span>
<span class="w"> </span><span class="n">description</span>
<span class="w"> </span><span class="n">entities</span><span class="p">(</span><span class="n">first</span><span class="p">:</span><span class="w"> </span><span class="nv">$first</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">certificates</span><span class="p">:</span><span class="w"> </span><span class="n">nodes</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">__typename</span>
<span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="nc">SustainabilityCertificate</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="n">title</span>
<span class="w"> </span><span class="n">description</span>
<span class="w"> </span><span class="n">logo</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">uri</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p><strong>response</strong>:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">"data"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"collection"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ern:collection:fwd:component:xyz"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Background check"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"subtitle"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Sustainability criteria you can trust"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Look for certificates like these to see..."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"entities"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"certificates"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"__typename"</span><span class="p">:</span><span class="w"> </span><span class="s2">"SustainabilityCertificate"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ern:sustcertificate::xyz"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"GOTS - organic"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Global Organic Textile Standard (GOTS) is..."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"logo"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"uri"</span><span class="p">:</span><span class="w"> </span><span class="s2">"[...]/sustainability/logos/gots-2.png"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="err">...</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>When implementing this module, we had to touch the following components of the Landing Page stack:</p>
<ul>
<li><strong>Contentful</strong>, to add the new data model</li>
<li><strong>Contentful proxy</strong>, to map the new Contentful model to the <code>Collection</code> type of the GraphQL schema in the Fashion Store API</li>
<li><strong>UI components</strong> for app and web platforms</li>
</ul>
<p>Overall, the implementation of the additional modules by the Sustainability team was a successful example of <a href="https://www.oreilly.com/library/view/adopting-innersource/9781492041863/ch01.html">inner sourcing</a>.</p>
<h3>Impact of the Content Management Tool</h3>
<p>Once the new tool was rolled out, it had a substantial impact on the efficiency of landing page content management:</p>
<ul>
<li>The average landing page time-to-go-live, from page creation, content upload to publish, was reduced from 2 days to 4 hours.</li>
<li>In the previous set-up, we had to impose a 2-week lead time from page briefing to go-live, to allow for content upload issues & QA etc. With the new solution, this lead time has been removed entirely.</li>
<li>The new tool requires no engineering involvement in the creation & publishing of landing pages - non-technical stakeholders can complete the process end-to-end themselves.</li>
<li>The same was also true for changes to the layout of existing landing pages. Previously requiring engineering involvement, and re-upload of <em>all</em> the content again, now this can be achieved by simply reordering the modules, or adding/removing new ones as needed.</li>
<li>The landing pages and all modules on them are mirrored across web and app, from a single point of upload, rather than two distinct pages, cutting the briefing and upload workload in half.</li>
<li>Since rollout, we've seen an 82% increase in the number of landing pages published YoY.</li>
</ul>
<h3>Conclusion and Next Steps</h3>
<p>In conclusion, if we assess the impact of the new tool against the original requirements, we think it’s fair to call the project a success. We implemented a tool that allows non-technical stakeholders to create and manage landing pages end-to-end, with greatly reduced effort, and that takes advantage of Zalando’s new Interface Framework.</p>
<p>Perhaps the most promising achievement is that one of the key aims of the tooling was to facilitate the addition of new features and iterations to continuously improve the landing pages offering. We feel this was achieved, as since the rollout many such features have been added, such as new content formats like the aforementioned Sustainability Certificates module, or process improvements like an adaptive streaming video solution which allows us to deliver longer video content with seamless playback, or image editing capabilities within the CMS to streamline content upload.</p>
<p>The ability to add these improvements gives us confidence that the tooling will remain adaptable enough to serve our ever-changing needs in the long term.</p>
<hr>
<p><em>If you would like to work on similar problems and helps us improve how rich, editorial content is delivered to our customers, please consider joining our <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Product%20Design%20%26%20User%20Research&filters%5Bcategories%5D%5B1%5D=Applied%20Science&filters%5Bcategories%5D%5B2%5D=Software%20Engineering&filters%5Bcategories%5D%5B3%5D=Product%20Management%20%28Technology%29&filters%5Bentities%5D%5B0%5D=zms">Engineering Teams</a></em> at Zalando Marketing Services (ZMS).</p>Growth Engineering at Zalando2022-07-26T00:00:00+02:002022-07-26T00:00:00+02:00Gary Raffertytag:engineering.zalando.com,2022-07-26:/posts/2022/07/growth-engineering-at-zalando.html<p>How we enable growth for engineers at Zalando</p><p>We recently closed out our annual performance review for employees. Naturally, this
period is for us to focus on how we are performing, what we aspire to achieve, and
how we can progress towards those goals, with the support of our leads.</p>
<p>As a leader, I’ve spent a great deal of time working with Software Engineers on their
development, and helping them to drive their career progression. These conversations
and discussions are usually driven by the engineer, with managers playing a guiding and
supporting role, and typically consist of self-reflection, ideation, motivation, and
the culmination of a development plan.</p>
<p>I thought that it might be helpful to share some notes on a few of the ways that we enable
growth for Engineers at Zalando.</p>
<h2>Role Expectations</h2>
<p>A standard progression for an engineer is from Junior to Mid to Senior. Unfortunately,
aside from the title, we (and I include myself from my own engineering days), are not always
completely clear on what the differences are between the levels. In order to progress as a
Software Engineer, it is imperative that we understand the expectations at each level.</p>
<p>At Zalando, all of our engineers are provided with a copy of our Software Engineering
Role Expectations. This document, very clearly defines the expectations per grade across a
wide range of functional areas, such as <strong>Scope</strong>, <strong>Delivery & Impact</strong>, <strong>Community Contributions</strong>.</p>
<p>Moreover, the expectations very clearly describe the requirements for advancing to the next grade.
A common activity for engineers reviewing their performance is to look at the functional areas on
their current grade, and the grade above, and with the help of their lead, to perform a RAG
assessment on their performance. This will usually shine a spotlight on areas for growth, and also
shine a light on strengths that should be doubled down upon.</p>
<p>A concrete role expectations document is something that I would have greatly benefited from
whilst coming up as an engineer.</p>
<p><strong>Alice</strong>: <em>"Would you tell me, please, which way I ought to go from here?"</em></p>
<p><strong>The Cheshire Cat</strong>: <em>"That depends a good deal on where you want to get to."</em></p>
<p><strong>Alice</strong>: <em>"I don’t much care where."</em></p>
<p><strong>The Cheshire Cat</strong>: <em>"Then it doesn’t much matter which way you go."</em></p>
<p><strong>Alice</strong>: <em>"...so long as I get somewhere."</em></p>
<p><strong>The Cheshire Cat</strong>: <em>"Oh, you’re sure to do that, if only you walk long enough."</em></p>
<h2>Performance Reviews</h2>
<p>I mentioned in the introduction that we have recently concluded our most recent performance review.
Performance reviews of some shape and form are relatively standard practice across the industry,
but no two systems are the same.</p>
<p>Our reviews are held annually, with a half-yearly check-in*. The reviews provide an opportunity for
employees to receive rounded feedback, which incorporates inputs from their peers, stakeholders, and lead.
In addition, it requires self-assessment. The self-assessment is particularly important.
We are all responsible for owning our careers.</p>
<p>The performance reviews serve to:</p>
<ol>
<li>Recognise and celebrate their contributions over the last period.</li>
<li>Identify their strengths and the areas that they shine in.</li>
<li>Highlight any development areas or blindspots.</li>
<li>Calibrate these elements relative to the aforementioned role expectations.</li>
<li>Develop a goal and milestones to work towards over the course of the next review period.</li>
</ol>
<p>I personally cherish the development areas, and love to hear where I can push myself more,
and course correct any bad habits or issues (we all have them).</p>
<p>*Growth and progression is a constant and ongoing collaboration between you and your lead, but the
actual timelines for the official review periods are annually and half-yearly.</p>
<h2>Continuous Feedback</h2>
<p>When I started out my career in engineering, one of the exciting aspects was the tight feedback loop.
Using the REPL or compiler, I could quickly validate my solution. Tight feedback loops allow us to
quickly course correct when something is wrong, but also provide a nourishing hit of endorphins when things
go well.
This supercharged-catalyst approach is something that we use for the delivery of continuous feedback at Zalando.</p>
<p>One of our values is <a href="https://jobs.zalando.com/en/our-founding-mindset/">High challenge, high support</a>, which states that</p>
<p><em>Feedback is a gift. We give and receive honest and timely feedback. At the same time, we provide each
other with support, and we care about the person beyond their role.</em></p>
<p>The use of the word timely is critical here. The best time to provide feedback, especially critical,
is when the action is fresh in the mind. This is when context is plentiful and crystal clear.
My lead never waited until our next 1:1 to provide me with feedback, and this is something that I have continued.</p>
<h2>Mentoring <em>(noun)</em></h2>
<p><em>the practice of helping and advising a less experienced person over a period of time,
especially as part of a formal programme in a company, university, etc.</em></p>
<p>Mentoring is everywhere in Zalando. We have many official mentoring programmes (some are company wide,
others are nurtured within departments), and we also have many unofficial mentoring relationships.
During my tenure, I have benefitted from being a mentor, and a mentee.</p>
<p>Typically, for early stage engineers, seeking out an experienced mentor is a great way to broaden
their network, to gain experience, and to accelerate their growth. Your mentor will likely be from a
different team or business unit, so they can offer a more diverse approach to problem solving and development.</p>
<p>For our more tenured engineers, and especially those who are progressing towards Senior Engineering,
mentoring a less experienced engineer* helps to prepare you for the seniority expectations such as
coaching, guiding, providing feedback, and paving the way for a new generation.</p>
<p>*I have witnessed some success stories where engineers have mentored non-engineers and helped
them to secure their first engineering role.</p>
<h2>Personal Development Budget</h2>
<p>We provide our engineers with a healthy personal development budget, which can be used for learning materials,
educational resources, training and certifications, and the like.
Every person is unique, and whilst you might prefer to upskill using sites like Coursera, I might prefer to
read a book on a particular topic, or to join a local study group.</p>
<p>Personal development is certainly not limited to technical skills, and should also include soft-skills, and
other attributes that shape a well-rounded career. A personal example. I recently sought to improve my public
speaking skills and took an eight week online course on Presentation Skills. The course was aimed at individuals
who often need to speak to groups, and who find it uncomfortable. To my surprise, the cohort consisted of quite
a few engineering leaders.</p>
<p>Courses and activities like these can be cost-prohibitive to some, and having the investment of your company
to support you is a huge boost to your development.</p>
<h2>Missing it? Make it Happen!</h2>
<p>Another one of our values is <a href="https://jobs.zalando.com/en/our-founding-mindset/">Act like an owner</a>, which states that</p>
<p><em>“Ownership” is about being responsible to our customers, partners and colleagues, not about being entitled.
We own our destiny and are not stopped by circumstances: Zalando is what you make of it.</em></p>
<p>We are all encouraged to take ownership of our careers and development. One such example of this is the large
number of communities and groups that were founded and run by engineers. In my particular department,
I have seen people create and run React meetups, Book Clubs, Podcasts, Show & Tells, Hackathons, etc. At one point
in time, these forums did not exist - an engineer wanted to attend one, and so they took ownership and created it.</p>
<p>Founding and organising such initiatives is no small feat, and you can be sure that the creators developed many
skills along the way.</p>
<p>Organisations are ever evolving, and don’t come equipped with everything that you would like. If there’s something
that you want, then go and make it happen.</p>
<h2>Support, Support, Support.</h2>
<p>I have been incredibly fortunate to work with leaders and peers who support my growth and development. They have
provided me with open and honest feedback on what I am doing well, and of course, what I am doing not so well.</p>
<p>Growing within an organisation with such a deeply woven culture of supporting one another is surprisingly easy.
Our engineers’ growth and engagement is a top priority for our leadership cohort, and they have our full support
for unlocking their potential.
Support isn’t sugar-coated, and sometimes that means having difficult conversations, but we do this to set you up for success.</p>
<hr>
<p><em>Do you like growing engineering talent and building high performing engineering teams? Consider joining Zalando as an <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B1%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B2%5D=Applied%20Science&filters%5Bcategories%5D%5B3%5D=Product%20Design%2C%20User%20Research%2C%20Content%20Design&filters%5Bcategories%5D%5B4%5D=Product%20Management%20%28Technology%29&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B7%5D=Software%20Engineering%20-%20Frontend&filters%5Bcategories%5D%5B8%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B9%5D=Software%20Engineering%20-%20Machine%20Learning&filters%5Bcategories%5D%5B10%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B11%5D=Software%20Engineering%20-%20Principal%20Engineering&search=%22engineering%20manager%22">Engineering Manager</a>.</em></p>An Introduction to the Zalando Design System2022-07-21T00:00:00+02:002022-07-21T00:00:00+02:00Andrea Morettitag:engineering.zalando.com,2022-07-21:/posts/2022/07/an-introduction-to-the-zalando-design-system.html<p>A high level overview of the elements composing our Design System and a brief history of how we got from an idea to full adoption.</p><h1>Yet Another "What is a Design System?"</h1>
<p>There is a lot of literature and countless blog posts around the very definition of the concept of design systems. In this post, we'd like to look at it from an engineering perspective and describe the journey from the initial idea to the complete adoption here at Zalando.</p>
<p>You can also find more information about the creation process from a design point of view in <a href="https://medium.com/zalando-design/the-label-part-1-redesigning-our-visual-identity-a468cad9d6f2">this blog post</a>.</p>
<p>At its core, a Design System is a collection of specifications describing a set of design primitives, reusable components, and arbitrary guidelines to ensure consistency and visual identity.
Given such a broad definition, there are no fixed rules when it comes to technical implementation, but some patterns started to emerge in the industry.</p>
<h2>Implementation-less Design System</h2>
<p>How a Design System is implemented into a reusable library is highly influenced by the specific business use case, technologies and frameworks used, platforms to support, as well as teams and company wide processes and structure.
In a very large company with many different products and a diverse panorama of tech stacks, providing a single solution that suits every context may become extremely difficult, if not impossible. On the other hand, visual consistency and brand identity are likely to still be a requirement.</p>
<p>A radical, but common, approach in these use cases is not providing an implementation at all.
The Design System is defined via a strict set of platform and technology agnostic definitions. Different teams/products/departments can implement their own library using the best tool for the job as long as the specifications are respected.</p>
<h2>Design Tokens</h2>
<p>Relying exclusively on a set of specifications offers more flexibility. However, as more and more implementations are developed, the problem of guaranteeing that they are in sync with the latest specs becomes increasingly hard.</p>
<p>A step toward increasing consistency without sacrificing flexibility is to provide a set of core variables and assets to be used across implementations.
Those variables, called tokens, represent all the shared values that will help us maintain consistency across our system.
Some practical examples are color palettes, spacing, typography, and assets like logos, icons, etc.</p>
<p>Design Tokens are usually maintained in a centralised place and via some tooling they are converted into different formats to be consumed by a vast array of different platforms.
Every independent implementation will use the latest version of those tokens as the only source of truth for the core variables and assets used.
With such a setup, we can quickly roll out changes to Design System core elements across an arbitrary number of implementations.</p>
<h2>The Single Component Library</h2>
<p>The term "Design System" is often used as a synonym for a component library.
While it is true that one of the practical implementations of a Design System is one of such libraries, overloading the term is a practice that may turn out to be counter-productive.
A lot of emphasis is given to the technicalities of how the different components are developed in a specific architecture, glossing over the Design System's core goals, which are to enforce a visual consistency and identity while reducing the maintenance costs.
These fundamental aspects are instead often relegated to vague concepts of default styles or custom themes.</p>
<p>The confusion of those terms is easy to understand: in many cases the one single component library is the main contact point between the Design System as a concept and its practical consumers.
Referring to this contact point with the “design system” term is an understandable shortcut.
Regardless of the terminology, we are dealing with very different concepts. For example, a Design System can exist without a component library, the same way a component library can be abstract enough to not enforce any visual identity.</p>
<h1>The Zalando Implementation for the Web Platform</h1>
<p>Our design system was initially conceived and developed roughly at the same time with its web platform implementation, this gave us the opportunity to gradually adopt certain technical decisions with a very tight feedback loop during a <a href="https://engineering.zalando.com/posts/2021/03/micro-frontends-part1.html">major visual and architecture redesign</a>.
In retrospect, that was both an advantage and a disadvantage: starting from scratch gave us the freedom to make the choices based on suitable use cases without the constraints of a legacy live system.
On the other hand, the lack of a complete set of specifications led to many changing requirements that naturally caused a certain amount of refactors and duplicated work.</p>
<p>Overall it was an extremely interesting challenge and I would like to share some of the learnings and decisions we encountered on the way.
As a first step, we identified some of the functional requirements we could foresee based on past experience and current business needs.</p>
<h3>Team Autonomy</h3>
<p>A high level of autonomy has consistently been reinforced by Zalando, even after years of change and growth.
Different teams, especially on the customer-facing side, own specific parts of the experience and expect to independently develop new features without being blocked by overly centralised teams and architectures.</p>
<h3>Speed</h3>
<p>In every meaning of the word, we knew that speed would have been a requirement.
From the performance of the components, to the ability to quickly iterate over existing implementations, provide new features, and avoid, as much as possible, becoming a bottleneck for other teams.</p>
<h3>Consistency</h3>
<p>One of the key metrics to evaluate the success of a Design System is the consistency and identity of the final customer-facing product.
From a technical perspective, there are always some trade-offs between consistency, speed, and flexibility.
While it can be complex, if not impossible, to maximize all of them, we tried to incentivize the "consistent way" by making it the easiest and fastest option whenever possible.
We still had to consider possible escape hatches for certain edge cases, but we wanted the most obvious and simple option to be the one providing the highest level of consistency.</p>
<h3>Consider Other Platforms</h3>
<p>While our main focus was to support the web platform, we decided from the beginning to identify opportunities to maintain a certain level of code sharing across platforms.
Some variables could be shared across all platforms, part of the CSS used on the website may be used for emails, some teams may want to use a different JS framework.
Those are some of the possible use cases we thought could arise at some point. While we didn’t want to over-engineer our solution based on these uncertain requirements, we tried to keep a loosely coupled architecture that would allow some of these scenarios to be addressed more easily in the future.</p>
<h2>Extended Atomic Metaphor</h2>
<p>Our web component library follows an approach loosely based on the concept of <a href="https://bradfrost.com/blog/post/atomic-web-design/">Atomic Design</a>.
The basic idea is to have different abstractions that can be built based on each other, from the most simple to the most complex. In the same way, complex living organisms are composed of simpler molecules which in turn are composed of simpler atoms and so on.
A layered approach is a natural fit for many complex and continuously evolving systems. In particular, we can observe in nature the speed at which layers of different complexities change and tend to be mirrored in artificial constructs like a Design System or many other instances of complex systems.
A very interesting reading that I strongly suggest on the topic is <a href="https://jods.mitpress.mit.edu/pub/issue3-brand/release/2">Pace Layering: How Complex Systems Learn and Keep Learning</a>.
For our web architecture we ended up with these different layers:</p>
<p><img alt="Ownership" src="https://engineering.zalando.com/posts/2022/07/images/atomic.png#center"></p>
<dl>
<dt><strong>Design Tokens</strong></dt>
<dd>A centralised source of truth for variables and assets that define the core of the Design System. Some examples are: colour palette, spacing, typography, fonts, icons, etc.</dd>
<dt><strong>Electrons</strong><dt>
<dd>A subset of the CSS grammar that only allows properties and values that are consistent with the specifications of the Design System. e.g. <code>paddingTop_m</code>, <code>fontFamily_sansSerif</code>, etc.</dd>
<dt><strong>Atoms</strong></dt>
<dd>A composition of electrons and/or other atoms that serve a single generic purpose and cannot be divided further without losing its functionality. E.g. the collection of electrons needed to create a button. In our implementation this is the last layer that directly uses CSS.</dd>
<dt><strong>Molecules</strong></dt>
<dd>A composition of atoms and/or other molecules forming a single generic component. An example could be the React implementation of different button types ready to be used as a package. At this level there should not be any business logic and emphasis is given to reusability and consistency with design specifications. For example, most of these molecules will also be available as components in the shared designer library.</dd>
<dt><strong>Organisms</strong></dt>
<dd>A composition of molecules, atoms, and/or other organisms to fulfill a specific business use case. They are not part of the core component library and are owned by different teams owning the specific feature they enable.</dd>
</dl>
<p>Consistent with the natural world analogy, elements belonging to the simpler layers like electrons and atoms, tend to be stable and only very rarely receive any updates, for example a major redesign every few years.
On the other hand as the complexity of the layer increases, changes happen more and more frequently.
Based on this expected behaviour, we shaped our architecture in order to optimise for:</p>
<ol>
<li>very frequent changes in organisms</li>
<li>occasional changes in molecules</li>
<li>very rare changes in atoms and electrons.</li>
</ol>
<p>We were also able to use these assumptions as a technical leverage to maximise other dimensions like bundle size, enforced visual consistency, testing, and documentation.</p>
<h2>Contributions and Ownership</h2>
<p>In terms of tangible entities, the Zalando Design System is composed of different parts with different ownership and contribution processes in place, <a href="https://medium.com/zalando-design/zalandos-design-system-contribution-model-73ab36f8591e">this article</a> covers the details of our "contribution model" more in-depth.
Here, we will focus on the parts affecting the web platform, but a similar structure can be encountered for mobile app development as well.</p>
<p><img alt="Ownership" src="https://engineering.zalando.com/posts/2022/07/images/ZDS.png#center"></p>
<dl>
<dt><strong>Design Tokens repository</strong></dt>
<dd>Owned by the larger Design System team, including designers as well as web and app engineers.</dd>
<dt><strong>Figma component library</strong></dt>
<dd>Includes a visual representation of the Design System specifications as well as a centralised component library that can be used by designers in many different teams to create screens and requirements for arbitrary features.</dd>
<dt><strong>Web component library</strong></dt>
<dd>Structured as a monorepo, it exports a single npm package for each atom, molecule and organism as well as a single highly optimised CSS bundle. The central Design System team has the ownership of the CSS layer, the atoms, the molecules, and some generic organisms.</dd>
</dl>
<p>Using GitHub code owners, different teams own specific organisms and are responsible for maintaining any business logic required.
Pull requests on code owned folders are usually faster to approve and merge as we ensure that changes on a code owned component will not affect other exported packages.</p>
<p>The only way to use CSS on organisms and molecules is via atoms, this ensures a certain amount of consistency and makes it easy to spot possible deviations from the Design System specifications.
Using a single, predictable CSS bundle and a set of React hooks and patterns, we encourage consistency and composability over one-off implementations. In return we get a very scalable library where an unlimited number of organisms will always result in the same CSS bundle size and not affect each other JS bundle size.</p>
<h2>Challenges and Pain Points</h2>
<p>Creating a Design System from scratch and driving its adoption in a large company was definitely challenging, from gathering the requirements from many hidden use cases to getting enough traction to refactor complex applications; it has been a journey where communication and coordination have played a major role.
Finding a technical solution able to grow and scale as fast as our requirements was also a challenge.</p>
<p>While the system has been running relatively smoothly for more than 2 years and the adoption rate is close to 100%, there are some long-lasting pain points and possible areas for future improvements.</p>
<h3>Fragmented Ownerships</h3>
<p>Finding the right owner for specific common components like product cards, carousels, banners, etc. is extremely difficult from an organisational point of view.
Even when an owner is found, it is hard to prevent some conflicts and overlaps of responsibilities.
For example, multiple variations of the same components start to appear with different ownerships, features that require coordination across certain premises need the involvement of different teams, and the discoverability of what is currently available becomes a crucial requirement.</p>
<h3>Coupling with Deployments</h3>
<p>In software engineering, it is usually considered a best practice to group things that change together.
Currently, a new version of the component library and a new version of the live customer-facing application are handled by different pipelines and the codebases live in different repositories.
Although having independent releases and a platform-agnostic pipeline may be convenient, we cannot ignore the reality of having one main consumer. In this case, a solution involving a larger monorepo may help with the bottleneck problem created by the need to keep versions in sync.</p>
<h2>Conclusions</h2>
<p>A Design System tends to behave like most complex systems.
Different layers evolve and stabilize at different paces, with a slow-changing core and fast iterations on the edges.
The biology metaphor fits quite well in those behaviors and got popularized with atomic design.
Porting those complexity layers into a technical implementation was not always straightforward, but overall a good decision.</p>
<p>Code can be observed with the same curiosity we have when looking at nature.
Identifying the boundaries between different layers and their relationships is the key to control the complexity involved.
While, to some extent, exceptions will always exist, knowing what parts of the system are stable, which ones are changing fast, and how they affect each other is a powerful tool.
The architecture and processes around a Design System can be shaped around these characteristics in order to optimize for fast iterations on the edge layers and stability on the core ones.
Embracing the chaotic nature of changes while learning and observing the larger patterns at play is the key to achieving long-term stability and a healthy evolution process.</p>International Women in Engineering Day (June 23rd)2022-06-23T00:00:00+02:002022-06-23T00:00:00+02:00Anja Bergnertag:engineering.zalando.com,2022-06-23:/posts/2022/06/international-women-in-engineering-day-23-june.html<p>We’re celebrating International Women in Engineering Day by talking to three senior Zalando Women in Tech.</p><p>What were the biggest learnings in your career so far? And what advice would you give your younger self today? How do you get ahead in your career? We’re celebrating <strong><a href="https://www.inwed.org.uk/about/">International Women in Engineering Day</a></strong> by talking to three senior Zalando Women in Tech: <a href="https://www.linkedin.com/in/mahak-swami-5a404029/">Mahak Swami</a>, Engineering Manager; <a href="https://www.linkedin.com/in/florianegramlich/">Floriane Gramlich</a>, Director of Product Payments; and <a href="https://www.linkedin.com/in/anapeleteiro/">Ana Peleteiro Ramallo</a>, Head of Applied Science. We caught up with them during the Women in Tech Global Conference 2022 — let’s find out their advice!</p>
<h4>What’s the best thing about your job?</h4>
<p><img alt="Photo of Mahak" src="https://engineering.zalando.com/posts/2022/06/images/mahak.jpg#right"></p>
<p><strong>Mahak:</strong> In my team, we build products for the Zalando mobile app. The best thing is the technical challenges: working on them and solving them.</p>
<p><strong>Floriane:</strong> I have an incredible team who I love to work with – it’s fun, but it’s also inspiring. Also, I work in payments, which is all about customer convenience: Ultimately, if I don’t do my job right, then people can’t pay, so I love that I’m making a difference.</p>
<p><strong>Ana:</strong> The best thing about my job is that I get to work on super-interesting topics, and with really amazing and interesting colleagues.</p>
<h4>Looking back at your career, what’s your tip for fostering a more inclusive environment?</h4>
<p><strong>Mahak:</strong> It’s really important that everyone’s opinions are considered when you’re solving a problem. An engineer could bring equally important input to the design, and vice versa. Everybody needs to bring their own values to the table, so that we can find the best solutions to the problem.</p>
<p><strong>Floriane:</strong> Being yourself is super-important. That means accepting who you are, and not trying to imitate somebody else. Because, if you can’t be true to yourself, how can you be true to others?</p>
<p><strong>Ana:</strong> The first thing is to make people aware when there is not an inclusive environment. Many times people want to be inclusive, and don’t realise there’s a problem.</p>
<h4>What’s the best professional advice you’ve ever received?</h4>
<p><img alt="Photo of Floriane" src="https://engineering.zalando.com/posts/2022/06/images/floriane.jpg#right"></p>
<p><strong>Mahak:</strong> The best advice I’ve had was around executive presence: To speak about my work and represent it just as well as I was doing it. A lot of women don’t advocate for the work they’re doing. That’s one thing I’d definitely push for.</p>
<p><strong>Floriane:</strong> So, the worst advice I ever received was, ‘Don’t be too ambitious’. I was told that a LOT in previous companies, in almost every performance talk. It’s terrible advice and I wonder if a man would be told the same thing. Now, it’s really important to me to be that multiplier for my teams, I say: Be ambitious!</p>
<p><strong>Ana:</strong> The best professional advice I ever got was, ‘If you want something, just go and get it’. Because many times we doubt ourselves, but it’s about wanting to get something and having a plan for how to get it.</p>
<h4>What advice would you give your younger self?</h4>
<p><strong>Mahak:</strong> Try out as many things as you can in your career. It’s very important to figure yourself out. Don’t be afraid to find out what clicks for you as a professional.</p>
<p><strong>Floriane:</strong> Know what you want. Say what you want. Do what you want. And stand true to that. It’s super-important to invest in self-reflection quite early. You need to really understand who you are.</p>
<p><strong>Ana:</strong> What I learned is to be really proactive and never stop learning. Continuous learning will help you to grow.</p>
<h4>What other tips would you give to women starting their career in STEM?</h4>
<p><img alt="Photo of Ana" src="https://engineering.zalando.com/posts/2022/06/images/ana.jpg#right"></p>
<p><strong>Mahak:</strong> In general, women have this perception of tech: that it isn’t a place for them, and perhaps it’s difficult to get into. But that’s not the case. Tech is very logical, a lot of fun and now very inclusive too. When I started my career, I was often the first and only woman on the team. But now that’s not the case. You will have company and you will have fun – try it!</p>
<p><strong>Floriane:</strong> Be curious and don’t let other people tell you what you can or can’t do. On a more practical level, look for role models (there are lots out there), find yourself a mentor, build your network, and really learn from others. Getting advice from outside your usual zone is very powerful.</p>
<p><strong>Ana:</strong> Never allow anyone to tell you what you can or can’t do. You’re the only one who knows your goals and what you want to achieve. Also, there are no things for girls or things for boys – there’s only things you like. So, if there’s something you like, go ahead and enjoy it.</p>
<p>Learn more about <a href="https://www.inwed.org.uk/about/">International Women in Engineering Day</a> and for more inspiration, check out our three Zalando speakers at the recent <a href="https://www.youtube.com/results?search_query=womentech+network+zalando">Women in Tech Global Conference</a>.</p>
<p><em>Curious about Zalando? Learn more about <a href="https://jobs.zalando.com/en/tech">how our teams rewrite the rules of fashion</a>, by building a sustainable fashion platform that is inclusive by design.</em></p>Accelerate testing in Apache Airflow through DAG versioning2022-06-10T00:00:00+02:002022-06-10T00:00:00+02:00Hilmi Yildirimtag:engineering.zalando.com,2022-06-10:/posts/2022/06/accelerate-apache-airflow-testing-through-dag-versioning.html<p>In this blog post we present a way to version your Airflow DAGs on a single server through isolated pipeline and data environments to enable more convenient simulation and testing.</p><h1>Introduction</h1>
<p>In the Performance Marketing department, we run paid advertisement campaigns for Zalando. To do so,
we build services that allow us to manage campaigns, optimize and distribute content,
and measure the performance of the campaigns at scale.</p>
<p>Talking about measurement, one of the core systems we’ve built and continuously extended over
the years is our so-called marketing ROI (return on investment) pipeline. The ROI pipeline is
a batch based data- and machine learning pipeline powered by Databricks Spark and orchestrated
by Apache Airflow. It consists of various sub-pipelines (components), some of which are built
using our python sdk <a href="https://www.linkedin.com/pulse/building-ml-workflows-zalando-zflow-s%C3%A1nchez-fern%C3%A1ndez/">zFlow</a>. Examples for said components are our input data preparation,
marketing attribution model or an incremental profit forecast for our campaigns.
These components are owned and developed by different cross-functional
teams (applied science, engineering, product) within Performance Marketing.
You can read more about the way we measure campaign effectiveness from a functional perspective in our previous <a href="https://engineering.zalando.com/posts/2019/02/effectiveness-online-marketing.html">blog post</a>.</p>
<h1>Problem Statement</h1>
<p>A recurring problem we faced during the development relates to the nature of the marketing
ROI which lacks a ground truth<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>. It means that while we oftentimes have assumptions on what
the impact of a change in input data or to our components has on the ROI, we require the new
version of the ROI pipeline to be run end-to-end to confirm our assumptions. Since different
teams are working on different components of the ROI pipeline in parallel, evaluating the
impact of a change on the final ROI in isolation is required to work effectively
(i.e. teams not blocking each other). The following section explains the problem in more depth.</p>
<p>As mentioned earlier, we are using Airflow to orchestrate the overall pipeline. The Airflow
code is stored in a github repository. We have two servers, production and test. When a pull
request is opened, the Airflow pipeline is deployed to the test server. On merge to the main
branch, we deploy to the production server. In this setup, we have two so-called pipeline
environments, a production (live) and a test environment. The live pipeline uses the live
data environment while the test pipeline uses the test data environment. As our data layer,
we’re mainly using AWS S3 with data organized as Spark tables. A set of Spark
tables represents a data environment. Only one version of an Airflow DAG such as our marketing
ROI pipeline can exist in each environment. When multiple features are developed at the same time,
they have to share the test environment which oftentimes leads to conflicts since testing in
isolation is not possible. Alternatively, the features can be tested sequentially which leads
to delays. To solve the problem, we implemented a mechanism to enable a flexible number of
Airflow environments. Moreover, we also developed a script to spin up new data environments.</p>
<p>Figure 1 depicts the relationship between a pipeline and data environments.</p>
<p><img alt="Environments" src="https://engineering.zalando.com/posts/2022/06/images/overview_environments.jpg#center"></p>
<figcaption style="text-align:center">Figure 1: Environments</figcaption>
<h2>Pipeline Environment</h2>
<p>A pipeline environment is a version of a pipeline (set of Airflow DAGs) deployed to an Airflow server on which it can
run end-to-end. Each environment contains all DAGs necessary to produce the required output
(e.g. marketing ROI in our case), so multiple environments can co-exist on one server and can be used independently.</p>
<h2>Data Environment</h2>
<p>A data environment is a set of Spark/Hive databases, tables and views. A pipeline environment uses a single
data environment for reading and writing data.</p>
<h1>Airflow Environments</h1>
<p>Our main objective was to create a new Airflow environment once a pull request is
opened on which the developed version of the pipeline can be tested in isolation.
The most trivial way is to create a new Airflow server for every pull request, which
would be time consuming and costly. For example, Amazon Managed Workflows for Apache Airflow (MWAA)
needs up to 30 minutes to create a new Airflow server and you have to pay for additional resources.
With our solution, a new environment is created on the existing test server once a pull request is
opened, resulting in multiple environments on the same Airflow server. The creation of a
new environment takes less than one minute.</p>
<p>Figure 2 shows how this could look like on the test server. We have 2 Airflow DAGs
<code>qu.test_dag</code> and <code>qu.test_dag_2</code> with three different environments: <code>feature1</code>, <code>feature2</code>
and <code>feature3</code>. "qu" is the name of an internal team at Zalando. The DAGs always have the team name as prefix.
It means that the same DAGs are adapted and deployed through three separate pull requests.</p>
<p><img alt="Airflow Environments" src="https://engineering.zalando.com/posts/2022/06/images/airflow_environments.jpg#center"></p>
<figcaption style="text-align:center">Figure 2: Airflow Environments</figcaption>
<p><br/></p>
<p>When the corresponding pull request is closed, the environment will be deleted automatically.
How did we implement this since the concept of environments does not exist in Airflow?
To achieve this, we adjusted the source code of the Airflow library and developed a cron job
which deletes the environments later on. The following sections explain necessary modifications made.</p>
<h2>Deploying Airflow code as a zip file</h2>
<p>The Airflow code is deployed as a single zip archive using the
<a href="https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#packaging-dags">Packaging DAGs</a>
feature. This feature prevents dependency conflicts because every deployment only uses
the dependencies which are defined in the same zip file.
The zip file has the name of the branch from which we are deploying. For example, when
we deploy the Airflow code from branch feature1, the zip file is called <code>feature1.zip</code>.</p>
<h2>Use correct Jinja Paths</h2>
<p>A problem occuring through the use of zip file is that jinja templates for files are not
working anymore. Jinja detects the absolute path of the file correctly but the file cannot
be read because it’s inside a zip file. For this reason we also deploy the unpackaged zip archive
in a different location. Inside the <code>dag.py</code> file (see Figure 3 line 13 - 19) we add the
location of the unpackaged files to the template search path. As a result, jinja now
searches for templates inside the unpackaged folder.</p>
<h2>Renaming Dag Ids</h2>
<p>On one Airflow server, it’s not possible to create multiple DAGs with the same id.
Therefore, we have to rename the DAG ids for every deployment. For that reason we adapted
the <code>dag.py</code> file (see Figure 3) of the Airflow library which contains the DAG class. Inside
the init method we are checking the file path of the python file which is initializing the dag.
The path contains the name of the zip file, e.g. <code>feature1.zip</code>. This way we can differentiate
the environments. We modify the original DAG id and inject the environment name
(see Figure 3, lines 3-11). Furthermore, we add the environment name as a tag to enable
filtering on environments.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="n">DAG</span>():
…
<span class="n">def</span> <span class="n">__init__</span>(...):
<span class="c1"># /usr/local/airflow/dags/feature1.zip/qu/main/file.py</span>
<span class="n">file_path</span> = <span class="n">get_path_of_file_which_initialized_dag</span>()
<span class="c1">#feature1</span>
<span class="n">feature_name</span> = <span class="n">get_zip_file_name</span>(<span class="n">file_path</span>)
<span class="n">dag_id</span> = {<span class="n">team_name</span>}.<span class="n">feature_name</span>.{<span class="n">dag_id</span>.<span class="nb">split</span>({<span class="n">team_name</span>}.')[<span class="mi">1</span>]}
<span class="n">tags</span>.<span class="nb">append</span>(<span class="n">feature_name</span>)
<span class="c1"># /usr/local/airflow/features/feature1/</span>
<span class="n">feature_dir_path</span> = <span class="n">get_feature_dir_path</span>(<span class="n">file_path</span>)
<span class="n">template_searchpath</span>.<span class="nb">add</span>(<span class="n">feature_dir_path</span>)
<span class="c1"># /usr/local/airflow/features/feature1/qu/main/</span>
<span class="n">feature_file_path</span> = <span class="n">get_feature_file_dir_path</span>(<span class="n">file_path</span>)
<span class="n">template_searchpath</span>.<span class="nb">add</span>(<span class="n">feature_file_path</span>)
…
</code></pre></div>
<figcaption style="text-align:center">Figure 3: Pseudo Code of adapted dag.py</figcaption>
<h2>Environment Cleanup</h2>
<p>We have developed a cron job that checks the status of pull requests. Once a pull request is
closed, the corresponding environment is deleted on the Airflow server. The job deletes the
zip file and the folder which contains the unpackaged files. Then, it queries the Airflow
metastore for all associated DAGs and deletes them via Airflow cli.</p>
<h1>Data Environments</h1>
<p>Every Airflow environment also requires a data environment, otherwise conflicts on the data
layer could occur during parallel feature development. Our data is mainly organized as Spark
databases stored on S3. A data environment is a set of Spark databases with a corresponding
suffix, e.g. all databases of the live environment have the suffix <code>_live</code>. The ddls of our
databases and tables are stored in a git repository. We developed a script which uses the
ddls to create a new data environment (see Figure 4). The databases have the environment
name as a suffix, e.g. <code>db_attribution_feature1</code>.</p>
<p><img alt="Data Environments" src="https://engineering.zalando.com/posts/2022/06/images/data_environments.jpg#center"></p>
<figcaption style="text-align:center">Figure 4: Create new Data Environment</figcaption>
<p><br/></p>
<p>A new data environment initially is empty, i.e. the databases do not contain any data.
We could copy the data, this costs time and money though. A more elegant way is the table
environment feature which we implemented with the data environment script. Instead of copying
data, the script creates a view pointing to the respective test data (see Figure 5).
Table environments are defined in a configuration file which is automatically created
via the table environment script. The script uses information about input and output
tables of all tasks which are predefined as yaml files. An example table environment
configuration is <code>db_attribution.m_events:TEST</code>, resulting in the creation of the following
example view.</p>
<div class="highlight"><pre><span></span><code>CREATE VIEW db_attribution_feature1.m_events AS
SELECT * FROM db_attribution_test.m_events
</code></pre></div>
<figcaption style="text-align:center">Figure 5: Creating a view instead of copying data</figcaption>
<p><br/></p>
<p>A view is only created if the table is not used as output by one of the respective tasks.
In some cases you need initial data for tables which are used as output. Therefore, the
table environment script creates a configuration stub for these tables like that:</p>
<div class="highlight"><pre><span></span><code>db_attribution.m_events:
partitions:
<span class="k">-</span> date BETWEEN "x" AND "y"
</code></pre></div>
<p>If you define the partition ranges and execute the data environment script, it creates the
table and copies the data for you.</p>
<h1>Summary</h1>
<p>In this blog post we presented how we enabled versioning of our performance marketing pipeline
which is based on Apache Airflow. The Versioning is necessary to enable more convenient simulation
and testing. We modified the Airflow DAGs class and used the Packaging DAGs feature of Apache Airflow
to make it possible to have multiple versions of the same DAGs on a single server. This allows us
to deploy a git branch consisting of Airflow DAGs directly to a single Airflow server where they
can run isolated from other versions. The deployment takes less than 1 minute compared to up to
30 minutes when you create a new Airflow server for the deployment. To enable isolation on data
level we implemented a script which spins up a new Data Environment consisting of Spark/Hive
tables on S3. As a result, every Pipeline version can use a dedicated Data Environment.</p>
<hr>
<p><em>Are you interested to join us to work on such problems and many more business related challenges?
Apply as a <a href="https://jobs.zalando.com/en/jobs/3932595/?gh_jid=3932595">Senior Data/ML Engineer at Zalando’s performance marketing</a> department.</em></p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>This is simplified, ultimately we consider the results of our a/b tests as ground truth.
Yet, a/b tests are only run in certain periods of the year and are used to correct our marketing
attribution results also in-between a/b test periods. Here, due to internal and external factors
such as spend changes or campaign efficiency changes, the ground truth could in fact have changed
as well. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Operation-Based SLOs2022-04-28T00:00:00+02:002022-04-28T00:00:00+02:00Pedro Alvestag:engineering.zalando.com,2022-04-28:/posts/2022/04/operation-based-slos.html<p>Zalando developed a new type of SLOs to monitor the critical aspects of its business which is based on Operations. This blog post describes how that framework works, and how it contributes to healthier on-call rotations.</p><p><img alt="Zalando's 2019 Cyber Week Situation Room" src="https://engineering.zalando.com/posts/2022/04/images/preview-image.jpg#previewimage"></p>
<p>Anyone who has been following the topic of Site Reliability Engineering (SRE)
has likely heard of <a href="https://sre.google/sre-book/service-level-objectives/">Service Level Objectives (SLOs)</a>,
and Service Level Indicators (SLIs). SLIs and SLOs are at the core of the SRE
practices. They are fundamental to establish the balance between building new
features on a product, shipping fast, or working on the reliability of that
product. But they are not easy to get right. Zalando has gone through different
iterations of defining SLOs, and we’re now in the process of maturing our latest
iteration of SLO tooling. With this iteration, we are addressing fragmentation
problems that are inherent to service based SLOs in highly distributed
applications. Instead of defining reliability goals for each microservice, we
are working with SLOs on Critical Busines Operations that are directly related
to the user experience (e.g. <em>"View Catalog"</em>, <em>"Add Item to Cart"</em>), rather
than a specific application (Catalog Service, Cart Service). In this blog post
we’re going to present our Operation Based SLOs, how we define them, the tooling
around them, how they are part of our development process, and also how they
contributed to a healthier on-call.</p>
<h2>The first iterations of defining SLOs</h2>
<p>To understand where we are right now, it’s important to understand how we got
here. When <a href="https://engineering.zalando.com/posts/2021/09/sre-journey-part1.html">we introduced SRE in Zalando back in 2016</a>
we also introduced SLOs. At the time, we went with service based SLOs. Each
microservice would have SLOs on whatever SLIs service owners defined (usually
availability and latency), and they would get a weekly report of those SLOs,
through a <a href="https://github.com/zalando-zmon/service-level-reporting">custom tool</a>
that was tightly coupled with our homebrew monitoring system.</p>
<p><img alt="Service Level Reporting tool" src="https://engineering.zalando.com/posts/2022/04/images/slr-report.png#center"></p>
<figcaption style="text-align:center">Service Level Reporting tool</figcaption>
<p><br/></p>
<p>As these were new concepts in the company, we ran multiple workshops across the
company for Engineers and Product Managers to train them on the basics and to
kick-start the definition of SLOs across all engineering teams.
Product Managers and Product Owners started to get unexpected questions from
other peers and engineers:</p>
<ul>
<li>"What is the desired level of service you wish to provide to your customers?"</li>
<li>"How fast should your product be?"</li>
<li>"When is the customer experience degraded to an unacceptable level?"</li>
</ul>
<p>The last one was particularly relevant for services that have different levels
of graceful degradation. Say the service cannot respond in the ideal way; it
uses its first fallback strategy that is still "good enough" so we consider it a
success. But what if that first fallback also fails? We can use a second
fallback just so we don’t return an error, but maybe that is no longer a
response of acceptable quality. Even though the response was successful from the
client’s perspective, we still count it as an error.
What was particularly interesting about this thought process was that it created
a break from defining availability exclusively based on HTTP status codes (where
failure is anything in the 5xx range). It’s good to keep this reasoning in mind,
as it will be useful further down.</p>
<p>SLOs saw an increasing adoption across the company, with many services having
SLOs defined and collected. This, however, did not mean that they were living up
to their full potential, as they were still not used to help balance feature
development and improving reliability. In a microservice architecture, a product
is implemented by multiple services. Some of those services contribute to
multiple products. As such, Product Managers had a hard time connecting the
myriad of SLOs and their own expectations for the products they are responsible
for. Because SLOs are on a microservice level, the closest manager would be on
the team level. Taking into consideration the previous point that a product is
implemented by multiple services, aligning the individual SLOs for a single
product would mean costly cross-team alignment. Raising the SLO discussion to a
higher management level would also be challenging, as microservices are too fine
grained for a Head or Director to be reviewing. <strong>The learning at this stage
was that the boundaries of Products did not match individual microservices.</strong></p>
<p><img alt="Service landscape and products" src="https://engineering.zalando.com/posts/2022/04/images/service-landscape-and-products.png#center"></p>
<figcaption style="text-align:center">In this service landscape we see that products can share individual services</figcaption>
<p><br/></p>
<p>We later tried to add additional structure to the existing SLOs. One of the
challenges we had with service based SLOs was the sheer amount of services that
had to be measured and monitored for their SLOs. Realistically speaking, they
could not all have the same level of importance. To ensure teams focused on what
mattered the most, a system of Tier classifications was developed - Tier 1 being
most critical and Tier 3 being least critical. With each service properly
classified, teams knew what they should be keeping a close eye on. Having the
Tier definition also allowed us to set canonical SLOs according to an
application's tier classification. Our tooling evolved to keep up with these
changes.</p>
<p>To summarise, our experience with service based SLOs struggled to overcome the
following challenges:</p>
<ol>
<li><strong>High number of microservices.</strong> The more there are, the more SLOs teams have to monitor, review, and fine tune.</li>
<li><strong>Mapping microservice SLOs to products and their expectations.</strong> When products use different services to provide the end-user functionality and with some services supporting several products, SLOs easily conflict with each other.</li>
<li><strong>SLOs on a fine grained level made it challenging for management to align on them.</strong> When dealing with SLOs on such a granular level as micro services, Management support beyond the team level is difficult to get. And within the team level, it requires costly cross-team alignment.</li>
</ol>
<h2>Symptom Based Alerting</h2>
<p>In our role as SREs we were in frequent contact with different teams, helping
them with PostMortem investigation, or reviewing their monitoring (what metrics
were collected and paging alerts that were set up).
While teams were quick to collect many different metrics, figuring out what to
alert on was a more challenging task. The default was to alert on signals that
<em>could</em> indicate a wider system failure ("Load average is high", "Cassandra node
is down"). Knowing the right thresholds to alert on was another challenge. Too
strict, and you’re being paged all the time with false positives. Too relaxed,
and you’re missing out on potential customer impacting incidents. Even figuring
out whether the alert always translates to customer impact was also tricky at
times. All of this led us to push for a different alerting strategy: <strong>Symptom
Based Alerting</strong>.</p>
<p>You can find more details about Symptom Based Alerting in the <a href="https://github.com/zalando/public-presentations/blob/master/files/2019-05-16_alerting_monitoring_and_all_that_jazz.pdf">slides of one of
the talks</a>
we did on this topic. But the main message of that talk is that there are some
parallels between SLOs and Symptom Based Alerts. Namely, about <em>what</em> makes a
good SLO, or a symptom worth alerting, and <em>how</em> many SLOs and alerts you should
have.
Both SLOs and Symptom based alerts should be focused on key customer experiences<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup><sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup><sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup>
by defining alerts and SLOs on signals that represent those experiences. Those
signals are stronger when they are measured closer to the customer, so we should
measure them on the edge services.
There are benefits to keeping both alerts and SLOs at a low number<sup id="fnref2:2"><a class="footnote-ref" href="#fn:2">2</a></sup><sup id="fnref2:3"><a class="footnote-ref" href="#fn:3">3</a></sup>.
Focusing on the customer experience, rather than all the services and other
components that make up that experience helps ensure that. By alerting on
symptoms, rather than potential causes for issues, we can also identify issues
in a more comprehensive way<sup id="fnref:4"><a class="footnote-ref" href="#fn:4">4</a></sup>, as anything that may negatively affect the
customer experience will be noticed by the signal at the edge.</p>
<p>Let's see how this works in practice by taking the following SLO as an example:
<em>"Catalog Service has 99.9% availability"</em>. Let's assume Catalog Service is an
edge service responsible for providing to our customers the catalog information,
its categories, and the articles included in each category. If that service is
not available, customers cannot browse the Catalog. Because it is an edge
service it can fail due to issues in any of the downstream services. That, in
turn, would negatively affect the availability SLO. Any breach of the SLO means
that the customer experience is also affected.
Due to the connection between the SLO's performance and the customer experience
we come to the conclusion that the degradation of the SLI <em>"Catalog Service
availability"</em> is a symptom of a degraded customer experience. The SLO sets a
threshold after which that degradation is no longer acceptable, and immediate
action is required. Or in other words, <strong>we should page when our SLO is missed,
or in danger of being missed.</strong></p>
<p>From this we derived the following formula:</p>
<p><center>
<em>Service Level Objective = Symptom + Target</em>
</center>
<br/></p>
<p>Essentially, we wanted to capture high level signals (or symptoms) that
represented customer interactions. These signals could be captured at the edge
services that communicate with our customers. If those signals degraded, then
the customer experience degraded. Regardless of whatever it was that caused that
degradation. If we couple that with an SLO, then, following the formula above,
we get our alert threshold implicitly.</p>
<p>There is an additional feedback loop between SLOs and symptom based alerts when
you couple them like that:</p>
<ul>
<li>If you get too many pages, then the respective SLO should be reviewed, even if temporarily.</li>
<li>If you get too few pages, then maybe you can raise the SLO, as you are overdelivering.</li>
<li>If you have a customer experience that is not covered by an alert, then you likely also identified a new SLO</li>
</ul>
<p>The problem with setting up alerts at those edge services, however, was that it
would always fall down to the team owning those services to receive the paging
alerts and perform the initial triage to figure out what was going on.</p>
<p>While the concept seemed solid, and made a lot of sense, we were still missing
one key ingredient: <strong>how could we measure and page based on these symptoms,
without burning out the team at the edge given they'd be paged all the time?</strong></p>
<h2>Introducing Operation Based SLOs</h2>
<p>When rolling out <strong>Distributed Tracing</strong> in the company, one of the challenges
we faced was where to begin with the service instrumentation work to showcase
its value early on.
Our first instinct was to instrument the Tier 1 services (the most critical
ones). We decided against this approach because we wanted to observe requests
end-to-end, and instrumenting services by their criticality would not give us
the coverage across system boundaries we were aiming for. Also, it is relevant
to highlight that Tracing is an observability mechanism that is <strong>operation
based</strong>, so we thought that going with a service based approach would be
counter-intuitive. We then decided to instrument a complete customer operation
from start to finish. But the question then became: "Which operation(s)?".</p>
<p>Earlier, for our <a href="https://engineering.zalando.com/tags/cyber-week.html">Cyber Week</a>
load testing efforts, SREs and other experienced engineers compiled a list of
"User Functions". These were customer interactions that were critical to the
customer-facing side of our business. Zalando is an e-commerce fashion store, so
operations like <em>"Place Order"</em> or <em>"Add to Cart"</em> are key to the success of the
customer experience, and to the success of the business. The criticality
argument was also valid to guide our instrumentation efforts, so that is what we
used to decide which operations to instrument. This list became a major
influence on the work we did from then on.</p>
<p>One of the key benefits we quickly got from Distributed Tracing was that it
allowed us to get a comprehensive look at any given operation. From looking at a
trace we could easily understand what were the key latency contributors, or
where did an error originate in the call chain. As these quick insights started
becoming commonplace during incident handling, we started wondering if we could
automate this triage step.</p>
<p>That train of thought led us to the development of an alert handler called
<strong>Adaptive Paging</strong> (you can see the <a href="https://www.usenix.org/conference/srecon19emea/presentation/mineiro">SRECon talk</a>
to learn more details about Adaptive Paging). When this alert handler is
triggered, it reads the tracing data to determine where the error comes from
across the entire distributed system, and pages the team that is closest to the
problem. <strong>Essentially, by taking Adaptive Paging, and having it monitor an edge
operation, we achieved a viable and sustainable implementation of Symptom Based
Alerting</strong>.</p>
<p><img alt="Adaptive Paging" src="https://engineering.zalando.com/posts/2022/04/images/adaptive-paging.jpg#center"></p>
<figcaption style="text-align:center">Adaptive Paging will traverse the Trace and identify the team to be paged</figcaption>
<p><br/></p>
<p>But rather than going around promoting Adaptive Paging as another tool that
engineers could use to be alerted, we were a bit more selective. A single
Adaptive Paging alert, monitoring an edge operation can cover all the services
in the call chain, which span multiple teams. No need to have every individual
team monitoring their own operations, when a single alert would serve the same
purpose (while being less noisy, and easier to manage). And figuring out what to
alert on was rather straightforward thanks to our list of "User Functions". We
renamed it to <strong>Critical Business Operations (CBO)</strong>, to be able to encompass
more than strictly user operations, and once again followed that list to
identify the signals we wanted to monitor. Alerts need a threshold to work, though.
Picking alert thresholds was always a challenging task. If we are talking about
an alert handler that can page any number of teams across several departments,
this becomes an even more sensitive topic that requires stronger governance.</p>
<p>Our list of CBOs was a customer centric list of symptoms that could "capture
more problems more comprehensively and robustly". And SLOs should represent the
"most critical aspects of the user experience". Basically, all we needed was a
target (which would be our alert threshold) and we would also have SLOs. <strong>CBOs
then became an implementation of Operation Based SLOs.</strong></p>
<p>Let’s take as an example <em>"Place Order"</em>. This operation is clearly critical to
our business, which is why it was one of the first to make the Critical Business
Operations list. As there are many teams and departments owning services that
are contributing to this operation, the ownership for the SLO is critical. We
chose the senior manager owning the customer experience of the Checkout and
Sales Order systems to define and be accountable for the SLO of the <em>"Place
Order"</em> operation. This also ensured that SLO had management support.
We repeated this process for the remaining CBOs. We identified the senior
managers responsible for each of the CBOs (Directors, VPs and above) and
discussed the SLOs for those operations. With each discussion we would end up
with: a CBO with an SLO signed off by senior management; and a new alert on that
same CBO that would be sure to page only on situations where customers were
truly affected.</p>
<p><strong>Our Operation Based SLOs tackled the issues we had with the service based
approach:</strong></p>
<table>
<thead>
<tr>
<th style="text-align: center;">Service Based SLOs</th>
<th style="text-align: center;">Operation Based SLOs</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center;">High number of SLOs.</td>
<td style="text-align: center;">A short list of SLOs, easier to maintain as changes in service landscape have no implications on the SLO definition.</td>
</tr>
<tr>
<td style="text-align: center;">Difficult mapping from services to products.</td>
<td style="text-align: center;">SLOs are now agnostic of the services implementing the Critical Business Operations.</td>
</tr>
<tr>
<td style="text-align: center;">SLOs on a fine grained level made it challenging for management to align on them.</td>
<td style="text-align: center;">Products have owners. We also changed the approach from bottom-up, to top-down to bring additional transparency to that ownership.</td>
</tr>
</tbody>
</table>
<p>There were additional benefits that came with this new strategy:</p>
<ul>
<li><strong>Longevity of the SLOs</strong> → "View Product Details" is something that has always existed in the company’s history, but as a feature it has gone through different services and architectures implementing it.</li>
<li><strong>Using SLOs to balance feature development with reliability</strong> → Before, the lack of ownership meant that teams were not clear when to stop feature development work to improve reliability should the availability decline. Now they had a clear message from the VP or Director that the SLO was a target that had to be met.</li>
<li><strong>Out-of-the-box alerts</strong> → Our Adaptive Paging alert handler was designed to cover CBOs. As soon as a CBO has an SLO, it can have an alert with its thresholds derived from the SLO.</li>
<li><strong>Transport agnostic measurements</strong> → Availability SLOs no longer need to be about 5xx rate, or using additional elaborate metrics. OpenTracing’s error tag makes it a lot easier for engineers to signal an operation as conceptually failed. This enables the graceful degradation scenario mentioned earlier.</li>
<li><strong>Understanding impact during an incident</strong> → 50% error rate in Service Foo is not easily translatable to customer or business impact, without deep understanding of the service landscape. A 50% error rate on “Add to cart” is much clearer to communicate and derive urgency of needing to be addressed immediately.</li>
</ul>
<p>SRE continued the rollout of CBOs by working closely with the senior management
of several departments agreeing on SLOs that would be guarded by our Adaptive
Paging alert handler. With this we also continued the adoption of Symptom Based
Alerting. As more and more CBOs were defined, we needed to improve the reporting
capabilities of our tooling, and developed a <strong>new Service Level Management</strong>
tool that catered to this operation based approach.</p>
<p><img alt="New SLO tool" src="https://engineering.zalando.com/posts/2022/04/images/slo-tool.png#center"></p>
<figcaption style="text-align:center">Our Service Level Management Tool (operation based - not actual data)</figcaption>
<p><br/></p>
<p>As the coverage of CBOs and their respective alerts took off, we started getting
reports that the alerts were too sensitive. Particularly, there were multiple
occasions of short lived error spikes that resulted in pages to on-call
responders. To prevent these situations, engineers started adding complex rules
to the alerts on a trial and error basis (usually using time of day, throughput,
duration of the error condition).
SRE was aiming at creating alerts that did not require much effort from
engineers to set them up, with no fine tuning required, or that would not change
as components and architecture evolved. We were not there yet, but we soon
evolved our Adaptive Paging alert handler to use the <a href="https://sre.google/workbook/alerting-on-slos/#6-multiwindow-multi-burn-rate-alerts">Multi Window Multi Burn Rate</a>
strategy which uses burn rates to define alert thresholds. <strong>The Error Budget
became much more relevant with this change.</strong> The alerts went from being
triggered whenever the error rate breached the SLO, to having the decision of
whether a page should go out or not based on the rate we are burning the error
budget for an operation. This not only prevented on-call responders from being
paged by short lived error spikes, but also meant we could pick up on slowly
burning error conditions.
Because the Error Budget is derived from the SLO, it is still the SLO that made
it possible to derive the alert threshold automatically. Together with the
adaptability of Multi Window Multi Burn Rate which made it unnecessary to fine
tune alerts, this meant engineering teams required no effort to set up and manage
these alerts.
We also made sure that the Error Budget was visible in our new Service Level
Management tool.</p>
<p><img alt="Error budget view" src="https://engineering.zalando.com/posts/2022/04/images/error-budget.png#center"></p>
<figcaption style="text-align:center">Error Budget over three 28 day periods</figcaption>
<p><br/></p>
<h2>Putting this model to the test</h2>
<p>Everything we described so far seems to make perfect sense. And as we explained
it to several teams, no one seemed to make any argument against it. But still,
we were not seeing the initiative gaining the momentum we expected. Even teams
that did adopt CBOs, weren’t disabling their cause based alerts. Something was
missing. We needed the data to support our claims of a better process that would
reduce false positive alerts, while ensuring wide coverage of critical systems.
That’s what we set out to do, by <a href="https://en.wikipedia.org/wiki/Eating_your_own_dog_food"><em>dogfooding</em></a>
the process within the department.</p>
<p>For 3 months we put the whole flow to the test within the SRE department. We
defined and measured CBOs for our department, with their SLO targets (at the
same time demonstrating that this approach wasn’t exclusively for the use of
end-user or external customer systems). Because SRE owns the Observability
Platform our CBOs included operations like <em>"Ingest Metrics"</em>, or <em>"Query Traces"</em>.
Those CBOs were monitored by Adaptive Paging alerts. Within our weekly
operational review meeting we would look at the alerts and incidents created in
the previous week, and gradually identify which cause based alerts could be
safely disabled or not. All of this had the support of senior management,
granting engineers the confidence to take these steps.</p>
<p>By the end of that quarter we reduced the False Positive Rate for alerts within
the department from 56% to 0%. We also reduced the alert workload from 2 to 0.14
alerts per day. And we did this without missing any relevant user-facing
incidents. In the process we disabled over 30 alerts from all the teams in the
department. Those alerts were either prone to False Positives, or already
covered by the symptom based alerts.</p>
<p>One thing the on-call team did bring up was that shifts had become too calm.
They risked losing their on-call ‘muscle’. We tackled this with regular
<a href="https://sre.google/sre-book/accelerating-sre-on-call/#xref_training_disaster-rpg">"Wheel of Misfortune"</a>
sessions, to keep knowledge fresh, and review incident documentation and tooling.</p>
<h2>What's next?</h2>
<p>We are not done yet with our goal of rolling out Operation Based SLOs. There are
still more Critical Business Operations that we can onboard, for one. And as we
onboard those operations, teams can start turning off their cause based alerts
that lead to false positives.</p>
<p>And there are additional evolutions we can add to our product.</p>
<h3>Alerting on latency targets</h3>
<p>Right now, CBOs only set Availability targets. We also want CBO owners to define
latency targets. After all, our customers not only care that the experience
works, but also that it is fast.
While we already have the latency measurements, and could, technically, trigger
alerts when that latency breaches the SLO, it is challenging to use our current
Adaptive Paging algorithm to track the source of the latency increase. We don’t
want to burden the team owning the edge component with every latency alert, so
we are holding off on those alerts until a proper solution is found.</p>
<h3>Event based systems</h3>
<p>So far we’ve been focusing on direct end-customer experiences, which are served
mostly by RPC systems. There is a good chunk of our business that relies on
event based systems, and that we also want to cater for with our CBO framework.
This is quite the undertaking, as monitoring of event based systems is not as
well established as traditional HTTP APIs. Also, Distributed Tracing, the
telemetry pillar behind our current monitoring and alerting of CBOs, was not
designed with an event based architecture in mind. And the loss of the causality
property reduces the usefulness of our Adaptive Paging algorithm.</p>
<h3>Non-edge customer operations</h3>
<p>We always tried to measure customer experience as close to the edge as possible.
There are, however, some operations that are deeper in the call chain, but would
still benefit from closer monitoring. To prevent an uncontrolled growth of CBOs,
well defined criteria needs to be in place to properly identify and onboard
these operations.</p>
<h2>Closing notes</h2>
<p>Operation Based SLOs granted us quite a few advantages over Service Based SLOs.
Through this type of SLOs we were also able to implement Symptom Based Alerting,
with clear benefits for the on-call health of our engineers. And we were even
able to demonstrate the effectiveness of this new approach with numbers, after
trailing within the SRE department.</p>
<p>But the purpose of this post is not to present a new and better type of SLOs. We
see operation based SLOs and service based SLOs as different implementations of
SLOs. Depending on your organization, and/or architecture, one implementation or
the other may work better for you. Or maybe a combination of the two.</p>
<p>Here at Zalando we are still learning as the adoption of this framework grows in
the organization. We'll keep sharing our experience when there are significant
changes through future blog posts. Until then we hope this inspired you to give
operation based SLOs a try, or that it inspires the development of a different
implementation of SLOs.</p>
<hr>
<p><em>Want to work on a one-of-a-kind product? Then <a href="https://jobs.zalando.com/en/jobs/4009171-senior-software-engineer-alerting-team-all-genders">join us at SRE</a>
to help develop this new framework.</em></p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p><a href="https://cloud.google.com/blog/products/gcp/building-good-slos-cre-life-lessons">Google Cloud Platform Blog, Building good SLOs - CRE life lessons</a> <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p><a href="https://prometheus.io/docs/practices/alerting/">Prometheus Best Practices</a> <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a><a class="footnote-backref" href="#fnref2:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:3">
<p><a href="https://sre.google/sre-book/service-level-objectives/">SRE Book, Chapter 4 - Service Level Objectives</a> <a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">↩</a><a class="footnote-backref" href="#fnref2:3" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:4">
<p><a href="https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/edit">Rob Ewaschuk, "My Philosophy on Alerting"</a> <a class="footnote-backref" href="#fnref:4" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
</ol>
</div>Zalando's Machine Learning Platform2022-04-19T00:00:00+02:002022-04-19T00:00:00+02:00Krzysztof Szafranektag:engineering.zalando.com,2022-04-19:/posts/2022/04/zalando-machine-learning-platform.html<p>Architecture and tooling behind machine learning at Zalando</p><p>To optimize the fashion experience for 46 million of our customers, Zalando embraces the opportunities provided by machine learning (ML). For example, we use recommender systems so you can easily find your favorite shoes or that great new shirt. We want these items to fit you perfectly, so a different set of algorithms is at work to give you the best size recommendations. Our demand forecasts will ensure that everything is in stock, even when you decide to make a purchase in the middle of a Black Friday shopping spree.</p>
<p>As we grow our business, we look for innovative ideas to improve user experience, become more sustainable, and optimize existing processes. What does it take to develop such an idea into a mature piece of software operating at Zalando's scale? Let's look at it from the point of view of a machine learning practitioner, such as an applied scientist or a software engineer.</p>
<h2>Experimenting with Ideas</h2>
<p>Jupyter notebooks are a frequently used tool for creative exploration of data. Zalando provides its ML practitioners with access to a hosted version of JupyterHub, an experimentation platform where they can use Jupyter notebooks, R Studio, and other tools they may need to query available data, visualize results, and validate hypotheses. Internally we call this environment Datalab. It is available via a web browser, comes with web-based shell access and common data science libraries.</p>
<p>Because Datalab provides pre-configured access to various data sources within Zalando, such as S3, BigQuery, MicroStrategy, and others, its users don't have to worry about setting up the necessary tools and clients on their own laptops. Instead, they're ready to start experimenting in less than a minute.</p>
<p>While Datalab is well suited for prototyping and getting quick feedback, it's not always enough, especially when big data is involved. Apache Spark is much better suited for that purpose, and Zalando users can access it via Databricks. It's a well-known tool within the data science community, suitable for both experimentation via notebooks and for running large-scale data processing jobs in Spark clusters.</p>
<p>Some experiments require extra processing power, e.g. when they involve computer vision or training of large models. For these purposes, our applied scientists have access to a high-performance computing cluster (HPC) equipped with powerful GPU nodes. Using the HPC is as easy as connecting to it via SSH.</p>
<h2>ML Pipelines in Production</h2>
<p>One of the most frequently discussed problems in machine learning is crossing the gap between experimentation and production, or in more crude terms: between a notebook and a machine learning pipeline.</p>
<p>Jupyter notebooks don't scale well to requirements typical for running ML in a large-scale production environment. These requirements include secure and privacy-respecting access to large datasets, reproducibility, high performance, scalability, documentation, and observability (logging, monitoring, debugging). A machine learning pipeline is a sequence of steps that can meet these additional requirements, and describes how the data will be extracted and processed, what is the required hardware infrastructure, and how to train and deploy the model. Additionally, ML pipelines at Zalando should follow best practices of software engineering: the code needs be stored in git, clean, readable, and reviewed by at least two people.
An ML pipeline can be visualized as a graph, like the one shown below.</p>
<p><img alt="Example ML pipeline" src="https://engineering.zalando.com/posts/2022/04/images/pipeline.png#center"></p>
<p>But how does one implement such a pipeline? In early 2019 we at Zalando decided to use AWS Step Functions for orchestrating machine learning pipelines. Step Functions is a platform for building and executing workflows consisting of multiple steps that may call various other services, such as AWS Lambda, S3 and Amazon SageMaker. These calls can be used to perform all steps comprising an ML pipeline, from data processing (e.g. by invoking Databricks API), to running training and batch processing jobs in Amazon SageMaker and creating SageMaker endpoints for real-time inference. The fact that Zalando already used AWS as its main cloud provider, and the flexibility provided by integrations with other services made Step Functions a good fit for our machine learning needs.</p>
<p>A Step Functions workflow is a state machine that can either be created visually using an editor provided by AWS or deployed as a JSON or YAML file known as a CloudFormation (CF) template. CloudFormation is another AWS service that implements the concept of infrastructure as code, and allows developers to specify needed AWS resources in a text file. We can thus use a CF template to describe Lambda functions and security policies used by the Step Functions workflow that is our ML pipeline. After the template is deployed to AWS, CloudFormation will create all resources listed in the file.</p>
<p>CloudFormation templates are highly expressive and allow developers to describe even minute details. Unfortunately, CF files can become verbose and are tedious to edit manually. We addressed this problem by creating zflow, a Python tool for building machine learning pipelines. Since its creation, zflow has been used to create hundreds of pipelines at Zalando.</p>
<p>A pipeline in a zflow script is a Python object with a series of stages attached to it. zflow provides a number of custom functions for configuring ML tasks, for example training, batch transform, and hyperparameter tuning. It also offers flow control so stages can be run conditionally or in parallel. Together these functions form a Domain Specific Language (DSL) for describing pipelines in a concise and readable form. Because zflow code is annotated with type hints, users can spot mistakes early on, and the available warnings go beyond simple syntax checks available for JSON and YAML templates.</p>
<p>The code listing below demonstrates an example zflow pipeline, with some configuration options omitted for brevity. It shows how three stages are created and added to a pipeline in the desired order. The pipeline is then added to a stack (a group of CloudFormation resources). The last line specifies where the resulting template should be saved.</p>
<div class="highlight"><pre><span></span><code><span class="n">data_processing</span> <span class="o">=</span> <span class="n">databricks_job</span><span class="p">(</span><span class="s2">"data_processing_job"</span><span class="p">)</span>
<span class="n">training</span> <span class="o">=</span> <span class="n">training_job</span><span class="p">(</span><span class="s2">"training_job"</span><span class="p">)</span>
<span class="n">batch_inference</span> <span class="o">=</span> <span class="n">batch_transform_job</span><span class="p">(</span><span class="s2">"batch_transform_job"</span><span class="p">)</span>
<span class="n">pipeline</span> <span class="o">=</span> <span class="n">PipelineBuilder</span><span class="p">(</span><span class="s2">"example-pipeline"</span><span class="p">)</span>
<span class="n">pipeline</span> \
<span class="o">.</span><span class="n">add_stage</span><span class="p">(</span><span class="n">data_processing</span><span class="p">)</span> \
<span class="o">.</span><span class="n">add_stage</span><span class="p">(</span><span class="n">training</span><span class="p">)</span> \
<span class="o">.</span><span class="n">add_stage</span><span class="p">(</span><span class="n">batch_inference</span><span class="p">)</span>
<span class="n">stack</span> <span class="o">=</span> <span class="n">StackBuilder</span><span class="p">(</span><span class="s2">"example-stack"</span><span class="p">)</span>
<span class="n">stack</span><span class="o">.</span><span class="n">add_pipeline</span><span class="p">(</span><span class="n">pipeline</span><span class="p">)</span>
<span class="n">stack</span><span class="o">.</span><span class="n">generate</span><span class="p">(</span><span class="n">output_location</span><span class="o">=</span><span class="s2">"zflow_pipeline.yaml"</span><span class="p">)</span>
</code></pre></div>
<p>When a pipeline script is executed, zflow uses AWS CDK to generate a CloudFormation template file. The file contains all the information needed to create the necessary AWS resources. All that is needed now is to commit and push the generated template to the git repository and let Zalando Continuous Delivery Platform (CDP) deploy it to AWS. When that is done, our pipeline will appear in the AWS Console as a Step Functions state machine. It can then be executed, either via scheduler (like in our example), manually in the Console, or programatically via an API call.</p>
<p>With zflow, a pipeline can be coded in a concise way, tested, then versioned in a git repository, deployed, run, and scaled as needed. To ensure that it works as expected, we can track its executions using a custom web interface. Pipeline tracking is a part of the internal Zalando developer portal running on top of <a href="https://backstage.io/">Backstage</a>, an open-source platform for building such portals. Here a screenshot of a series of pipeline executions in the ML portaI.</p>
<p><img alt="ML portal in Backstage" src="https://engineering.zalando.com/posts/2022/04/images/backstage.jpg#center"></p>
<p>This ML web interface provides a detailed, real-time view of pipeline execution. Pipeline authors can monitor how metrics evolve across multiple runs of training pipelines and can view these changes on a graph. They can also view model cards for models created by the pipelines. These are just a few features of the ML portal, and the tool is actively developed to improve the process of experimenting with notebooks and deploying the pipelines in production.</p>
<p>The detailed journey of a pipeline is shown in the diagram below.</p>
<p><img alt="Lifecycle of an ML pipeline at Zalando" src="https://engineering.zalando.com/posts/2022/04/images/zflow-diagram.png#center"></p>
<p>Admittedly, that's a lot to take in! Let's summarize the steps and tools we discussed so far:</p>
<ol>
<li>We use JupyterHub, Databricks, and a high-performance computing cluster for ML experimentation.</li>
<li>We describe our ML pipelines in Python scripts with zflow DSL. Pipelines can use various resources, such as Databricks jobs for big data processing and Amazon SageMaker endpoints for real-time inference.</li>
<li>When we run the pipeline script, zflow will internally call AWS CDK to generate a CloudFormation template.</li>
<li>We commit and push the template to a git repository, and Zalando Continuous Delivery Platform will then upload it to AWS CloudFormation.</li>
<li>CloudFormation will create all the resources specified in the template, most notably: a Step Functions workflow. Our pipeline is now ready to run.</li>
<li>A web portal built with Backstage provides a visual overview of running pipelines, together with additional information relevant to ML practitioners.</li>
</ol>
<p>zflow and the dedicated web UI abstract away most of the complexity of building production pipelines with AWS tooling, such as CDK and CloudFormation, so ML practitioners can focus on their domain rather than the infrastructure. While zflow takes full advantage of AWS, it also allows us to integrate other tools used within the company and to quickly respond to our specific needs.</p>
<h2>The Organization</h2>
<p>Tooling is just one side of using any technology. Another aspect is the organizational structure that allows experts to work and collaborate effectively. While applying ML within the company, Zalando uses a distributed setup with additional resources in place to support reusing tools and practices across the organization. Most expertise is spread across over a hundred product teams working in their specific business domains. These teams have dedicated software engineers and applied scientists who in their daily work use both 3rd party products (e.g. AWS, Databricks) and internal tools (zflow, ML web portal).</p>
<p>Our experts are assisted by a few central teams which operate and develop some of the aforementioned tools. For example, a dedicated team provides support and improvements to our JupyterHub installation and the HPC cluster. Two teams actively develop zflow and monitoring tools for pipelines. Another group consisting of ML consultants works closely with product teams, offering trainings, architectural advice, and pair programming. A separate research team actively explores and disseminates the state-of-the-art in algorithmics, deep learning, and other branches of AI.</p>
<p>On top of that, our data science community provides platforms to exchange best practices from internal teams, academia, and the rest of the industry through expert talks, workshops, reading groups, and an annual internal conference.</p>
<h2>Exciting Times</h2>
<p>Teams at Zalando tackle many of the difficult problems in the space of <a href="https://engineering.zalando.com/tags/machine-learning.html">machine learning and MLOps</a>, such as reducing the time needed to validate and implement new ideas at scale and improving model observability. We constantly look for new ways to use technology to be faster, more efficient, and innovative in meeting all fashion-related needs of our customers. Best news: we would like to work with you on these exciting ML challenges! Simply search for “machine learning” on <a href="https://jobs.zalando.com/en/jobs/?gh_src=gk03hq&filters%5Boffices%5D%5B0%5D=Berlin&search=%22machine%20learning%22">Zalando job board</a> and reach out to us!</p>Functional tests with Testcontainers2022-04-12T00:00:00+02:002022-04-12T00:00:00+02:00Marek Hudymatag:engineering.zalando.com,2022-04-12:/posts/2022/04/functional-tests-with-testcontainers.html<p>We explore how to write functional tests using Testcontainers.org library in Java-based backend applications.</p><p>In this article, I will show how teams at <a href="https://zms.zalando.com/">Zalando Marketing Services</a> are using functional tests. We will follow the idea of functional tests: the main concept and the attributes of a good functional test. Then, we will discuss an example based on the TestContainers library used in the Spring environment.</p>
<p>You can find an introduction to the <a href="https://www.testcontainers.org/">TestContainers library</a> in my previous article <a href="https://engineering.zalando.com/posts/2021/02/integration-tests-with-testcontainers.html">Integration tests with Testcontainers</a>, because that is out of the scope of this one.</p>
<h2>Definition of functional test</h2>
<p>There are many definitions of functional testing. For example, the definition found on <a href="https://en.wikipedia.org/wiki/Functional_testing">Wikipedia</a> is:</p>
<blockquote>
<p>Functional testing is a quality assurance (QA) process and a type of black-box testing that bases its test cases on the specifications of the software component under test. Functions are tested by feeding them input and examining the output, and internal program structure is rarely considered (unlike white-box testing). Functional testing is conducted to evaluate the compliance of a system or component with specified functional requirements. Functional testing usually describes what the system does.”</p>
</blockquote>
<p>Functional tests answer the fundamental question: <code>Do the features work as intended</code>?
Functional tests are not answering the question of HOW it works internally, but rather <code>WHAT</code> the result should be.</p>
<h2>Non-functional vs. functional testing</h2>
<p>What is the key difference between <strong>non-functional software testing</strong> and <strong>functional testing</strong>?</p>
<p>The answer is relatively simple: non-functional testing is concerned with <strong>how</strong>, and functional testing is concerned with <strong>what</strong>.
Functional testing verifies what the system should do, and non-functional testing tests how well the system works. The intention of functional testing is to verify software actions, and non-functional testing validates the behavior of the application.</p>
<p>Another comparison you might see when discussing this is black-box testing vs white-box testing. Black-box testing looks at the functionality of the software <strong>without</strong> looking at the <strong>internal structures</strong>. White-box testing is aware of the internal structures.</p>
<h2>Concept</h2>
<p><a href="https://www.testcontainers.org/">Testcontainers.org</a> is a JVM library that allows users to run and manage docker images and control them from Java code.
Zalando uses it mainly for integration and functional tests.</p>
<p>The main purpose of functional tests with the Testcontainers library is to set up a black-box test, by using an environment closest to the production one.
To achieve this:</p>
<ul>
<li><strong>package and run your service in a docker container</strong>;</li>
<li><strong>run all its dependencies</strong>, like: database, queues, streams, <strong>as separate docker containers</strong>;</li>
<li><strong>make your service connect to locally run dependencies</strong>;</li>
<li><strong>make your testing code independent of implementation</strong>;</li>
</ul>
<p>The structure of invocation can look like below.</p>
<p><img alt="Functional tests communicates with your service run as Docker images." src="https://engineering.zalando.com/posts/2022/04/images/concept.png"></p>
<p>Your entire production code needs to be packaged and run as a docker image.
If your service needs to communicate to the database, you need to run the database as a docker image as well.
Your functional tests will test your code ran as a docker image, so your testing code does not have any connection to production code.</p>
<p>You also need to remember that a proper pyramid of tests is (when sorted from the highest to the lowest amount of tests):</p>
<ul>
<li>unit tests</li>
<li>component tests</li>
<li>integration tests</li>
<li>functional tests</li>
<li>system tests</li>
</ul>
<p>It is very nice to have functional tests, but it cannot dominate your testing structure.</p>
<h2>Packaging your application into a docker container</h2>
<p>Packaging your application into a docker image is pretty simple. In the root of your repository, just define Dockerfile like:</p>
<div class="highlight"><pre><span></span><code><span class="k">FROM</span><span class="w"> </span><span class="s">openjdk:17-alpine</span>
<span class="k">COPY</span><span class="w"> </span>service/target/application-exec.jar<span class="w"> </span>application.jar
<span class="k">EXPOSE</span><span class="w"> </span><span class="s">8080</span>
<span class="k">ENTRYPOINT</span><span class="w"> </span>java<span class="w"> </span><span class="si">${</span><span class="nv">ADDITIONAL_JAVA_OPTIONS</span><span class="si">}</span><span class="w"> </span>-jar<span class="w"> </span>application.jar
</code></pre></div>
<p>As an alternative solution, I would suggest using <a href="https://github.com/GoogleContainerTools/jib">Jib</a></p>
<h2>Code separation</h2>
<p>I recommend organizing code into a multi-module maven project with two modules: service and functional-tests.
The functional-tests module cannot have any dependency on the service module.</p>
<div class="highlight"><pre><span></span><code><span class="n">.</span>
<span class="n">├── service</span>
<span class="n">│ └── pom.xml</span>
<span class="n">├── functional-tests</span>
<span class="n">│ └── pom.xml</span>
<span class="n">├── Dockerfile</span>
<span class="n">└── pom.xml</span>
</code></pre></div>
<p>Because we don’t have access to the service code, we cannot use any DTO objects, database repositories, etc.</p>
<ul>
<li>We should operate on the simplest possible interfaces. For example, if we call a REST endpoint, send plain JSON and read JSON. Don’t create any internal DTOs. It would place you in the position of a real client of your service.</li>
<li>I recommend using only official interfaces to create resources, e.g. create entities via the REST interface. We could create the entity directly inside the database and inside the test to just retrieve it, but it would not be a black-box test then. If there are changes to the storage of the service in the future, we would need to change our tests.</li>
</ul>
<h2>AbstractFunctionalTests</h2>
<p>All functional tests extend the <code>AbstractFunctionalTest</code> class where all needed docker images are run.
In our example, I will run my microservice which is connected to the database.</p>
<div class="highlight"><pre><span></span><code><span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">AbstractFunctionalTest</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">HTTP_PORT</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">8080</span><span class="p">;</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">DEBUG_PORT</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">5005</span><span class="p">;</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">Logger</span><span class="w"> </span><span class="n">LOGGER</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="n">LoggerFactory</span><span class="p">.</span><span class="na">getLogger</span><span class="p">(</span><span class="s">"Docker-Container"</span><span class="p">);</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">Network</span><span class="w"> </span><span class="n">network</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Network</span><span class="p">.</span><span class="na">newNetwork</span><span class="p">();</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">PostgreSQLContainer</span><span class="w"> </span><span class="n">postgreSQLContainer</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="p">(</span><span class="n">PostgreSQLContainer</span><span class="p">)</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">PostgreSQLContainer</span><span class="p">(</span><span class="s">"postgres:14.2"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withUsername</span><span class="p">(</span><span class="s">"username"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withPassword</span><span class="p">(</span><span class="s">"password"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withDatabaseName</span><span class="p">(</span><span class="s">"databaseName"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withNetwork</span><span class="p">(</span><span class="n">network</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withNetworkAliases</span><span class="p">(</span><span class="s">"postgres"</span><span class="p">);</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">GenericContainer</span><span class="o"><?></span><span class="w"> </span><span class="n">backendContainer</span><span class="p">;</span>
<span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">postgreSQLContainer</span><span class="p">.</span><span class="na">start</span><span class="p">();</span>
<span class="w"> </span><span class="n">backendContainer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ofNullable</span><span class="p">(</span><span class="n">System</span><span class="p">.</span><span class="na">getenv</span><span class="p">(</span><span class="s">"CONTAINER_VERSION"</span><span class="p">))</span>
<span class="w"> </span><span class="p">.</span><span class="na">map</span><span class="p">(</span><span class="n">version</span><span class="w"> </span><span class="o">-></span>
<span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ServiceContainer</span><span class="p">(</span><span class="s">"docker-repository/application"</span><span class="p">,</span><span class="w"> </span><span class="n">version</span><span class="p">))</span>
<span class="w"> </span><span class="p">.</span><span class="na">orElseGet</span><span class="p">(()</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ServiceContainer</span><span class="p">(</span><span class="s">"."</span><span class="p">,</span><span class="w"> </span><span class="n">Paths</span><span class="p">.</span><span class="na">get</span><span class="p">(</span><span class="s">"../"</span><span class="p">)))</span>
<span class="w"> </span><span class="p">.</span><span class="na">withExposedPorts</span><span class="p">(</span><span class="n">HTTP_PORT</span><span class="p">,</span><span class="w"> </span><span class="n">DEBUG_PORT</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withFixedExposedPort</span><span class="p">(</span><span class="n">DEBUG_PORT</span><span class="p">,</span><span class="w"> </span><span class="n">DEBUG_PORT</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withEnv</span><span class="p">(</span><span class="s">"SPRING_PROFILES_ACTIVE"</span><span class="p">,</span><span class="w"> </span><span class="s">"functional"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withEnv</span><span class="p">(</span><span class="s">"ADDITIONAL_JAVA_OPTIONS"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"-agentlib:jdwp=transport=dt_socket,"</span>
<span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">"server=y,suspend=n,address=0.0.0.0:"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">DEBUG_PORT</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withNetwork</span><span class="p">(</span><span class="n">network</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withCreateContainerCmdModifier</span><span class="p">(</span><span class="n">cmd</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">cmd</span><span class="p">.</span><span class="na">withName</span><span class="p">(</span><span class="s">"application"</span><span class="p">))</span>
<span class="w"> </span><span class="p">.</span><span class="na">withLogConsumer</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">Slf4jLogConsumer</span><span class="p">(</span><span class="n">LOGGER</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withPrefix</span><span class="p">(</span><span class="s">"Service"</span><span class="p">))</span>
<span class="w"> </span><span class="p">.</span><span class="na">waitingFor</span><span class="p">(</span><span class="n">Wait</span><span class="p">.</span><span class="na">forHttp</span><span class="p">(</span><span class="s">"/actuator/health"</span><span class="p">).</span><span class="na">forPort</span><span class="p">(</span><span class="n">HTTP_PORT</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withStartupTimeout</span><span class="p">(</span><span class="n">Duration</span><span class="p">.</span><span class="na">ofMinutes</span><span class="p">(</span><span class="mi">2</span><span class="p">)));</span>
<span class="w"> </span><span class="n">backendContainer</span><span class="p">.</span><span class="na">start</span><span class="p">();</span>
<span class="w"> </span><span class="n">Runtime</span><span class="p">.</span><span class="na">getRuntime</span><span class="p">().</span><span class="na">addShutdownHook</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">Thread</span><span class="p">(()</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">backendContainer</span><span class="p">.</span><span class="na">stop</span><span class="p">();</span>
<span class="w"> </span><span class="n">postgreSQLContainer</span><span class="p">.</span><span class="na">stop</span><span class="p">();</span>
<span class="w"> </span><span class="p">}));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</code></pre></div>
<p>As an alternative solution, I would suggest the creation of a <a href="https://junit.org/junit5/docs/current/user-guide/#extensions">Junit5 extension</a>.
In this case, we would use an annotation instead inheritance, with the same logic.</p>
<h2>Logging</h2>
<p>When running the docker image with our service, it is critical to add logging. Without it, you are loosing visibility on errors. Don't forget adding a logger to the container code:</p>
<div class="highlight"><pre><span></span><code><span class="p">.</span><span class="na">withLogConsumer</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">Slf4jLogConsumer</span><span class="p">(</span><span class="n">LOGGER</span><span class="p">).</span><span class="na">withPrefix</span><span class="p">(</span><span class="s">"Service"</span><span class="p">))</span>
</code></pre></div>
<h2>Stopping images</h2>
<p>One of the biggest advantages of the TestContainers library is the fact that there is a <strong>Ryuk</strong> container that stops all other containers when an initial JVM process is terminated.
It protects us from unwanted zombie containers (and networks, volumes) in the system. But if you run docker images from multiple maven modules, the Ryuk image can be too slow and the build can crash. That’s why I additionally specify <code>shutdownHook</code>, which stops all docker images when test execution finishes.</p>
<div class="highlight"><pre><span></span><code><span class="n">Runtime</span><span class="p">.</span><span class="na">getRuntime</span><span class="p">().</span><span class="na">addShutdownHook</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">Thread</span><span class="p">(()</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">backendContainer</span><span class="p">.</span><span class="na">stop</span><span class="p">();</span>
<span class="w"> </span><span class="n">postgreSQLContainer</span><span class="p">.</span><span class="na">stop</span><span class="p">();</span>
<span class="p">}));</span>
</code></pre></div>
<h2>Example of a functional test</h2>
<p>An example functional test can look like below. The testing method uses many helper methods to simplify the test.
Helper methods are key to make the code readable.</p>
<div class="highlight"><pre><span></span><code><span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">AccountFunctionalTest</span><span class="w"> </span><span class="kd">extends</span><span class="w"> </span><span class="n">AbstractFunctionalTest</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Test</span>
<span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">shouldUpdateAccount</span><span class="p">()</span><span class="w"> </span><span class="kd">throws</span><span class="w"> </span><span class="n">JSONException</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// given</span>
<span class="w"> </span><span class="n">createAccount</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// when</span>
<span class="w"> </span><span class="n">ResponseEntity</span><span class="o"><</span><span class="n">String</span><span class="o">></span><span class="w"> </span><span class="n">response</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">updateAccount</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// then</span>
<span class="w"> </span><span class="n">assertThat</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="na">getStatusCodeValue</span><span class="p">())</span>
<span class="w"> </span><span class="p">.</span><span class="na">isEqualTo</span><span class="p">(</span><span class="n">HttpStatus</span><span class="p">.</span><span class="na">NO_CONTENT</span><span class="p">.</span><span class="na">value</span><span class="p">());</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">actual</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getAccount</span><span class="p">(</span><span class="s">"00000000-0000-0000-0000-000000000001"</span><span class="p">);</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">expected</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">readFromResources</span><span class="p">(</span><span class="s">"get_account_dto.json"</span><span class="p">);</span>
<span class="w"> </span><span class="n">JSONAssert</span><span class="p">.</span><span class="na">assertEquals</span><span class="p">(</span><span class="n">expected</span><span class="p">,</span><span class="w"> </span><span class="n">actual</span><span class="p">,</span><span class="w"> </span><span class="n">JSONCompareMode</span><span class="p">.</span><span class="na">LENIENT</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">createAccount</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">json</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">readFromResources</span><span class="p">(</span><span class="s">"create_account_dto.json"</span><span class="p">);</span>
<span class="w"> </span><span class="n">ResponseEntity</span><span class="o"><</span><span class="n">String</span><span class="o">></span><span class="w"> </span><span class="n">response</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getTestRestTemplate</span><span class="p">()</span>
<span class="w"> </span><span class="p">.</span><span class="na">exchange</span><span class="p">(</span><span class="s">"/accounts"</span><span class="p">,</span>
<span class="w"> </span><span class="n">HttpMethod</span><span class="p">.</span><span class="na">POST</span><span class="p">,</span>
<span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">HttpEntity</span><span class="o"><></span><span class="p">(</span><span class="n">json</span><span class="p">,</span><span class="w"> </span><span class="n">getPostHeaders</span><span class="p">()),</span>
<span class="w"> </span><span class="n">String</span><span class="p">.</span><span class="na">class</span><span class="p">);</span>
<span class="w"> </span><span class="n">assertThat</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="na">getStatusCodeValue</span><span class="p">())</span>
<span class="w"> </span><span class="p">.</span><span class="na">isEqualTo</span><span class="p">(</span><span class="n">HttpStatus</span><span class="p">.</span><span class="na">CREATED</span><span class="p">.</span><span class="na">value</span><span class="p">());</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">ResponseEntity</span><span class="o"><</span><span class="n">String</span><span class="o">></span><span class="w"> </span><span class="nf">updateAccount</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">getTestRestTemplate</span><span class="p">()</span>
<span class="w"> </span><span class="p">.</span><span class="na">exchange</span><span class="p">(</span><span class="s">"/accounts/00000000-0000-0000-0000-000000000001"</span><span class="p">,</span>
<span class="w"> </span><span class="n">HttpMethod</span><span class="p">.</span><span class="na">PATCH</span><span class="p">,</span>
<span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">HttpEntity</span><span class="o"><></span><span class="p">(</span><span class="n">readFromResources</span><span class="p">(</span><span class="s">"patch_account_dto.json"</span><span class="p">),</span>
<span class="w"> </span><span class="n">getPatchHeaders</span><span class="p">(</span><span class="n">etag</span><span class="p">)),</span>
<span class="w"> </span><span class="n">String</span><span class="p">.</span><span class="na">class</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="nf">getEtag</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ResponseEntity</span><span class="o"><</span><span class="n">String</span><span class="o">></span><span class="w"> </span><span class="n">response</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getTestRestTemplate</span><span class="p">()</span>
<span class="w"> </span><span class="p">.</span><span class="na">getForEntity</span><span class="p">(</span><span class="s">"/accounts/{id}"</span><span class="p">,</span><span class="w"> </span><span class="n">String</span><span class="p">.</span><span class="na">class</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">response</span><span class="p">.</span><span class="na">getHeaders</span><span class="p">().</span><span class="na">getETag</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="nf">getAccount</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ResponseEntity</span><span class="o"><</span><span class="n">String</span><span class="o">></span><span class="w"> </span><span class="n">response</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getTestRestTemplate</span><span class="p">()</span>
<span class="w"> </span><span class="p">.</span><span class="na">getForEntity</span><span class="p">(</span><span class="s">"/accounts/{id}"</span><span class="p">,</span><span class="w"> </span><span class="n">String</span><span class="p">.</span><span class="na">class</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">response</span><span class="p">.</span><span class="na">getBody</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">HttpHeaders</span><span class="w"> </span><span class="nf">getPostHeaders</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">HttpHeaders</span><span class="w"> </span><span class="n">headers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">HttpHeaders</span><span class="p">();</span>
<span class="w"> </span><span class="n">headers</span><span class="p">.</span><span class="na">setContentType</span><span class="p">(</span><span class="n">MediaType</span><span class="p">.</span><span class="na">APPLICATION_JSON</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">headers</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">HttpHeaders</span><span class="w"> </span><span class="nf">getPatchHeaders</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">etag</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">HttpHeaders</span><span class="w"> </span><span class="n">headers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">HttpHeaders</span><span class="p">();</span>
<span class="w"> </span><span class="n">headers</span><span class="p">.</span><span class="na">setContentType</span><span class="p">(</span>
<span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">MediaType</span><span class="p">(</span><span class="s">"application"</span><span class="p">,</span><span class="w"> </span><span class="s">"merge-patch+json"</span><span class="p">));</span>
<span class="w"> </span><span class="n">headers</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="n">HttpHeaders</span><span class="p">.</span><span class="na">ETAG</span><span class="p">,</span><span class="w"> </span><span class="n">etag</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">headers</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h2>Advantages of functional tests</h2>
<p>The biggest advantages of functional tests are:</p>
<ul>
<li>We force engineers to think about the <a href="https://opensource.zalando.com/restful-api-guidelines/#api-first">API first principle</a>.</li>
<li>We are able to test the service as black-box, meaning that when you have a good functional tests coverage, you are able to make a deep refactoring without changing functional tests.</li>
<li>It gives developers a lot of confidence that the code does what it should do.</li>
<li>You are sure that your application is correctly packed as a docker image, so another layer of application is tested.</li>
<li>Functional tests give you a lot of confidence that the application works as expected. I find it very useful during code refactoring.</li>
</ul>
<h2>Disadvantages of functional tests</h2>
<ul>
<li>Writing functional tests can be time-consuming. Especially when something doesn’t work as expected, debugging becomes much harder. From a different point of view, if you have well-written helper classes you can speed up this process.</li>
<li>Because functional tests are running services and dependencies (like database, queues) as docker images, we need to run it at least once. Usually, it is slow. For example: PostgreSQL as a docker image needs around 4 seconds to start on my machine, Localstack which emulates AWS components, can take much longer to start, even 20 seconds.</li>
<li>In an ideal world, we should run new containers for each test, but it would be way too slow. So, we need to run it once for all tests. If functional tests are written in a bad way, they can make tests interfere with each other. It is critical that tests use different object identifiers and that there is a clean state after the test.</li>
</ul>
<h2>Summary</h2>
<p>Unit tests force developers to think about methods. Functional tests do the same for applications/components.</p>
<p>I find functional tests to be an interesting concept. The TestContainers library makes it possible to use this concept inside the Java world.
It can be pretty expensive to implement it, but it also gives you big confidence that a system still works during deep refactoring.</p>
<p>Functional tests implemented in this way are not for everybody. I would suggest having it in the systems where microservice contracts are not changing very fast.
Besides of high cost of development, it gives us a very high confidence level that the delivered applications are working as intended.</p>
<h2>Code example</h2>
<p>You can find examples of usages in my <a href="https://gitlab.com/marek_hudyma/application-style">GitLab project</a>.</p>
<hr>
<p><em>If you would like to help us improve our tests and thus help shipping high-quality features for our customers, please consider joining our <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Product%20Design%20%26%20User%20Research&filters%5Bcategories%5D%5B1%5D=Applied%20Science&filters%5Bcategories%5D%5B2%5D=Software%20Engineering&filters%5Bcategories%5D%5B3%5D=Product%20Management%20%28Technology%29&filters%5Bentities%5D%5B0%5D=zms">Engineering Teams</a></em> at Zalando Marketing Services (ZMS).</p>GraphQL persisted queries and Schema stability2022-02-17T00:00:00+01:002022-02-17T00:00:00+01:00Boopathi Rajaa Nedunchezhiyantag:engineering.zalando.com,2022-02-17:/posts/2022/02/graphql-persisted-queries-and-schema-stability.html<p>Learn how Zalando uses persisted queries, and how we define and think about different levels of stability of our GraphQL schema.</p><h2>Persisted Queries</h2>
<p>Persisted Queries in GraphQL are like stored procedures in Databases. To know about the Apollo's way of automated persisted queries, please follow their documentation <a href="https://www.apollographql.com/docs/apollo-server/performance/apq/">here</a>. In Zalando, we took a different approach - <strong>to disable GraphQL in production</strong>. It might sound counterintuitive at first - we have a GraphQL service, but we disable GraphQL in production - why?</p>
<p>Let us go over how the system works and explain the reasons for how it helps us maintain a stable schema.</p>
<h3>Part 1: Build time persistence</h3>
<p>At development time for the web and apps, the developers enjoy the power of GraphQL - the automatic code and type generation, combining multiple parts of the application to send queries and aggregation of those queries to perform one optimized batched request, etc.</p>
<p>When the code in the UI layers (web and app) is actually merged to the main deployment branch, at the build time, there is one extra step - persist the queries to the GraphQL service. The GraphQL service generates an ID for a particular query (ID is just the hash of the normalized query in terms of formatting and operation selection), and returns it back to the UI layers to bundle with the actual built files.</p>
<p>When the actual query is used in production, the GraphQL service does not allow GraphQL queries, but rather only allows the query IDs that are persisted. So, instead of the request looking like this:</p>
<div class="highlight"><pre><span></span><code><span class="err">POST /graphql</span>
<span class="err">{</span>
<span class="err"> "query": "query productCard($id: ID!) { product(id: $id) { name } }",</span>
<span class="err"> "variables": {</span>
<span class="err"> "id": "12345"</span>
<span class="err"> }</span>
<span class="err">}</span>
</code></pre></div>
<p>it would look like this - with <code>id</code> instead of <code>query</code>:</p>
<div class="highlight"><pre><span></span><code><span class="err">POST /graphql</span>
<span class="err">{</span>
<span class="err"> "id": "a1b2c3",</span>
<span class="err"> "variables": {</span>
<span class="err"> "id": "12345"</span>
<span class="err"> }</span>
<span class="err">}</span>
</code></pre></div>
<h3>Part 2: Inspecting the persisted queries database</h3>
<p>Now that we have a database of queries, we can perform certain inspections on these persisted queries. Because we do not allow non-persisted queries in production, we know at any time what parts of the schema are used in production and what are not used in production.</p>
<p>We leverage these persisted queries for better monitoring and alerting for each individual query separately. We are also able to tell if certain fields can have a breaking change because the field is no longer used or never used in production.</p>
<h2>Schema Stability</h2>
<p>As mentioned previously, our GraphQL schema covers wide variety of use-cases and different parts of the schema can have different levels of stability as new product features get added in.</p>
<p>All API's dream is to have a non-breaking model that evolves well. In most cases, it becomes impossible to design everything up front so well in a changing product landscape. In other aspects, the amount of time we spend meditating about certain models to get the best design possible may not warrant the actual time available to completely implement it end-to-end.</p>
<p>The schema is a collaboration of the UI engineers and the GraphQL server maintainers. It should be possible for the UI engineers to prototype something fast and break it later. But once the schema is merged to the main deployment branch, the GraphQL server maintainers do not wish to have breaking changes. How do we solve this conflict in a neat way?</p>
<p>Let's use branch deployments to satisfy this constraint, so the main branch stays clean. Though it looks simple and easy enough to understand, the mixing of branches across various projects soon becomes a nightmare in reality. At Zalando, we have microservices and the <a href="https://engineering.zalando.com/posts/2021/03/how-we-use-graphql-at-europes-largest-fashion-e-commerce-company.html">GraphQL layer is an aggregator</a> from multiple other services. So, maintaining multiple feature branches across 3-5 projects for 1 or 2 product features isn't going to help any developer or team move smoothly. The complexity increases non-linearly as we mix different features that must work together.</p>
<h3>Draft status</h3>
<p>In the previous section, we learned about the power of persisted queries controlled by the GraphQL layer - we exactly know what part of the schema is used in production. So, our solution to schema stability starts by leveraging how we handle persisted queries - by marking certain parts of the schema as <strong>not ready for production</strong>, and preventing them to get into the persisted queries database.</p>
<p>For this we use <a href="https://graphql.org/learn/queries/#directives">GraphQL directives</a>:</p>
<div class="highlight"><pre><span></span><code><span class="k">directive</span><span class="w"> </span><span class="err">@draft</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">FIELD_DEFINITION</span>
</code></pre></div>
<p>The above directive will help annotate certain fields in the schema as draft. And during the persistence time, we validate if the query contains a field which is marked as such and disallow persisting it.</p>
<div class="highlight"><pre><span></span><code><span class="k">export</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">draftRule</span><span class="p">(</span><span class="nx">context</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Field</span><span class="p">(</span><span class="nx">node</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">parentType</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">getParentType</span><span class="p">();</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">field</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parentType</span><span class="p">.</span><span class="nx">getFields</span><span class="p">()[</span><span class="nx">node</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">];</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">isDraft</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">field</span><span class="p">.</span><span class="nx">astNode</span><span class="p">.</span><span class="nx">directives</span><span class="p">.</span><span class="nx">some</span><span class="p">(</span>
<span class="w"> </span><span class="p">(</span><span class="nx">directive</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">directive</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s2">"draft"</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">isDraft</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">reportError</span><span class="p">(</span><span class="ow">new</span><span class="w"> </span><span class="nx">GraphQLError</span><span class="p">(</span><span class="sb">`Cannot persist draft field`</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</code></pre></div>
<p>This is an example implementation of the rule which you can pass to the <a href="https://graphql.org/learn/validation/">GraphQL validation</a>. The usage in the schema would look like:</p>
<div class="highlight"><pre><span></span><code><span class="k">type</span><span class="w"> </span><span class="err">Product</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">fancyNewField</span><span class="p">:</span><span class="w"> </span><span class="n">FancyNewType</span><span class="w"> </span><span class="nd">@draft</span>
<span class="err">}</span>
<span class="err">type</span><span class="w"> </span><span class="err">FancyNewType</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="err">testField:</span><span class="w"> </span><span class="err">String</span>
<span class="err">}</span>
</code></pre></div>
<p>In the above definition of a Product, when we add the new field <code>fancyNewField</code>, we begin by adding a draft status. When someone tries to persist it, it would fail.</p>
<p>This brings us new opportunities and guarantees:</p>
<ol>
<li>The field cannot be used in production</li>
<li>We can break it at will, since we allow ONLY persisted queries in production</li>
<li>We can merge it to the main branch (and even deploy it)</li>
</ol>
<p>The draft status and how our persisted queries work improves the work flow. We are able to faster develop multiple features, experiment with it across different codebases, and still have the safety of production usage only after we stabilized (removing draft) the schema by testing it end-to-end.</p>
<h3>Experimenting in Production</h3>
<p>The draft status allows us to deny persisting certain queries which we know are not ready for production usage. When they are ready, we want to carry forward certain experiments to production. But, we can still be unsure about the stability of this schema. This is tricky, but is a valid use-case often. Certain product features go into production as an experiment, and then it may change form or structure by a little.</p>
<p>One obvious option is to remove the draft. But we do not restrict who can persist it. For example, some other parts of the UI may start persisting those experimental fields, and we might not notice it until we inspect the queries. We certainly cannot break the schema once it is in production. So, how do we ensure that this experimental field is used only by the components that are part of the experiment?</p>
<p>Here, we introduce two new directives which act as access control for fields in production. The <code>@component</code> directive, and <code>@allowedFor</code> directive:</p>
<div class="highlight"><pre><span></span><code><span class="k">directive</span><span class="w"> </span><span class="err">@component(name:</span><span class="w"> </span><span class="err">String!)</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">QUERY</span>
<span class="k">directive</span><span class="w"> </span><span class="err">@allowedFor(componentNames:</span><span class="w"> </span><span class="err">[String!]!)</span><span class="w"> </span><span class="err">on</span><span class="w"> </span><span class="k">FIELD_DEFINITION</span>
</code></pre></div>
<p>These two directives complement each other where one is used in the query and the other one is used in the schema (here, on <code>Field</code> definition). We ask the queries authors to tag their queries using a component name, and we match those names in the other directive <code>allowedFor</code> during persist time.</p>
<p><strong>Note</strong>: Instead of component name, you can also use the operation name of the query itself.</p>
<p>For example:</p>
<div class="highlight"><pre><span></span><code><span class="k">type</span><span class="w"> </span><span class="err">Product</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">fancyProp</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="nd">@allowedFor</span><span class="p">(</span><span class="n">componentNames</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s">"web-product-card"</span><span class="p">])</span>
<span class="p">}</span>
</code></pre></div>
<p>and a query product card:</p>
<div class="highlight"><pre><span></span><code><span class="k">query</span><span class="w"> </span><span class="nf">productCard</span><span class="w"> </span><span class="err">@</span><span class="nf">component</span><span class="p">(</span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">"</span><span class="nc">web</span><span class="err">-product-card"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">product</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">fancyProp</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>This would be allowed and any other query which uses the field <code>fancyProp</code> would fail to persist.</p>
<p>The component and allowed-for directives / annotations allow us to take an experimental feature to production by restricting the usage to one component of the UI. This allows us to handle breaking changes more easily as we have a guarantee that only that part of the UI needs to update when we have a minor breaking change.</p>
<h2>Conclusion</h2>
<p>When we first extend the GraphQL schema, we start with the <code>draft</code> annotation. Then we promote new fields to a restricted usage in production using the <code>allowedFor</code> annotation. After we finally have stabilized the schema, we remove all of these annotations and have a non-breaking contract in form of persisted queries.</p>
<p>This is just the starting point of the exploration to saving developer time as well as ensuring stability to the GraphQL schema. It helps us in evolving the schema rather than having to re-model it every single time.</p>
<p>Depending on how you want to evolve the schema, and how you prefer to handle breaking changes, you can use these concepts and save precious time - by thinking about schema evolution in a non-destructive manner.</p>
<hr>
<p><em>If you would like to work on similar challenges, consider <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=95c8de231us&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B1%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B2%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B3%5D=Software%20Engineering%20-%20Frontend&filters%5Bcategories%5D%5B4%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Machine%20Learning&filters%5Bcategories%5D%5B7%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B8%5D=Software%20Engineering%20-%20Principal%20Engineering&filters%5Bcategories%5D%5B9%5D=Applied%20Science%20%26%20Research&filters%5Bcategories%5D%5B10%5D=Product%20Design%20%26%20User%20Experience&filters%5Bcategories%5D%5B11%5D=Product%20Management&search=software%20engineer">joining our engineering teams</a>.</em></p>
<hr>
<h2>Related posts</h2>
<ul>
<li><a href="https://engineering.zalando.com/posts/2023/10/understanding-graphql-directives-practical-use-cases-zalando.html">Understanding GraphQL Directives: Practical Use-Cases at Zalando</a></li>
<li><a href="https://engineering.zalando.com/posts/2021/04/modeling-errors-in-graphql.html">Modeling Errors in GraphQL</a></li>
<li><a href="https://engineering.zalando.com/posts/2021/03/optimize-graphql-server-with-lookaheads.html">Optimize GraphQL Server with Lookaheads</a></li>
</ul>Principal Engineering at Zalando2022-02-10T00:00:00+01:002022-02-10T00:00:00+01:00Bartosz Ocytkotag:engineering.zalando.com,2022-02-10:/posts/2022/02/principal-engineering-at-zalando.html<p>Learn how we leverage Principal Engineers to solve engineering challenges across Zalando.</p><p><img alt="Photo by Ian Schneider on Unsplash" src="https://engineering.zalando.com/posts/2022/02/images/career-path.jpg#previewimage"></p>
<p>In many companies, Senior Engineers who do not pursue Engineering Management, end up in a dead end in terms of their career progression. At Zalando, we have had a career path for individual contributors since 2016. Senior Software Engineers can choose one of the three possible career paths:</p>
<ul>
<li>Engineering Management</li>
<li>Principal Engineering</li>
<li>Technical Program Management</li>
</ul>
<p>In this post, we detail out how we leverage our senior individual contributors (Principal Engineers) throughout the company. In the last two years, we have observed an increased amount of companies emphasizing the value of career development for individual contributors. At this level, the roles are highly varying across companies, hence the importance of exchange about different approaches to structuring this role.</p>
<h2>Principal Engineering</h2>
<p>Beyond the Senior Software Engineer level, Engineers have increasingly varying profiles depending on their career journey and unique expertise. <em>Depth-focused</em> Principal Engineers are experts in their unique field (or more than one) whereas <em>breadth-focused</em> Principal Engineers have an expert view across many domains and aspects of the software development life cycle with an ability to leverage unique expertise of others or when needed dive deep themselves.</p>
<p>Up until 2021, there was no literature we would know about, speaking in detail about individual contributors above the senior level in tech companies. While traditionally Software Companies defined the role of an (Enterprise) Architect, the industry moved away from centralized architecture teams with hands-off individuals, as these were detached from the software development process and the necessary feedback loops to continuously adjust their approaches.
More often than not, delivery teams are empowered with technical decision making and conduct architectural design adhering to guardrails set by the department and the company (in our case, the <a href="https://engineering.zalando.com/tags/tech-radar.html">Tech Radar</a>). Principal Engineers support the team in the architectural design and help to maintain architectural integrity in the scope of the department and beyond.</p>
<p>In March 2021, the book <a href="https://staffeng.com/book">Staff Engineer: Leadership beyond the management track</a> was published and added some common vocabulary about technical leadership and strategies for leading without formal authority. In addition, <a href="https://staffeng.com/guides/staff-archetypes">four archetypes</a> are listed and provide classification for the types of tasks Principal Engineers are most commonly working on. It is important to note that individuals may transition between these archetypes throughout their career depending on their strengths or the organizational needs:</p>
<ul>
<li><strong>Tech Lead:</strong> leads critical technical initiatives across the department and beyond. Partners with more than one team to support teams and individuals with delivery and coaching. Usually, Principal Engineers transitioning from a Senior Engineer role in a single team to a Principal Engineer acting across teams will go through this path. Initially, delivery includes high focus on coding alongside the team for high-impact and critical projects.</li>
<li><strong>Architect:</strong> manages technical direction, quality, and approach within an area or project. Navigates different levels of leadership to address mid to long-term challenges.</li>
<li><strong>Solver:</strong> digs deep into an area or problem, captures findings, aligns a set of recommendations. May apply both to short-term and long-term engagements and include driving the implementation of the recommended solutions.</li>
<li><strong>Right Hand:</strong> extends an executive's attention and borrows their scope and authority to address certain problem areas.</li>
</ul>
<h2>Principal Engineers at Zalando</h2>
<p>Principal Engineers<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup> at Zalando are senior individual contributors and role models for our Engineers. While they have no people management responsibilities, they are part of the leadership team. Principal Engineers report to a Manager of Managers (e.g. Head of Engineering) and assume the scope of the person they report to. Typically, this means they have 2-5 engineering teams that they support. Overall, Principal Engineers constitute around 4% of our total Engineering population.</p>
<p>At Zalando, Principal Engineers are responsible for the architecture of the systems built within the department they're part of. They enable others and facilitate the design process across teams. They are proactively initiating and executing process and technical improvements (e.g. scaling, technical debt reduction) across the department and beyond. Principal Engineers play a leading role in the full product development lifecycle. They're consulting Product and Engineering Management on projects early on, ensuring that technical considerations are factored into the project's scoping and planning processes.</p>
<p>Our (usually <em>breadth-focused</em>) Principal Engineers are leading the technical design for mid to large scale projects that their department is part of. This involves trade-off discussion, scope definition and negotiation with Product Designers and Product Managers as well as advice on structuring the projects into iterations optimized for reducing delivery risk, dependencies on teams, or ensuring quick time to market. Principal Engineers facilitate design discussions with the involved teams, delegate design or experimentation of well-defined parts of the design to other Engineers. They outline key design decisions and trade-offs and seek feedback through peer-reviews. To understand how their designs perform in production, they guide teams throughout the execution time of the project and support launch readiness through production readiness reviews and project launch coordination.</p>
<p>At Zalando, we peer-review technical designs on different organizational levels, depending on their scope and complexity. During peer-reviews for Zalando group-wide projects requiring contributions from multiple business units, Principal Engineers support the project teams in finding the best solution for realizing the project's goals. Additionally, they provide teams with a different perspective on the suggested solutions and discuss trade-offs related to dependencies, relation to other pending or ongoing projects, and risks and challenges anticipated during project delivery. In this way, we ensure consistency of technical solutions, promote standardized solutions and practices, connect teams who solved similar problems with one another, and seek to incorporate learnings from other projects into future designs.</p>
<p>Focus on operational excellence is key to delivering high-value customer experiences. Principal Engineers play a crucial role in scaling knowledge and raising the bar. They coach teams on resilience patterns, observability and facilitate weekly operational meetings where the operational performance of the system and past incidents are reviewed. They peer-review post-mortem documents and runbooks that the teams prepare as part of the incident response. Finally, they collaborate on alignment and implementation of cross-team action items.</p>
<p><em>Depth-focused</em> Principal Engineers are most frequently part of platform or infrastructure teams. When compared with their peers, these individuals are also spending the highest share of their time writing code. They are thought-leaders influencing the long-term product roadmap. Through their network and collaborations with other Engineers across the company (e.g. via language guilds), they look for opportunities to scale the adoption of existing infrastructure solutions or initiate new ones, with the focus on making our teams or systems more efficient (e.g. shared libraries, application templates, operational guidance or patterns). Lastly, they contribute to setting Engineering Standards and support others in technology selection, evaluation, and adoption as part of our <a href="https://engineering.zalando.com/tags/tech-radar.html">Tech Radar process</a>.</p>
<p>Principal Engineers have also important contributions that go beyond core engineering tasks. They are bar raisers during the interview process, mentor other Engineers, and play a key role in our engineering communities. This way, they have opportunities to coach other engineers, role model our culture, and help identify and develop promising talent.</p>
<h2>Principal Engineering Community</h2>
<p>Principal Engineers form a company-wide community of experts, who support one another in their challenges and journey at Zalando. They self-organize both company-wide and per business unit in order to discuss and drive technical topics that they or their leadership consider as important to meet the business growth and operational excellence of Zalando's technical systems. The Community provides expertise around know-how, patterns, solutions, and the approach to rollout of these in teams. Further, Principal Engineers support one another in order to continuously upskill themselves and others, through mentorship, coaching, or pairing up on tasks.</p>
<p>Engineering-wide initiatives driven by the community are documented in a task list, which in addition to providing transparency on the community efforts, serves as an opportunity to (i) highlight tasks that any Engineer at Zalando can contribute to, or (ii) for anyone to request support on an engineering topic. Similar task lists exist in a smaller scope and provide ways to involve the Engineering talent from these organizations.</p>
<h2>Helping Principal Engineers with their new role</h2>
<p>The majority of our Principal Engineers have been promoted from within Zalando. Some of our senior individual contributors have switched career tracks from Engineering Management back to individual contributors. As the principal engineering role is tailored to our specific needs and organizational structure, it was important for us to set up newcomers to the role for success.</p>
<p>A few Principal Engineers teamed up and compiled a guide to beginning the journey of a Principal Engineer and how to structure the first 100 days in this role. This guide has proven to be helpful for our Principal Engineers, their Managers, and for colleagues who are planning their own career development towards the individual contributor track. In addition to the guide, our more seasoned Principal Engineers provide mentorship to other Principal Engineers.</p>
<p>We also realize that the role of a Principal Engineer may not be a fitting career opportunity for every Senior Engineer. Principal Engineering is not just a label for the best Senior Engineers. In the end, it's a technical leadership role with strong emphasis on cross-team coordination, communication skills, and requiring the ability to lead without authority. The initiatives that an individual is driving tend to have a much longer time horizon for the impact to become visible and are often realized through the hands of others. This delayed gratification can negatively affect motivation, especially for individuals who as problem-solvers with deep expertise value and source their energy from solving large-scale problems with fast iteration cycles (e.g. as part of incident response). At Zalando, we leverage stretch assignments as development opportunities to allow our colleagues to try out aspects of the Principal Engineer role and verify whether it's a good fit for them while allowing them to easily step back to their prior activities otherwise.</p>
<h2>Managing Principal Engineers</h2>
<p>Some of our Engineering Managers have not worked with nor managed Principal Engineers before.
This can lead to situations where the potential of the individuals is under-leveraged. Individual contributors on this level require a degree of flexibility and share of their time to explore the potential of addressing the problem areas they have identified. They also need the necessary sponsorship and support in change management for solutions that are introduced within the department and beyond.</p>
<p>To address this challenge at scale, we compiled guidance for our managers on how to support and effectively work with Principal Engineers. This guide includes a short checklist allowing organizational leaders to easily verify whether they have structured the ways of working and expectations towards the Principal Engineers in the right way. This includes ensuring that the Principal Engineer is part of leadership rounds providing the right context about the department's priorities and upcoming projects, creating the necessary connections between key stakeholders and Heads of Product, and also includes examples of initiatives that Principal Engineers have driven at Zalando.</p>
<h2>Summary</h2>
<p>In this post we have provided insights into the key aspects of the role of a Principal Engineer at Zalando. While this is not an extensive description of the challenges and intricacies of the role, we hope that the information shared in this post will shed some light on the opportunities that the individual contributor path provides. Likewise, we will be happy if it serves as an inspiration for you to consider putting stronger focus on the individual contributor career path in your company.</p>
<hr>
<p>If you found the post relevant to your career ambitions, we'd be happy to get to know you! Join us at Zalando as a <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Principal%20Engineering&filters%5Bcategories%5D%5B1%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B2%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B3%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B4%5D=Software%20Engineering%20-%20Frontend&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B7%5D=Software%20Engineering%20-%20Machine%20Learning&filters%5Bcategories%5D%5B8%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B9%5D=Applied%20Science&filters%5Bcategories%5D%5B10%5D=Product%20Design%2C%20User%20Research%20%26%20UX%20Writing&filters%5Bcategories%5D%5B11%5D=Product%20Management%20%28Technology%29&search=%22Principal%20Software%22">Principal Engineer</a> and help us shape the role.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>There is no consistency in the industry for naming Senior+ roles. Some companies use (i) Senior, Staff, Senior Staff, Principal (e.g. Spotify), whereas others go for (ii) Senior, Principal, Senior Principal, ..., Distinguished Engineer (e.g. Amazon). We chose a naming scheme based on the second model. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Releasing Connexion to the Community2022-02-07T00:00:00+01:002022-02-07T00:00:00+01:00Henning Jacobstag:engineering.zalando.com,2022-02-07:/posts/2022/02/releasing-connexion-python-framework-to-the-oss-community.html<p>After 6 years and 3.9k GitHub stars, we are releasing Connexion, our API-first Python framework, to the Open Source community.</p><blockquote>
<p><a href="https://github.com/zalando/connexion/">Connexion</a> is a Python framework that automagically handles HTTP requests based on <a href="https://www.openapis.org/">OpenAPI specification</a> (formerly known as Swagger Spec) of your API described in <a href="https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md#format">YAML format</a>. Connexion allows you to write an OpenAPI specification, then maps the endpoints to your Python functions; this makes it unique, as many tools generate the specification based on your Python code. You can describe your REST API in as much detail as you want; then Connexion guarantees that it will work as you specified.</p>
</blockquote>
<p>After 6 years and 3.9k GitHub stars, Zalando is now releasing Connexion to the community. What does this mean? Connexion's repository will move from Zalando's GitHub organization to the <a href="https://github.com/spec-first">new community-owned "spec-first" organization</a>. This repository transfer highlights changes in Connexion's maintainer structure. Connexion's license (Apache 2.0) and <a href="https://pypi.org/project/connexion/">release package on PyPI</a> will not change.</p>
<p><img alt="Connexion on GitHub.com" src="https://engineering.zalando.com/posts/2022/02/images/connexion-github.png#center"></p>
<p>Connexion was a huge enabler for Zalando to move towards <a href="https://opensource.zalando.com/restful-api-guidelines/#api-first">API-first</a> in 2015, i.e. to write the API specification before implementing the backend code. While Python is a first class citizen in Zalando's tech landscape (see our <a href="https://opensource.zalando.com/tech-radar/">Tech Radar</a>), Zalando's customer-facing production software is usually implemented in modern JVM languages such as <a href="https://engineering.zalando.com/tags/kotlin.html">Kotlin</a>, <a href="https://engineering.zalando.com/tags/java.html">Java</a>, or <a href="https://engineering.zalando.com/tags/scala.html">Scala</a>. Maintenance of Connexion stalled with core developers changing focus and nobody new stepping up within Zalando. Thankfully, <a href="https://blog.ml6.eu/why-we-decided-to-help-maintain-connexion-c9f449877083">ML6 took over</a> most of the regular maintenance from Zalando. We are very glad to have found new active maintainers. Special thanks go to my colleague <a href="https://github.com/jmcs">João</a> as the original author, <a href="https://github.com/rafaelcaricio">Rafael</a> for his significant contributions, <a href="https://github.com/RobbeSneyders">Robbe</a> and <a href="https://github.com/Ruwann">Ruwan</a> from ML6 for taking over, and to <a href="https://github.com/dtkav">Daniel</a> for donating the "spec-first" organization. The "spec-first" organization will serve as a company-neutral new home for this awesome open source project. The project is what it is today because of its community. Big thanks to all <a href="https://github.com/zalando/connexion/graphs/contributors">165 contributors</a> and to the numerous users of Connexion out there!</p>
<p>Moving Connexion out of Zalando's GitHub organization won't affect how the project is used within Zalando. With JVM-based languages powering most of Zalando's Fashion Store, Connexion is used for low-traffic services and tools in various departments. For example, Connexion powers parts of our internal Continuous Delivery Platform, serves metadata for our internal realtime business monitoring platform, exposes APIs for our inhouse machine learning platform, and is used in our pricing department. Connexion has gained some popularity among Zalando's data science community as <a href="https://engineering.zalando.com/tags/python.html">Python</a> is the most commonly used language for data scientists.</p>
<p>Personally, I'm very happy to see Connexion graduate and have it released to a new community-owned home. I will follow its path into the future and try to be helpful when time allows.</p>
<p>If you are interested in learning more about Connexion, check out <a href="https://connexion.readthedocs.io/">the documentation</a>.</p>Utilizing Amazon DynamoDB and AWS Lambda for Asynchronous Event Publication2022-02-03T00:00:00+01:002022-02-03T00:00:00+01:00Matthias Michael Döpmanntag:engineering.zalando.com,2022-02-03:/posts/2022/02/transactional-outbox-with-aws-lambda-and-dynamodb.html<p>We demonstrate an implementation of the Transactional Outbox pattern put into practice on AWS with Amazon DynamoDB, AWS DynamoDB Streams and AWS Lambda.</p><p>In our Microservices Architecture, services communicate both asynchronous via events and synchronous via REST calls.
Frequently, a synchronous REST call modifies data in a data store and emits an event based on the changes made.
Publishing data change events can be decoupled from performing the changes in the data store in order to increase the resilience of the application.</p>
<p>We will show how this is achieved with the <a href="https://microservices.io/patterns/data/transactional-outbox.html">Transactional Outbox</a> pattern, presenting a cloud native approach utilizing Amazon DynamoDB, AWS DynamoDB Streams and AWS Lambda.</p>
<h2>Problem Statement</h2>
<p>In Zalando Payments we have a service, called Order Store, that stores payment related data for a given order in a DynamoDB table.
Updating this data happens via a synchronous REST call.
Changes to the stored payment information need to be propagated to other services too, which is realized by sending events to <a href="https://github.com/zalando/nakadi">Nakadi</a>, Zalando's message bus.</p>
<p><img alt="coupled" src="https://engineering.zalando.com/posts/2022/02/images/coupled_diagram.png"></p>
<p>Initially, the service created/updated data in DynamoDB and then sent events to Nakadi to inform other services about the change in payment information.
This meant the service had two downstream dependencies to complete the request, namely the database and the message bus.
As the availability of a service is the product of the availabilities of its dependencies, the more dependencies a service has, the lesser is its own availability.
Let's assume DynamoDB and the message bus have availabilities of 99.9% each.
Thus, the maximum availability for the service is <code>99.9% * 99.9% = 99.8%</code>.</p>
<p>Aiming for the highest availability possible, reducing the dependency to only DynamoDB results in a higher availability of the service.
After explaining the transactional outbox pattern, we will provide a concrete solution, the technologies it comprises and how we achieved decoupling the process.</p>
<h2>Transactional Outbox</h2>
<p>Let us look at the underlying concept of how to decouple data update and event publication.
The pattern we are describing here is known as Transactional Outbox.
Our goal is to achieve that a service, synchronously called via a REST API, creates, deletes or updates a data store entry and also propagates the change to other services via messaging.
However, publishing the message is decoupled from updating the data store.</p>
<p><img alt="transactional-outbox-drawing" src="https://engineering.zalando.com/posts/2022/02/images/outbox_diagram.png"></p>
<p>In this drawing we provide the setup of the environment.
Our flow consists of 4 steps, where the starting point is a synchronous call that triggers further actions.</p>
<h3>Change Entry and Populate Outbox</h3>
<p>After the call is received, the service triggers a change for an entry in the data store.
This is denoted with <code>1</code>.
The actions that trigger a change consist of Create, Update or Delete, as a Read operation would not alter any data.
Modifying data in the data store is transactional and once it is successfully completed, the service already returns a success response code to its caller.</p>
<p>As part of the transaction in the data store, the actual data change is written to an outbox.
This is depicted in step <code>1.5</code>.
The outbox can be thought of as a write append log.
Each data change operation in the data store will produce an entry in the outbox.</p>
<h3>Consume Outbox and Publish Event</h3>
<p>The transaction in the data store was successful and the data entry got updated or created.
Thus, a new entry in the outbox exists.
A so called message relay reads that entry from the outbox.
To get aware of the new entry, the message relay notifies the outbox, which upon notification consumes the entry.
This is depicted with number <code>2</code>.</p>
<p>Upon consumption, the message relay extracts the data, transforms it to an event and publishes it, marked in the diagram with <code>2.5</code>.
Only after successful publication the entry is marked as consumed.</p>
<h2>Concrete Solution</h2>
<p>After describing the pattern we now want to present the concrete solution.
In order to decouple the asynchronous event emission from the synchronous process we take advantage of various cloud services AWS has to offer.</p>
<p>The following diagram shows the complete flow from a synchronous REST API call to the publicaton of the Nakadi event following the new approach:</p>
<p><img alt="concrete-solution-drawing" src="https://engineering.zalando.com/posts/2022/02/images/concrete.png"></p>
<h3>DynamoDB Streams</h3>
<p>Recently, DynamoDB was extended with a Change Data Capture implementation – DynamoDB Streams.
Once <a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html">activated</a>, as soon as an item in the DynamoDB table is changed (added, updated or deleted) a corresponding <em>dataset</em> is sent to the stream.
In our case this dataset contains the <em>old image</em>, containing the table item before the change, and the <em>new image</em>, containing the table item after the change.
It can be configured which <em>images</em> AWS exposes to the DynamoDB stream.
With both these images we are now able to assemble a corresponding Nakadi event using AWS Lambda.</p>
<h3>AWS Lambda</h3>
<p>The trigger for our AWS Lambda is a DynamoDB Stream item.
We chose Python for our implementation as it is more lightweight compared to Java.
The lambda function will receive the item containing the <em>old</em> and <em>new image</em>.
Then it will assemble the data change event, which contains the complete item after its change as well as a patch node containing the diff.
As a last step the assembled event is published to Nakadi.</p>
<p>In case the publication to Nakadi fails, e.g. due to timeouts, the request is retried.
If all the retries fail then we make use of an AWS SQS queue as fallback storage which is further explained in the next chapter.
This also means that we do not guarantee that the events are published in the correct order.</p>
<h3>AWS SQS & Kubernetes CronJob</h3>
<p>AWS SQS is a message queue service.
When creating a new AWS Lambda function it already comes with an AWS SQS queue attached as a dead letter queue.
Having this queue it is ensured that no events are lost in case of a failed publication or even worse a temporary outage.
Now, whenever Nakadi event publishing fails the event is sent to the dead letter queue.
For event publishing retries with exponential backoff are in place to minimize the number of events that could not be published ending up in the dead letter queue.
In order to retry sending the events in the queue in intervals we created a Kubernetes cronjob.
The cronjob simply runs the Python code that is also run by the AWS Lambda and tries to publish the events to Nakadi again.
As publication is eventually successful the event is then removed from the SQS queue.</p>
<h2>Conclusion</h2>
<p>We successfully decoupled synchronous data changes from eventually consistent event publishing.
Through decreasing dependencies, we increased the resiliency of our service.
Besides improving the architecture, the team also got to work with DynamoDB streams and AWS Lambda for the first time, offering a great possibility to learn about AWS technologies.
Having implemented this pattern, we are working with our infrastructure teams to offer an implementation of this pattern to all teams at Zalando.
We already have an implementation of the Transactional Outbox for <a href="https://engineering.zalando.com/tags/postgresql.html">PostgreSQL</a>, managed centrally via a Kubernetes operator.</p>
<hr>
<p>If you would like to work on similar challenges, consider <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B1%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B2%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B3%5D=Software%20Engineering%20-%20Frontend&filters%5Bcategories%5D%5B4%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Machine%20Learning&filters%5Bcategories%5D%5B7%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B8%5D=Software%20Engineering%20-%20Principal%20Engineering&filters%5Bentities%5D%5B0%5D=zalando%20payments">joining our engineering teams</a> at Zalando Payments.</p>Maps with PostgreSQL and PostGIS2021-12-02T00:00:00+01:002021-12-02T00:00:00+01:00Felix Kundetag:engineering.zalando.com,2021-12-02:/posts/2021/12/maps-with-postgresql-and-postgis.html<p>Learn how to stream geodata from PostGIS to your browser</p><p><img alt="" src="https://engineering.zalando.com/posts/2021/12/images/postgis-maps-preview.png#previewimage"></p>
<p>This blog post explains to you which tools to use to serve geospatial data from a database system (PostgreSQL) to your web browser. All you need is a database server for the data, a web map application for the frontend and a small service in between to transfer user requests. I will also show you how these components can run on top of Kubernetes in a highly available cloud native fashion.</p>
<h2>PostGIS - a spatial database</h2>
<p>As a first step the dataset in your database you want to put on a map must include a geospatial representation: Two coordinates or an address. For Zalando it might be interesting to know the demand hotspots across Europe e.g. by joining the zip codes of shipments with administrative boundaries which are often <a href="https://ec.europa.eu/eurostat/web/gisco/">available</a> as Open Data. The database must support geo data types and indexes to answer spatial queries. At Zalando, the open source database system PostgreSQL is used by many teams and it offers a geospatial component called <a href="https://postgis.net/">PostGIS</a>. It is used for example to allow our customers to select the nearest pickup and return points. Over the years, PostGIS has grown a strong community and is widely accepted in the industry as the de facto standard to manage geospatial data. There are many different tools and interfaces available to import data in various formats into PostGIS and access it from your favorite data science environment - be it <a href="https://jupyter-tutorial.readthedocs.io/de/latest/data-processing/postgresql/postgis/index.html">Jupyter</a>, <a href="https://www.r-bloggers.com/2019/04/interact-with-postgis-from-r/">R</a> or <a href="https://help.tableau.com/current/pro/desktop/en-gb/maps_spatial_sql.htm">Tableau</a>.</p>
<h2>Bring the map to your browser</h2>
<p>Creating a web mapping app is simple with tools like <a href="https://leafletjs.com/examples.html">Leaflet.js</a>. For the basemap we can use <a href="https://www.openstreetmap.org/">OpenStreetMap</a>, the wiki-style free alternative to commercial map providers. Adding extra layers with e.g. over 100,000 polygons on top of it would slow down map navigation a lot. Splitting the data into a grid of tiles and loading only the ones of the area you are currently looking at on your screen is what makes a browser map fast and responsive. Until recently, a middleware was usually required to produce these tile structures. That middleware had to consider not only the grid creation, but also take care of different zoom levels. When you zoom out the geometry of streets, rivers, forests etc. should be coarser and styled differently - some details should be even left out at a smaller scale for the sake of readability.</p>
<p><img alt="Loading data as vector tiles" src="https://engineering.zalando.com/posts/2021/12/images/nuts3_tiles.gif#center"></p>
<figcaption style="text-align:center">Streaming spatial data from PostGIS as vector tiles into the browser map</figcaption>
<p><br/></p>
<p>The good news is, these days PostGIS can take over most of the middleware’s job and produce map tiles for you. You only need a lightweight server between the frontend that takes in requests from the map and sends queries to your spatial database to produce the tiles you want. <a href="https://github.com/CrunchyData/pg_tileserv">pg_tileserv</a> is such a solution. You configure the table name that contains the spatial data and that’s it. If you want to learn more about vector tiles I can recommend <a href="https://www.youtube.com/watch?v=t8eVmNwqh7M">this talk</a> by Paul Ramsey, one of the PostGIS authors.</p>
<h2>Running it on Kubernetes</h2>
<p>The <a href="https://github.com/zalando/postgres-operator">Postgres Operator</a>, created by my team at Zalando, provides you with an easy creation and update path for PostgreSQL servers running on top of Kubernetes. Engineers only have to write a short YAML manifest which can look like this:</p>
<div class="highlight"><pre><span></span><code><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">acid.zalan.do/v1</span>
<span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Postgresql</span>
<span class="nt">metadata</span><span class="p">:</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">acid-geo</span>
<span class="nt">spec</span><span class="p">:</span>
<span class="w"> </span><span class="nt">numberOfInstances</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2</span>
<span class="w"> </span><span class="nt">postgresql</span><span class="p">:</span>
<span class="w"> </span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">"14"</span>
<span class="w"> </span><span class="nt">volume</span><span class="p">:</span>
<span class="w"> </span><span class="nt">size</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10Gi</span>
<span class="w"> </span><span class="nt">teamId</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">acid</span>
<span class="w"> </span><span class="nt">preparedDatabases</span><span class="p">:</span>
<span class="w"> </span><span class="nt">map_db</span><span class="p">:</span>
<span class="w"> </span><span class="nt">defaultUsers</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
<span class="w"> </span><span class="nt">extensions</span><span class="p">:</span>
<span class="w"> </span><span class="nt">postgis</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">geo</span>
<span class="w"> </span><span class="nt">schemas</span><span class="p">:</span>
<span class="w"> </span><span class="nt">geo</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</code></pre></div>
<p>The operator will notice the new manifest and create all the necessary resources in Kubernetes - a stateful set with 2 database pods, services to connect to the database, secrets for authentication etc.. With specifying preparedDatabases the operator will create a new database with schemas as well as a set of database roles (reader, writer, owner) with default access privileges assigned. Plus, you can list extensions to be created in a certain schema. The Postgres cluster is based on the <a href="https://github.com/zalando/spilo">Spilo</a> docker image which includes the PostGIS extension.</p>
<p>To import arbitrary geodata formats I can recommend <a href="https://subscription.packtpub.com/book/application_development/9781788299329/1/ch01lvl1sec14/importing-and-exporting-data-with-the-ogr2ogr-gdal-command">GDAL’s ogr2ogr</a> command-line tool. In my case I’ve imported the latest <a href="https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts">European NUTS polygons</a> of 2021 and the <a href="https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/population-distribution-demography/geostat">1km² population grid</a> of 2018 by Geostat.</p>
<p>To roll out <code>pg_tileserv</code> on Kubernetes I’m using a deployment resource. To run it within the Zalando infrastructure I had to move the tileserver base path behind our oauth2 proxy with a dedicated <code>/tileserver</code> base path which required me to overwrite <code>pg_tileserv</code>’s default configuration. Configuration of <code>pg_tileserv</code> happens via <a href="https://github.com/CrunchyData/pg_tileserv/blob/master/config/pg_tileserv.toml.example">toml</a> files so I’ve put that into a config map and mounted it into the container. Here you can see the manifest (leaving out the resources section in this example):</p>
<div class="highlight"><pre><span></span><code><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">v1</span>
<span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">ConfigMap</span>
<span class="nt">metadata</span><span class="p">:</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">acid-geo-tileserver-config</span>
<span class="nt">data</span><span class="p">:</span>
<span class="w"> </span><span class="nt">pg_tileserv.toml</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">BasePath = "/tileserver/"</span>
<span class="w"> </span><span class="no">Debug = true</span>
<span class="nn">---</span>
<span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">apps/v1</span>
<span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Deployment</span>
<span class="nt">metadata</span><span class="p">:</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">acid-geo-tileserver</span>
<span class="nt">spec</span><span class="p">:</span>
<span class="w"> </span><span class="nt">replicas</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1</span>
<span class="w"> </span><span class="nt">selector</span><span class="p">:</span>
<span class="w"> </span><span class="nt">matchLabels</span><span class="p">:</span>
<span class="w"> </span><span class="nt">application</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">acid-geo-tileserver</span>
<span class="w"> </span><span class="nt">template</span><span class="p">:</span>
<span class="w"> </span><span class="nt">metadata</span><span class="p">:</span>
<span class="w"> </span><span class="nt">labels</span><span class="p">:</span>
<span class="w"> </span><span class="nt">application</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">acid-geo-tileserver</span>
<span class="w"> </span><span class="nt">spec</span><span class="p">:</span>
<span class="w"> </span><span class="nt">containers</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">acid-geo-tileserver</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">pramsey/pg_tileserv:latest</span>
<span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">containerPort</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">7800</span>
<span class="w"> </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="s">"TCP"</span>
<span class="w"> </span><span class="nt">volumeMounts</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">configs</span>
<span class="w"> </span><span class="nt">mountPath</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/config</span>
<span class="w"> </span><span class="nt">env</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s">"DATABASE_URL"</span>
<span class="w"> </span><span class="nt">value</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">postgresql://map_db_reader_user@acid-geo:5432/map_db</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s">"PGPASSWORD"</span>
<span class="w"> </span><span class="nt">valueFrom</span><span class="p">:</span>
<span class="w"> </span><span class="nt">secretKeyRef</span><span class="p">:</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">map_db_reader_user.acid-geo.credentials</span>
<span class="w"> </span><span class="nt">key</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">password</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">configs</span>
<span class="w"> </span><span class="nt">configMap</span><span class="p">:</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">acid-geo-tileserver-config</span>
</code></pre></div>
<p>Another deployment is needed serving our Leaflet application, e.g. using a simple Ubuntu docker image with nginx running.</p>
<h2>Dynamic mapping layers</h2>
<p>The web map requests tiles from <code>pg_tileserv</code> which sends back protobuf files. In our case, a request looks like this - with <code>geo.boundaries_europe</code> being the schema qualified table name:</p>
<div class="highlight"><pre><span></span><code><span class="si">${</span><span class="nv">BASE_URL</span><span class="si">}</span>/tileserver/geo.boundaries_europe/<span class="o">{</span>z<span class="o">}</span>/<span class="o">{</span>x<span class="o">}</span>/<span class="o">{</span>y<span class="o">}</span>.pbf
</code></pre></div>
<p>Z is the zoom level and X and Y are the coordinates of the mouse cursor. Leaflet’s <a href="https://leaflet.github.io/Leaflet.VectorGrid/vectorgrid-api-docs.html#vectorgrid">VectorGrid</a> class can be used to <a href="https://blog.crunchydata.com/blog/crunchy-spatial-tile-serving">display the vector tiles</a> returned from PostGIS. For the boundaries the result can look like in the first picture above. The vector tile format must not consist solely of the geometry. Multiple thematic attributes can be included making it possible to change the style on the fly without sending another request to the database. <code>pg_tileserv</code> will take information from all columns it finds in a spatial table.</p>
<p>Alternatively, it allows me to serve vector tiles not only from a table but also from an SQL function using a query with PostGIS’ vector tile creator function <a href="https://postgis.net/docs/ST_AsMVT.html">ST_AsMVT</a>. <code>pg_tileserv</code>’s <a href="https://github.com/CrunchyData/pg_tileserv#readme">README</a> on GitHub provides some cool examples for such function layers. For example PostGIS allows you to create a <a href="https://postgis.net/docs/ST_HexagonGrid.html">grid</a> of squares or hexagons within a defined extent, e.g. the envelope of a single tile. The grid can be intersected with another spatial data set to produce a heatmap. The following example is inspired from <code>pg_tileserv</code>'s example of <a href="https://access.crunchydata.com/documentation/pg_tileserv/1.0.3/usage/function-layers-advanced/">Advanced Function Layers</a>.</p>
<div class="highlight"><pre><span></span><code><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">geodata</span><span class="p">.</span><span class="n">population_hexagons</span><span class="p">(</span>
<span class="w"> </span><span class="n">z</span><span class="w"> </span><span class="nb">integer</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="nb">integer</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="nb">integer</span><span class="p">,</span>
<span class="w"> </span><span class="n">step</span><span class="w"> </span><span class="nb">integer</span><span class="w"> </span><span class="k">default</span><span class="w"> </span><span class="mi">4</span><span class="p">)</span>
<span class="k">RETURNS</span><span class="w"> </span><span class="n">bytea</span><span class="w"> </span><span class="k">AS</span>
<span class="err">$$</span>
<span class="k">WITH</span><span class="w"> </span><span class="n">bounds</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="c1">-- get web mercator tile bounds to given coordinate</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">ST_TileEnvelope</span><span class="p">(</span><span class="n">z</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">geom</span>
<span class="p">),</span><span class="w"> </span><span class="n">hexes</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="c1">-- generate hexgrid within bounds and join with population grid</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">row_number</span><span class="p">()</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">grid_id</span><span class="p">,</span>
<span class="w"> </span><span class="n">h</span><span class="p">.</span><span class="n">geom</span><span class="p">,</span><span class="w"> </span><span class="n">h</span><span class="p">.</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">h</span><span class="p">.</span><span class="n">j</span><span class="p">,</span>
<span class="w"> </span><span class="k">sum</span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="n">popcount</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">5</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">popcount</span><span class="w"> </span><span class="c1">-- oversimplified, of course</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">bounds</span><span class="w"> </span><span class="n">b</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="k">LATERAL</span><span class="w"> </span><span class="n">ST_HexagonGrid</span><span class="p">(</span><span class="w"> </span><span class="c1">-- 1. hex size, 2. boundary</span>
<span class="w"> </span><span class="p">(</span><span class="n">ST_XMax</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">geom</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">ST_XMin</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">geom</span><span class="p">))</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">pow</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="n">step</span><span class="p">),</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">geom</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="n">h</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="k">true</span><span class="p">)</span>
<span class="w"> </span><span class="c1">-- do spatial join between our artificial grid and the Geostat grid</span>
<span class="w"> </span><span class="c1">-- the hex grid is in web mercator coordinate reference system (CRS)</span>
<span class="w"> </span><span class="c1">-- it must be tranformed into the same CRS of the population grid (WGS84 - 4326)</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">geodata</span><span class="p">.</span><span class="n">population</span><span class="w"> </span><span class="n">p</span>
<span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">p</span><span class="p">.</span><span class="n">geom</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">ST_Transform</span><span class="p">(</span><span class="n">h</span><span class="p">.</span><span class="n">geom</span><span class="p">,</span><span class="w"> </span><span class="mi">4326</span><span class="p">)</span>
<span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">h</span><span class="p">.</span><span class="n">geom</span><span class="p">,</span><span class="w"> </span><span class="n">h</span><span class="p">.</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">h</span><span class="p">.</span><span class="n">j</span>
<span class="p">),</span><span class="w"> </span><span class="n">mvt</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="c1">-- processing geometry for vector tiles</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">ST_AsMVTGeom</span><span class="p">(</span><span class="n">h</span><span class="p">.</span><span class="n">geom</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">geom</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">geom</span><span class="p">,</span>
<span class="w"> </span><span class="p">(</span><span class="n">h</span><span class="p">.</span><span class="n">i</span><span class="p">::</span><span class="nb">text</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">h</span><span class="p">.</span><span class="n">j</span><span class="p">::</span><span class="nb">text</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">h</span><span class="p">.</span><span class="n">grid_id</span><span class="p">::</span><span class="nb">text</span><span class="p">)::</span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">grid_id</span><span class="p">,</span>
<span class="w"> </span><span class="n">H</span><span class="p">.</span><span class="n">popcount</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">hexes</span><span class="w"> </span><span class="n">h</span><span class="p">,</span><span class="w"> </span><span class="n">bounds</span><span class="w"> </span><span class="n">b</span>
<span class="p">)</span>
<span class="c1">-- baking mvt geom, grid_id and popcount into MVT encoding</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">ST_AsMVT</span><span class="p">(</span><span class="n">mvt</span><span class="p">,</span><span class="w"> </span><span class="s1">'geodata.population_hexagons'</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">mvt</span><span class="p">;</span>
<span class="err">$$</span>
<span class="k">LANGUAGE</span><span class="w"> </span><span class="s1">'sql'</span><span class="w"> </span><span class="k">STABLE</span><span class="w"> </span><span class="k">STRICT</span><span class="w"> </span><span class="n">PARALLEL</span><span class="w"> </span><span class="n">SAFE</span><span class="p">;</span>
</code></pre></div>
<p>Your function must take the Z, X and Y parameters as arguments and return Postgres' <code>bytea</code> type, which is just a BLOB for the PBFs returned from ST_AsMVT. In the first part of the query we need to get the tile envelope for the given input. Within this square we generate the grid and join it against the Geostat population grid. For each hexagon we sum up the population of every intersecting Geostat grid cell. This is quite coarse, indeed. It would be more precise to join the generated grid against a point data set, e.g. one could generate <a href="https://postgis.net/docs/ST_Centroid.html">centroids</a> for each data polygon.</p>
<p><img alt="Dynamic spatial layers based on zoom level" src="https://engineering.zalando.com/posts/2021/12/images/pophex_tiles.gif#center"></p>
<figcaption style="text-align:center">Dynamic hexagon grid joined against Geostat population data using an SQL function</figcaption>
<p><br/></p>
<p>Because this is all based on database queries triggered from user interactions with the map, such a heatmap can be dynamic and change while zooming in and out. As the vector tile grid gets smaller on a larger scale the heatmap becomes more fine-grained. In the map legend you can see that values adapt to the zoom level and hexagon size. This is much better for the perception by not overwhelming the observer when the full picture is shown and providing better guidance to points of interests.</p>
<hr>
<p><em>If you are as passionate about PostgreSQL or databases as we are, please consider joining our team as a <a href="https://jobs.zalando.com/en/jobs/2704994-database-engineer-postgresql/">Database</a> or <a href="https://jobs.zalando.com/en/jobs/3460249-database-engineer-postgresql/">Senior Database Engineer</a></em></p>A Systematic Approach to Reducing Technical Debt2021-11-30T00:00:00+01:002021-11-30T00:00:00+01:00Gregor Ulmtag:engineering.zalando.com,2021-11-30:/posts/2021/11/technical-debt.html<p>This article describes a systematic approach to reducing technical debt from the perspective of engineering management. It thoroughly describes the process that was set up in one of our core engineering teams and also addresses how such work can be effectively capitalized.</p><h2>Introduction</h2>
<p>While technical debt is a recurring issue in software engineering, the case of the Merchant Orders team within Zalando Direct was a an outlier as, due to a lack of a clearly defined process, technical debt more or less only ever accumulated. When I joined this team in autumn 2020 as its new engineering lead, the technical debt backlog had entries dating back to 2018. In this article, I describe the process we set up in Q1/2021 in order to regain control of our technical debt. While the situation in your own team may not be quite as dire, you may nonetheless find some aspects of this blog post useful to adopt. Our backlog of technical debt tickets used to be in excess of 70, with no end in sight. With the adoption of the methodology described in this article, we have already shipped more than ten features or improvements over the course of eight weeks, i.e. four sprints. For the first time in three years, i.e. ever since my team started tracking technical debt, we are reducing it.</p>
<p>This article is written from a managerial perspective and has Engineers and Engineering Managers as its target audience, though I hope that engineers of all levels find value in this article. Furthermore, I can only encourage any software engineer reading this article to approach their lead if ever-growing technical debt is an issue in their team. There is a non-zero chance that they will appreciate you raising the issue, considering that all of us are aware that technical debt is a serious problem. If you do not pay it down, you will get more technical debt on top for free, until your only option is a complete rewrite. This is quite similar to compound interest driving debtors into bankruptcy in the real world. Obviously, we would like to avoid such an outcome.</p>
<p><img alt="Excerpt from team's technical-debt backlog" src="https://engineering.zalando.com/posts/2021/11/images/techdebt_tracker_anon.png#center"></p>
<figcaption style="text-align:center">An excerpt from my team’s technical-debt backlog as of April 2021. As you can see, there are items from 2018 and 2019 on it.</figcaption>
<h2>Technical debt, Known and Unknown</h2>
<p>Using the vocabulary of the <a href="https://en.wikipedia.org/wiki/Johari_window">Johari window</a>, you can probably identify plenty of “known known” technical debt in your codebase. However, some technical debt constitutes an “unknown unknown”, i.e. technical debt we do not know that we have. In our case, we had a long backlog of known technical debt, with many dozens of entries. Given that we have over a dozen services to maintain, this is probably not even a particularly frightening number. However, there is also technical debt that you are completely unaware of. This may seem counter-intuitive, in particular if you subscribe to the notion of being able to perfectly design services in advance, as well as once and for all eternity. Yet, this is not a caricature, considering that you can encounter non-technical leads who hold rather similar beliefs. In some circumstances, this could even be a perfectly valid position to hold, for instance in static environments.</p>
<p>There are at least two sources of unknown technical debt. First, there are problems with your services that you simply have not yet identified. This can happen easily because once you agree on a design and subsequently carry out its implementation, you may not question any decisions the team has agreed on. This can of course mean that there are drawbacks in your design or implementation that someone with a fresh pair of eyes, for instance a new joiner, may be able to spot. Second, technology is a fast-moving field. This means that today’s cutting-edge design-patterns, development processes, testing strategies, or even programming languages and paradigms may get superseded. Your current best practices replaced your previous set of best practices one by one, and there are new developments that will one day make you wonder why anybody ever thought that a hitherto valid approach was ever a good idea. Of course, there is also the problem that we sometimes need to deliver features quickly to seize a business opportunity, which may lead to sub-optimal design and implementation decisions.</p>
<p>Not all change is positive, however. As much as we engineers may pride ourselves on our objectivity, our industry is also driven by fads. This is such a big issue that a company like Gardner makes money by selling their analyses about where on the “<a href="https://www.gartner.com/en/research/methodologies/gartner-hype-cycle">hype cycle</a>” certain technologies are. Sometimes, we also regress as an industry, for instance by adopting technologies that are popular but less powerful. Yet, if they are being pushed by corporations with an annual marketing budget of many hundreds of millions of dollars, they can get a lot of traction in industry. Any of your services might look much differently if it was rewritten today. As a practical consequence, I think you should take the time to re-review your existing services and look for improvements, but, if possible, with a very critical view toward buzzwords du jour. Even <a href="https://en.wikipedia.org/wiki/TeX">TeX</a>, one of the arguably most mature software products in the world, receives fixes to this very day. Its first version was released about two decades ago. Taking this into account, it is probably not an entirely implausible assumption that your services could be improved as well. On a related note, Zalando has <a href="https://engineering.zalando.com/posts/2020/07/technology-choices-at-zalando-tech-radar-update.html">formal processes in place for selecting technologies as well as adopting new technologies</a>. This is certainly helpful for engineering leaders, yet it cannot address the problem that some technologies fall out of favor over time due to shortcomings.</p>
<p>As we create software solutions in a highly dynamic environment where both customer requirements and technologies can change, a semi-regular review of any of your services may uncover areas of improvement. All of that should be categorized as (hitherto unknown) technical debt. A very welcome consequence of such an exercise is that your engineers will gain greater familiarity with their services. This is particularly valuable if your services need to be reliable anytime. Preferably, each engineer on your on-call rotation should have very detailed knowledge of your services, so thoroughly studying the source code of your existing service will be very helpful to them.</p>
<h2>Motivating your Engineers</h2>
<p>In management theory, a popular concept is <a href="https://en.wikipedia.org/wiki/Theory_X_and_Theory_Y">Theory X/Theory Y</a>. These two show up in pairs. According to Theory X, people only work because they need money and, if they could get away with it, they would prefer to not work at all. In contrast, Theory Y posits that people are intrinsically motivated, care about their work, and want to advance in their career. Reality is probably somewhere in-between. However, as a leader, the problem is how to get people to want to work on technical debt. In our case, the problem was that the backlog had tickets on it that were three years old, which seems to imply a lack of motivation to work on such tickets.</p>
<p>As leaders we can of course simply tell people what to work on (Theory X). The problem, however, is that people tend to be more productive if they work on tickets they really do want to work on (Theory Y). Furthermore, my experience as an engineer was that work on technical debt can be both fulfilling, as well as open up new opportunities. Consequently, I use a Theory Y approach with my team, stressing the benefits of this kind of work. Please note that this is not in any way a cynical approach. A good part of my growth as an engineer was due to resolving hairy technical problems, oftentimes with a focus on performance improvements. In one of my internships I was given the task of increasing the performance of an artificial neural network, and this work led to me later on getting hired in a very competitive field. I also highlighted to my team that work on technical debt can sometimes be easily quantified. An engineer’s CV certainly looks better with hard data on percentages of performance increases or space reductions. Examples are: “Reduced weekly AWS hosting fees by $500 by evaluating resource requirements” (this is an actual result of our work) or “reduced space requirements of one of our databases by 12% by optimizing data types and removing redundant information.”</p>
<h2>The Technical-Debt Rotation</h2>
<p>My team already has several rotations in place. Thus, I set up technical debt as another rotation. I aim to give my team autonomy in their work, so my proposal was the following: all engineers take turns in the technical-debt rotation, and one iteration lasts for one week. In practice, this means that on every Monday an engineer should spend some time on identifying technical debt they want to work on. This can either be known technical debt, i.e. one or more tickets from the technical-debt tracker, or unknown technical debt. For the latter, my suggestion is to pick one of our many services, study the source code, and look for improvements. This should lead to a number of additional tickets. Preferably, an engineer identifying possible improvements of an existing service should also do the corresponding work. This is particularly the case when we only have a hypothesis that requires some work to test it.</p>
<p>I want the engineers on the technical-debt rotation to work on tickets related to technical debt before taking on any tickets from our regular backlog, which is of course considered during the planning meeting. In terms of the time commitment, I am rather flexible. I would like the engineer on the rotation to spend at least one day working on technical debt. However, there are situations where a bigger commitment may be warranted. This is particularly the case with larger subprojects, which is detailed in the next section.
You may miss that I have not addressed the issue of urgency as, clearly, not all technical debt is created equal. Pressing issues we tend to address as soon as possible. We commonly do not even classify it as technical debt but instead as a necessary bug fix or an “operations” issue. Nonetheless, some of our accumulated technical debt is merely nice-to-resolve. My advice to fellow leaders would be to keep an eye on what your team is working on by tracking the technical-debt tickets your team closes. There should be a healthy mix of relative importance. If not, you will have to address this, perhaps in a separate session for backlog refinement. I would not advise you to rank all technical-debt tickets by urgency and simply assign them, however, for reasons specified in the previous section.</p>
<p>We also have a simple system in place for categorizing technical debt where we use the two metrics "complexity" and "impact", and rank both on a scale from one to five. In our case, these estimations are initially done by the engineer who adds entries to the tech-debt backlog, but they are reviewed intermittently. I think a good starting point is picking a few items that could be considered low-hanging fruit, i.e. work that pairs relatively low complexity with moderate to high impact. You may want to encourage your engineers to also tackle more complex work with a medium to high impact. You may also find that some of the technical debt is not worth resolving at the current point in time as the impact would be low to non-existent. Those you may want to save for a less busy time, for instance the code freeze before Cyber Week.</p>
<h2>Capitalizing Technical Debt</h2>
<p>One of the duties of software engineering leads is to ensure that the work their team performs is properly capitalized. This means that any software we create that increases our digital assets should also be added to our financial assets. In turn, this reduces our tax liabilities. Maintenance work, however, cannot be capitalized as it is instead considered an expense. A collection of technical debt tickets could constitute a mini-project that can be capitalized, however. One example would be a migration to new infrastructure or a significant rewrite that leads to performance improvements. Admittedly, packaging technical-debt tickets into a project may be an overly idealistic scenario. Yet, it is a possible outcome. In our team’s case, we have recently identified a number of issues with our Scala code base, due to an over-reliance on object-oriented programming constructs. If we resolved them, we would have a more maintainable system; we also predict an improvement in performance as there are many instances where objects are used instead of primitive types. Similarly, you may be able to identify a group of technical-debt tickets, provided your backlog is long enough, that could constitute a small project.</p>
<h2>Results</h2>
<p>The team has been following the technical-debt rotation as described in this article for about six months. Feedback from the team has been positive. Among others, the engineers remarked that it adds variety to their work or that they appreciate the increased autonomy. Of course, the latter will only be the case for as long as there is a large enough backlog of technical-debt tickets to choose from. At some point, hopefully, we will have reduced our backlog significantly, and then we will have to rely on the intrinsic motivation of wanting to better understand an existing system by diving deeper into implementation details or the satisfaction of improving the performance or design of a service.
From the perspective of an engineering leader, my end goal is to pay down as much technical debt as possible. In fact, the ideal size of our technical-debt backlog would be zero. This is a distant goal, but we have taken successful steps towards it. First, I wanted to reduce the rate of increase of the backlog. We achieved this within the first two weeks. If you preside over a technical-debt backlog that has only been growing for three years, it is already satisfying to see that it is no longer growing as quickly. The next step was to keep the number of tickets on the backlog steady, which we reached soon afterwards. Now we are at the point where the total number of tickets on our technical-debt backlog is, possibly for the first time ever, declining. The team is very happy about it. One year from now, I expect us to have drastically reduced our technical-debt backlog.</p>
<hr>
<p><em>If you're interested in Engineering Management, consider joining our teams as <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Blevels%5D%5B0%5D=Management%20Level&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Leadership">Engineering Manager at Zalando</a>.</em></p>Parallel Run Pattern - A Migration Technique in Microservices Architecture2021-11-04T00:00:00+01:002021-11-04T00:00:00+01:00Ali Sabzevaritag:engineering.zalando.com,2021-11-04:/posts/2021/11/parallel-run.html<p>Learn how we leveraged the parallel run pattern to decompose a high traffic monolith to smaller microservices</p><p>The business landscape in Zalando is growing every day. This continuous growth implies that we need to be able to cope with an ever-changing environment. Everyone with experience in software development knows that dealing with changes is a challenging problem. Especially, when the software is already working in production. Changing the software in production is like changing the tires on a car while it is still moving.</p>
<p>In large organisations such as Zalando, where microservices architecture is the standard, changes are even more frequent. Technologies become obsolete, organization structures change, teams split or merge, monoliths are being rewritten, and yesterday's microservices become today's monoliths. All those examples impose dramatic changes in codebases.</p>
<p>Naturally, testing is the first solution that comes to our minds when trying to minimize the regression of a change. But, in scenarios like decomposing a monolith or replacing a legacy component with a newer one, testing might not be enough. Furthermore, there are always dark corners in our systems that we have never tested or we don't know their behavior (anymore). Sometimes, as you may well know from your own experience, legacy systems don't even have tests one can use as a reference.</p>
<p>In this article, we will explore a design pattern called the <em>Parallel Run</em><sup id="fnref:fn1"><a class="footnote-ref" href="#fn:fn1">1</a></sup> which is a strategy to make sure those dramatic changes will not break the system. We will walk you through a real-world example and describe how we managed to replace a service by taking advantage of this pattern and show you the challenges and surprises we dealt with. In the end, we summarize the upsides and downsides of this pattern to better help you choose when to implement it and when not.</p>
<h2>Decomposing the monolith, a case study</h2>
<p>Zalando is aiming to unify the user experience across platforms<sup id="fnref:fn2"><a class="footnote-ref" href="#fn:fn2">2</a></sup>. As part of this effort we, the Returns team, were required to extract the returns logic out of a soon-to-be legacy monolithic application. Returns logic, as the name might imply, deals with everything to do with customers returning articles they've bought on the Zalando Fashion Store. This article will explore how our team used the Parallel Run pattern to transparently and safely extract the returns logic from the monolith to the new Returns microservice.</p>
<p><img alt="Decomposing the monolith" src="https://engineering.zalando.com/posts/2021/11/images/decomposing-monolith.png"></p>
<p>This new service should behave exactly like the respective part in the monolith and the customers should not notice any difference after the migration. In order to achieve this, the following complications needed to be overcome:</p>
<ul>
<li>While reading the old code is possible, we might miss some parts of the logic or misunderstand the code.</li>
<li>Some parts of the code are not tested, so running the tests over the new code (if possible) would not guarantee the exact behavior.</li>
<li>The criticality of the application precludes downtime.</li>
</ul>
<h2>Parallel Run Pattern</h2>
<p>In order to solve these problems, wouldn't it be nice if we could verify that each request handled by the new system would be handled exactly in the same way as for the system currently running in production? The parallel run pattern does exactly that.</p>
<blockquote>
<p>When using a parallel run, rather than calling either the old or the new implementation, instead we call both, allowing us to compare the results to ensure they are equivalent. Despite calling both implementations, only one is considered the source of truth at any given time. Typically, the old implementation is considered the source of truth until the ongoing verification reveals that we can trust our new implementation.</p>
<p>-- Sam Newman, Monolith to microservices</p>
</blockquote>
<h3>Implementation</h3>
<p>There are several ways of implementing this pattern. Hereafter we present how we solved it for the above use case.</p>
<p>The following diagram shows the flow for each incoming request:</p>
<!-- http://www.plantuml.com/plantuml/uml/dPB1RXGn38RlynJM7X18h40zSa0jXPweG6sFI2XDl3iMIHniXx9lJxAZ7GR1x85ZBCV_vo-vL7DYDSN1LUDSqoFAS1q9iy7sBMokIb5uTtC3psyvSoGRNspUm1r-hwZsrGt_REWtfnczLGjdnTQRsH24zcChFum8fmiWnvwWO0ms8lWftnM2BrccB70APF34DGR8BCd5U830mph2vWwjIjPM9o-iA3_8O-V__Ed-0LxvnaLgcFrXwqVqttH2ZBXhXFVOYJhEJB0pb2DHYGVA-nFkjEhwUZfFQeajphGDuLsld4o2ow6VPrr0-NX-v70OrXQ1xVeJNNcFnJ0iL-fKomb0AM4WPzXKJdiHAZnrw8lN5yEPu7Mxoz-nLFBXfudpzeVHAl4b9BIHGyll3aPq0KMFFhet8EkQoHJZxYpFUoojlp_cJ33yhfqbdgt_xyRNd8eJKivBtLCLGKvlsl_Bt_zUyNngQoTZeRnlIRTeGbxX6NpalGwNc4DDyHS0 -->
<p><img alt="Parallel run sequence diagram" src="https://engineering.zalando.com/posts/2021/11/images/parallel-run-sequence-diagram.png"></p>
<ul>
<li>(1-2) The Client makes a request that gets immediately processed and responded by the monolith to avoid any degradation in performance.</li>
<li>(3-4) After responding, the monolith POSTs a request to the <code>/consistency-checks</code> endpoint of the new Returns microservice, that immediately answers back with 202 (Accepted), indicating the request will be handled asynchronously. In this way we avoid the monolith having to wait, and we free its resources.</li>
<li>(5-6-7) The Returns microservice starts processing the request, in background, by first re-issuing the same request to itself but calling the actual endpoint.</li>
<li>(8) Then the response from the Returns microservice gets collected and compared with the one from the monolith.</li>
<li>(9) Finally, Metrics and Logs about the consistency are produced to later on verify that the expected consistency is reached and to investigate cases of inconsistencies.</li>
</ul>
<p>The async request sent to the ConsistencyChecker part in the Returns microservice, contains information about the original request url with the query-params, the method, headers and, when present, the body. This information represents the new request to be sent to the Returns microservice. It includes also the HttpStatus, the headers, and the body of the response returned by the monolith in order to be checked against the response from the Returns microservice.</p>
<p>The following is an example of the structure that we used:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">"request"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"url"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"path"</span><span class="p">:</span><span class="w"> </span><span class="s2">"api/example?param=something"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">"headers"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"Content-Type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"application/json;charset=UTF-8"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"Accept-Language"</span><span class="p">:</span><span class="w"> </span><span class="s2">"de-DE"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">"method"</span><span class="p">:</span><span class="w"> </span><span class="s2">"GET"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"body"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">"response"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"status"</span><span class="p">:</span><span class="w"> </span><span class="mi">200</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"headers"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"Content-Type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"application/json;charset=UTF-8"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"transfer-encoding"</span><span class="p">:</span><span class="w"> </span><span class="s2">"chunked"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">"body"</span><span class="p">:</span><span class="w"> </span><span class="s2">"json-response-body"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Each endpoint of the monolith has its own expected consistency to be reached in order to declare the migration successful. Once that threshold has been achieved, the migration can be considered safe, and we can perform the switch from the monolith to the new Returns microservice for that endpoint.</p>
<h3>Monitoring and Reporting</h3>
<p>In order to consider an endpoint ready, it had to reach a satisfying consistency percentage. For each request we produced the result metrics using Prometheus, and we displayed them with Grafana. Each endpoint, defined by an <code>operation_id</code>, had its own metric and its own tolerance. This was done because, as usual, fixing those last few percentages has a cost higher than the value it brings; given that each endpoint is completely separated from one another, each endpoint had its own target percentage to consider it consistent (enough).</p>
<p><img alt="Monitoring_example" src="https://engineering.zalando.com/posts/2021/11/images/monitoring-and-reporting-grafana.png"></p>
<p><strong>Matched</strong>: counter for all the requests that matched between the monolith and the Returns microservice.</p>
<p><strong>Unmatched</strong>: counter for all the requests that did not match between the two services. Possible examples could be:</p>
<ul>
<li><em>Different HttpStatuses</em>: such as 2xx and 4xx or even 201 and 200</li>
<li><em>Different Headers set</em>: a missing header in one of the two responses or different values for the same header</li>
<li><em>Different Body responses</em>: missing fields/attributes in the responses or different values for the same field/attribute</li>
</ul>
<p><strong>Failed</strong>: counter for all the requests where the response was terminated by temporary issues, such as for example in case of any 5xx. In these cases, even if they matched it would not be a valuable information given that the request couldn't be properly fulfilled due to a transient server-side issue. On the other hand, if the request did not match for 5xx cases, the <em>unmatched</em> counter should be increased because it means the overall behavior of the Returns microservice doesn't match the one from the monolith, and it requires a deeper investigation.</p>
<h3>Rollout</h3>
<p>The switch was done gradually, and it was done per endpoint to allow the system to be tested in a fully functional way. This was achieved by using a proxy to move the forwarding of the requests to the Returns microservice one by one once they were ready. In our case we used <a href="https://opensource.zalando.com/skipper/">Skipper</a>, an open-source Proxy developed by Zalando.</p>
<p><img alt="Endpoints rollout" src="https://engineering.zalando.com/posts/2021/11/images/endpoints-rollout.png"></p>
<p>In this way, by minimizing the amount of endpoint rolled out to one per switch, we avoided introducing a massive set of changes in one go, and we were able to collect additional feedback by every single switch while still working on finalizing the other ones.</p>
<h3>Clean-up</h3>
<p>Once the migration was successfully finalized, all the code related to the parallel run logic needed to be cleaned-up. The three main parts to remove were the handler performing the consistency check (use cases layer), the gateway to call the localhost (gateway layer) and the domain model related to the consistency logic (entities layer). Additional clean-ups were done for configuration files such as the feature toggle to enable/disable the consistency checker and the config for the localhost gateway, the dependency injection in the Main file, the consistency-checker api in the route and, of course, all the tests to validate the consistency check logic. Code-wise we removed ~700 lines of code and ~1.3k lines between unit and component tests.</p>
<h3>Advantages of this approach</h3>
<ul>
<li>
<p><strong>Live data for testing:</strong> We can leverage the real production data as test cases. Therefore, given enough time, the system will be tested potentially under all the "real-life" use cases.</p>
</li>
<li>
<p><strong>Gradual rollout:</strong> The rollout is done per endpoint minimizing the amount of changes per switch.</p>
</li>
<li>
<p><strong>Incremental development:</strong> The gradual rollout also enables the possibility to approach the implementation per endpoint.</p>
</li>
<li>
<p><strong>Easy rollback:</strong> By using a proxy to do the traffic switch, rolling back just requires a change to the proxy to migrate the endpoint back to use the previous host instead of the microservice one; this avoids the need of redeploying, making the whole process faster.</p>
</li>
<li>
<p><strong>Finding bugs:</strong> Since the new microservice will be tested with real data, there might be cases where even the monolith was behaving incorrectly. This approach can make those edge cases visible.</p>
</li>
<li>
<p><strong>Load testing:</strong> In case of using a different technology for the newer service, parallel run pattern helps to understand the performance characteristics of the new service. As a result, the development team can target more realistic performance goals or SLOs before going live.</p>
</li>
</ul>
<h3>Considerations and Limitations</h3>
<p>While this approach makes the migration safer and smoother, it has also some concerns and issues to be kept into account.</p>
<ul>
<li>
<p><strong>Increased load:</strong> Given that requests received by the monolith are forwarded to the microservice, the load across all components increases, potentially doubling.</p>
</li>
<li>
<p><strong>Refine the comparisons:</strong> In the comparison check not everything needs to match 100%. For example, in our case we ignored some headers that were not relevant for the outcome of the request.</p>
</li>
<li>
<p><strong>GDPR:</strong> While collecting the data for the comparison we need to keep into account that sensitive information should either not be stored or cleaned afterwards. In the former case, analyzing some inconsistencies for the fields containing personal data might not be easy.</p>
</li>
<li>
<p><strong>Non-trivial comparisons:</strong> Comparing the results is not always a straightforward task. For example comparing PDFs might be complicated due to different but negligible metadata, or a change in the http frameworks might result in different default response headers, or collections could have different orderings.</p>
</li>
<li>
<p><strong>Non-Idempotent endpoints:</strong> Idempotency should always be kept into account. For example this approach can be used for POSTs that are idempotent but not when the idempotency of the endpoint cannot be guaranteed. When doing this investigation always consider idempotency of each operation and possible side effects (for example calling another POST api, updating a database, or publishing an event).</p>
</li>
<li>
<p><strong>Not a quick-win:</strong> Even if this approach leads to a smooth and safe migration, it requires quite some time and effort to be properly set up and tuned.</p>
</li>
</ul>
<h2>Verdict</h2>
<blockquote>
<p>Implementing a parallel run is rarely a trivial affair, and is typically reserved for those cases where the functionality being changed is considered to be high risk. (...) the work to implement this needs to be traded off against the benefits you gain.</p>
<p>-- Sam Newman, Monolith to microservices</p>
</blockquote>
<p>The parallel run pattern is a powerful technique to overcome the complexities and stress of migration projects, but not every migration project is a match to use this pattern. Increasing traffic, complexities in comparing the results, and the amount of effort are the risks that should be considered before implementing this pattern.</p>
<p>In the end, this pattern is just a tool that should be used wisely considering constraints, use cases, and team capacity when planning for it. When it is done properly, it saves you a lot of headaches.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:fn1">
<p>Newman S. (2020). <em>Monolith to Microservices</em>. 2nd ed. O’Reilly Media, Inc. <a class="footnote-backref" href="#fnref:fn1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:fn2">
<p>You can learn more about this effort in a series of <a href="https://engineering.zalando.com/posts/2021/03/how-we-use-graphql-at-europes-largest-fashion-e-commerce-company.html">articles about GraphQL</a> in this blog. <a class="footnote-backref" href="#fnref:fn2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Tracing SRE’s journey in Zalando - Part III2021-10-15T00:00:00+02:002021-10-15T00:00:00+02:00Pedro Alvestag:engineering.zalando.com,2021-10-15:/posts/2021/10/sre-journey-part3.html<p>Follow Zalando's journey to adopt SRE in its tech organization.</p><p><em>This is the third and last part of our journey to roll out SRE in Zalando. You’ll find the previous chapters <a href="https://engineering.zalando.com/posts/2021/09/sre-journey-part1.html">here</a> and <a href="https://engineering.zalando.com/posts/2021/09/sre-journey-part2.html">here</a>. Thanks for following our story.</em></p>
<h2>2020 - From team to department</h2>
<p><em>The road so far:</em> 2016 saw an attempt at the rollout of a Site Reliability Engineering (SRE) organization that did not quite materialize but still left the seed of SRE in the company; in 2018 and 2019 we had a single SRE team working on strategic projects that improved the reliability of Zalando’s platform. The success of that last team brought with it many requests for collaboration, which had to be balanced with SRE’s own roadmap. In this chapter we’ll learn how SRE adapted in order to achieve sustainable growth.</p>
<p>In late 2019 there was a reorg in our Central Functions unit. This reorg was centered around a set of principles, chief among them were ”Customer Focus”, “Purpose” and “Vision”. Through that reorg <strong>SRE becomes a department</strong> that encompasses the original SRE Enablement team, the teams building monitoring services and infrastructure, and incident management. This is a clear investment from the company into the value SRE repeatedly demonstrated.
The close collaboration those teams had had in the previous years already hinted at a common purpose between them. Through the Incident Commander role and the support to Postmortems, SRE was always in close contact with Incident Management. Distributed Tracing, where SRE invested much of its efforts, was actually owned by one of the monitoring teams. Now that everyone was under the same ‘roof’ we could further strengthen the synergies that were already in place.</p>
<p><img alt="Zalando’s SRE Logo" src="https://engineering.zalando.com/posts/2021/10/images/sre-logo.png#center"></p>
<figcaption style="text-align:center">Zalando’s SRE <s>team</s> department logo</figcaption>
<p><br/></p>
<p>In 2019 SRE had already started to dedicate time to its own products, but the creation of a department further endorsed SRE’s long term plans. But with an entire department under the SRE label, we had to be smart about our next steps. Particularly in the long term. Also, we had to adjust to what it meant operating as a department. Before, with a single team we could be (and occasionally had to be) more flexible, picking ad hoc projects. But now we had teams with a better defined purpose. And we wanted to have all teams working together towards a common goal. It was time to come up with a plan for how we could implement our new purpose: <strong>to reduce the impact of incidents while supporting all builders at Zalando to deliver innovation to their users reliably and confidently</strong>. That plan was materialized into the <strong>SRE Strategy</strong>, which was published in 2020, and it set the path for the years to come.</p>
<p>Following the same set of principles that influenced the creation of the SRE department (”Customer Focus”, “Purpose” and “Vision”), the SRE Strategy had at its core <strong>Observability</strong>.
How did Observability fit with those principles and bound the three teams? For the teams developing our monitoring products it’s quite obvious. But Observability is also key for SRE: we drive our work through SLOs, and it is at the base of the <a href="https://sre.google/sre-book/part-III-practices/">Service Reliability Hierarchy</a>. Finally, Incident Management is made that much more efficient with the right Observability into our systems, by identifying issues in our platform, and also making it easier to understand what is affecting the customer experience.</p>
<p>Our strategy set a target <strong>standardizing Observability across Zalando</strong>. Through that standardization we could achieve a common understanding of Observability within the company, reduce overhead of operating multiple services and make it easier to build on top of well defined signals (like we did before with OpenTracing). The concrete step for making this possible was to develop SDKs for the major <a href="https://opensource.zalando.com/tech-radar/">programming languages at use in Zalando</a>.
<strong>Standardization</strong> was something we grew quite fond of in the previous years. While operating as a single team, doing several projects with different teams we were uniquely positioned to identify common pain points or inefficiencies across the company. But eventually we also realised one thing: <strong>as a single team it would be challenging to scale our enablement efforts to cover hundreds of teams in the company</strong>. Waiting for the practices we tried to establish to spread organically would also take too long. The only way we could properly scale our efforts and reach our goals, was to develop the tools and practices that every other team would use in their day to day work. We couldn’t do everything at once, but our new strategy gave us the starting point: Observability.</p>
<p><img alt="Service Reliability Hierarchy" src="https://engineering.zalando.com/posts/2021/10/images/service-reliability-hierarchy.jpeg#center"></p>
<figcaption style="text-align:center">Observability is also at the base of <a href="https://sre.google/sre-book/part-III-practices/">Service Reliability Hierarchy</a></figcaption>
<p><br/></p>
<p>We started collecting metrics on our performance regarding Incident Response. How many incidents were we getting? What was the Mean Time To Repair? How many were false positives? What was the impact of those incidents? Now that incident management was part of SRE, it was important to understand how the incident process was working, and how it could be improved. We were already rolling out Symptom Based Alerting, so that alone would already help with reducing the False Positive Rate. But we took it a step further and devised a new incident process that <strong>separated Anomalies and Incidents</strong>.
It’s easy to map these improvements to benefits for the business and to our customers, but there’s also something to be said about the <strong>health of our on-call engineers</strong>. Having an efficient incident process (and the right Observability into a team’s systems), goes a long way to making the lives of on-call engineers better. Pager fatigue is something that should not be dismissed, and can hurt a team through lower productivity and employee attrition.
Something important to highlight in this whole process is that <strong>we started by collecting the numbers</strong> to see if they would match what our observations had already been pointing to. This is a common practice that guides our initiatives. That is also why one of the first things we did after creating the department was to define the KPIs that would guide our work, make sure they were being measured, and facilitate the reporting of those KPIs.</p>
<p>SRE continued the rollout of <a href="https://engineering.zalando.com/posts/2022/04/operation-based-slos.html">Operation Based SLOs</a> by working closely with the senior management of several departments and agreeing on their respective SLOs. Those SLOs would be guarded by our <a href="https://www.usenix.org/conference/srecon19emea/presentation/mineiro"><strong>Adaptive Paging</strong></a> alert handler. With this we also continued the adoption of <a href="https://github.com/zalando/public-presentations/blob/master/files/2019-05-16_alerting_monitoring_and_all_that_jazz.pdf"><strong>Symptom Based Alerting</strong></a>.
With Adaptive Paging we had an interesting development. Our initial approach was to make the SLO the threshold upon which we would page the on-call responder. What we soon discovered is that it made our alerts too sensitive to occasional short lived spikes, similar to any other non-Adaptive Paging alert. We mitigated this by providing additional criteria that engineers could use to more granularly control the alert itself (time of day, throughput, length of the error rate). What initially was supposed to be a hands off task for engineers (defining alerts and thresholds), quickly led us down a path we were already familiar with. Engineers were back at defining alerting rules because the target set by the SLO was not enough. After some experiments, we improved Adaptive Paging by having it use <a href="https://sre.google/workbook/alerting-on-slos/#6-multiwindow-multi-burn-rate-alerts">Multi Window Multi Burn Rate</a> alert threshold calculation. This change resulted in two relevant outcomes. First, it brought <strong>Error Budgets</strong> to the forefront. Deciding whether to page someone or not was no longer whether the SLO was breached or not, but rather whether the Error Budget was in risk of being depleted or not. The second outcome, and arguably more important, is that we made it possible for the operations guarded by our alert handler to have their respective rules (length of the sliding windows and the alarm threshold) <strong>derived automatically from the SLO without any effort from the engineering teams</strong>, which was usually done through trial and error.</p>
<p>The challenge with rolling out Operation Based SLOs was that reporting and getting an overview of those SLOs was not easy, with the data fragmented in different tools. To address this issue, a <strong>new Service Level Management tool</strong> was developed. As we evolved the concept of SLOs, so too did we evolve the tooling that supported it. Other than reporting SLOs for the different operations, we also gave a view on the Error Budget. Knowing how much Error Budget is left makes it easier to use it to steer prioritization of development work.</p>
<p><img alt="SLO Tool" src="https://engineering.zalando.com/posts/2021/10/images/slo-tool.png#center"></p>
<figcaption style="text-align:center">Our operation based Service Level Management Tool (not actual data)</figcaption>
<p><br/></p>
<p>Late in 2020 we began developing what we called the <strong>SRE Curriculum</strong>. This was an initiative that aimed at scaling the <strong>educational benefits of SRE</strong>. Specifically, this meant sharing the wealth of knowledge that SREs have accumulated over time about the sharp edges of production. We were looking not only at raising the bar on the company’s operational capabilities, but also to facilitate any interactions with other teams by providing a common understanding on the topics covered by the curriculum. In the previous years we did several training sessions for incident response, distributed tracing, and alerting strategies. These were ad hoc engagements when teams requested our support. With the advent of the pandemic, many things changed and we had to adapt. Those training sessions were one of those things. The format for those sessions was based on having them in person. We did try to do some via video conference, but it did not have quite the same result. At the same time, the company’s Tech Academy was facing the same challenges. We grouped together to develop a new series of training sessions in a new format. The deliverables of this new format were a video and a quiz for each topic, with the content of each training being created and reviewed by subject matter experts to ensure a common understanding and a high quality training. This way we captured the knowledge that could be consumed by anyone in the company at any given time and different pace. Also, by having those training sessions part of the onboarding process, any engineer joining Zalando would get an introduction to some of the SRE practices we were rolling out.</p>
<p><img alt="Curriculum recording" src="https://engineering.zalando.com/posts/2021/10/images/studio-picture.jpeg#center"></p>
<figcaption style="text-align:center">The studio where we recorded some of the training sessions</figcaption>
<p><br/></p>
<p>The support of the SRE Enablement team is still in high-demand for ad hoc projects. After another collaboration between SRE and the Checkout teams, the senior management of that department officially pitched for the creation of an <strong>Embedded SRE team</strong>. This is something we had in the back of our minds for further down the road. But to have it being requested by another department was an interesting development. In any case, here we were. This development presented quite a few new challenges (and opportunities):</p>
<ul>
<li>What will the team work on? What will its responsibilities be?</li>
<li>Who will the team report to?</li>
<li>Is this time bound? Or is it a permanent setup?</li>
<li>If they report to separate departments, how will they review the collaboration? Or how do we do performance evaluation effectively for SREs working in a different department?</li>
<li>How will the embedded SRE team collaborate with the product development team?</li>
<li>How will the embedded team keep in sync with the central team?</li>
</ul>
<p>The Embedded team will report to the SRE department, and <strong>both SRE and product area management have aligned on a set of KPIs</strong> like Availability and On Call Health. The former will be dictated by the SLOs defined for that product area, but the latter aims at making sure the operational aspect is not having its toll on the product development team. On-call Health will be measured taking into account paging alerts and how often an individual is on-call.</p>
<p>We’re still figuring out most things as we go along, but this is an exciting development. This team will be different from the Enablement team, in the sense that it will have a much more concrete scope. This team will be able to be more hands-on on the code and tooling used within the product development team. It will be a voice for reliability within that product area, able to influence the prioritization of topics which ensure a reliable customer experience in our Fashion Store. The <strong>SRE department will also benefit from having a source providing precious feedback</strong> on whatever the department is trying to roll out to the wider engineering community.</p>
<p>You may remember from our <a href="https://engineering.zalando.com/posts/2021/09/sre-journey-part2.html">last article</a> where we mentioned that hiring was always a challenge (a topic you can also read from the experience of other companies that rolled out SRE). Now we’re planning to bootstrap another team, so that cannot be making things any easier. But the truth is that having a department with teams which were different in nature also had an <strong>unexpected benefit in our hiring</strong>. Before, our capacity constraints prevented us from hiring anyone who wasn't a good fit for the original position with the plan to further develop those people and establish the SRE mindset. Now we have the possibility to have a candidate with potential to join one of the teams in the department, and from there grow into the SRE role. Whether later they join the SRE Enablement team or not is not that important (although team rotation is something that is quite active in Zalando). Any team can benefit from having someone with the SRE mindset. Also, <strong>we strive for close collaboration within the department</strong>, so it’s not like engineers are isolated in their respective teams.</p>
<p>And this is it, mostly. You are all caught up with how SRE has been adopted in Zalando, and what we’ve been up to. And what a ride it has been! Attempting to create a full SRE organization, later starting with a single central team, reaching the limits of that team, creating a department, further growing that department with an embedded SRE team… Were we 100% successful? No (also, SREs don’t believe in 100%). But we’ve done the Postmortem where we failed, and the learnings we got from there turned into action items in our strategy. This has been working really well for us, but there’s still so much to do. There are many interesting ways that SRE can develop into, so we’re really excited to see what challenges we’ll get next. Until we reach our next stage of evolution, we’ll keep doing what we do best: dealing with ambiguity and uncertainty. And help Zalando ensure customers can buy fashion reliably!</p>
<hr>
<p><em>Curious about what might come next? Then <a href="https://zsre.page.link/enablement-job-ad">join us at SRE</a> and help write the next chapters of this story.</em></p>Tuning Image Classifiers using Human-In-The-Loop2021-10-13T00:00:00+02:002021-10-13T00:00:00+02:00Paul O'Gradytag:engineering.zalando.com,2021-10-13:/posts/2021/10/tuning-image-classifiers-using-human-in-the-loop.html<p>We present an Expectation–Maximization (EM) algorithm for iteratively estimating the optimal class-confidence threshold for an image classifier using human annotators. The algorithm is developed for classifiers that are applied to out-of-distribution images, and efficiently constructs a validation data set to estimate an optimal threshold for this use case.</p><p>In this blog post we describe an algorithm we developed when building our product image analysis infrastructure, where we use human-in-the-loop to tune the thresholds of our image classifiers. We discuss the algorithm in the following, and present some mathematical details and a simple code example in the appendices.</p>
<h2>Background</h2>
<p>When a customer browses for a product on the Zalando website they may use descriptive terms to search for what they want, for example a customer may use a specific term such as <a href="https://en.zalando.de/women/?q=leopard+print+dress"><em>leopard print dress</em></a> instead of providing a more generic term such as casual dress. One approach we use to support product search using descriptive terms is to automatically generate additional product information from product images using computer vision techniques. In particular, we train image classifiers to identify products that have a particular fashion attribute such as a specific pattern or style, e.g. leopard print, which correspond to descriptive search terms.</p>
<h2>Problem</h2>
<p>A typical image classifier generates a <em>class-confidence score</em> (a value between 0 & 1) at its output to indicate that a given input image belongs to one of the specified output classes, i.e., the image shows a particular fashion attribute. To generate a binary decision from the classifier output a <em>class-confidence threshold</em> parameter is selected based on a classifier performance metric such as <a href="https://en.wikipedia.org/wiki/Precision_and_recall"><em>precision</em> & <em>recall</em></a>. Once the threshold has been selected the model can be deployed and used to generate class labels for an input image, which can be used in product search.</p>
<p>Over time the characteristics of the input product images may change, leading to a drift in the input <em>data distribution</em>. For image classifiers that are used to generate predictions for <em>out-of-distribution</em> input images the performance of the classifier may degrade. For example this may occur when an image classifier is trained on Zalando product images before the introduction of a new photography style on a revamped Zalando website, for which there are no annotated image examples in the new style available to retrain the model.</p>
<p>To solve this problem we modify the class-confidence threshold of the classifier to compensate for data distribution drift, and developed an <a href="https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm">Expectation-Maximization (EM) algorithm</a> that we call <em>AutoThreshold</em> for this purpose. AutoThreshold estimates an <em>optimal class-confidence threshold</em> for an image classifier using manual annotations from a selection of the classifier's predictions on the out-of-distribution data. Additionally, the process of creating annotations for the out-of-distribution data helps in the generation of a new data set that can be used to train a new version of the image classifier.</p>
<h3>Selecting Classifier Thresholds</h3>
<p>The optimal threshold value for an image classifier is the class-confidence score, a value between 0 & 1, for which the set of predictions above that score leads to optimal classifier performance. Ideally this value would be 0.5, i.e., the center of the range. However, for a number of reasons this is never the case and is usually estimated post training to achieve best results.</p>
<p>The estimated optimal threshold for each output class of an image classifier is evaluated using an annotated image data set, i.e. validation set, where each image in the set is manually assigned a class label. The image classifier is tested by using the validation data set as input and comparing the classifier's predictions to the manually assigned labels. We can measure classifier performance using metrics such as precision & recall, which indicate the quality and quantity of the results. Optimizing the threshold is usually a tradeoff between precision & recall, where we want to find a threshold value that results in an <em>acceptable</em> score for both. Typically, a performance metric that combines both precision and recall, such as the <a href="https://en.wikipedia.org/wiki/F-score"><span class="math">\(f_\beta\)</span>-measure</a>, is used, and the class-confidence score that maximizes the metric is chosen as the threshold value.</p>
<h3>Estimating Thresholds in the Absence of Data</h3>
<p>For our use case there exists no training or validation data set for the out-of-distribution input image set. Furthermore, we do not annotate all images in advance, as this would be a costly, and time consuming, exercise for the scale of the data at Zalando (currently around <a href="https://en.zalando.de/catalogue/">600k products</a>). To overcome these issues we make use of the simple fact that when classifier predictions are <em>ordered</em> by class-confidence score—for a well trained image classifier—high-confidence class predictions exhibit greater correspondence with the image annotations than low-confidence predictions, which indicates model performance, and allows us to search for an optimal threshold between both extremes (demonstrated in the <a href="#annotation_plot">plot below</a>). With this in mind, we frame threshold selection as an optimization problem using manual annotators, who generate annotations to be used in the metric calculations required to estimate a threshold.</p>
<p>Specifically, we take an iterative approach, where images to be annotated are conditioned on the image classifier, and annotators annotate a subset of the classifier's most confident predictions first. The generated annotations are used to estimate a threshold using our selected performance metric, and the process is repeated until our estimated threshold converges. This process can be implemented as an Expectaton-Maximization algorithm, and describes a <em>human-in-the-loop</em> procedure, which generates a validation data set for the out-of-distribution data over a number of iterations. Furthermore, the data set is generated in an efficient way, both in terms of the number of annotations required, and the selection of image examples which contribute most to discovery of an optimal threshold.</p>
<h2>Problem Definition</h2>
<p>Taking a binary image classifier as our motivating example, which typically has a sigmoid output layer, the value generated at the output for each of the <span class="math">\(n\)</span> input images can be interpreted as a class-confidence score, or probability <span class="math">\(p_{i}\)</span>, that an input image, <span class="math">\(\mathbf{x}_i\)</span>, belongs to the output class, <span class="math">\(c\)</span>. For the purposes of image attribute identification, the predictions at the output, <span class="math">\(\mathbf{p} =[p_{1},\dots,p_{n}]\)</span>, undergo a thresholding operation to replace the class-confidence scores with a binary class label, which indicates a transform from a continuous to categorical probability distribution. Since the output layer is a sigmoid function, where output values are thresholded by the parameter <span class="math">\(t\)</span> into two binary categories, <em>true</em> & <em>false</em>, we can model the classifier's output distribution using a <a href="https://en.wikipedia.org/wiki/Bernoulli_distribution"><em>Bernoulli distribution</em></a>, i.e., <span class="math">\(P(\mathbf{x}_i=c | p_{i})\)</span>. Furthermore, the distribution of annotations also follows a Bernoulli distribution. Using these details, we frame the problem of threshold estimation within the framework of the Expectation-Maximization algorithm, where we present algorithm details below, and present a more detailed mathematical explanation in Appendix A.</p>
<h3>Threshold Estimation Using the EM Algorithm</h3>
<p>The Expectation-Maximization algorithm is an iterative method to find <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation"><em>maximum likelihood</em></a> estimates of parameters (such as our classifier threshold) in the presence of <em>unobserved latent variables.</em> In our problem setting, the predictions made by the classifier are observed by our annotators to generate image annotations. However, the order of the images presented to the annotators is conditioned on the classifier's class-confidence score, which is unknown to our annotators. As mentioned, the estimated optimal threshold corresponds to a class-confidence score, and thus our latent variable allows us to estimate an optimal threshold for our classifier. Each iteration of the EM algorithm alternates between performing an Expectation step (E-step), which constructs a likelihood function to estimate the latent variable, and a Maximization step (M-step), which computes parameters that maximize the function constructed in the E-step. For our algorithm, the E-step generates annotations for the classifier's most confident predictions and the M-step estimates the optimal class-confidence threshold using the new set of annotations. Both steps are repeated at each iteration until the estimated threshold converges.</p>
<h2>Algorithm Details - Binary Classifier</h2>
<p>For a set of images, <span class="math">\(\mathbf{X}=[\mathbf{x}_1,\dots,\mathbf{x}_n]\)</span>, and their class-confidence scores, <span class="math">\(\mathbf{p}\)</span>, we construct a set of images ordered by their scores, <span class="math">\(\mathbf{X}_{\tt asc} = {\tt sort}(\mathbf{X},\mathbf{p})\)</span>, to estimate the optimal threshold, <span class="math">\(\hat{t}\)</span>, for the output class. We use <span class="math">\(\mathbf{X}_{\tt asc}\)</span> as input to the AutoThreshold algorithm, and specify a number of hyperparamters including the subset window size <span class="math">\(m\)</span>, and classifier performance metric <span class="math">\({\tt metric}(.)\)</span> (e.g., <span class="math">\(f_{\beta}\)</span>-measure). We define a data windowing function that selects images to be annotated by centering a window of size <span class="math">\(m\)</span> on <span class="math">\(\mathbf{X}_{\tt asc}\)</span> at a position that corresponds to current threshold estimate (class-confidence score), i.e, <span class="math">\(\mathbf{X}_{\tt subset} = {\tt window}(\mathbf{X}_{\tt asc}, \hat{t}, m)\)</span>. We denote associated predictions for the windowed subset as <span class="math">\(\mathbf{p}_{\tt subset}\)</span>, and denote the annotations generated for this set as <span class="math">\(\mathbf{a}_{\tt subset}\)</span>.
Furthermore, we define a thresholding function <span class="math">\({\tt threshold}(\mathbf{p}_{\tt subset}, t)\)</span>, which generates true and false class labels from model predictions to be used as input to the performance metric.</p>
<p>The EM algorithm is outlined below:</p>
<ol>
<li>Specify hyperparameters <span class="math">\(m\)</span> & <span class="math">\({\tt metric}\)</span></li>
<li>Initialise the current threshold estimate <span class="math">\(\hat{t}\)</span> to the maximum class-confidence score, i.e. 1</li>
<li><em>E-step:</em> Generate a new subset of manual annotations, <span class="math">\(\mathbf{a}_{\tt subset}\)</span>, for the selected images, <span class="math">\(\mathbf{X}_{\tt subset} = {\tt window}(\mathbf{X}_{\tt asc}, \hat{t}, m)\)</span></li>
<li><em>M-step:</em> Estimate a new threshold estimate which corresponds to the maximum metric value for the new set of annotations, <span class="math">\(\hat{t} = {\underset {t} {\operatorname {argmax} }} \ \, {\tt metric}(\mathbf{a}_{\tt subset}, {\tt threshold}(\mathbf{p}_{\tt subset}, t))\)</span></li>
<li>Return to step 3 until convergence</li>
</ol>
<h3>Practical Details</h3>
<p>Below are some practical details on the operation of the algorithm:</p>
<ul>
<li>
<p>Note that <span class="math">\(\hat{t}\)</span> can be initialized to any value between 0 & 1, if a good initial estimate is available it can be used to initialize the algorithm, if not initializing to 1 is a good choice. Also note that when initializing to the maximum, due to edge effects, the windowing function will only capture the <span class="math">\(m/2\)</span> examples beneath <span class="math">\(\hat{t}\)</span>.</p>
</li>
<li>
<p>The EM algorithm typically converges to a local optimum, for our use case there is a global optimum, and we have observed (for a suitably selected subset size) very good convergence and results with this approach.</p>
</li>
<li>
<p>Note that as the algorithm operates on subsets of the unannotated data, and as such the number of available unannotated images, <span class="math">\(n\)</span>, could grow as the algorithm runs, so <span class="math">\(n\)</span> is not required to be fixed. Furthermore, the number of required annotations (and hence algorithm iterations) will depend on the metric and subset size chosen.</p>
</li>
</ul>
<p>Finally, for a multilabel classifier, where the output classes, <span class="math">\(\mathbf{c}= [c_1,\dots,c_k]\)</span>, are independent but not mutually exclusive of each other, the above algorithm can be performed for each class separately, where the task is to estimate <span class="math">\(\hat{t}_j\)</span> for each of the <span class="math">\(j=1,\ldots,k\)</span> classes.</p>
<h2>Threshold Estimation Example</h2>
<p>Below we present an annotation plot for a run of our EM algorithm for a <a href="https://en.zalando.de/women/?q=leopard+print+dress"><em>leopard print</em></a> image classifier, which is a binary classifier and has a single class output. The middle subplot presents the annotations for the images sent to a crowdsourcing platform, ordered in ascending class-confidence score (as illustrated by the orange curve), where positive labeled images are indicated at the top of the subplot by blue dashes and negative labelled images are indicated at the bottom of the subplot by purple dashes. We can see that for high confidence predictions there are many positive annotations with few negative annotations, illustrating that the classifier is performing well. However there is a point at which the occurrence of positive labels is frequently punctuated by negative annotations, illustrating that the classifier performs poorly beyond this point. We can see from the subplot that the threshold estimated by the EM algorithm (as indicated by the black dot) is positioned just before the classifier begins to perform poorly, which demonstrates the algorithm's usefulness in estimating an optimal class-confidence threshold. Furthermore, the annotation density subplot indicates a natural separation between the cluster of positive and negative annotations, and the estimated threshold corresponds to this also.</p>
<div id="annotation_plot"></div>
<p><img alt="leopard print annotations analysis" src="https://engineering.zalando.com/posts/2021/10/images/leopard_print_annotations.png"></p>
<p>To illustrate further we present a <em>slope plot</em> below, where we generate a cumulative sum of annotations and examine the slope of the curve, where annotations are assigned values 1 & 0 for positive and negative labels respectively, and are ordered by the class-confidence scores generated by the classifier (as was the case in the previous plot). The resultant plot is piecewise linear, where flat-line segments in the curve above the threshold represent consecutive False Positives, whereas those beneath the threshold represent consecutive True Negatives. Conversely, sloped-line segments in the curve above the threshold represent consecutive True Positives, whereas those beneath the threshold represent consecutive False Negatives. For our purposes we would like the curve above the threshold to have a slope as close to 1 as possible, and on average to have a steeper slope above the threshold than beneath it.</p>
<p><img alt="leopard print slope plot" src="https://engineering.zalando.com/posts/2021/10/images/leopard_print_slope_plot.png"></p>
<p>In the slope plot we observe the following:</p>
<ul>
<li>There are many long sloped-line segments above the threshold, whereas there are few beneath the threshold</li>
<li>There are many long flat-line segments beneath the threshold, whereas there are few above the threshold</li>
<li>The slope on average above the threshold is steeper than beneath it</li>
</ul>
<p>Therefore, for the leopard print image classifier predictions, we see that the threshold estimated by the AutoThreshold algorithm successfully identifies an appropriate class-confidence threshold.</p>
<h2>Conclusion</h2>
<p>We have presented a novel algorithm for the task of optimal threshold estimation for an image classifier that is applied to out-of-distribution data, where an EM algorithm and human-in-the-loop is used to generate annotations for the out-of-distribution data, which are used to calculate a threshold to compensate for the difference in distributions. The algorithm is simple to implement, and is efficient in terms of the number of annotated image examples required to estimate an optimal threshold.</p>
<p>In future work, we will explore using the EM algorithm and human-in-the-loop to train a classifier in the context of <a href="https://en.wikipedia.org/wiki/Active_learning_(machine_learning)"><em>active learning</em></a>, i.e., the case where there is no annotated data set to train a classifier.</p>
<p><em>If you would like to work on similar problems, consider joining our <a href="https://jobs.zalando.com/en/tech/jobs/?filters%5Bcategories%5D%5B0%5D=Product%20Design%20%26%20User%20Research&filters%5Bcategories%5D%5B1%5D=Applied%20Science&filters%5Bcategories%5D%5B2%5D=Software%20Engineering&filters%5Bcategories%5D%5B3%5D=Product%20Management%20%28Technology%29&search=machine%20learning">Data Science teams!</a></em></p>
<h2>Appendix A: Mathematical Details</h2>
<p>Below we provide further details on the presented algorithm's interpretation as an <a href="https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm">Expectation-Maximization (EM) algorithm</a>.</p>
<h3>EM Algorithm Description</h3>
<p>Using standard notation, the EM algorithm can be described as follows: For a set of observed data <span class="math">\(\mathbf{X}\)</span> generated from a statistical model with unknown parameters <span class="math">\(\boldsymbol{\theta}\)</span>, and a set of latent variables <span class="math">\(\mathbf{Z}\)</span>, which are unobserved but effect the distribution of the data nonetheless, we estimate the values for <span class="math">\(\boldsymbol{\theta}\)</span> by maximizing the marginal likelihood of the observed data,</p>
<p><span class="math">\({\displaystyle L({\boldsymbol {\theta }};\mathbf {X} )=p(\mathbf {X} \mid {\boldsymbol {\theta }})=\int p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }})p(\mathbf {X} \mid {\boldsymbol {\theta }})\,d\mathbf {Z} }\)</span>,</p>
<p>i.e, we generate a <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation">maximum likelihood estimate (MLE)</a> for <span class="math">\(\boldsymbol{\theta}\)</span>. However, this quantity is often intractable since <span class="math">\(\mathbf {Z}\)</span> is unobserved and its distribution is unknown before obtaining <span class="math">\(\boldsymbol{\theta}\)</span>.</p>
<p>The EM algorithm seeks to overcome this issue, and finds the MLE of the marginal likelihood by iteratively maximizing a specifed <span class="math">\(Q\)</span> function, which is defined as the expected value of the log likelihood function of <span class="math">\({\boldsymbol {\theta }}\)</span>, i.e., <span class="math">\(Q({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)})=\operatorname {E} _{\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}^{(t)}}\left[\log L({\boldsymbol {\theta }};\mathbf {X} ,\mathbf {Z} )\right]\,\)</span>. The <span class="math">\(Q\)</span> function is maximized over two steps: In the first step—the E-step—the data-dependent parameters of the <span class="math">\(Q\)</span> function are calculated, while in the second step—the M-step—we seek to maximize the function constructed in the E-step over the parameters <span class="math">\(\boldsymbol{\theta}\)</span>, where the value that achieves the maximum is our new estimate, <span class="math">\(\boldsymbol {\theta }^{(t)}\)</span>.</p>
<h3>AutoThreshold as an EM Algorithm</h3>
<p>Using the above notation and translating to our algorithm description, our observations, <span class="math">\(\mathbf{X}\)</span>, are the vector of annotations generated by human-in-the-loop, <span class="math">\(\mathbf{a}\)</span>; our unobserved latent variables, <span class="math">\(\mathbf{Z}\)</span>, are the ordered classifier predictions used to generate <span class="math">\(\mathbf{a}\)</span>, i.e. <span class="math">\(\mathbf{p}\)</span>; and the unknown model parameters, <span class="math">\({\boldsymbol {\theta }}\)</span>, are defined by the statistical model used to generate <span class="math">\(\mathbf{X}\)</span>, which in our case is the <em>Bernoulli distribution</em>, as the annotators answer a yes-no question when generating annotations for our image data set. For the Bernoulli distribution, there is single model parameter <span class="math">\(p\)</span>, which is simply the probability that an observation will be true.</p>
<p>For our use case, where we estimate a class-confidence threshold, <span class="math">\(\hat{t}\)</span>, for an image classifier in order to generate binary predictions, the parameter <span class="math">\(p\)</span> has a direct correspondence, which can be explained as follows: For an ideal image classifier with perfect accuracy applied to a balanced data set (i.e., a data set with an equal number of true and false examples) the output distribution of the class labels will be uniform and the parameter <span class="math">\(p\)</span> will be 0.5, as all predictions will be correct, and a true or false outcome will have equal probability as the observations are balanced. Similarly, in the ideal case the sigmoid units at the output will be perfectly normalized and the class-confidence threshold used to assign predictions to categories will also be 0.5 (as is the standard assumption with logistic regression analysis etc.). Also, 0.5 corresponds to the sample mean of the observed predictions (where true & false are represented numerically by 1 & 0) which is the <em>MLE</em> for the parameter <span class="math">\(p\)</span>.</p>
<h3>Known Unknowns</h3>
<p>As we move away from the ideal case where the data may not be balanced or the image classifier may exhibit errors, the parameter <span class="math">\(p\)</span> and threshold <span class="math">\(t\)</span> deviate from 0.5 and both become unknown (but still remain in the range from 0 to 1), since the classifier's output class distribution, <span class="math">\(\mathbf{y}\)</span>, becomes unknown. However, a direct correspondence between the two parameters remains. To overcome this issue, and estimate an appropriate value for <span class="math">\(t\)</span> using a known distribution, i.e., <span class="math">\(\hat{t}\)</span>, we generate a validation data set, i.e., a set of manually annotated images, and test the image classifier by generating class predictions for the images then compare against the image annotations. The goal is to estimate a value for <span class="math">\(\hat{t}\)</span> that will generate a class label output distribution, <span class="math">\(\mathbf{y}\)</span>, as close as possible to <span class="math">\(\mathbf{a}\)</span>.</p>
<p>However, as already discussed in this article, there are additional practical considerations when evaluating the performance of an image classifier such as precision & recall, and simply comparing annotations to class predictions to determine performance may not lead to the selection of a useful classifier. To choose a suitable image classifier, the effect of the class-confidence threshold itself must be considered, which leads to a meta-labeling of the model's class predictions using the annotations in the validation data set. In particular, all positively annotated images that are correctly classified are known as True Positives (TP), whereas those that are incorrectly classified are known as False Negatives (FN). Conversely, all negatively annotated images that are correctly classified are known as True Negatives (TN), whereas those that are incorrectly classified are known as False Positives (FP).</p>
<p>Using these four categories of class prediction, a performance metric can indicate how close an image classifier's class output distribution is to the validation data set, while also giving an indication of the classifier's performance when it comes to precision & recall.</p>
<h3>Averages Over Categories</h3>
<p>As mentioned above an important component of the EM algorithm is how to calculate the maximum likelihood estimate for the unknown parameter <span class="math">\(\boldsymbol{\theta}\)</span>. For our use case where the observations are generated by a Bernoulli distribution, the MLE for the parameter <span class="math">\(p\)</span> is the sample mean. Although, as discussed above, for our use case we must also consider precision & recall, which necessitates the use of a performance metric to determine a class-confidence threshold that optimizes <span class="math">\(p\)</span> with respect to the validation data set. However, performance metrics such as precision & recall can be interpreted as <em>averages over categories</em>, which provides a direct connection to the MLE for <span class="math">\(p\)</span>. For example, recall can be considered an average over the meta-labeled positive annotations TP & FN, i.e., recall = TP/(TP+FN); while precision can be considered an average over the meta-labeled annotations above the threshold, i.e., precision = TP/(TP+FP). Furthermore, as discussed, precision and recall may be combined to create a performance metric such as the <span class="math">\(f_\beta\)</span>-measure, such derived performance metrics also perform averaging over the values for precision & recall. In summary, for a chosen performance metric, the optimal value for <span class="math">\(\hat{t}\)</span> has the effect of generating a Bernoulli distribution <span class="math">\(\mathbf{y}\)</span> which is a close as possible to <span class="math">\(\mathbf{a}\)</span>, and also specifies a level of control over precision and recall.</p>
<h3>Optimization Loop</h3>
<p>Now that we have described how the AutoThreshold algorithm fits within the framework of the EM algorithm, we will provide further detail on the algorithm's optimization loop.</p>
<p>At each iteration, the number of items in <span class="math">\(\mathbf{a}\)</span>, and their corresponding <span class="math">\(\mathbf{p}\)</span>, increases by our specified window size, <span class="math">\(m\)</span>, which increases the amount of data available to calculate our specified performance metric, <span class="math">\({\tt metric}(.)\)</span>, and also increases the number of possible values to be used to maximize <span class="math">\(\hat{t}\)</span>. Where we increase the available observations in the E-step (by generating new annotations from our most confident predictions) and maximize the threshold in the M-step to estimate the optimal threshold. Here the E-step is arguably most important, since it generates the required validation data set, as the original problem is to generate a sufficient number of annotations for an unannotated data set to estimate a threshold. Furthermore, in the E-step, we increase the available observations using a suitably large subset size until the algorithm converges, which allows us to minimize overall the number of annotations needed to estimate an optimal threshold, which is what we wish to achieve with this algorithm.</p>
<h3>Finally</h3>
<p>To conclude we present some other interesting points to consider about this algorithm:</p>
<ul>
<li>
<p>For this use case we apply the EM algorithm to a discrete probability distribution using categorical observations, i.e., annotations. Typically EM is applied to problems where observations are drawn from a continuous probability distribution, such as the Gaussian distribution.</p>
</li>
<li>
<p>For this use case we have our latent variables, <span class="math">\(\mathbf{p}\)</span>, before we obtain our observations, <span class="math">\(\mathbf{a}\)</span>. This is the reverse of the standard implementation of EM, and illustrates the flexibility of the EM algorithm's two-step learning iteration when applied to human-in-the-loop.</p>
</li>
<li>
<p>For this use case we have human-generated observations, where usually the EM algorithm is applied to <a href="https://asp-eurasipjournals.springeropen.com/articles/10.1155/2008/784296">sensor observations</a>.</p>
</li>
</ul>
<h2>Appendix B: AutoThreshold Python Implementation</h2>
<p>Below we present a simple code implementation of the AutoThreshold algorithm applied to a binary classification task using synthetic data.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env python3.8</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">namedtuple</span>
<span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">f1_score</span>
<span class="n">SyntheticData</span> <span class="o">=</span> <span class="n">namedtuple</span><span class="p">(</span><span class="s1">'SyntheticData'</span><span class="p">,</span> <span class="p">[</span><span class="s1">'predictions'</span><span class="p">,</span> <span class="s1">'annotations'</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">generate_predictions_and_annotations</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Returns synthetic predictions and annotations for a step classifier response,</span>
<span class="sd"> ordered by prediction score.</span>
<span class="sd"> Note: The returned synthetic data has an optimal threshold at 0.5</span>
<span class="sd"> """</span>
<span class="n">predictions</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
<span class="n">annotations</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">((</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">n</span><span class="o">//</span><span class="mi">2</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">n</span><span class="o">//</span><span class="mi">2</span><span class="p">)))</span>
<span class="k">return</span> <span class="n">SyntheticData</span><span class="p">(</span><span class="n">predictions</span><span class="p">,</span> <span class="n">annotations</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">predictions_generator</span><span class="p">(</span><span class="n">synthetic_data</span><span class="p">,</span> <span class="n">thresh_ind</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Returns predictions for the current subset window as specified by `thresh_ind`.</span>
<span class="sd"> Note: In the normal operation of AutoThreshold this step would generate predictions</span>
<span class="sd"> for our out-of-distribution images from our image classifier. Here, our toy example</span>
<span class="sd"> is run on synthetic data and our precomputed predictions are simply returned.</span>
<span class="sd"> """</span>
<span class="k">return</span> <span class="n">synthetic_data</span><span class="o">.</span><span class="n">predictions</span><span class="p">[</span><span class="n">thresh_ind</span><span class="o">-</span><span class="n">M</span><span class="o">//</span><span class="mi">2</span><span class="p">:</span><span class="n">thresh_ind</span><span class="o">+</span><span class="n">M</span><span class="o">//</span><span class="mi">2</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">annotations_generator</span><span class="p">(</span><span class="n">synthetic_data</span><span class="p">,</span> <span class="n">thresh_ind</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Returns annotations for the current subset window as specified by `thresh_ind`.</span>
<span class="sd"> Note: In the normal operation of AutoThreshold this step would source annotations</span>
<span class="sd"> from a crowdsourcing platform. Here, our toy example is run on synthetic data and</span>
<span class="sd"> our precomputed annotations are simply returned.</span>
<span class="sd"> """</span>
<span class="k">return</span> <span class="n">synthetic_data</span><span class="o">.</span><span class="n">annotations</span><span class="p">[</span><span class="n">thresh_ind</span><span class="o">-</span><span class="n">M</span><span class="o">//</span><span class="mi">2</span><span class="p">:</span><span class="n">thresh_ind</span><span class="o">+</span><span class="n">M</span><span class="o">//</span><span class="mi">2</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">calculate_optimal_threshold</span><span class="p">(</span><span class="n">annotations</span><span class="p">,</span> <span class="n">predictions</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Returns the index of the optimal threshold using the F1 score.</span>
<span class="sd"> **Example:**</span>
<span class="sd"> >>> predictions = [0, 0.2, 0.4, 0.6, 0.8, 1.0]</span>
<span class="sd"> >>> annotations = [0, 0, 0, 1, 1, 1]</span>
<span class="sd"> >>> thresh_ind = calculate_optimal_threshold(annotations, predictions)</span>
<span class="sd"> >>> threshold = predictions[thresh_ind]</span>
<span class="sd"> >>> threshold</span>
<span class="sd"> 0.6</span>
<span class="sd"> """</span>
<span class="n">scores</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">threshold</span> <span class="ow">in</span> <span class="n">predictions</span><span class="p">:</span>
<span class="n">labels</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">prediction</span> <span class="ow">in</span> <span class="n">predictions</span><span class="p">:</span>
<span class="n">label</span> <span class="o">=</span> <span class="mi">1</span> <span class="k">if</span> <span class="n">prediction</span> <span class="o">>=</span> <span class="n">threshold</span> <span class="k">else</span> <span class="mi">0</span>
<span class="n">labels</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">label</span><span class="p">)</span>
<span class="n">scores</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">f1_score</span><span class="p">(</span><span class="n">annotations</span><span class="p">,</span> <span class="n">labels</span><span class="p">))</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">auto_threshold</span><span class="p">(</span><span class="n">synthetic_data</span><span class="p">,</span> <span class="n">annotation_generator</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Main loop of the AutoThreshold algorithm.</span>
<span class="sd"> """</span>
<span class="c1"># Specify initial estimate; here we start from the highest confidence which is</span>
<span class="c1"># the n-th ordered prediction</span>
<span class="n">thresh_ind</span> <span class="o">=</span> <span class="n">N</span>
<span class="n">thresh_est</span> <span class="o">=</span> <span class="n">synthetic_data</span><span class="o">.</span><span class="n">predictions</span><span class="p">[</span><span class="n">thresh_ind</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">MAX_ITERS</span><span class="p">):</span>
<span class="c1"># E-Step: Generate annotations for the subset of ordered predictions</span>
<span class="n">predictions_subset</span> <span class="o">=</span> <span class="n">predictions_generator</span><span class="p">(</span><span class="n">synthetic_data</span><span class="p">,</span> <span class="n">thresh_ind</span><span class="p">)</span>
<span class="n">annotations_subset</span> <span class="o">=</span> <span class="n">annotations_generator</span><span class="p">(</span><span class="n">synthetic_data</span><span class="p">,</span> <span class="n">thresh_ind</span><span class="p">)</span>
<span class="c1"># M-Step: Estimate local threshold index for the newly annotated subset</span>
<span class="n">thresh_ind_subset</span> <span class="o">=</span> <span class="n">calculate_optimal_threshold</span><span class="p">(</span><span class="n">annotations_subset</span><span class="p">,</span> <span class="n">predictions_subset</span><span class="p">)</span>
<span class="c1"># Estimate new threshold</span>
<span class="n">thresh_ind_old</span> <span class="o">=</span> <span class="n">thresh_ind</span>
<span class="n">thresh_ind</span> <span class="o">=</span> <span class="p">(</span><span class="n">thresh_ind_old</span> <span class="o">-</span> <span class="n">M</span><span class="o">//</span><span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="n">thresh_ind_subset</span>
<span class="n">thresh_est</span> <span class="o">=</span> <span class="n">synthetic_data</span><span class="o">.</span><span class="n">predictions</span><span class="p">[</span><span class="n">thresh_ind</span><span class="p">]</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Iter: </span><span class="si">{}</span><span class="s1">, Est: </span><span class="si">{:.3f}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">thresh_est</span><span class="p">))</span>
<span class="c1"># Check convergence</span>
<span class="k">if</span> <span class="n">thresh_ind</span> <span class="o">==</span> <span class="n">thresh_ind_old</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Converged'</span><span class="p">)</span>
<span class="k">break</span>
<span class="k">return</span> <span class="n">thresh_est</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">AutoThreshold Toy Example.</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
<span class="c1"># Specify arguments: Max algorithm iterations, number of synthetic predictions & subset size</span>
<span class="n">MAX_ITERS</span> <span class="o">=</span> <span class="mi">25</span><span class="p">;</span> <span class="n">N</span> <span class="o">=</span> <span class="mi">10000</span><span class="p">;</span> <span class="n">M</span> <span class="o">=</span> <span class="mi">500</span>
<span class="c1"># Synthetically generate ordered classifier predictions and annotations</span>
<span class="n">synthetic_data</span> <span class="o">=</span> <span class="n">generate_predictions_and_annotations</span><span class="p">(</span><span class="n">N</span><span class="p">)</span>
<span class="c1"># Run AutoThreshold to estimate optimal classifier threshold</span>
<span class="n">thresh_est</span> <span class="o">=</span> <span class="n">auto_threshold</span><span class="p">(</span><span class="n">synthetic_data</span><span class="p">,</span> <span class="n">annotations_generator</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">Estimated threshold value: </span><span class="si">{:.3f}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">thresh_est</span><span class="p">))</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"</span><span class="se">\n\t</span><span class="s2">Fin.</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>
<p>Code output will look like:</p>
<div class="highlight"><pre><span></span><code>$ ./autothreshold.py
AutoThreshold Toy Example.
Iter: 0, Est: 0.975
Iter: 1, Est: 0.950
Iter: 2, Est: 0.925
Iter: 3, Est: 0.900
Iter: 4, Est: 0.875
Iter: 5, Est: 0.850
Iter: 6, Est: 0.825
Iter: 7, Est: 0.800
Iter: 8, Est: 0.775
Iter: 9, Est: 0.750
Iter: 10, Est: 0.725
Iter: 11, Est: 0.700
Iter: 12, Est: 0.675
Iter: 13, Est: 0.650
Iter: 14, Est: 0.625
Iter: 15, Est: 0.600
Iter: 16, Est: 0.575
Iter: 17, Est: 0.550
Iter: 18, Est: 0.525
Iter: 19, Est: 0.500
Iter: 20, Est: 0.500
Converged
Estimated threshold value: 0.500
Fin.
</code></pre></div>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Space efficient machine learning feature stores using probabilistic data structures - a benchmark2021-10-05T00:00:00+02:002021-10-05T00:00:00+02:00Enno Shiojitag:engineering.zalando.com,2021-10-05:/posts/2021/10/space-efficient-machine-learning-feature-stores-using-probabilistic-data-structures.html<p>In this post, we describe a technique for providing machine learning feature stores with sublinear space requirements, and perform a benchmark that uses bloom filter as the backing data store. Such feature stores can be an effective alternative to the commonly used key-value-store-based feature stores in certain situations.</p><p><img alt="Bloom Filter" src="https://engineering.zalando.com/posts/2021/10/images/bloom_filter.png#previewimage"></p>
<h2>The problem</h2>
<p>When building Machine Learning (ML) applications - such as recommender systems - there is often a need to provide a "feature store" which can enrich the request to the system with additional ML features.</p>
<p>For example: whether a user had looked at an article before is often very informative about whether the user will click or buy that article this time. So, companies keep a record of what article their users had clicked bought recently, and use this data in their recommender systems. Other commonly used data include: past browsing history, purchase history, user information like demographics, explicit preferences they shared etc.</p>
<p>These data are usually stored in key-value stores like Redis, using the user ID as the key, and the features as value.</p>
<p>When a request is made to the recommender system, a query is made to this key-value store using the user ID, and the retrieved features are fed to the recommendation algorithm together with the data contained in the original request. When there are many users, these feature stores can easily get very large.</p>
<p>This creates significant challenges in terms of the development and operation of ML applications.</p>
<ul>
<li><strong>They add to the processing time</strong>: Adding a network call commonly adds 2-10ms to your response time. To make matters worse, it also adds a lot of variance to the response time due to the variation of message sizes across users</li>
<li><strong>Additional hosting costs/maintenance cost</strong>: Distributed databases with strict performance requirements can be expensive to host</li>
<li><strong>Additional operational complexities</strong>: Operations like backfill can become very expensive to setup/execute</li>
<li><strong>Development complexities</strong>: An external database adds a dependency to the application code, which adds some complexity to the development/testing process (like having to pre-populate this DB for tests). Intrusive performance optimizations like size limits, aggregations, prioritization of users are often necessary, which adds development time and increases the coupling between model design and infrastructure</li>
<li><strong>Multiple lookups can be prohibitively expensive</strong>: For example: imagine you want to rank a thousand products, and want to retrieve features for each product - this would be extremely difficult with an external database under strict latency budget. Another hypothetical example is retrieving features for composite keys (interactions), e.g. "How many times were product X and Y bought together?". If the feature state is small enough to live in the same processes' memory, multiple look-ups are far cheaper and thus feasible.</li>
</ul>
<h2>The solution</h2>
<p>What if, instead of having a big, unwieldy database, we could read a much smaller dataset into memory, and query that as a feature store from within the process? This is essentially what we can do with "sketching" data structures, a type of probabilistic data structures.</p>
<p>Sketching data structures can store large amounts of data in a compact (sublinear) space at the expense of accuracy. In other words, they store a "summary" of the original data. They are essentially a lossy compression algorithm for your features. Just like JPEG compression for your images, it can compress input data at varying "compression levels" - low-compression level means better quality but larger sizes, and high-compression level means lower quality but smaller sizes.</p>
<p>This allows us to trade-off accuracy in exchange for space requirements. As we will see below, the trade-off is highly favorable - a very small sacrifice in accuracy can save a lot of space.</p>
<p>In this article we will only describe and benchmark bloom-filter-backed feature stores in detail, but theoretically, other sketching data structures like <a href="https://en.wikipedia.org/wiki/HyperLogLog">HyperLogLog</a>, <a href="https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch">Count-Min Sketch</a>, <a href="https://en.wikipedia.org/wiki/Quotient_filter">Quotient Filters</a> etc. could be used, too.</p>
<h2>Benchmark of a sketching-data-structure-based feature store backed by a Bloom-Filter</h2>
<p>Below is a benchmark based on a real-life click prediction dataset. It shows that prediction models that use a bloom-filter-based feature store can achieve the same level of prediction accuracy & prediction throughput with a vastly smaller feature state that can easily be fit into memory.</p>
<h3>Benchmark setting</h3>
<p>We used a real-life click prediction dataset which has two types of features:</p>
<ul>
<li><strong>Request features</strong>: Features that are immediately available in the request, like country, article id, device type, context URL and so on</li>
<li><strong>Historical features</strong>: Features that are based on accumulated historical data, like browsing history, purchase history, preferences that were saved in the past etc.</li>
</ul>
<p>The historical features were aggregated using count, max etc. (e.g. how many times did a user browse an item, what was the last time they looked at it etc.) and were then discretized to yield categorical features. They were then stored into feature stores.</p>
<p>The training data had about 5.7 mil examples. Out of these 5.7 mil examples, 2.8 mil had historical data (the rest had only request features). Combined, the data had 1.762 bil data points after feature extraction.</p>
<p>Finally, a logistic regression classifier was used to predict clicks. Our variants were as follows:</p>
<ul>
<li><strong>No history</strong>: A model without a feature store (so that it could only use request features)</li>
<li><strong>Uncompressed history</strong>: A model that simulated use of a conventional feature store (the features were pre-fetched)</li>
<li><strong>Compressed history</strong>: A model that used a bloom filter based compressed feature store</li>
</ul>
<h3>Implementation of the bloom-filter-based-compressed-feature-store</h3>
<p>Below is a simplified implementation in Python that illustrates how the feature store was implemented. It returns what articles a user had looked at before, given their user_id. This is not the actual implementation that was used in the benchmark. The benchmark used a JVM-based implementation, and was more general in nature (it stored arbitrary categorical features).</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Set</span>
<span class="kn">from</span> <span class="nn">bloom_filter</span> <span class="kn">import</span> <span class="n">BloomFilter</span>
<span class="k">class</span> <span class="nc">FeatureStore</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">store</span><span class="p">:</span> <span class="n">BloomFilter</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">store</span> <span class="o">=</span> <span class="n">store</span>
<span class="bp">self</span><span class="o">.</span><span class="n">possible_articles</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">user_id</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">article_ids</span><span class="p">:</span> <span class="n">Set</span><span class="p">[</span><span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="k">for</span> <span class="n">article_id</span> <span class="ow">in</span> <span class="n">article_ids</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">possible_articles</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">article_id</span><span class="p">)</span>
<span class="n">composite_key</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">user_id</span><span class="si">}</span><span class="s1">^</span><span class="si">{</span><span class="n">article_id</span><span class="si">}</span><span class="s1">'</span>
<span class="bp">self</span><span class="o">.</span><span class="n">store</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">composite_key</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">retreive_articles</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">user_id</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="n">Set</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
<span class="n">ret</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="k">for</span> <span class="n">article_id</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">possible_articles</span><span class="p">:</span>
<span class="n">composite_key</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">user_id</span><span class="si">}</span><span class="s1">^</span><span class="si">{</span><span class="n">article_id</span><span class="si">}</span><span class="s1">'</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">store</span><span class="o">.</span><span class="n">might_contain</span><span class="p">(</span><span class="n">composite_key</span><span class="p">):</span>
<span class="n">ret</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">article_id</span><span class="p">)</span>
<span class="k">return</span> <span class="n">ret</span>
</code></pre></div>
<p>The most important element to point out is the additional state <code>self.possible_articles</code>. This would hold the set of all possible features (in this case, all article IDs), and the code is brute forcing all of them in order to reconstruct the set of articles viewed by the user. This may appear to be a very expensive thing to do, but in practice it is very cheap in relation to the total processing. In my simple benchmark, the difference was undetectable. It is also worth noting that this process could be optimized, for example through the use of binary search, and/or by only querying for important features.</p>
<p>The compressed history variant had a parameter that determined the level of compression - i.e. higher compression level meant lower quality and size, lower compression level meant higher quality and size. What do we mean by "quality" here? In a nutshell, the bloom filter tells us if a binary categorical feature is present (1) or not (0). When the bloom filter says a feature is NOT present, it is always correct - i.e. there are no false negatives. However, when the bloom filter says a feature is present, it can be an error. In other words, at some probability, we will mistakenly set the feature value to 1, when in fact it should have been 0 (i.e. false positive). This adds noise to our model's input. This probability can be tuned via a parameter, and the higher the false positive rate, the smaller the state size.</p>
<p>For more details on how this compression level parameter works, and generally how bloom filters work and their characteristics, see e.g. <a href="https://llimllib.github.io/bloomfilter-tutorial/">here</a>, <a href="https://en.wikipedia.org/wiki/Bloom_filter">here</a> and <a href="https://freecontent.manning.com/all-about-bloom-filters/">here</a>.</p>
<p>As an evaluation metric, we used click <a href="https://en.wikipedia.org/wiki/Receiver_operating_characteristic">ROC-AUC (Area Under the Curve of the Receiver Operating Characteristic curve)</a>, a common metric for recommender systems.</p>
<h3>Result</h3>
<p>The scatter plot below shows the AUC (y axis) of the classifier at varying compression levels (x axis = size of the feature store in bytes in logarithmic scale). The dotted green line is the AUC with a key-value-store-based feature store equivalent (i.e. Uncompressed). The dotted red line is the AUC without any history features (i.e. No history).</p>
<p><img alt="AUC plotted against feature state size" src="https://engineering.zalando.com/posts/2021/10/images/feature_store-to-roc_auc-plot.png"></p>
<p>As expected, our bloom-filter-backed feature store achieves performances between the two lines (uncompressed ~= 0.80 and no history ~= 0.70).</p>
<p>The estimated size of the key-value-store-based feature store was about 15GB. Hence, the results show that our compressed feature store achieves the same level of classification performance (AUC~=0.7997) using just 3% of memory (470MB vs 15GB). The state size can be further reduced at the expense of classification performance. For example, 90% of the uplift provided by the feature store can be retained by using merely ca. 40MB of state (AUC~=0.79). This would be just 0.3% of the size of an uncompressed feature store. Note that this "saving" grows as the data volume increases due to the sublinear space complexity.</p>
<p>When it comes to throughput (computational efficiency), all of the variants achieved similar throughput (20-22k predictions per second per core on my 2018 Mac). I.e. the additional overhead was undetectable with my performance tests.</p>
<h2>The Limitation</h2>
<p>So the benchmark results look very good - why would anyone use a conventional key-value-store-based feature store at all? Alas, the new feature stores come with severe limitations and are thus not a drop-in replacement for conventional feature stores.</p>
<h3>You have to know what to ask</h3>
<p>As described above, we need to keep the set of possible features in order to get the desired output. In a lot of use cases this is not an issue, but in some situations it may be prohibitively expensive (e.g. imagine reconstructing bag-of-word encoding of past user reviews).</p>
<h3>They are difficult to update (and thus keep them "fresh")</h3>
<p>The second, and probably by far the more important weakness is the difficulty associated with updating them.</p>
<p>Feature "freshness", as in how quickly recent events can be reflected to the feature store is very important, as recent events tend to have high informational value. Many distributed key-value stores have good write performance, and thus it's very feasible to keep them very "fresh" even when high load is involved. The situation is very different with sketching-data-structure-based feature stores.</p>
<p>First, let's consider the appending of new information to our new feature store.</p>
<p>Most sketching-data-structure (including bloom filters) allow incremental appends (so far, so good). However, since the complete state is loaded onto each node's RAM, every write must be applied on every node - so that each node (process) must be able to handle 100% of event traffic. This is usually impossible - common event streams like views, clicks are usually very high volume, and processing that amount of writes on a single node is not a practical option. One could consider batching, but in many key-value-based feature store, the target update latency is shorter than a few seconds - which makes this option extremely difficult.</p>
<p>Theoretically bloom filters could be distributed so that each node only needs to process a shard of the traffic - but at this point one would have converted one's real-time transaction server into a distributed database.</p>
<p>Second, let's consider deletion (expiry) of information.</p>
<p>The situation is even worse, because due to their nature, sketching-data-structures don't allow deletes of individual records. Thus, to delete a record from our new feature state, one has to completely regenerate it by re-processing the entire source dataset again (sans the information we want to delete). This is extremely expensive and thus can only be done on a low-frequency batch basis. There are some sketching-data-structure variants that allow some degree of expiry (see e.g. <a href="https://arxiv.org/pdf/2001.03147.pdf">Age-partitioned Bloom Filters</a>, but there are no mature implementations available.</p>
<h3>They cannot support complex queries and updates</h3>
<p>Finally, sketching-data-structure-based feature stores don't support complex queries or updates like "remove all events that happened on day X". With key-value-store-based feature stores, the additional cost of storing some metadata (like event timestamps) is relatively minor. But this can be a major undertaking for sketching-data-structure-based feature store.</p>
<h2>Conclusion</h2>
<p>Sketching-data-structure-based feature stores can not substitute conventional feature stores in all use cases, but they can be an attractive option when using an external feature store is prohibitively expensive. For example, if:</p>
<ul>
<li>One can't afford the additional network call to an external feature store</li>
<li>Many feature lookups need to be performed per one request</li>
</ul>
<hr>
<p><em>Are you interested in working on similar problems? Consider a <a href="https://jobs.zalando.com/de/jobs/3402876-senior-software-engineer-java-scala-m-f-d">Senior Software Engineer</a> position in our teams.</em></p>Tracing SRE’s journey in Zalando - Part II2021-09-21T00:00:00+02:002021-09-21T00:00:00+02:00Pedro Alvestag:engineering.zalando.com,2021-09-21:/posts/2021/09/sre-journey-part2.html<p>Follow Zalando's journey to adopt SRE in its tech organization.</p><p><em>Welcome to the second part of our journey establishing SRE in Zalando. You’ll find the <a href="https://engineering.zalando.com/posts/2021/09/sre-journey-part1.html">first part here</a>. Don’t miss out on the third and final post in one week.</em></p>
<h2>2018 - The Return of SRE</h2>
<p>In our previous blog post we left it with the plans for Site Reliability Engineering (SRE) in Zalando having to change. So, what were those changes and what were the challenges we faced in this new iteration?
In this blog post we’ll go straight to the first quarter of 2018, when two sister SRE teams were bootstrapped around the same time in different departments. One of them was the <strong>SRE Enablement</strong> team in Digital Foundation (DF - a central functions department). The other was the <strong>Digital Experience SRE</strong> team (DX - the department responsible for the customer facing part of our Fashion Store). The last one was created from a grassroots initiative, but the DF one was reimagined by management of that department.</p>
<p>Since the decision made back in 2017 to grow the number of teams on call, the issue with overwhelmed on call teams was gone. As expected, the side effect of that decision was that teams were now much more aware of the operational burden of their services and would take steps to reduce that burden. Post-Mortems started becoming a regular practice in 2017, which also helped (although the practice was not yet well established). But while teams were slowly becoming more ‘operationally capable’, the complexity of our platform was growing at a much faster pace, with no one to keep a holistic view on the service landscape.
You’ll notice from the name of the DF team that there is already something implied: SRE <strong>Enablement</strong>. This is where the new team differentiates itself from the 2016 initiative. The challenge that gave purpose to the Enablement team was <strong>raising the bar on our operational practices</strong>. This was around: monitoring, incident response, chaos engineering, resilience engineering.</p>
<p><img alt="Service Landscape" src="https://engineering.zalando.com/posts/2021/09/images/service-landscape.png"></p>
<figcaption style="text-align:center">Service Landscape</figcaption>
<p><br/></p>
<p>Both SRE teams had very limited resources (only 2 engineers each), and they obviously shared the same goals. To better align the efforts of both teams, an <strong>SRE Program</strong> is kicked-off that unites them around common goals. As before, the practices and mindset described in Google’s original <a href="https://sre.google/sre-book/table-of-contents/">SRE book</a> are used as the main inspiration for our own SRE teams.
The teams were composed of experienced engineers, with a strong background in software development, knowledge of systems engineering, and incident response (very much aligned with the profile that was outlined back in 2016). These engineers also enjoyed a fair amount of social capital across the organization, which greatly facilitated the collaboration with other teams.</p>
<p>Compared to the previous iteration, the SRE Program was not aiming at significant organizational changes. This gave some degree of freedom regarding the projects the Program would tackle. At the beginning of the Program, the 2 teams got together and made a list of all the topics that were SRE relevant and that we wanted to work on. When we were done, the size of the list was considerable (there are so many interesting, relevant and challenging topics in SRE). With our limited capacity, however (6 team members between the two teams - 1 Lead, 1 Program Manager, 4 Engineers), we had to be careful when picking our initiatives. Although this meant that we had to drop many of the topics we wanted to work on, <strong>that careful selection contributed significantly to the success of the Program</strong>, and the reputation we built for the SRE name within the company.</p>
<p>The SRE Program took on the <strong>rollout of Distributed Tracing</strong> across the engineering organization, helped <a href="https://engineering.zalando.com/posts/2018/06/loading-time-matters.html"><strong>improve the Page Load Time</strong></a> for some of Zalando’s pages, staffs the newly created <a href="https://sre.google/workbook/incident-response/"><strong>Incident Commander role</strong></a>, and helps with Cyber Week preparations, namely <a href="https://engineering.zalando.com/posts/2019/04/end-to-end-load-testing-zalandos-production-website.html"><strong>Load Tests</strong></a>. SREs, in the role of Incident Commanders, provided on-site support during Black Friday in a dedicated Situation Room. SREs also worked with other teams on <strong>efficiency topics</strong> that led to significant cost savings with cloud infrastructure while preserving reliability targets.</p>
<p><img alt="Distributed Tracing Workshop" src="https://engineering.zalando.com/posts/2021/09/images/ot-workshop.jpg"></p>
<figcaption style="text-align:center">Distributed Tracing Workshop</figcaption>
<p><br/></p>
<p>SLOs, as were introduced back in 2016 were still in place, with hundreds of new services specifying SLOs. Despite the growing number of SLOs, they were still not used to help the teams strike a balance between feature development and operational improvements. One of the things that made it more challenging was the fact that Zalando runs many thousands of services in production. We figured that not all of them had the same relevance. To try to put some structure into the SLOs we had, <strong>Service Tier definitions</strong> were published. To help with the Service Tiers, a <strong>new SLO reporting tool</strong> was developed. The new tool defined canonical SLIs and used the tier classification. However, this work was limited in scope. They targeted a single department, Digital Experience, home to one of the SRE teams. Services in other departments were not included in this effort and there was no mandate for them to adopt the new Service Tier definitions. Attempting to roll this out for the entire company (>4000 services) would not be feasible.</p>
<p>On the cultural level, the SRE Program took ownership of the SRE Guild. Guilds in Zalando are self-organized groups of colleagues, sharing a common interest, that meet regularly to exchange knowledge. The SRE Guild was actually a remnant from the 2016 initiative, but was left dormant. <strong>We saw the SRE Guild as an agent of cultural change</strong> to help us spread the SRE mindset. We then devoted efforts to develop a format that would be engaging and sustainable. Guild sessions provided a regular event with talks around all things SRE, whether it’s presenting the work of the SRE Program, or giving the floor for other teams or engineers to share knowledge. Postmortems became a regular topic in these sessions. This format is still in place today.</p>
<p><img alt="Black Friday 2018 Situation Room" src="https://engineering.zalando.com/posts/2021/09/images/black-friday-2018.jpg"></p>
<figcaption style="text-align:center">Black Friday 2018 Situation Room</figcaption>
<p><br/></p>
<p>Despite the success of the SRE Program, the fact that the individual teams were part of different organizations with different reporting chains led to some challenges related to the priorities of those different departments. Those different priorities and guidelines posed another problem when they would be at odds with each other. Teams in Zalando would seek out guidance from SRE, not knowing which team to reach out to, or even that there were 2 separate teams. To understand how two SRE teams that were working together could offer inconsistent guidance, it’s important to remember that they belonged to different departments. The SRE DX team could focus on the problem space of the DX department and offer customized solutions for those teams. The SRE DF team had the entire company in scope, so whatever that team did, it had to be applicable on a different scale. The SRE Program was planned for the year of 2018, culminating with the end of Cyber Week. Following that plan, after Cyber Week was over the program ended and each team went back to work on projects relevant to their respective departments.</p>
<h2>2019 - Combining forces as a single SRE team</h2>
<p>In early 2019 both SRE teams were officially united into a single team in the DF department (the department of one of the original teams). With this merger, SRE now had a single voice in the company.</p>
<p>The experience with Distributed Tracing in the previous year was quite positive - Do you get the pun in the blog post’s name, now? 🤓. For one, it became a fundamental tool for incident response because it allowed for quicker insights, saving time from incidents. The coverage across Zalando’s services kept growing. The standardized data model and the development of Zalando specific Semantic Conventions, and an API to consume the tracing data allowed the SRE team to build additional value from it.</p>
<p>One of the tools we developed based on Distributed Tracing is an Alert Handler called <strong>Adaptive Paging</strong> (which we <a href="https://www.usenix.org/conference/srecon19emea/presentation/mineiro">talked about in SRECon’19</a>). This alert handler monitors the error rate of what we call <strong>Critical Business Operations</strong><sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup> (CBO) and when it is triggered it uses the tracing data to determine where the error comes from across the entire distributed system, and pages the team that is closest to the problem. This alert handler was also a game changer in our push for a different alerting strategy: <strong>Symptom Based Alerting</strong>. You can learn more about it in the <a href="https://github.com/zalando/public-presentations/blob/master/files/2019-05-16_alerting_monitoring_and_all_that_jazz.pdf">slides of one of the talks</a> we did on this topic.</p>
<p><img alt="Adaptive Paging Diagram" src="https://engineering.zalando.com/posts/2021/09/images/adaptive-paging.jpg"></p>
<figcaption style="text-align:center">Adaptive Paging will traverse the Trace and identify the team to be paged</figcaption>
<p><br/></p>
<p>A <strong>throughput calculator</strong> based on Tracing data is also developed that helped the Load Test efforts for Cyber Week preparations. By applying the expected throughput for a CBO, we could estimate the impact on all the components that are part of the same journey, usually through cascading remote procedure calls.</p>
<p><img alt="Throughput Calculator" src="https://engineering.zalando.com/posts/2021/09/images/throughput-calculator.png"></p>
<figcaption style="text-align:center">Throughput Calculator</figcaption>
<p><br/></p>
<p>Finally, through our use of Distributed Tracing, and Adaptive Paging, we made a significant change in our SLO strategy. We moved away from service based SLOs, and started rolling out <a href="https://engineering.zalando.com/posts/2022/04/operation-based-slos.html">Operation based SLOs</a>.</p>
<p>Through internal and external hiring we grew the team up to 7 SREs. But that team size notwithstanding, <strong>hiring was always a challenge</strong>. Then, and today. The combination of the required skill set for an SRE at Zalando and the different definitions of the SRE role across the industry, means many candidates do not meet the bar, or simply have a different skill set. Nevertheless, it was agreed that we would not compromise our hiring. While growing engineers and teaching the SRE mindset was something seen as positive (and definitely a way to scale the team further), with our reduced size we could not provide an effective mentorship. Any engineers we would hire needing that mentorship would not be set up for success.</p>
<p><img alt="SRECon DT Workshop" src="https://engineering.zalando.com/posts/2021/09/images/sre-con-workshop.jpg"></p>
<figcaption style="text-align:center">We took the previous year’s Distributed Tracing Workshop to SRECon’19</figcaption>
<p><br/></p>
<p>Both 2018 and 2019 were successful years for SRE, but there are quite a few differences between the two. In 2018 we worked exclusively on topics that SRE did not own. We were a mix of a <a href="https://cloud.google.com/blog/products/devops-sre/how-sre-teams-are-organized-and-how-to-get-started">consulting team and a kitchen sink team</a>. We either volunteered for some of the projects we worked on, or were asked to help due to capacity reasons or because the projects required a specific skill set. Our main challenge was <strong>how to decide what to work on</strong>. There was no mathematical formula to determine this. It was always a matter of balancing the following dimensions:</p>
<ul>
<li>Likelihood of success (Would we be in way over our head? Could we actually influence the outcome?)</li>
<li>Company’s priorities</li>
<li>Enablement (If we’re working with a team, will that team learn something from the engagement, or were we expected to do everything ourselves?)</li>
</ul>
<p>In 2019 we still operated partially in the same kitchen sink/consulting mode, but the big difference is that in 2019 we started working on our own products, which also means we started taking some control of our roadmap.</p>
<p>Overall, 2019 was the year we started reaping the benefits of the achievements from the previous years. We had given a clear signal that a single (small) team of engineers dedicated to Reliability could bring significant benefits to an organization the size of Zalando. But, to an extent, we were also a victim of our success. Despite having our own backlog and a list of topics we wanted to work on, <strong>the team became increasingly more in demand</strong> from different parts of the organization. Our help was requested to improve Operational Excellence in departments, to assist in the roll out of major launches, to review Technical Design Documents, to help in PostMortem investigations, <a href="https://engineering.zalando.com/posts/2020/10/how-zalando-prepares-for-cyber-week.html">Cyber Week preparations, Production Readiness Reviews</a>… As before, we had to pick our battles carefully. Accepting every challenge with our reduced capacity meant that we would likely do a poor job in all of them. And anything in our backlog that we had promised and wouldn’t deliver would also affect our reputation.</p>
<p>Things are starting to get interesting. After a few successful projects, SRE’s reputation in the company grew. We merged the two SRE teams into a single team, making sure that SRE could continue to grow unaffected by fragmentation. The SRE Guild kept on going, further spreading the SRE mindset. We grew the team, and even started to focus on our own backlog. But SRE is still a single, small, team in a very large organization. How far can we stretch this model? Well, that's what we're going to talk about in our last blog post on this series in one week's time.</p>
<hr>
<p><em>EDIT 1: Don't stop now. The third and last part of our series is already available <a href="https://engineering.zalando.com/posts/2021/10/sre-journey-part3.html">here</a>.</em></p>
<hr>
<p><em>Excited with what you learned so far? We're looking for that kind of excitment. <a href="https://zsre.page.link/enablement-job-ad/">Join us at SRE</a> to be part of the journey.</em></p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>Grossly summarizing it, Zalando is an e-commerce platform, so a Critical Business Operation is anything that affects our Business, like ‘Add To Cart’, ‘Place Order’ or ‘View Catalog’ <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Tracing SRE’s journey in Zalando - Part I2021-09-13T00:00:00+02:002021-09-13T00:00:00+02:00Pedro Alvestag:engineering.zalando.com,2021-09-13:/posts/2021/09/sre-journey-part1.html<p>Follow Zalando's journey to adopt SRE in its tech organization.</p><h2>2016 - First attempt at rolling out SRE</h2>
<p><em>Welcome to the first installment of our three part series following Zalando’s SRE journey. Be sure to come back for the other two, with the next one being published in a week.</em></p>
<p>Site Reliability Engineering (SRE) is a recent discipline in the Software Engineering field that is growing in popularity, with many companies turning to this new way of working to solve their operational issues, or to support its growing scale.
But being a recent discipline, it’s not yet well established how organizations should adopt SRE, or even what is the role of a Site Reliability Engineer (although the role enjoys an increasing demand).
At Zalando we also took a stab at implementing SRE within our organization. We looked at it as a way to help us scale our engineering efforts, improving efficiency and making life for our developers easier. Today, Zalando includes in its organization a Site Reliability Engineering department, but the journey to reach this point was filled with challenges and learnings that we are now sharing with everyone.</p>
<p>In this series of blog posts we will take our readers through the road so far. We’ll describe what worked well for us, and what didn’t. Where we failed, and where we succeeded. We’ll also look into how we defined the role of an SRE within the company, and how SRE is growing in Zalando.</p>
<p>Before we get to the ‘How’, let’s start with the ‘Why’. Why would we want to have SRE in Zalando? Well, for that we need to understand the point that we were at as a company before this journey began. That takes us back to 2016 when we were well into our move to the cloud, migrating our monoliths to a micro services architecture (you can find more details about this and what came after in the <a href="https://srcco.de/posts/one-decade-in-zalando-tech.html">blog post</a> from our colleague Henning Jacobs).</p>
<p><img alt="A view of Zalando Tech pre-cloud" src="https://engineering.zalando.com/posts/2021/09/images/zalando-pre-cloud.png"></p>
<figcaption style="text-align:center">A view of Zalando Tech pre-cloud</figcaption>
<p><br/></p>
<p>The move to the cloud came with disruptive changes to the way we were working. Teams were now responsible end-to-end for the software they built. That meant designing, developing, testing, deploying and operating the applications the teams owned. I’ll skip the gruesome details, but to put it simply, before this time, developers developed, and operators operated<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>. This meant that the vast majority of our engineers were not experienced in a good chunk of their newfound responsibilities. This lack of experience coupled with the hypergrowth that we were going through resulted in a lot of different and complex issues. These issues were mostly around the operational aspect of software development (monitoring, automated testing, deploying, incident handling, managing the cloud runtime).</p>
<p>One of the more obvious pain points was the on-call support. Before we started the microservice migration, our service landscape was small enough that 5 on-call teams could cover the whole stack. Each team had a large enough rotation, and the domain was well understood by each team member. The monoliths were also quite similar in terms of monitoring and operations, making it easier to tackle issues even in services that a given engineer would not be so familiar with. That gradually changed as new teams were created, and more and more services were deployed in the cloud. And there was little standardization across those services. The on-call teams did not grow to meet the new demands, and were increasingly overwhelmed by the new services that they were responsible for.</p>
<p><img alt="Our deploy tool for our data center services" src="https://engineering.zalando.com/posts/2021/09/images/dc-deploy-tool.png"></p>
<figcaption style="text-align:center">Our deploy tool for our data center services</figcaption>
<p><br/></p>
<p>But 2016 is also the year that Google publishes their book <a href="https://sre.google/sre-book/table-of-contents/">Site Reliability Engineering</a>. The practices and mindset described in that book seemed to provide some answers to the growth pains we were experiencing. For that reason, it becomes the main inspiration for implementing the SRE mindset, role and practices in Zalando. <strong>How it all started, though, was through a grassroots initiative to promote and pitch for an investment in SRE.</strong> After convincing enough managers, mostly through explaining the pain points being felt by the engineering teams, and how SRE can be a solution for those pains, a group of engineers teams up under a project scope to drive this implementation. One of their main goals was to solve the on-call situation, and make it sustainable.
A quick side note: If it feels like the ‘convincing’ management is grossly summarized, or feels like it was just too easy, it’s important to bring up that <strong>Zalando is a company that does not shy away from change.</strong> It’s a core part of the company’s DNA and culture. And the <a href="https://hbr.org/2011/03/culture-trumps-strategy-every">culture of an organization</a> always plays a key role in enabling (or resisting) such changes.</p>
<p><img alt="SRE Brainstorming session" src="https://engineering.zalando.com/posts/2021/09/images/sre-brainstorm.jpg"></p>
<figcaption style="text-align:center">SRE Brainstorming session</figcaption>
<p><br/></p>
<p>Now that there was an initial buy in from management, there were o so many things to discuss at the time. But the one that had the most influence in the following steps was <strong>“How do we structure SRE?”</strong>. Again, remember that this had to be done in a way that it would solve the on-call problem.
Should we go for a central team? We were already too big for that (our headcount had grown to 1.000+), so odds were that we wouldn’t be effective. Although it would make staffing easier because we’d need fewer SREs.
Should we distribute one SRE per team? The scope would be too large for the lone SREs. Not to mention that, over time, they’d likely become the Ops engineer for the team they were in.
It was agreed that we would need several SRE teams. But that still begged the question: What is the granularity at which we would create SRE teams? In the end we went with <strong>one SRE team per Product Cluster</strong>. This would give SREs end-to-end responsibility over a domain, without having too wide of a scope.</p>
<p><img alt="SRE team structures" src="https://engineering.zalando.com/posts/2021/09/images/sre-team-structures.jpg"></p>
<p>There was another concern around the reporting chain. This was an easy discussion, as we quickly converged to following the <a href="https://sre.google/workbook/how-sre-relates/#consider-reliability-work-as-a-specialized-role">guidance in the SRE book</a> and consider reliability work as a specialized role and have them separate from the product delivery teams.</p>
<p>To further gauge the interest in the SRE role and mindset, we sent out a survey to our engineering Org. In that survey we included a description of the desired profile for an SRE. That profile included: <strong>Software engineering, Operational mindset, Systems engineering, Software architecture skills, Troubleshooting skills</strong>.</p>
<p><img alt="Survey to gauge SRE interest" src="https://engineering.zalando.com/posts/2021/09/images/sre-survey.png"></p>
<figcaption style="text-align:center">Survey to gauge SRE interest</figcaption>
<p><br/></p>
<p>The survey results also gave us an idea on the talent pool that might be interested in a move to an SRE role. To further promote the role and the initiative within the company, several talks were done across the company and its different hubs, which, at the time, already included Helsinki, Dublin, and Dortmund.</p>
<p>With few engineers able to fit that profile we had to be smart about where to start rolling out SRE. Ideally, we start with the area with the most need for SRE practices. But to know which area that would be, we first had to measure the health of the different products at Zalando, to then be able to prioritize.
Fortunately, at the core of SRE we have <a href="https://sre.google/sre-book/service-level-objectives/">Service Level Objectives</a> (SLOs) and Service Level Indicators (SLIs). With the lack of a standardized way of measuring availability, the first thing the team working on the SRE initiative decided to do was to roll out SLOs and SLIs. Workshops were conducted across the company for Engineers and Product Managers, and the first SLO reporting tool (SLR) was developed.</p>
<p><img alt="Zalando’s SRE Logo" src="https://engineering.zalando.com/posts/2021/09/images/sre-logo.png#center"></p>
<figcaption style="text-align:center">Zalando’s SRE Logo</figcaption>
<p><br/></p>
<p>To further demonstrate the <strong>educational benefit of SRE</strong>, the SRE program team ran Reliability Workshops as part of <a href="https://engineering.zalando.com/tags/cyber-week.html">Cyber Week</a> preparations to discuss and review Reliability Patterns for the more critical services. In those Reliability Workshops we covered Retry Strategies, Circuit Breakers and Fallbacks.</p>
<p>Many services did have SLOs defined and collected, but it still did not end up influencing the software development process. The vast majority of SLOs were defined through initiatives from Engineers. But in a microservice architecture, a product is implemented by multiple services. Product Managers had a hard time establishing a link between the different SLOs and their own expectations for the products they are responsible for. Management was kept in the loop, but not directly involved, so there was no real motivation for management to uphold the SLOs.</p>
<p>Senior Management agreed that SRE concepts like SLOs and reliability patterns are a much needed practice, and that teams should continue doing that. However, there was a clear preference to keep building the missing operational capabilities in the Delivery Teams. <strong>The way that was chosen to kickstart that capability building, was by putting each delivery team on-call for the critical services they owned. This decision was fundamental to properly establish the “you build it, you run it” mentality we still have today.</strong></p>
<p>With teams now responsible 24/7 for their own services, the plans for Zalando SRE would necessarily have to change. Join us for the next chapter of our series to learn more about the next steps of this journey.</p>
<hr>
<p><em>EDIT 1: No reason to stop the reading here. The second part of our series is already available <a href="https://engineering.zalando.com/posts/2021/09/sre-journey-part2.html">here</a>.</em></p>
<hr>
<p><em>Already curious enough to want to be part of this story and its future chapters? <a href="https://zsre.page.link/enablement-job-ad">Then come join us at SRE</a>. We're always looking for talented engineers to deliver our strategy.</em></p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>We did have some engineers with end to end responsibility. They would deploy, monitor and even be on-call for the services of their respective area. This was not standardized in the company, and it would depend greatly on the leadership of their respective teams. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Micro Frontends: Deep Dive into Rendering Engine (Part 2)2021-09-09T00:00:00+02:002021-09-09T00:00:00+02:00Jan Brockmeyertag:engineering.zalando.com,2021-09-09:/posts/2021/09/micro-frontends-part2.html<p>Learn all the details about Rendering Engine - from data fetching to layout composition.</p><p>Zalando's <a href="https://en.zalando.de/">Fashion Store</a> has been running on top of microservices for quite some time already. This architecture has proven to be very flexible, and project <a href="https://www.mosaic9.org/">Mosaic</a> has extended it – although partially – to the frontend, allowing HTML fragments from multiple services to be stitched together, and served as a single page.</p>
<p>Fragments in Mosaic can be seen as the first step towards a Micro Frontends architecture. With the ambitions of the Interface Framework as presented in <a href="https://engineering.zalando.com/posts/2021/03/micro-frontends-part1.html">the first blog post</a>, we did not want to just stop at serving multiple HTML pieces, we wanted more:</p>
<ul>
<li><em>Implemented once, works anywhere</em> - UI blocks should work in different contexts and be context-aware, not context-bound.</li>
<li><em>Declarative data dependencies</em> - Components get the data they need but do not re-implement data fetching over and over.</li>
<li><em>Simplified A/B Testing</em> - Zalando's decisions are data driven, so experimentation is at the core of our decision making. Running an A/B test that spans multiple pages and user flows should be possible with minimal alignment and zero delivery interruption.</li>
<li><em>Feels like Zalando</em> - We want a consistent and accessible look and feel for all user journeys and ability to experiment with design fast, across multiple user flows.</li>
<li><em>Power to the engineers</em> - Any developer should be able to contribute to all the Fashion Store experience. This means universal tooling and setup, first-class React integration, easy testing (also for work-in-progress code), and continuous integration.</li>
</ul>
<p>That's how Renderers came to be.</p>
<h2>Introducing Renderers</h2>
<p>A Renderer is a self-contained Javascript module that runs inside the Rendering Engine framework. It fully relies on the framework to encapsulate all the implementation details like data fetching and layout composition.</p>
<p><img alt="Architecture of Interface Framework" src="https://engineering.zalando.com/posts/2021/09/images/rengine-architecture_if.png"></p>
<p>A Renderer declares its data dependencies using GraphQL queries and, based on that data, provides a visual representation of a single Entity type (check <a href="https://engineering.zalando.com/posts/2021/03/micro-frontends-part1.html">Part 1</a> for a detailed explanation on Entities).</p>
<p>This visual representation is a React component, but data management and layout composition is handled solely by the Rendering Engine framework.</p>
<p>So, Renderers are visualisation components for Entities.</p>
<p>The mapping of Entities to Renderers is one-to-many, since different visual representations may exist for a given entity type. A Product Entity, for example, can be represented as a detailed product page, or as a compact card component in collection view. Each Renderer, on the other hand, corresponds to one specific entity type only.</p>
<p>All Renderers share some important properties:</p>
<ol>
<li>Renderers are <em>composable</em>. A Renderer is able to embed other Renderers as children, or be embedded by other Renderers.</li>
<li>Renderers are <em>declarative</em>. They specify their dependencies and behaviour but delegate all implementation to the Rendering Engine, the framework that runs them.</li>
<li>Renderers are <em>self-sufficient</em>. A Renderer can visualise its Entity no matter on which page or in which context it appears. This ensures that the choice and arrangement of Renderers remains as flexible as possible.</li>
</ol>
<h2>Enabling dynamic content for Zalandos’ mobile apps</h2>
<p>Project Mosaic was solely focused on the web. However, Zalando offers its Fashion Store as two experiences: the Web and the Native Apps. Since they share most parts of the user journey, it was natural to explore if the Apps could benefit from a system based on Entities and Renderers, too.</p>
<p>We knew it would be too much of a stretch for Mosaic fragments. But there's literally nothing that binds Renderers specifically to the Web!</p>
<p>In the Zalando app, we had already implemented server-side layout steering for some parts of the application experience such as the main App landing page. Instead of relying on hardcoded views, the app would receive layouts from a remote Zalando server over the network. The preferred format here would be JSON, but otherwise the same challenges were present: we wanted dynamic, personalizable UIs with declarative data dependencies.</p>
<p>If Renderers were able to output JSON instead of HTML, we could reuse the same rendering core as for the web with the same benefits.
Our Renderers relied on React for their output. To cover the app-specific use case, we added a custom React reconciler that consumed custom React elements, and output app-compatible JSON instead of HTML. Now, web developers are able to contribute Native apps features by reusing the same set of APIs as they were used to deliver web experiences and bring the web and native apps experiences closer together. All the existing tools, infrastructure support, and the constantly evolving platform APIs are now shared.</p>
<h2>The life of a Renderer</h2>
<p>So, how does it look under the hood?</p>
<p>We decided to organise the Renderers API as a set of so-called life cycle methods, each accepting a function declaring Renderer's behaviour for a given context or case. All Renderers are implemented using TypeScript.</p>
<p><img alt="Screenshot of collection carousel Renderer" src="https://engineering.zalando.com/posts/2021/09/images/rengine-carousel.png"></p>
<p>Let’s have a look at a simplified version of a collection carousel Renderer:</p>
<div class="highlight"><pre><span></span><code><span class="k">import</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">MOVE</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s2">"@tracking/event-names"</span><span class="p">;</span>
<span class="k">import</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">SimpleCarousel</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s2">"@dx/react-carousel-tile"</span><span class="p">;</span>
<span class="k">import</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">tile</span><span class="p">,</span><span class="w"> </span><span class="nx">ViewTracker</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s2">"@if/rendering-engine/api"</span><span class="p">;</span>
<span class="k">import</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">React</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s2">"react"</span><span class="p">;</span>
<span class="k">import</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">query</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s2">"./query.graphql"</span><span class="p">;</span>
<span class="k">export</span><span class="w"> </span><span class="k">default</span><span class="w"> </span><span class="nx">tile</span><span class="p">()</span>
<span class="w"> </span><span class="p">.</span><span class="nx">withQueries</span><span class="p">(({</span><span class="w"> </span><span class="nx">entity</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">({</span>
<span class="w"> </span><span class="nx">carousel</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">query</span><span class="p">,</span><span class="w"> </span><span class="nx">variables</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">}))</span>
<span class="w"> </span><span class="p">.</span><span class="nx">withProcessDependencies</span><span class="p">(({</span><span class="w"> </span><span class="nx">data</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">data</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">action</span><span class="o">:</span><span class="w"> </span><span class="s2">"error"</span><span class="p">,</span><span class="w"> </span><span class="nx">message</span><span class="o">:</span><span class="w"> </span><span class="s2">"No collection data found."</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">action</span><span class="o">:</span><span class="w"> </span><span class="s2">"render"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">data</span><span class="p">,</span>
<span class="w"> </span><span class="nx">tiles</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">entities</span><span class="o">:</span><span class="w"> </span><span class="nx">getCollectionEntities</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">.</span><span class="nx">withRender</span><span class="p">((</span><span class="nx">props</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">data</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">collection</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">tiles</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">entities</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">tools</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">props</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="o"><</span><span class="nx">ViewTracker</span><span class="o">></span>
<span class="w"> </span><span class="o"><</span><span class="nx">SimpleCarousel</span>
<span class="w"> </span><span class="p">{...</span><span class="nx">collection</span><span class="p">}</span>
<span class="w"> </span><span class="nx">onNextClickCarousel</span><span class="o">=</span><span class="p">{()</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">tools</span><span class="p">.</span><span class="nx">tracking</span><span class="p">.</span><span class="nx">track</span><span class="p">({</span><span class="w"> </span><span class="nx">name</span><span class="o">:</span><span class="w"> </span><span class="nx">MOVE</span><span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="p">}}</span>
<span class="w"> </span><span class="o">></span>
<span class="w"> </span><span class="p">{</span><span class="nx">entities</span><span class="p">}</span>
<span class="w"> </span><span class="o"><</span><span class="err">/SimpleCarousel></span>
<span class="w"> </span><span class="o"><</span><span class="err">/ViewTracker></span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="p">});</span>
</code></pre></div>
<p>Renderers are implemented using the <a href="https://en.wikipedia.org/wiki/Fluent_interface">fluent interface</a> approach. By calling the <code>tile()</code> function of the Rendering Engine API, we are setting up a Renderer that defines various <em>lifecycle methods</em>. Each method receives a function that encapsulates the associated behaviour and has fully typed interfaces. Since renderers are declarative, they do not execute any of the lifecycle methods themselves. Instead, the Rendering Engine framework runs all of them, in due order and context, fetches data and dependencies, and passes the output down to other methods when necessary.</p>
<p>The most important lifecycle methods are:</p>
<h3><code>withQueries</code></h3>
<p>Declares a data dependency via a GraphQL query. Data is fetched automatically by the framework and is available when the other life cycle methods are called.</p>
<h3><code>withProcessDependencies</code></h3>
<p>Based on data delivered by <code>withQueries</code>, defines further action (render, error etc.) and allows data pre-processing, which is then passed to the <code>withRender</code> method. The chosen action tells the Rendering Engine that the Renderer should redirect, or be displayed in an error state.</p>
<p>This life cycle method is also responsible for specifying child entities of the current Renderer. In this example we want to display the collection entities as outfit or product cards based on their entity type. It is important to note that a given renderer does not know which renderers will be used for its child entities</p>
<h3><code>withRender</code></h3>
<p>Returns the root React component to be used as the Renderer output.</p>
<p>For the Web, this is transformed into HTML and rendered on the server (SSR). Later on, the markup is hydrated on the client side with the data. For the Apps, we use a custom React reconciler and custom (non-Web) components to output JSON instead of HTML. However, most of the data flow, dev tooling and infrastructure remain the same for both use cases.</p>
<p>There are more advanced features by using Renderers:</p>
<ul>
<li><strong>Progressive Hydration:</strong> we can mark specific renderers to be hydrated early, i.e. kicking off their React hydration as fast as possible on the client-side, and thus making its content interactive before its parent renderer.</li>
<li><strong>Code Splitting:</strong> we only load and parse the Renderers needed on a given, personalised page which gives us a good performance out of the box.</li>
<li><strong>Renderer State:</strong> Renderers have access to a local Renderer State. The concept is similar to <a href="https://reactjs.org/docs/react-component.html#setstate">React’s setState</a>. It enables you to re-run renderer lifecycle methods for example to fetch additional data, and re-render the updated child entities. The "classical" React state can still be used via React Hooks.</li>
</ul>
<h3>Data sharing</h3>
<p>Renderers are not intended to share data with each other that is based on the client side state. We want to avoid unwanted data coupling and allow Renderers to be reused in other contexts with minimal risks.</p>
<p>Renderers have access to Zalando’s GraphQL Mutation APIs which allows remote data to be modified. Since all Renderers use the same data schema for their data dependencies, they can subscribe to changes in the schema to limit the need for cross-renderer communication.</p>
<h2>Rendering Engine</h2>
<p>Rendering Engine is the framework powering the Renderers. It is a backend service written in TypeScript and running in NodeJS coupled to a client-side Javascript module that runs in the browser.</p>
<p>Rendering Engine encapsulates all the complexity and implementation details for the declarative Renderers. It processes incoming customer requests, matches Entities to Renderers, fetches data and other dependencies such as A/B testing assignments, asynchronously renders the response and delivers it back to the Web and Native App clients.</p>
<p>The following sections describe the main responsibilities of Rendering Engine.</p>
<h3>UI Composition</h3>
<p>All layouts in Interface Framework are represented as trees of nested entities that are visualized using the matching Renderers. The mapping of Entities to Renderers is fully described by a set of rendering rules.</p>
<p>In computer science terms, Rendering Engine recursively and asynchronously transforms a tree of entities into a tree of UI elements. On each step, it takes an entity node and its metadata as input, outputs a UI node plus zero or more child entity nodes, and then recurs over children.</p>
<p>The page rendering always starts with an Entity. We call it the Root Entity since it typically defines what the page is about. After the Rendering Engine receives a request, it extracts the root Entity from the request headers and looks up a matching Renderer. Once a Renderer is found, the Rendering Engine runs the Renderer lifecycle methods to fetch data. In case there are any child entities associated with this Renderer, the same resolution process happens recursively. Thus, each Renderer may "suggest" which entities should be rendered as its children, but has no control over the actual renderer choice. That choice is based exclusively on the Rendering Rules.</p>
<p>The important part here is that we do not block the resolution process. As soon as the entity is matched to a Renderer and the data resolved, the Rendering Engine kicks off the rendering process and starts streaming the HTML content to the client.</p>
<h3>Data Fetching</h3>
<p>The Rendering Engine takes care of fetching the GraphQL queries from the Fashion Store API. It uses an implementation of <a href="https://github.com/zalando-incubator/perron">Perron</a>, a data client with built-in support for circuit breakers, error handling and retries.</p>
<p>All queries to FSA are batched and cached based on a <a href="https://github.com/graphql/dataloader">DataLoader</a> implementation. This prevents duplicate calls to backends during the same request.</p>
<h3>Universal Rendering</h3>
<p>Zalando being an e-commerce platform, our typical web page would have a prevalence of static content with islands of interactivity and we aim at serving content as fast as possible. This is why Rendering Engine was built from the ground up with full Server-Side Rendering (SSR) support. Each Renderer first generates its markup on the server and the Rendering Engine stitches it all together and streams the HTML to the client which then hydrates the components using our runtime module.</p>
<p>For the Web use case, we provide additional Zalando-specific APIs which add interactivity, mutate data if necessary, lazy-load extra contents etc. For the Native app, the Rendering Engine only serves the JSON markup and the actual rendering happens in App clients for iOS and Android.</p>
<h3>Mosaic backward compatibility</h3>
<p>We knew that the migration from Mosaic to Interface Framework would not happen in a day. Our Mosaic codebase was extensive and actively maintained. Therefore, the Rendering Engine allowed Mosaic fragments to be used directly inside Renderers.</p>
<p>This made our migration path very smooth. In fact, we now view Mosaic fragments as a powerful API our framework supports, and we still use them sometimes. In addition, this opened up extra integration and observability benefits for the legacy implementations.</p>
<h3>Monitoring and Tracing</h3>
<p>Improved observability is yet another benefit of the integrated platform. The Rendering Engine automatically collects and reports <a href="https://web.dev/vitals/">Web Vitals</a> so that we can correlate performance variations with code changes. A number of custom client-side metrics are also collected. All this happens automatically, so developers who contribute to Renderers can focus on the customer experience
We also integrate a variety of common enterprise tools for logging aggregation, Open Tracing and client-side error monitoring, with zero-integration time for the Renderer developers.</p>
<h3>Developer Experience</h3>
<p>Rendering Engine focuses on providing a great developer experience with the following features:</p>
<ul>
<li><strong>Local Development Environment:</strong> the framework provides an integrated development server and an on-demand compilation of Renderers. It only builds the Renderers that are shown on the current page. This ensures fast build times even when more and more Renderers are added to the application.</li>
<li><strong>Multiple version support:</strong> Rendering Engine uses the Zalando Design System as a UI component library. The UI components are defined as dependencies for each particular Renderer. To allow greater flexibility, it supports using multiple versions including convenient tools and hooks to simplify the version maintenance.</li>
<li><strong>Continuous Integration & Deployment:</strong> New code changes get tested and built automatically with specific performance reports for every page. These reports include bundle sizes and Lighthouse metrics. The deployments to Kubernetes happen continuously in preview and production environment.</li>
<li><strong>Automatic Persisted Queries:</strong> all GraphQL queries to the Fashion Store API are persisted on the server side together with a unique identifier. It helps reduce the request size, since the Rendering Engine client runtime sends the identifier instead of the whole query string.</li>
<li><strong>Localization:</strong> Rendering Engine supports localized bits of text inside Renderers.</li>
</ul>
<h2>Page Rendering Explained</h2>
<p>Let’s have a look at what happens in Interface Framework on a high-level when you visit a page on the Zalando website. In this example, the user visits an outfit view by choosing one from Zalando’s <a href="https://en.zalando.de/get-the-look-women/">Get the Look</a> page.</p>
<p><img alt="Data flow for page rendering" src="https://engineering.zalando.com/posts/2021/09/images/rengine-data-flow.png"></p>
<p>The request gets picked up by <a href="https://github.com/zalando/skipper">Skipper</a>, which is an HTTP router and reverse proxy for service composition. Skipper identifies the matching route and forwards the request to the Rendering Engine along with the entity parameters:</p>
<div class="highlight"><pre><span></span><code><span class="nx">entity</span><span class="o">-</span><span class="k">type</span><span class="p">:</span><span class="w"> </span><span class="s">"outfit"</span>
<span class="nx">entity</span><span class="o">-</span><span class="nx">id</span><span class="p">:</span><span class="w"> </span><span class="s">"ern:outfit::4NXOAez0Qti"</span>
</code></pre></div>
<p>The Rendering Engine gets the request with the entity above, that is called the root entity. The root entity defines the main content of the page. Based on the Rendering Rules, a matching Renderer is selected for this root entity.</p>
<p>For the outfit page, the set of Rendering Rules looks like the following:</p>
<div class="highlight"><pre><span></span><code><span class="k">export</span><span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">outfitViewRule</span><span class="o">:</span><span class="w"> </span><span class="nx">RenderingRule</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">selector</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">entity</span><span class="o">:</span><span class="w"> </span><span class="s2">"outfit"</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">renderer</span><span class="o">:</span><span class="w"> </span><span class="s2">"outfit_view"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">children</span><span class="o">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">selector</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">entity</span><span class="o">:</span><span class="w"> </span><span class="s2">"outfit"</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">renderer</span><span class="o">:</span><span class="w"> </span><span class="s2">"outfit_highlight-b"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">children</span><span class="o">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">selector</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">entity</span><span class="o">:</span><span class="w"> </span><span class="s2">"product"</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">renderer</span><span class="o">:</span><span class="w"> </span><span class="s2">"product_horizontal-highlight-product-card"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">selector</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">entity</span><span class="o">:</span><span class="w"> </span><span class="s2">"collection"</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">renderer</span><span class="o">:</span><span class="w"> </span><span class="s2">"collection_simple-carousel"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">children</span><span class="o">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">selector</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">entity</span><span class="o">:</span><span class="w"> </span><span class="s2">"outfit"</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">renderer</span><span class="o">:</span><span class="w"> </span><span class="s2">"outfit_outfit-card"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">],</span>
<span class="p">};</span>
</code></pre></div>
<p>The Renderer for the root entity is the Outfit View Renderer. We can refer to it as the top-level or root Renderer for the request. The Renderer has a data dependency in the form of the following GraphQL query.</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nx">outfit</span><span class="p">(</span><span class="nx">id</span><span class="o">:</span><span class="w"> </span><span class="s2">"4NXOAez0Qti"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">id</span>
<span class="w"> </span><span class="nx">creator</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">variant</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">relevantEntities</span><span class="p">(</span><span class="nx">first</span><span class="o">:</span><span class="w"> </span><span class="mf">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">edges</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">id</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The query is executed in the <a href="https://engineering.zalando.com/posts/2021/03/how-we-use-graphql-at-europes-largest-fashion-e-commerce-company.html">Fashion Store API</a> and various parts of the query go through different resolvers depending on the fields that are present. Each of the resolvers then calls one or many microservices that provide data.</p>
<p>In our example, we ask for the creator’s name of the outfit together with two relevant entities. One resolver will call the Recommendation System to get the relevant entities for this outfit. Here, our relevant entities are a collection with other outfits from the same creator and a collection with outfits that look similar.</p>
<p>Each Renderer decides which relevant entities appear as its children and adds placeholders for them. This is achieved via the <code>withProcessDependencies</code> lifecycle method.
The Rendering Engine picks up all relevant entities and determines matching Renderers. For each of these nested Renderers, the process repeats recursively until no more nested entities must be rendered.</p>
<p>After all the Renderers and their data dependencies are collected, the Rendering Engine renders the React components of each Renderer and streams the content to the client.
The next picture shows a sketch of the outfit page that is divided into the corresponding Renderers. Each Renderer is responsible for one part of the page.</p>
<p><img alt="Hierarchy of Renderers for the outfit page" src="https://engineering.zalando.com/posts/2021/09/images/rengine-outfit-page.png"></p>
<h2>Conclusion</h2>
<p>We have presented a deep dive into Rendering Engine with all its key functionalities. The final part of this blog series will cover a comparison between Mosaic and Interface Framework and what we have learned during the migration.</p>
<p><strong>Update 2023/07</strong>: See <a href="https://engineering.zalando.com/posts/2023/07/rendering-engine-tales-road-to-concurrent-react.html">Rendering Engine Tales: Road to Concurrent React</a> for an update on Rendering Engine and how we integrated React Concurrent features as part of our upgrade to React 18.</p>
<hr>
<p><em>We always look for talented Engineers to join Zalando as a <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&search=frontend&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Frontend">Frontend Engineer</a>!</em></p>Using Internal Mobility For Growth2021-09-02T00:00:00+02:002021-09-02T00:00:00+02:00Gary Raffertytag:engineering.zalando.com,2021-09-02:/posts/2021/09/internal-mobility.html<p>A look at how Zalando Direct, our B2B marketplace, is using Internal Mobility as a catalyst for growth.</p><p>Long time readers of this blog will remember that back in 2019, we <a href="https://engineering.zalando.com/posts/2019/03/rotating-engineers-at-zalando.html">published a feature</a> on the benefits of rotating engineers between teams. For those of you who have not seen it, the article described an initiative that aimed to establish cross-functional knowledge sharing, encourage cross team collaboration, and bring greater product awareness, by providing engineers with an opportunity to work on different teams within our Developer Productivity department.</p>
<p>Within Zalando, we are incredibly passionate about enabling our engineers to progress and to develop. This empowerment and growth mindset is deeply woven into our fabric. Take a peek at <a href="https://jobs.zalando.com/en/our-founding-mindset/">Our Founding Mindset</a>. Four of them are focused on empowerment.
I myself am particularly drawn to <strong>#makeUsBetterNotBigger</strong>.</p>
<p>Let’s take a look at how another of our business units, Zalando Direct, our B2B marketplace, is using Internal Mobility as a catalyst for development.
Within the unit, the leadership team maintains a directory of opportunities that are used to foster growth within engineers. This repository covers community driven initiatives such as our architecture review groups and our weekly hacking sessions, in addition to our department driven topics and task forces such as improving observability of systems. One development opportunity is Internal Mobility.</p>
<p>Internal Mobility is described as an exciting avenue for growth that enables engineers to join a different team on either a fixed-length assignment, or on a permanent basis. In this article, I would like to focus on the former, which was our most recent success story.
This story involved a Frontend Engineer who had been with Zalando Direct for over one year, and was joining my team on a short-term assignment for one month.</p>
<p>The goals of the team swap were to:</p>
<ul>
<li>Provide a solid opportunity to expand knowledge and expertise by contributing to a new domain.</li>
<li>Provide the destination team with an experienced extra engineer to contribute to their large and growing backlog.</li>
<li>Further highlight that Internal Mobility should be used to successfully provide a development opportunity for our engineers.</li>
</ul>
<h2>Kicking Things Off</h2>
<p>The engineer’s lead initiated the assignment, so let’s understand what that entails. First and foremost, it is imperative that our engineer is comfortable with, and excited about, the opportunity. Taking ownership of one’s own career progression and personal development is something that I look for when an engineer is on a seniority trajectory. I am always more than happy to double down my investment in them if I know that it will be maximised.</p>
<p>Thereafter, it is important to agree on scope and duration. Engineers know that diving into an unscoped project is a fool’s errand, and this is no different. Up front, it is important to be clear on what is expected from all parties, and what are the boundaries. In this case, it was agreed that the duration would be one month, and that the scope was to work on a particular area of partner-facing functionality within our platform, zDirect. For some additional context, zDirect is a web application that enables our partners to grow and steer their business on Zalando.</p>
<h2>Onboarding</h2>
<p>Onboarding a new joiner to our team is always a great opportunity to critically assess how well our process is. One factor that can accelerate onboarding productivity, is if the new joiner is familiar with the languages and tools. We were able to keep the tech stack unified, which is a subset of the technologies sponsored by Zalando as part of the <a href="https://opensource.zalando.com/tech-radar/">tech radar</a>. This, coupled with the engineer’s understanding of the ecosystem, meant that we were able to get up and running in no time at all. Additionally, we got some incredibly helpful feedback that enabled us to improve our onboarding documentation. Given that we are growing at an incredible pace, streamlining the onboarding process for new hires pays dividends on productivity and experience. <strong>Always be squeezing your Time-To-Ship!</strong></p>
<p>From this point onwards, we had a new team member. They joined all of our ceremonies, paired with their colleagues, and got to grips with the team’s ways of working. Similarly, they attended social settings such as team lunches and activities.
They immediately started shipping value, and right away boosted our team’s throughput. This required collaboration with our engineers, our product manager, and our designer.
We do not work in isolation, and this is an important aspect of the assignment. Please don’t extract somebody from their team environment and have them work alone. A <a href="https://rework.withgoogle.com/blog/five-keys-to-a-successful-google-team/">well known study</a> on team dynamics stated that <em>“Who is on a team matters less than how the team members interact, structure their work, and view their contributions”</em>.</p>
<p>Use this opportunity to solidify your team and to hone the dynamics of collaboration.</p>
<h2>So How Did This Experiment Go?</h2>
<p>Ultimately, this assignment enabled our team to deliver increased value for our stakeholders. Throughput aside, however, the assignment yielded much more.
As a leader, I thrive from helping my team to succeed. One of the most rewarding stages of this assignment was doing a final retrospective with our new team member. Throughout the process I could see a continuous stream of high quality deliveries, but I wanted to drill down further into the personal experience. To hear that they</p>
<blockquote>
<p>“developed technically, acquired a better understanding of how the business operates, and identified different processes and ideas to bring back to their own team”</p>
</blockquote>
<p>was of course music to my ears. Moreover, they were inspired to go out and enroll into a Typescript course (we provide every engineer with a healthy training budget to use for their own growth) and incorporate it into their development plan. I like to think of this as the flywheel effect on growth.</p>
<p>My last question to them was “Would you do it again?”, which was answered with an enthusiastic “Yes”.</p>
<h2>Conclusion</h2>
<p>Internal mobility assignments are a really effective way to provide engineers with an opportunity to learn new skills, to work in a new domain, and to push themselves out of their comfort zone.</p>
<p>All experiments come with learning opportunities, and the goal of trying something new is to broaden our understanding and experiences. Two important learnings for us (as receiving team) were that</p>
<ol>
<li>We needed to improve our onboarding documentation.</li>
<li>Engineers should not have to switch back and forth during such an assignment.</li>
</ol>
<p>For the former, our new member was able to pinpoint some gaps in the process, and we have since created an internal ways-of-working document to alleviate this for the next person. For the latter, there was an instance when our new member needed to respond to a topic for his original team, which broke the productivity flow, and led to some context switching. This is something that we will avoid next time.</p>
<p>Sidenote: Context-switching is a productivity killer. I remember reading <em>Quality Software Management: System Thinking</em>, by Gerald Weinberg, and being horrified by the impact that switching has on delivery.</p>
<p>That being said, I believe that any endeavour that yields learnings is a successful endeavour. The benefits and learnings that come from internal rotation are in abundance, and I would highly recommend that you try this in your organisation. Presently, we have a number of engineers on different assignments, ranging from weeks to months.</p>
<p>I opened up this article by referring back to an experiment conducted back in 2019. One of the goals that the authors hoped for was that rotations would become more of a regular thing in Zalando, and it’s awesome to be able to write this piece two years later, and say that, yes it is something that we are doing regularly, and continuously learning from.</p>Knowledge Graph Technologies Accelerate and Improve the Data Model Definition for Master Data2021-07-29T00:00:00+02:002021-07-29T00:00:00+02:00Katariina Karitag:engineering.zalando.com,2021-07-29:/posts/2021/07/knowledge-graph-master-data-mdm.html<p>In the ongoing master data management project the challenge is to create a consolidated golden record of particular business information scattered across multiple systems of different business units. Applying knowledge graph technologies has proven to be an effective means to automatically derive a logical data model for this golden record and improve stakeholder communication.</p><h2>The Master Data Management Challenge</h2>
<p>Master data management (MDM) is a technology-enabled discipline in which business and Information Technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise's official shared master data assets.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup> At Zalando we are at an early phase of realising MDM for our internal data assets and we have chosen to do it in a consolidated style.</p>
<p>Typically, MDM projects are started because an organisation does not have a central view to a specific subject matter and, instead, that information, such as the contact details of a business partner, are scattered across systems with each maintaining their own differing or same record of these details. In our practical approach MDM is a set of practises to create a common, shared, and trusted view on data, also called a golden record, for a particular domain. In our MDM project, source systems are identified, their data is consumed, processed through a match and merge process, cleansed and quality assured, and then stored centrally according to a canonical data model. This centrally stored golden record, is then published back to the source systems for consideration and possible correction in their respective systems.</p>
<p>We are currently designing a central MDM component that harmonises the different records into the central and trusted golden record. Its form needs to be defined in a logical data model. This is a set of definitions of tables and columns in which the consolidated record pulled, matched, and merged from the different sources is stored. Deriving this model is usually done manually, which has the following drawbacks:</p>
<ul>
<li>The amount of manual work to create the logical data model increases relatively to the number of system tables.</li>
<li>Usually, the data models are read and created by colleagues from engineering with limited business know-how.</li>
<li>The communication of the data model of source records and the data model of the golden record is shown as technical and textual definition files (SQL schema or a spreadsheet).</li>
<li>For business stakeholders that are domain experts the understanding of contents and how they relate to each other is hard to grasp from these technical definition files.</li>
<li>The domain expert is limited from conveying correctly the knowledge to the engineers creating the data model, which leads to errors and misunderstandings.</li>
</ul>
<p>Because of these drawbacks, the risk is that a MDM tool is released with a faulty and incorrect model that needs iterations of rework. As the logical data model is a main driver for the effort of creating a MDM tool effecting user interface, processes, business rules, and data storage, this risk might have a large impact and delays the business value delivery.</p>
<p>As the communication between business and engineering about a correct logical data model is happening upon textual technical specification files, an effective and efficient data governance decision making process is hindered, too, which is important to make the golden record also trustworthy.</p>
<p>The logical data model is not the only deliverable in such an MDM project. We also have to deliver the mapping from each system's data model to the golden record's one and define whether mapping can be done directly 1-to-1, or whether it needs to go through some kind of transformation. For example, system A may define an address differently like system B.</p>
<p>System A:
<em>Address</em></p>
<ul>
<li>address_line_1</li>
<li>address_line_2</li>
<li>address_line_3</li>
</ul>
<p>System B:
<em>Address</em></p>
<ul>
<li>street</li>
<li>zip_code</li>
<li>city</li>
<li>country_code</li>
</ul>
<p>The golden record data model needs to define the optimal and correct way to store an address object as well as define how the differing systems' data models map to it. If done manually, also this work increases with the number of system tables.</p>
<h2>Using Knowledge Graph Technologies</h2>
<p>In order to improve this manual definition, we made use of knowledge graph technologies by describing all system's data models in a named directed graph. We then mapped each column of a system to a set of business concepts, such as "address", "contact person", or "business partner". These business concepts have attributes as well as relationships with other concepts. For example, the business partner concept is connected to the address concept as in the image below.<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup></p>
<p><img alt="Business Partner has Address" src="https://engineering.zalando.com/posts/2021/07/images/address-business-partner.png"></p>
<p>We are using Neo4J to create these human-readable images about the mappings, since it has, in our opinion, the best look-and-feel in the current landscape of knowledge graph technologies. Most domain experts can read these images much better than the above mentioned data model definition files. Currently, we are mapping tens of tables and hundreds of columns, so creating images manually would generate more manual and error-prone work and that is why it is efficient to generate these images from the knowledge graph. The number in brackets in the colour legend is the total amount of nodes of this type in the knowledge graph.</p>
<p>For the above mentioned example of system A and B storing address information differently, we can model this in the knowledge graph in the following way. Columns from system A, such as address line 1, 2, and 3, map <em>indirectly</em> (one-to-many) to the address concept. This means that these columns need to be processed into the MDM system with a transformation algorithm. Columns from system B, however, map <em>directly</em> (one-to-one) to respective attributes of the address concept. See the image below for an illustration.</p>
<p><img alt="Column Mappings for Address" src="https://engineering.zalando.com/posts/2021/07/images/column-mapping.png"></p>
<h2>Focusing Manual Work Where it Should Be</h2>
<p>The only manual work that is done is to record the mapping from systems' tables and columns to business concepts, their attributes, and their relationships. For example, system A and B is mapped in the following way:</p>
<p>System A:
<em>Address</em></p>
<ul>
<li><strong>address id</strong> -> concept: <strong>Address</strong>, relationship: <strong>has contact</strong> (target)</li>
<li><strong>business partner id</strong> -> concept: <strong>Business Partner</strong>, relationship: <strong>has contact</strong> (source)</li>
<li><strong>address_line_1</strong> -> concept: <strong>Address</strong></li>
<li><strong>address_line_2</strong> -> concept: <strong>Address</strong></li>
<li><strong>address_line_3</strong> -> concept: <strong>Address</strong></li>
</ul>
<p>System B:
<em>Address</em></p>
<ul>
<li><strong>id</strong> -> concept: <strong>Address</strong>, relationship: <strong>has contact</strong> (target)</li>
<li><strong>business partner id</strong> -> concept: <strong>Business Partner</strong>, relationship: <strong>has contact</strong> (source)</li>
<li><strong>street</strong> -> concept: <strong>Address</strong>, attribute: <strong>street name</strong></li>
<li><strong>zip_code</strong> -> concept: <strong>Address</strong>, attribute: <strong>postal code</strong></li>
<li><strong>city</strong> -> concept: <strong>Address</strong>, attribute: <strong>city name</strong></li>
<li><strong>country_code</strong> -> concept: <strong>Address</strong>, attribute: <strong>country code</strong></li>
</ul>
<p>And that is all that needs to be done manually. A domain expert can provide us with these definitions and some coordination that the exact same name for concepts, attributes, and relationships is required. This is done by cross-referencing system's business concepts and unifying their wording.</p>
<h2>Generating the Logical Data Model</h2>
<p>The mapping from systems' tables and columns to business concepts is processed and written into the knowledge graph, which then holds the following types of nodes:</p>
<ul>
<li><strong>System</strong>, the name of one system owning tables and columns.</li>
<li><strong>Table</strong>, the name of a table from a particular system.</li>
<li><strong>Column</strong>, one column in one system with respective schema definitions, such as data type.</li>
<li><strong>Concept</strong>, a business concept such as Address.</li>
<li><strong>Attribute</strong>, one single data record defining the concepts, such as street name for the address concept.</li>
<li><strong>Relationship</strong>, a connecting information between two concepts flowing from one, the source concept, to the other, the target concept. For example business partner "has contact" address.</li>
</ul>
<p>The logical data model is then systematically created (via a Python script) from the concepts, attributes, and relationships. Each concept is created with a table of its own, where the columns are all of its attributes and an internal identifier for the concepts. Each relationship also becomes a table of its own with the internal identifiers of the source and target concepts as foreign key columns.</p>
<p>Since the graph contains the record which system's tables and columns contribute to one concept, we can then also generate the so-called transformation data model, which shows how each system's column maps to (directly or indirectly) to the logical data model of the golden record.</p>
<p>By using knowledge graphs for a live-data representation of all systems' logical data models and how they map to a semantic layer of business concepts, we are able to automatically generate the logical data model of the golden record inside the knowledge graph with additional information on how it connects to systems' data model. This enables us to keep a record of data lineage from each system to the golden record and, additionally, to use contemporary knowledge graph visualisation tools to give domain experts a intuitive and understandable representation on how each system is connected to the golden record. We see here two main advantages:</p>
<ol>
<li>The dialogue between business and technology in designing the golden record logical data model has improved and accelerated the process of creating a correct model.</li>
<li>All deliverables, such as the logical data model and the transformation data model can be queried directly from the knowledge graph and do not need to be done manually, which is less error-prone.</li>
</ol>
<p>We estimate that during the development of the MDM component this approach will keep on saving time for us by forgoing misunderstandings and improving stakeholder communication.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>Wikipedia on <a href="https://en.wikipedia.org/wiki/Master_data_management">Master Data Management</a> 23.7.2021 <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>For knowledge graph experts it is worthwhile to note that because this is a schema for the logical data model, also relationships between concepts are modeled as nodes. This is a deliberate design choice. It enables us to map data model information to relationships. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>How we use Kotlin for backend services at Zalando2021-07-01T00:00:00+02:002021-07-01T00:00:00+02:00Ole Sassetag:engineering.zalando.com,2021-07-01:/posts/2021/07/kotlin-for-backend-services.html<p>In the latest update to the Tech Radar, Kotlin has moved to ADOPT. As part of this effort, Zalando's Kotlin Guild has created a set of recommended tools and libraries for backend development, which this blog post takes a closer look at.</p><p><img alt="Kotlin Logo" src="https://engineering.zalando.com/posts/2021/07/images/kotlin-logo.png#previewimage"></p>
<h2>The adoption of Kotlin at Zalando</h2>
<p>As outlined in <a href="https://engineering.zalando.com/tags/tech-radar.html">prior posts</a>, Zalando uses a <a href="https://opensource.zalando.com/tech-radar">Tech Radar</a> to provide guidance on technology selection.</p>
<p><a href="https://engineering.zalando.com/posts/2021/06/zalando-tech-radar-scaling-contributions.html">Recently</a>, we moved <a href="https://kotlinlang.org">Kotlin</a> from TRIAL to ADOPT. With this change we are doubling down on the support of Kotlin as the 3rd JVM language next to <a href="https://www.java.com">Java</a> and <a href="https://scala-lang.org">Scala</a>. This is the result of increased adoption within the company (100+ new applications were written in Kotlin in a year), positive feedback from engineers starting to use it, as well as creation of guidelines, coding standards, reference projects, and service templates by the Zalando Kotlin Guild.</p>
<p>The experience that our Engineering Community gained over the recent years with Kotlin matches the developer stories of other companies. A nice collection of success stories can be found on the Android <a href="https://developer.android.com/kotlin/stories">blog</a>. Kotlin allows writing more succinct code with fewer pitfalls compared to Java and comes with a lot of useful features and libraries (e.g. <a href="https://kotlinlang.org/docs/data-classes.html">data classes</a>, <a href="https://kotlinlang.org/docs/null-safety.html">null safety</a>) that Java does not (yet) have as part of its standard library. This is probably also a reason why it is more <a href="https://insights.stackoverflow.com/survey/2020#technology-most-loved-dreaded-and-wanted-languages-wanted">wanted</a> and less <a href="https://insights.stackoverflow.com/survey/2020#technology-most-loved-dreaded-and-wanted-languages-dreaded">dreaded</a> than Java and Scala in the 2020 Stackoverflow insights. Additionally, <a href="https://kotlinlang.org/spec/type-inference.html">type inference</a>, <a href="https://kotlinlang.org/docs/reference/collections-overview.html#collection-types">read only collections</a> as well as the rich support for functional programming in the standard libraries were among the things our developers see as benefits compared to Java.</p>
<h2>The Kotlin Guild</h2>
<p>The Kotlin <a href="https://engineering.zalando.com/tags/guild.html">Guild</a> was founded with around 10 core members who want to help the language grow in Zalando. Moving the language to ADOPT in the latest <a href="https://engineering.zalando.com/posts/2021/06/zalando-tech-radar-scaling-contributions.html">Tech Radar Update</a> was a central milestone in that effort, as the ADOPT status comes with support from central infrastructure teams and the created documentation as well as templates, which help to promote a standardized tech stack and make bootstrapping new services easier. Due to being driven by our language guild, the whole process was kept transparent and open for contributions from the Engineering Community.</p>
<p>As a preparation for wider adoption of Kotlin, we collected internal good practices as well as the definition of tools and libraries for the development of RESTful backend services and Android apps with Kotlin that are recommended as default choices. For additional input we looked at how frequently things are used within the company, sat together with experts on specific topics, consulted external sources, and asked the whole Engineering Community to review final recommendations via a survey. Overall, we made sure that our recommendations support a positive developer experience and fit the need of most services, which are not directly serving customer traffic.</p>
<p>Looking forward the Kotlin Guild will continue to foster knowledge exchange as well as community building for its 250+ members. We also plan to cover more use cases with our documentation, like pure functional services using <a href="https://arrow-kt.io">Arrow</a> and will make sure we stay up to date with new development in the Kotlin space. Next to that, the members support each other with technical issues and regular talks are hosted.</p>
<h2>How we build Backend Services at Zalando</h2>
<p>Our internal developer tooling allows to initialize a repository from a template project. Those come with out-of-the box configuration and integrations which teams can then adapt to their needs. As an added benefit, they nudge teams towards higher consistency across different services and departments.</p>
<p>All APIs are defined in the OpenAPI format using <a href="https://swagger.io">Swagger</a>. This allows our API portal to list all available APIs in one place along with their API linting results via <a href="https://github.com/zalando/zally">Zally</a>. API linting can also be required to pass for MUST validations on every build. Many of our teams follow the <a href="https://engineering.zalando.com/posts/2019/04/developing-zalando-apis.html">API first</a> principle throughout service development.</p>
<p>Given that most services are deployed in <a href="https://kubernetes.io">Kubernetes</a>, we consider <a href="https://opensource.zalando.com/skipper">Skipper</a> filters the best way to handle Authentication and Authorization. This can either be achieved in Skipper <a href="https://opensource.zalando.com/skipper/reference/filters/#oauthtokeninfoallscope">directly</a>, via <a href="https://opensource.zalando.com/skipper/kubernetes/routegroups/#routegroups">Route Groups</a> or <a href="https://zalando-incubator.github.io/fabric-gateway/fabric-gateway-features/#authentication">Fabric Gateway</a>. Skipper is designed to handle a large number of requests and is less likely to be misconfigured than for example <a href="https://spring.io/projects/spring-security">Spring security</a>.</p>
<p>Many JVM based Web services in Zalando are built using <a href="https://spring.io/projects/spring-boot">Spring Boot</a> and we believe that this is also a good option when using Kotlin. This choice is mainly driven by the large adoption, but also because Spring <a href="https://spring.io/guides/tutorials/spring-boot-kotlin">integrates</a> really well with Kotlin, is compatible with multiple application servers, and supports reactive programming via <a href="https://docs.spring.io/spring-integration/docs/5.1.2.RELEASE/reference/html/webflux.html">WebFlux</a>. We do also see growing adoption of <a href="https://ktor.io">Ktor</a> and predict it to gain popularity within Zalando in the future, possibly even in conjunction with <a href="https://www.graalvm.org">GraalVM</a>.</p>
<h2>Libraries we use for Backend Services</h2>
<p>As build system, we prefer <a href="https://gradle.org">Gradle</a> over <a href="https://maven.apache.org">Maven</a> because of its great customizability and build performance. Gradle is also used to compile the language itself and is used by many major framework projects like <a href="https://github.com/spring-projects/spring-boot">Spring Boot</a>. On top of that, the build configuration scripts can be <a href="https://docs.gradle.org/current/userguide/tutorial_using_tasks.html">written</a> in Kotlin.</p>
<p>Linting is a very good practice to keep the style consistent in a codebase and to settle disputes over correct indentation. <a href="https://github.com/pinterest/ktlint">Ktlint</a> is our tool of choice as it follows the official <a href="https://kotlinlang.org/docs/coding-conventions.html">coding</a> conventions, is <a href="https://plugins.gradle.org/plugin/org.jlleitschuh.gradle.ktlint">easy to run</a> in Gradle, and does not enforce too many rules such that it seamlessly integrates into the software development process.</p>
<p><a href="https://github.com/MicroUtils/kotlin-logging">Kotlin-logging</a> is recommended for logging as it automatically adds class names to the log, lazily evaluates messages, and is built on top of <a href="http://www.slf4j.org">slf4j</a>.</p>
<p>For <a href="https://redis.io">Redis</a> access, we recommend using <a href="https://github.com/lettuce-io/lettuce-core">Lettuce</a> which is part of <a href="https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-data-redis">spring-boot-starter-data-redis</a>, as it is a thread safe client with nice support for reactive programming.</p>
<p>To access relational databases, we see <a href="https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-data-jpa">spring-boot-starter-data-jpa</a> as a solid choice in case you like to use ORM, but advise considering <a href="https://www.jooq.org">jOOQ</a> in cases where database transactions become more complex. It is also worth mentioning that jOOQ can be used together with other clients, as it can be used on top of JPA. jOOQ also has the added benefit that it supports database specifics like Postgres <a href="https://www.postgresql.org/docs/current/datatype-json.html">JSON types</a>.</p>
<p>Zalando is investing into traceability with <a href="https://opentracing.io">Open Tracing</a> and we recommend <a href="https://github.com/zalando/opentracing-toolbox">opentracing-toolbox</a> which eases <a href="https://github.com/zalando/opentracing-toolbox/tree/main/opentracing-kotlin">integration</a> of tracers, particularly in Spring Boot projects. Tracing allows linking requests across services and is also great to set up <a href="https://www.usenix.org/conference/srecon19emea/presentation/mineiro">automated alerting</a>.</p>
<h2>Conclusion</h2>
<p>We hope this gives you some idea why Kotlin is gaining popularity for backend development within Zalando. This is just the beginning of the journey - and you can <a href="https://jobs.zalando.com/en/jobs/?search=kotlin">become part of it</a>!</p>Zalando Tech Radar - Scaling Contributions to Technology Selection2021-06-24T00:00:00+02:002021-06-24T00:00:00+02:00Bartosz Ocytkotag:engineering.zalando.com,2021-06-24:/posts/2021/06/zalando-tech-radar-scaling-contributions.html<p>Learn how we scaled contributions to Zalando Tech Radar</p><p><img alt="Zalando Tech Radar" src="https://engineering.zalando.com/posts/2021/06/images/zalando-tech-radar.jpg#previewimage"></p>
<h2>Introduction</h2>
<p>In our previous post about <a href="/posts/2020/07/technology-choices-at-zalando-tech-radar-update.html">Technology Choices at Zalando</a> we spoke about a few problems with scaling technology selection in Tech companies. Since then, we have focused on the remaining categories of the <a href="https://opensource.zalando.com/tech-radar/">Tech Radar</a> beyond languages and the Tech Radar contribution process. Now, we'd like to reflect on our lessons learned, which you can use when designing technology selection processes.</p>
<h2>Scaling contributions</h2>
<p>One of the challenges for us to solve was scaling contributions to the Tech Radar across our 250+ delivery teams. Technologists are often more excited in promoting a new, promising technology than working on guidelines or sharing knowledge about already well-known tech. Such individuals are also essential for continued innovation. On the other hand, companies look for organizational efficiency by ensuring talent mobility across teams supported by a more or less standardized tech stack. This makes it easier to address cross-team dependencies in product delivery by allowing teams to contribute to code bases beyond their area of responsibility. Further, it creates career opportunities for Engineers, who can quickly switch teams and work on a challenging, high impact project. Thus, for technology selection, there is a natural tension between early adopters' vested interest and the needs of the organization they work for. At Zalando, we have created a two-sided contribution model to the Tech Radar:</p>
<ul>
<li>Anyone in Zalando is encouraged to contribute knowledge about technologies we have on the Tech Radar or suggest ones that are promising to evaluate and play a key role in this process.</li>
<li>Our Principal Engineers are maintainers of the Tech Radar and are moderating information collection on incoming suggestions, driving creation of good practices for technologies being evaluated or used, and for promoting technologies to increase their adoption.</li>
</ul>
<p>Ring change suggestions are supported by issue templates in our internal Tech Radar GitHub repository. These templates provide guidance on common questions around use case fit, key differences from alternatives already on the Tech Radar, conformance to our Technology Selection Principles, and support within the Engineering Community.</p>
<p>We encourage and expect our Engineers to contribute information about usage, lessons learned from production incidents, or challenges they face at scale. Voluntary contributions alone are insufficient to keep an updated view of the technologies we use. Thus, to support usage information collection, we collect usage data from our AWS accounts, source code repositories, or our infrastructure platform offerings. Collected information is collected in a documentation page with a common structure across all entries:</p>
<p><img alt="Zalando Tech Radar: example documentation entry" src="https://engineering.zalando.com/posts/2021/06/images/tech-radar-docs-entry.png"></p>
<p>Finally, we leverage Principal Engineers to moderate and drive discussions around technology adoption at Zalando. These colleagues have a sufficiently broad view on technology usage and performance in production across multiple teams and serve as a multiplying factor. They're responsible for encouraging teams they work with to share knowledge and highlight technology usage based on the software systems in their areas - either themselves or by enabling others to do so. Additionally, they moderate discussions within technology guilds or initiate working groups to create specific artifacts for the technologies, like collections of good practices or guidelines tailored to our environment, use cases, and scale. Such working groups are also excellent opportunities to develop or identify talent within the company.</p>
<h2>Re-scoring - how have we decided upon changes?</h2>
<p>After a longer period of time with no regular changes to the Tech Radar, we had a re-scoring exercise to complete. A similar approach was used originally at ThoughtWorks and can be used to create a Tech Radar from the ground up.</p>
<p>Within our Principal Engineering Community, we formed a working group per dimension: Datastores, Data processing, Infrastructure, and Queues. Our <a href="https://opensource.zalando.com/tech-radar/">Tech Radar visualization</a> merges Data processing and Queues in a single Data Management dimension for simplicity. Each working group was responsible for the data collection and analysis. One person from each group compiled the information in a structured format where per technology there was a case made for a ring change (or not). The change reasoning was supported by data points on usage, incidents, and expertise we gained since the technology was added to the Tech Radar (a few years in some cases) as well as conformance with our Technology Selection Principles. Where necessary to build a solid case, we reached out to teams in order to understand more details about their use cases or experience, if this was not sufficiently documented through recent information in our Tech Radar.</p>
<p>Based on the collected data, Principal Engineers participated in a review and re-scoring exercise. In a spreadsheet, we collected votes. Every 'nay' vote required a short rationale which we later discussed in the group to ensure we did not miss out on usage or use cases. We also found inconsistencies in the way we handle technologies with multiple deployment options (self-hosted vs. managed or vendor offerings), for which we did not find a good solution yet.</p>
<p>After the voting, the collected ring changes were discussed with our Senior Leadership Team. The main focus was on ensuring long-term support for the technologies we promote to ADOPT and that technologies on lower rings are in line with long-term strategies (e.g. Data Strategy).</p>
<p>Finally, the changes were shared with our Engineers where we shared detailed rationale per ring change and further information on the re-scoring process and contributions moving forward.</p>
<h2>Notable changes</h2>
<p>With the re-scoring, we moved a few technologies to ADOPT, confirming our investment in these. To scale adoption, in some cases, we formed dedicated teams that operate service offerings available to all Zalando Engineers and Data Scientists.</p>
<h3>Airflow</h3>
<p><a href="https://airflow.apache.org/">Apache Airflow</a> is a Workflow Orchestration tool used by data teams in Zalando. We have a central infrastructure team responsible for managing Airflow as a Service for our data teams.</p>
<h3>Databricks</h3>
<p>We've been using <a href="https://spark.apache.org/">Apache Spark</a> for various analytical and Machine Learning use cases and talked about our usage before (see <a href="https://youtu.be/Fy_KnCxp1lo">Data Warehousing with Spark Streaming at Zalando</a>). Databricks is also the core element of our Machine Learning Platform, available to all Engineers.
More recently, we went from a centralized Data Lake approach towards a distributed Data Mesh architecture backed by Spark and built on Delta Lake powered by Databricks. See our talk <a href="https://www.youtube.com/watch?v=eiUhV56uVUc">Data Mesh in Practice: How Europe's Leading Online Platform for Fashion Goes Beyond the Data Lake</a> for more information.</p>
<h3>GraphQL</h3>
<p>We've blogged about our <a href="/posts/2021/03/how-we-use-graphql-at-europes-largest-fashion-e-commerce-company.html">GraphQL usage</a> before. We have 200+ developers that contributed to the GraphQL API layer powering the <a href="https://en.zalando.de/">Zalando shop</a> over the past 2.5 years. We also have other use cases in production, for example in back-office applications for our Buying department.</p>
<h3>Kotlin & TypeScript</h3>
<p>Having seen continued and growing usage of <a href="https://kotlinlang.org/">Kotlin</a> and <a href="https://www.typescriptlang.org/">TypeScript</a>, we have initiated workstreams for within our language guilds to define guidelines, coding standards, reference projects, and service templates. These artifacts are helping teams in adopting the languages moving forward. Further, they help building a shared understanding what we consider as production-proven frameworks and libraries along with recommended configuration options.
We've shared our <a href="/posts/2019/02/typescript-best-practices.html">TypeScript best practices</a> in the past and more details about <a href="/posts/2021/07/kotlin-for-backend-services.html">promoting Kotlin at Zalando</a>.</p>
<h3>SageMaker</h3>
<p>We have blogged before about our usage of <a href="https://aws.amazon.com/sagemaker/">Amazon SageMaker</a> for <a href="/posts/2021/02/machine-learning-pipeline-with-real-time-inference.html">ML Pipelines with Real-Time Inference</a>, <a href="/posts/2020/06/distributed-xgb-sagemaker.html">distributed training</a>. See also our talk on <a href="https://www.youtube.com/watch?v=6UVdMtNUpDE">using SageMaker for training ML models</a> from the AWS Summit 2019.</p>
<h2>Tech Radar changes moving forward and future focus</h2>
<p>The re-scoring exercise described in this post was a house-keeping exercise supported by clarifying the purpose of the Tech Radar, long-term ownership, and the contribution model. The amount of upcoming changes will of course depend on contributions from our Engineering Community and our appetite for trying out new technologies. While changes to ADOPT/HOLD are going to be evaluated on a quarterly basis, we have a steady stream of ongoing assessments and trials.</p>
<p>The Principal Engineering Community focuses on:</p>
<ul>
<li>supporting and guiding contributions from the Engineering Community,</li>
<li>identifying promising technologies to invest in,</li>
<li>collecting best practices and expertise around technologies on TRIAL and ADOPT.</li>
</ul>
<p>With the last point we aim to define paved roads for Engineers describing for example battle-tested configurations for typical use cases or standardized monitoring dashboards with their explanation for the key and most common technologies. While this is today already the case for our <a href="https://www.youtube.com/watch?v=G8MnpkbhClc">PostgreSQL as a Service offering</a> built on top of <a href="https://github.com/zalando/patroni">Patroni</a> and <a href="https://github.com/zalando/postgres-operator">Postgres Operator</a>, given a dedicated team responsible for this infrastructure, we don't have such guidance collected across all our ADOPT technologies yet.</p>
<h2>Challenges we have not solved yet</h2>
<p>There are a few challenges that the Tech Radar does not solve for today, mostly related to consistency and completeness of the technology landscape. If we resolve any of these challenges, we will surely share our insights and lessons learned.</p>
<p>Some technologies (e.g. etcd) have been successfully used in our infrastructure teams, but we would not want any delivery team to use these (e.g. for configuration management counting as "infrastructure") as we have more suitable building blocks in our platform.</p>
<p>In other cases, we have invested into service offerings built around open-source software (e.g. Airflow) and we would rather have teams extend this platform offering rather than deploy their own infrastructure.</p>
<p>We also have solutions built in-house (e.g. our request router - <a href="https://github.com/zalando/skipper">Skipper</a>) which are an essential part of our cloud infrastructure. Teams don't really have a choice to easily opt-out of these. These technologies will most likely be moved to a different place that will represent the maturity of the development infrastructure at Zalando from a Product perspective.</p>
<p>For technologies, where we chose vendor offerings built on top of a technology (e.g. Databricks for Spark), the question arises whether to include one or both and with which ring assignment (setting Spark to HOLD while keeping Databricks on ADOPT may sound confusing). Here, we consider using the underlying technology and outlining the recommended deployment options.</p>
<p>Finally, there are 3rd party products, which allow us to deliver solutions faster, without the need to reinvent the wheel. One example are Content Management Systems - we've built a few over the past years and strive not to do this again. A question arises how to make these sufficiently visible to our Engineers, so that they're considered while building future products for our customers.</p>
<hr>
<p><em>If you would like to work on similar challenges and help scale our approach to technology selection, consider joining our <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&search=principal&filters%5Bcategories%5D%5B0%5D=Software%20Engineering">engineering teams at Zalando</a>.</em></p>Making the Remote Onboarding a Success2021-04-22T00:00:00+02:002021-04-22T00:00:00+02:00Martin Schwitallatag:engineering.zalando.com,2021-04-22:/posts/2021/04/making-the-remote-onboarding-a-success.html<p>Onboarding new people to the team is always a big challenge and got even more complicated due to the pandemic when most people work from home. This post describes a couple of steps we took to make the remote onboarding of three new team members a success.</p><p>When the pandemic started in 2020 many Zalando employees went into home office. It changed our working habits and many other things and Zalando published <a href="https://engineering.zalando.com/posts/2020/03/how-to-work-remotely-at-zalando.html">remote working guidelines</a> to support their employees. This concentrates only on remote working, but what happens if you change companies during the pandemic?</p>
<p>Joining a new company and getting onboarded can be already pretty tough during normal times. Starting a new job requires you to learn new skills and build up new relations within the company. Working from home amplifies those problems by introducing virtual barriers. It's not possible to walk up to somebody and ask a question or introduce yourself to people you meet by chance in different situations.</p>
<p>We were recently confronted with the challenge to grow our engineering team from two persons to five persons across two months. In this article I try to describe how we tackled this challenge to make sure that the new team members get quickly onboarded and feel welcomed in this new setup.</p>
<h2>Onboarding Buddy</h2>
<p>One of the first decisions we made was to assign an onboarding buddy to each new team member. The onboarding buddy is the go to person for the new team member in case of questions or problems where support is needed, e.g. setting up the notebook. As some persons might feel uncomfortable asking unknown people for help, especially remotely, daily 1:1 sessions have been set up to discuss the current state of the onboarding, answering open questions or to provide regular feedback. As time went on, the frequency of the 1:1s decreased, because people got used to working in the team.</p>
<h2>Feedback</h2>
<p>Providing regular feedback is the key to success during the onboarding. It’s supposed to create this continuous feedback loop to inform the new team members about how their contribution is viewed, get them used to Zalando's feedback culture and to also reflect on how the onboarding is working out and if it needs to be tweaked. To make sure we don’t forget to provide feedback, we set up monthly feedback sessions between the team and each new team member. While doing this we experimented with three different formats.</p>
<ul>
<li>An open round where everybody shares the feedback freely.</li>
<li>The feedback is given in short 1:1 sessions between each team member.</li>
<li>The team collects the feedback and presents then one summarized view to the new team member.</li>
</ul>
<p>Overall it’s impossible to say which format is the best. It could be intimidating in the beginning to receive feedback from the whole team in an open round, but fine at a later point in time when the team knows each other better. It depends on the situation and the people and we gave our new team members the possibility to choose. As those feedback sessions were also meant for the new member to provide feedback to the team, we prepared some questions to collect the feedback.</p>
<ul>
<li>What do you think about the onboarding so far?</li>
<li>Is there any information that you missed or would have liked to receive earlier?</li>
<li>Is your workload manageable for you? Are the tasks too easy/too difficult?</li>
<li>Would you like to receive more/less support?</li>
<li>Is there anything you would like to work more on?</li>
<li>How comfortable would you feel if all other team members fall sick and you are alone working on tasks and support requests?</li>
</ul>
<p>The last question is probably the most important one. It asks the new team members to reflect on themself and check how confident they are about their skills already. This is an important indicator for the team to maybe put some focus on certain areas that were missed so far in the onboarding. This way we found out that we needed to become better at introducing the on-call and incident process in our team as this was completely missed.</p>
<p><img alt="Picture of Sokoban Standup" src="https://engineering.zalando.com/posts/2021/04/images/team_pic.jpg"></p>
<h2>Technical Onboarding</h2>
<p>The onboarding consists of course of some technical onboarding as well. We did the obligatory domain introduction and some introductions into our ways of working, like the sprint ceremonies. It’s important to not overwhelm the new team members in the start. Many if not most information can be also shared down the line when it’s necessary. It’s better to focus on the basics in the beginning and give time to let that sink in. But at some point the new team members need to get their hands dirty and work on some real tasks. To make the start easier, we defaulted to pair programming or even mob programming in the beginning. It was the rule that the tasks had to be done with at least two persons unless other circumstances prevented it. Pair programming while working remotely is even more important than usually. Not only because it allows for easy, “on the job” knowledge sharing, but it also allows the participants to bond and get to know each other. The pair programming was done with simple tools. The person programming was using their IDE of their choice and the screen was shared via the call so that other persons could watch the coding. Of course other tools and IDE plugins exist that try to make the whole setup even better, but in our experience it worked pretty well without them.</p>
<p>In our team we have a team role that rotates each day and that person takes care of incoming support requests from internal clients. Usually this requires a certain level of domain and system knowledge. We decided to onboard the new team members pretty fast to the role. On the one hand it frees up some time from the more experienced engineers and on the other hand it provides another learning opportunity for the new team members. As long as this was transparently communicated with clients, they didn’t mind that some support requests took longer than usual and the new team members made huge progress on domain knowledge in a relatively short time.</p>
<h2>Relationships</h2>
<p>The last part of the onboarding relates to the relationships inside the team. We are not just robots coming into work, but we are humans with emotions, goals and sometimes also problems. I believe that trust is an essential ingredient for efficient teams. It allows you to speak up freely, you can make mistakes and addressing conflicts leads to constructive discussions. And during the pandemic you are missing out on a lot of opportunities to get to know your new team-mates as there are no team lunches, no short discussions at the coffee machine and no rounds of table tennis during the breaks. This can quickly start to feel like you are being left alone with your problems. Therefore we introduced a weekly “Team Bonding” session which was moderated by our producer. The producer is responsible for team processes in our team and in case you don't have such a role, any person, be it a team member, team lead or somebody outside the team, could facilitate this meeting.</p>
<p>Every week she came up with new ideas for the session. Sometimes we just presented to each other personal objects from our home, another time we did powerpoint karaoke or we played a game like Tabu. Some of those exercises had some goals, like improving your presentation skills, but in the end it was always about the people and getting to know them. What drives your team-mates? What kind of humour do they have? What keeps them up at night right now? Opening up really helps to create this bond and increase the trust among each other. Such exercises can of course also be done when everybody is back at the office to continue the bonding between team-mates and are not only valuable when you are working remotely.</p>
<h2>Summary</h2>
<p>Summing up this article, it boils down to some simple points. Take your time to do a proper onboarding and be transparent with clients and leads about possible delays for support requests or roadmaps. Remind yourself constantly about providing feedback to give guidance and prevent unpleasant surprises. And don’t forget about the personal relationships that need to be created, because they will allow you to trust each other and also feel safe while making mistakes. Following those rules is very time intensive, but it pays off in the long run and we were able to build an awesome team in just about three months that already increased the productivity compared to before. Of course there is no one-size-fits-all solution regarding the onboarding and different teams might have different needs, but this setup worked very well for us.</p>
<h2>Other Resources</h2>
<ul>
<li><a href="https://miro.com/guides/remote-work/onboarding">Miro: Remote Onboarding Checklist</a></li>
<li><a href="https://resources.owllabs.com/blog/remote-employee-onboarding">OwlLabs: 7 Remote Employee Onboarding Tips and Checklist for Your Next New Hire</a></li>
<li><a href="https://about.gitlab.com/company/culture/all-remote/onboarding/">GitLab: The guide to remote onboarding</a></li>
<li><a href="https://www.hive.hr/blog/improve-your-remote-onboarding-experience/">Hive: 16 Ways to Improve Your Remote Onboarding Experience</a></li>
<li><a href="https://martinfowler.com/articles/on-pair-programming.html">Martin Fowler: On Pair Programming</a></li>
<li><a href="https://www.agilealliance.org/glossary/pairing/">Agile Alliance: Pair Programming</a></li>
<li><a href="https://www.agilealliance.org/glossary/mob-programming/">Agile Alliance: Mob Programming</a></li>
<li><a href="https://www.remotemobprogramming.org/">Remote Mob Programming</a></li>
<li><a href="https://www.powerpointkaraoke.com/">PowerPoint Karaoke</a></li>
<li><a href="https://slidelizard.com/en/blog/powerpoint-karaoke-rules-and-free-download">A Guide to PowerPoint Karaoke</a></li>
</ul>Modeling Errors in GraphQL2021-04-13T00:00:00+02:002021-04-13T00:00:00+02:00Boopathi Rajaa Nedunchezhiyantag:engineering.zalando.com,2021-04-13:/posts/2021/04/modeling-errors-in-graphql.html<p>GraphQL excels in modeling data requirements. Modeling errors as schema types in GraphQL is required for certain kinds of errors. In this post, let's analyze some cases where errors contain structured data apart from the message and the location information.</p><p><img alt="Use case to distinguish different errors" src="https://engineering.zalando.com/posts/2021/04/images/use-case.jpg"></p>
<h2>GraphQL Errors</h2>
<p>GraphQL is an excellent language for writing data requirements in a declarative fashion. It gives us a clear and well-defined concept of nullability constraints and error propagation. In this post, let's discuss how GraphQL lacks in certain places regarding errors and how we can model those errors to fit some of our use-cases.</p>
<p>Before we dive into the topic, let's understand how GraphQL currently treats and handles errors. The response of a GraphQL query is of the following structure -</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">"data"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"foo"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">"errors"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"message"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Something happened"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"path"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"foo"</span><span class="p">,</span><span class="w"> </span><span class="s2">"bar"</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div>
<h2>Error extensions</h2>
<p>The Schema we define for GraphQL is used only in the data field of the response. The <code>errors</code> field is a well-defined structure - <code>Array<{ message: string, path: string[] }></code> in its simplest form. The Schema we define does not affect this Error.</p>
<p>Let's say the client queries a field using an ID. How can the client know from the above error object whether the Error is due to an Internal Server Error or the ID is Not_Found? Parsing the message is a no-go because it is not reliable.</p>
<p>Luckily, in GraphQL, there is a way to provide extensions to the error structure - using <code>extensions</code>. The <code>error.extensions</code> can convey other information related to the Error - properties, metadata, or other clues from which the client can benefit. As for the above example, we can model the response to be -</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">data</span><span class="o">:</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="nx">errors</span><span class="o">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">message</span><span class="o">:</span><span class="w"> </span><span class="s2">"Not Found"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">extensions</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">code</span><span class="o">:</span><span class="w"> </span><span class="s2">"NOT_FOUND"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">],</span>
<span class="p">};</span>
</code></pre></div>
<h2>Errors for Customers</h2>
<p>When we have a GraphQL API that delivers content to the end-user - the customers, i.e., we have two levels of users -</p>
<ol>
<li>The <strong>Developer</strong> or <strong>user</strong> of the API - UI/UX/front-end developer.</li>
<li>The <strong>Customer</strong> or <strong>end-user</strong> - The one who does not see any technical layers but gets the product's experience in its most presentable format. The Front-end developer builds this experience using data from the GraphQL API.</li>
</ol>
<p>Since using the word <strong>user</strong> might be confusing, from now on, <strong>Developer</strong> will refer to the front-end developer, and <strong>Customer</strong> will refer to the end-user.</p>
<p><img alt="Customer vs Developer" src="https://engineering.zalando.com/posts/2021/04/images/customer-developer.jpg"></p>
<p>When we have an API whose data is directly consumed by two levels of these users - Developer and Customer, there might be different error data requirements. For example, let's take <code>mutations</code> - when the Customer enters an invalid email address,</p>
<ol>
<li>The <strong>Developer</strong> who uses the GraphQL API needs to know that the Customer has entered an Invalid Email address via a <strong>parseable format</strong> - a boolean or enum or whatever data structure you choose will work except parsing the error message.</li>
<li>The <strong>Customer</strong> needs to care about the error message in a nicely styled format close to the text box. Also, for <strong>different languages</strong> or locales, the error message needs to be in the corresponding <strong>translated</strong> text.</li>
</ol>
<p>Let's try to model this using the error extensions discussed above -</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">"data"</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="nt">"errors"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"message"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Die E-Mail-Addresse ist ungültig"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"extensions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"code"</span><span class="p">:</span><span class="w"> </span><span class="s2">"INVALID_EMAIL"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div>
<p>While this would work, we soon end up in a case where multiple input fields in a mutation can be invalid. What can we do here? Do we model them as different errors or fit everything into the same Error.</p>
<p>The Customer errors still need to be usable by the Developers to propagate it. The front-end developers are the ones ultimately transforming our data structures to UI elements. So they need to understand the Error to highlight that input text-box with a <strong>red</strong> border. So, to make it easy, let's try modeling these as a single error with multiple validation messages -</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">"data"</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="nt">"errors"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"message"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Multiple inputs are invalid"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"extensions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"invalidInputs"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"code"</span><span class="p">:</span><span class="w"> </span><span class="s2">"INVALID_EMAIL"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"message"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Die E-Mail-Addresse ist ungültig"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"code"</span><span class="p">:</span><span class="w"> </span><span class="s2">"INVALID_PASSWORD"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"message"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Das Passwort erfüllt nicht die Sicherheitsstandards"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div>
<p>The codes <code>INVALID_EMAIL</code> and <code>INVALID_PASSWORD</code> will help the front-end dev or <strong>Developer</strong> highlight the field in the UI, and the message will be displayed to the user right under that text-box.</p>
<p>All this leads to a complicated structure very soon and is not as friendly as the data modeled with a GraphQL schema.</p>
<h2>Why you no Schema?</h2>
<p><img alt="Errors don't have type definitions" src="https://engineering.zalando.com/posts/2021/04/images/error-schema.jpg"></p>
<p>The biggest problem we face in modeling these in the extension object is that it's not discoverable. We use such a powerful language like GraphQL to define each field in our data structure using Schemas, but when designing the errors, we went back to a <strong>loose mode</strong> of not using any of the ideas GraphQL brought us.</p>
<p>Maybe, in future extensions of the language, we can write schemas for Errors as we write for Queries and Mutations. The developers using the Schema get all the benefits of GraphQL even when handling errors. For now, let's concentrate on modeling this using the existing language specification.</p>
<h2>Errors in Schema</h2>
<p>We want to enjoy the power of GraphQL - the discoverability of fields of data, the tooling, and other aspects for errors. Why don't we put some of these errors in the Schema instead of capturing them in extensions?</p>
<p>For example, the mutation discussed previously can be modeled like this -</p>
<ol>
<li>mutation returns a <code>Result</code> type</li>
<li><code>Result</code> type is a <code>union</code> of <code>Success</code>, <code>Error</code>.</li>
<li>Error schema contains necessary error info - like translated messages, etc.</li>
</ol>
<div class="highlight"><pre><span></span><code><span class="k">type</span><span class="w"> </span><span class="err">Mutation</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">register</span><span class="p">(</span><span class="n">email</span><span class="p">:</span><span class="w"> </span><span class="no">String</span><span class="err">!</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">:</span><span class="w"> </span><span class="no">String</span><span class="err">!</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">RegisterResult</span>
<span class="p">}</span>
<span class="k">union</span><span class="w"> </span><span class="err">RegisterResult</span><span class="w"> </span><span class="err">=</span><span class="w"> </span><span class="err">RegisterSuccess</span><span class="w"> </span><span class="err">|</span><span class="w"> </span><span class="err">RegisterError</span>
<span class="k">type</span><span class="w"> </span><span class="err">RegisterSuccess</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">ID</span><span class="err">!</span>
<span class="w"> </span><span class="nl">email</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="err">!</span>
<span class="p">}</span>
<span class="k">type</span><span class="w"> </span><span class="err">RegisterError</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">invalidInputs</span><span class="p">:</span><span class="w"> </span><span class="err">[</span><span class="n">RegisterInvalidInput</span><span class="err">]</span>
<span class="p">}</span>
<span class="k">type</span><span class="w"> </span><span class="err">InvalidInput</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">field</span><span class="p">:</span><span class="w"> </span><span class="n">RegisterInvalidInputField</span><span class="err">!</span>
<span class="w"> </span><span class="nl">message</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="err">!</span>
<span class="p">}</span>
<span class="k">enum</span><span class="w"> </span><span class="err">RegisterInvalidInputField</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">EMAIL</span>
<span class="w"> </span><span class="n">PASSWORD</span>
<span class="p">}</span>
</code></pre></div>
<p>This structure looks exactly like the one we designed above inside error extensions. The advantage of modeling it like this would be that we are using the benefits of GraphQL for errors.</p>
<h2>When you have a hammer,</h2>
<p>Now, with the idea of modeling errors as Schema types, we are left with more questions than answers -</p>
<ol>
<li>Should I model all errors as GraphQL types?</li>
<li>How should I decide when to use error extensions and when to use GraphQL types for modeling errors?</li>
<li>etc.</li>
</ol>
<p><img alt="The Problem hammer" src="https://engineering.zalando.com/posts/2021/04/images/problem-nails.jpg"></p>
<p>When we have multiple teams maintaining the platform, many people contribute and think about modeling different parts of the Schema. There should be clear definitions for the different aspects of the existing data structures and the idea behind how we reached such solutions. The design and the Schema are changed far fewer times than it is read/used.</p>
<p>GraphQL gave us the mindset of <a href="https://graphql.org/learn/thinking-in-graphs/">"Thinking in Graphs"</a>. If we suggest a new way of modeling errors, we need to talk about this mindset and its ideas. Not all errors fit into this modeling (error types in Schema), and it will make the GraphQL API less usable if we approach it by looking at all the errors as nails.</p>
<h2>Classification</h2>
<p>To model errors, let's try to find some analogies. I want to think about modeling these errors in terms of programming language errors. For example,</p>
<ol>
<li>Go: Error vs. panic</li>
<li>Java: Error vs. Exception</li>
<li>Rust: Error vs. runtime exception</li>
</ol>
<p>The programming languages also model errors as two variants. In one model (an <code>error</code> type in go), we inform the Developer who uses the function. The Developer decides either to handle it or to pass it through. In the other variant (a <code>panic</code> in go), we skip everything and bring the program to a halt. We inform the end-user of the program that something has happened. This small variation captured as two different things help us understand the intention of data in errors.</p>
<h3>Part 1. Action-ables</h3>
<p>What is an error? It tells us that something is wrong and gives us some information on what action can be taken. We can think of errors as containers of <strong>action</strong>-ables. When modeling them, we classify them into different groups depending on <strong>who</strong> can take that action.</p>
<p>In GraphQL context, for some errors, the front-end takes care of it - either by a fallback or a retry. In case of some other errors like the invalid inputs, the front-end cannot take action; only the Customer who entered the invalid input can fix the input.</p>
<p>Instead of modeling the errors loosely, we now have a concrete use-case - model it for whoever can take action.</p>
<h3>Part 2. Bugs in the system</h3>
<p>Errors convey information - either to <strong>Developer</strong> or <strong>Customer</strong>. If the Error is conveying some bug in the system, it should <strong>not</strong> be modeled as schema error types. Here, the system means all the services and software involved in our entire product and not just the GraphQL service. It is essential because it separates the end-user / Customer vs. Developer who uses the API - the end-user looks at our product as one thing, not many individual services.</p>
<p>In the <code>404 Not Found</code> case, if we had modeled the errors as schema types, it would make the Schema less usable. Let's take a product look-up use-case -</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="n">product</span><span class="p">(</span><span class="n">id</span><span class="p">:</span><span class="w"> </span><span class="s">"foo"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="nc">ProductSuccess</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">success</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="nc">ProductError</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">error</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">collection</span><span class="p">(</span><span class="n">id</span><span class="p">:</span><span class="w"> </span><span class="s">"bar"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="nc">CollectionSuccess</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">products</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="nc">ProductSuccess</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">success</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="nc">CollectionError</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">error</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>This way of handling errors at every level is not friendly for front-end developers. It's too much to type in a query and too many branches to handle in the code.</p>
<h3>Part 3. Error propagation</h3>
<p>We also have to remember not to disrupt GraphQL semantics of error propagation. If an error occurs in one place in the query, it propagates upwards in the tree till the first nullable field occurs. This propagation does not happen with error types in Schema. It is essential to model these schema error types for only specific use-cases. We go back to Part 1: Action-ables - we design these types for actions that the end-user or Customer can take.</p>
<h2>The Problem type</h2>
<p>Naming is half the battle in GraphQL. Since the name <code>error</code> is already taken by the GraphQL language (<code>response.errors</code>), it would be confusing to name our error types in Schema as <code>Error</code>. As we did before to look for inspirations, there is a well-defined concept in <a href="https://tools.ietf.org/html/rfc7807">RFC 7807 - Problem details for HTTP API</a>. So, we will call all our errors in Schema as Problems and, as it has always been, all other errors as errors.</p>
<p>The above register schema with the <code>Problem</code> type would look like this -</p>
<div class="highlight"><pre><span></span><code><span class="k">type</span><span class="w"> </span><span class="err">Mutation</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">register</span><span class="p">(</span><span class="n">email</span><span class="p">:</span><span class="w"> </span><span class="no">String</span><span class="err">!</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">:</span><span class="w"> </span><span class="no">String</span><span class="err">!</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">RegisterResult</span>
<span class="p">}</span>
<span class="k">union</span><span class="w"> </span><span class="err">RegisterResult</span><span class="w"> </span><span class="err">=</span><span class="w"> </span><span class="err">RegisterSuccess</span><span class="w"> </span><span class="err">|</span><span class="w"> </span><span class="err">RegisterProblem</span>
<span class="k">type</span><span class="w"> </span><span class="err">RegisterSuccess</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">ID</span><span class="err">!</span>
<span class="w"> </span><span class="nl">email</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="err">!</span>
<span class="p">}</span>
<span class="k">type</span><span class="w"> </span><span class="err">RegisterProblem</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="err">"</span><span class="n">translated</span><span class="w"> </span><span class="n">message</span><span class="w"> </span><span class="n">encompassing</span><span class="w"> </span><span class="n">all</span><span class="w"> </span><span class="n">invalid</span><span class="w"> </span><span class="n">inputs</span><span class="err">."</span>
<span class="w"> </span><span class="nl">title</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="err">!</span>
<span class="w"> </span><span class="nl">invalidInputs</span><span class="p">:</span><span class="w"> </span><span class="err">[</span><span class="n">RegisterInvalidInput</span><span class="err">]</span>
<span class="p">}</span>
<span class="k">type</span><span class="w"> </span><span class="err">InvalidInput</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">field</span><span class="p">:</span><span class="w"> </span><span class="n">RegisterInvalidInputField</span><span class="err">!</span>
<span class="w"> </span><span class="err">"</span><span class="n">translated</span><span class="w"> </span><span class="n">message</span><span class="err">."</span>
<span class="w"> </span><span class="nl">message</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="err">!</span>
<span class="p">}</span>
<span class="k">enum</span><span class="w"> </span><span class="err">RegisterInvalidInputField</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">EMAIL</span>
<span class="w"> </span><span class="n">PASSWORD</span>
<span class="p">}</span>
</code></pre></div>
<h2>Problem or Error</h2>
<p><img alt="Errors vs Problems" src="https://engineering.zalando.com/posts/2021/04/images/problem-vs-error-2.jpg"></p>
<p><strong>Problem</strong> refers to the Error as a Schema type. ** Error** refers to the Error that appears in the <code>response.errors</code> array with an error code at <code>error.extensions.code</code>.</p>
<h3>Case 1: Resource Not Found</h3>
<p>404s are bugs in the system in case of navigation. If the user navigates from the home page to a product page and ends up on a 404 page, some service selected an id that leads to 404 when resolved and this has most likely been the case upon selection. It's not something because the user entered some input. Also, these errors need to be propagated. So, this becomes an Error with an error code as <code>NOT_FOUND</code> and not a Problem.</p>
<h3>Case 2: Authorization</h3>
<p>Authorization errors are of the Error type and do not fit a problem type. Here, the action taker looks like it's the Customer who needs to log in. But, the UI can take action here and show a login dialog box to the Customer. In apps, the app decides to take the Customer to the login view. The action belongs to the Front-end and only then the Customer. So, we model it for the developer/front-end as an Error with error code <code>NOT_AUTHORIZED</code> and not a Problem.</p>
<h3>Case 3: Mutation Inputs</h3>
<p>Mutation Inputs is the only case where it is crucial to construct Problem types. It contains inputs directly from the Customer, and only the Customer can take action for this. So, we model these errors as Problems and not Errors.</p>
<h3>Case 4: All other bugs / errors</h3>
<p>Any runtime exception in the code or Internal Server Errors from any backends that the GraphQL layer connects to should be modeled as Error and need not contain an error code. This way, it is easy for the front-end to treat all non-error code responses as Internal Server Errors and take action accordingly - to retry or show the Customer an error page.</p>
<h2>Conclusion</h2>
<p>We have discussed Problem type as a possible solution where the error object in the GraphQL response does not suffice the use-cases. But we have to be careful about not overusing this for many use-cases where the error extensions already provide enough value.</p>
<p>We have to understand that the Problem type in <strong>unnecessary</strong> places does make the query and front-end code complicated. Our GraphQL Schema should try to simplify and provide a friendly interface.</p>
<h2>Related posts</h2>
<p>In case you are interested, here are further posts in the GraphQL series -</p>
<ul>
<li><a href="https://engineering.zalando.com/posts/2021/03/how-we-use-graphql-at-europes-largest-fashion-e-commerce-company.html">Introduction to how we use GraphQL at Zalando</a></li>
<li><a href="https://engineering.zalando.com/posts/2023/10/understanding-graphql-directives-practical-use-cases-zalando.html">Understanding GraphQL Directives: Practical Use-Cases at Zalando</a></li>
<li><a href="https://engineering.zalando.com/posts/2022/02/graphql-persisted-queries-and-schema-stability.html">GraphQL persisted queries and Schema stability</a></li>
<li><a href="https://engineering.zalando.com/posts/2021/03/optimize-graphql-server-with-lookaheads.html">Optimize GraphQL Server with Lookaheads</a></li>
</ul>
<hr>
<p>_If you would like to work on similar challenges, consider <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=95c8de231us&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B1%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B2%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B3%5D=Software%20Engineering%20-%20Frontend&filters%5Bcategories%5D%5B4%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Machine%20Learning&filters%5Bcategories%5D%5B7%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B8%5D=Software%20Engineering%20-%20Principal%20Engineering&filters%5Bcategories%5D%5B9%5D=Applied%20Science%20%26%20Research&filters%5Bcategories%5D%5B10%5D=Product%20Design%20%26%20User%20Experience&filters%5Bcategories%5D%5B11%5D=Product%20Management&search=software%20engineer">joining our engineering teams</a>.</p>Optimize GraphQL Server with Lookaheads2021-03-18T00:00:00+01:002021-03-18T00:00:00+01:00Boopathi Rajaa Nedunchezhiyantag:engineering.zalando.com,2021-03-18:/posts/2021/03/optimize-graphql-server-with-lookaheads.html<p>GraphQL offers a way to optimize the data between a client and a server. We can use the declarative nature of a GraphQL query to perform lookaheads. Lookaheads provide us a way to optimize the data between the GraphQL server and a backend data provider - like a database or another server that can return partial responses.</p><p>In our first post about <a href="https://engineering.zalando.com/posts/2021/03/how-we-use-graphql-at-europes-largest-fashion-e-commerce-company.html">How we use GraphQL at Zalando</a>, we briefly shared about performance optimizations using <a href="https://github.com/zalando-incubator/graphql-jit">GraphQL-JIT</a>. GraphQL-JIT allowed us to scale our implementation without performance degradations. In this post, we share another optimization we use - <strong>Lookaheads</strong>.</p>
<p><img alt="Lookaheads" src="https://engineering.zalando.com/posts/2021/03/images/lookaheads.png"></p>
<h2>Same Model; Different Views</h2>
<p>In our GraphQL service, we do not have resolvers for every single field in the schema. Instead, we have certain groups of fields resolved together as a single request to a backend service that provides the data. For example, let's take a look at the <code>product</code> resolver,</p>
<div class="highlight"><pre><span></span><code><span class="nx">resolvers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Query</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">product</span><span class="p">(</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ProductBackend</span><span class="p">.</span><span class="nx">getProduct</span><span class="p">(</span><span class="nx">id</span><span class="p">);</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">},</span>
<span class="p">};</span>
</code></pre></div>
<p>This resolver will be responsible for getting multiple properties of the <code>Product</code> - name, price, stock, images, material, sizes, brand, color, other colors, and further details. The same <strong>Product</strong> type in the schema can render as a Product Card in a grid or the entire Product Page. The amount of data required for a Product card is less than the complete product details of a product page.</p>
<p><img alt="Different views of the same model" src="https://engineering.zalando.com/posts/2021/03/images/same-model-different-views.png"></p>
<p>Every time the product resolver is called, the entire response from the product backend is requested by the GraphQL service. Though GraphQL allows us to specify the data requirements to fetch optimally, it becomes beneficial only between the client-server communication. The data transfers between the GraphQL server and the Backend server remain unoptimized.</p>
<h2>Partial Responses</h2>
<p>Most of the backend services in Zalando support <a href="https://cloud.google.com/blog/products/api-management/restful-api-design-can-your-api-give-developers-just-information-they-need">Partial responses</a>. In the request, one can specify the fields' list. Only these fields must be in the response trimming other fields which were not specified in the request. The backend service treats this as a filter and returns only those fields. It is similar to what GraphQL offers us, and the request somewhat looks like this -</p>
<div class="highlight"><pre><span></span><code><span class="err">GET /product?id=product-id&fields=name,stock,price</span>
</code></pre></div>
<p>Here, the <code>fields</code> query parameter is used to declare the required response fields. The backend can use this to compute only those response fields. Likewise, the backend can pass it further down the pipeline to another service or database. The response for the above request would look like the following -</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Fancy T-Shirt"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"stock"</span><span class="p">:</span><span class="w"> </span><span class="s2">"AVAILABLE"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"price"</span><span class="p">:</span><span class="w"> </span><span class="s2">"EUR 35.50"</span>
<span class="p">}</span>
</code></pre></div>
<p>Partial responses help in reducing the amount of data over the wire and give a good performance boost. A GraphQL query is also precisely the same thing - it provides a well-defined language for the fields parameter in the above request.</p>
<h2>Lookahead</h2>
<p>Let's leverage these partial responses and use them in the GraphQL server. When resolving the product, we must know what the next fields are within this product, (or) we need to <strong>look ahead</strong> in the query to get the sub-fields of the product.</p>
<div class="highlight"><pre><span></span><code><span class="k">query</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">product</span><span class="p">(</span><span class="n">id</span><span class="p">:</span><span class="w"> </span><span class="s">"foo"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">name</span>
<span class="w"> </span><span class="n">price</span>
<span class="w"> </span><span class="n">stock</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>A thing to note - name, stock, and price do not have explicitly declared resolvers. When resolving <strong>product</strong>, how can we know what its sub-selections are? Here, navigating the query <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">AST (Abstract Syntax Tree)</a> helps. During execution, the resolver function will receive the AST of the current field. The structure of the AST depends on the language and implementation. For <a href="https://github.com/graphql/graphql-js">GraphQL-JS</a>, or <a href="https://github.com/zalando-incubator/graphql-jit">GraphQL-JIT</a> executors, it is available in the last parameter (of the resolver function) which is called a <strong>Resolve Info</strong>.</p>
<div class="highlight"><pre><span></span><code><span class="nx">resolvers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Query</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">product</span><span class="p">(</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">info</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">fields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">getFields</span><span class="p">(</span><span class="nx">info</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ProductBackend</span><span class="p">.</span><span class="nx">getProduct</span><span class="p">(</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">fields</span><span class="p">);</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">},</span>
<span class="p">};</span>
</code></pre></div>
<p>We use the query AST in the resolve info to compute the list of fields under product, pass this list of fields to the product backend, which supports partial responses, and then send the backend response as the resolved result.</p>
<h2>Field Nodes</h2>
<p>The resolve info is useful for doing a lot of optimizations. Here, for this case, we are interested in the <strong>fieldNodes</strong>. It is an array of objects, each representing the same field - in this case - <strong>product</strong>. Why is it an array? A single field may appear in more than one place in a query - for instance, fragments, inline fragments, aliasing, etc. For simplicity, we will not consider fragments and aliasing in this post.</p>
<p>The entire query is a tree of field nodes where the children at each level are available as selection sets.</p>
<p>Each fieldNode has a <strong>Selection Set</strong>, a list of <strong>subfield nodes</strong> - here - the selection set will be the field nodes of name, stock, and price. So the <code>getFields</code> implementation (without considering fragments and aliasing) will look like the following -</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span><span class="w"> </span><span class="nx">getFields</span><span class="p">(</span><span class="nx">info</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// TODO: handle all field nodes in other fragments</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">info</span><span class="p">.</span><span class="nx">fieldNodes</span><span class="p">[</span><span class="mf">0</span><span class="p">].</span><span class="nx">selectionSet</span><span class="p">.</span><span class="nx">selections</span><span class="p">.</span><span class="nx">map</span><span class="p">(</span>
<span class="w"> </span><span class="p">(</span><span class="nx">selection</span><span class="p">)</span><span class="w"> </span><span class="p">=></span>
<span class="w"> </span><span class="c1">// TODO: handle fragments</span>
<span class="w"> </span><span class="nx">selection</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span>
<span class="w"> </span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>
<p>When we pass product resolver's info, the <code>getFields</code> function returns <code>[name, stock, price]</code>. We can take this list and pass it to the backend as the query parameter.</p>
<p>For simple use-cases like these, where the backend data structure and the GraphQL schema are the same, it's possible to use GraphQL fields as the backend fields. When it's a bit different, we need to map the schema fields to backend fields for the request. Also, we need to map the backend fields back to schema fields for the response.</p>
<h2>Different schemas</h2>
<p>If the backend fields are different from the GraphQL schema fields, then there exists a mapping from schema fields to backend fields. A simple mapping may be the difference in the name of the fields. For example, <code>name</code> in schema might be <code>title</code> in the backend. This mapping can get complex where a single schema field might derive from multiple backend fields. For example, price in schema might be a concatenation of <em>currency</em> and <em>amount</em> from the backend. It gets interesting when we have nested structures - for example, <code>price</code> in schema might be a concatenation of <code>price.currency</code> and <code>price.amount</code>.</p>
<h2>The response is partial</h2>
<p>Another aspect of this mapping is that it's not enough to think about it one way - from schema fields to backend fields. It only suffices the request from the GraphQL server to the backend server. The response that the backend sends must transform to match the schema, and it isn't free when we have such complications in the mapping of fields.</p>
<p>When we have a single transform function that converts backend response to match the schema, we have to understand that it is built from a <a href="https://cloud.google.com/blog/products/api-management/restful-api-design-can-your-api-give-developers-just-information-they-need">partial response</a> and not the complete response -</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span><span class="w"> </span><span class="nx">backendProductToSchemaProduct</span><span class="p">(</span><span class="nx">backendProduct</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span><span class="o">:</span><span class="w"> </span><span class="nx">backendProduct</span><span class="p">.</span><span class="nx">title</span><span class="p">,</span>
<span class="w"> </span><span class="c1">// we have a problem here -</span>
<span class="w"> </span><span class="nx">price</span><span class="o">:</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">backendProduct</span><span class="p">.</span><span class="nx">currency</span><span class="si">}</span><span class="sb"> </span><span class="si">${</span><span class="nx">backendProduct</span><span class="p">.</span><span class="nx">amount</span><span class="si">}</span><span class="sb">`</span><span class="p">,</span>
<span class="w"> </span><span class="nx">stock</span><span class="o">:</span><span class="w"> </span><span class="nx">backendProduct</span><span class="p">.</span><span class="nx">stock_availability</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</code></pre></div>
<p>In the above implementation, when the query is <code>{ product(id) { name } }</code>, the transformer will try to convert, assuming the complete response is available. Since the backend responded with partial data (only the <code>name</code> field is used), the access to a nested property will throw an error - <code>Cannot read property currency of 'undefined'</code>. We could have a <code>null</code> check at every place, but the code becomes not maintainable. So we need a way to model it both ways -</p>
<ol>
<li>Map schema fields to backend fields during the request to the backend</li>
<li>Map backend fields to schema fields with the response from the backend</li>
</ol>
<h2>Dependency Maps</h2>
<p>The mapping we talked about in our scribbling phase is what a dependency map is. Every schema field depends on one or many nested fields in the backend. A way to represent this can be as simple as an object whose keys are schema fields, and the values are a list of <a href="https://github.com/mariocasciaro/object-path#usage">object paths</a>.</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span><span class="w"> </span><span class="nx">dependencyMap</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"title"</span><span class="p">],</span>
<span class="w"> </span><span class="nx">price</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"price.currency"</span><span class="p">,</span><span class="w"> </span><span class="s2">"price.amount"</span><span class="p">],</span>
<span class="w"> </span><span class="nx">stock</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"stock_availability"</span><span class="p">],</span>
<span class="p">};</span>
</code></pre></div>
<p><img alt="Dependency Map" src="https://engineering.zalando.com/posts/2021/03/images/dependencies.png"></p>
<p>From this dependency map, we can create our request to the backend. Let's say the backend takes a query parameter <code>fields</code> in the following form - a comma-separated list of object path strings. Depending on the implementation, there can be a wide variety of formats for this. Here, we will take a simple one.</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span><span class="w"> </span><span class="nx">getBackendFields</span><span class="p">(</span><span class="nx">schemaFields</span><span class="p">,</span><span class="w"> </span><span class="nx">dependencyMap</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Set helps in deduping</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">backendFields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Set</span><span class="p">(</span>
<span class="w"> </span><span class="nx">schemaFields</span>
<span class="w"> </span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">field</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">dependencyMap</span><span class="p">[</span><span class="nx">field</span><span class="p">])</span>
<span class="w"> </span><span class="p">.</span><span class="nx">reduce</span><span class="p">((</span><span class="nx">acc</span><span class="p">,</span><span class="w"> </span><span class="nx">field</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">[...</span><span class="nx">acc</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">field</span><span class="p">],</span><span class="w"> </span><span class="p">[])</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">backendFields</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s2">","</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>
<p>For schema fields name and price, the computed backend fields would be a string, and we can construct the request to the backend -</p>
<div class="highlight"><pre><span></span><code><span class="err">GET /product?id=foo&fields=title,price.currency,price.amount</span>
</code></pre></div>
<h2>Transformation Maps</h2>
<p>After the request, we know that the backend returns a partial response instead of the complete response. We also saw above that a single function that transforms the entire backend response to schema fields is not enough. Here, we use a <strong>transformation map</strong>. It's a map of schema fields to transformation logic. Like the dependency map, the keys are schema fields, but the values are transform functions that use only specific fields from the backend.</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span><span class="w"> </span><span class="nx">transformerMap</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="nx">resp</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">resp</span><span class="p">.</span><span class="nx">title</span><span class="p">,</span>
<span class="w"> </span><span class="nx">price</span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="nx">resp</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">resp</span><span class="p">.</span><span class="nx">currency</span><span class="si">}</span><span class="sb"> </span><span class="si">${</span><span class="nx">resp</span><span class="p">.</span><span class="nx">amount</span><span class="si">}</span><span class="sb">`</span><span class="p">,</span>
<span class="w"> </span><span class="nx">stock</span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="nx">resp</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">resp</span><span class="p">.</span><span class="nx">stock_availability</span><span class="p">,</span>
<span class="p">};</span>
</code></pre></div>
<p>As you see here, each value is a function where the only properties used inside this function are from the <strong>dependency map</strong>. To construct the result object from the partial response of the backend, we use the same computed sub-fields (from the <code>getFields</code> function) and use them on the transformer map. For example -</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span><span class="w"> </span><span class="nx">getSchemaResponse</span><span class="p">(</span><span class="nx">backendResponse</span><span class="p">,</span><span class="w"> </span><span class="nx">transformerMap</span><span class="p">,</span><span class="w"> </span><span class="nx">schemaFields</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">schemaResponse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">const</span><span class="w"> </span><span class="nx">field</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="nx">schemaFields</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">schemaResponse</span><span class="p">[</span><span class="nx">field</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">transformerMap</span><span class="p">[</span><span class="nx">field</span><span class="p">](</span><span class="nx">backendResponse</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">schemaResponse</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<h2>So far,</h2>
<p>Let's recap on how the concept we have so far unwrapped -</p>
<ol>
<li><code>getFields</code>: compute sub-fields by looking ahead in AST</li>
<li><code>getBackendFields</code>: compute backend fields from sub-fields and dependency map</li>
<li>request the backend with the computed backend fields</li>
<li><code>getSchemaResponse</code>: compute schema response from partial backend response, sub-fields, and the transformer map</li>
</ol>
<h2>Batching</h2>
<p>At Zalando, like <a href="https://cloud.google.com/blog/products/api-management/restful-api-design-can-your-api-give-developers-just-information-they-need">partial responses</a>, most of our backends support batching multiple requests into a single request. Instead of getting a resource by its <code>id</code>, most backends have to get resources by <code>ids</code>. For example,</p>
<div class="highlight"><pre><span></span><code><span class="err">GET /products?ids=a,b,c&fields=name</span>
</code></pre></div>
<p>will return the response,</p>
<div class="highlight"><pre><span></span><code><span class="p">[{</span><span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"a"</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"b"</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"c"</span><span class="w"> </span><span class="p">}]</span>
</code></pre></div>
<p>We should take advantage of such features. One of the popular libraries that aid us in batching is the <a href="https://github.com/graphql/dataloader">DataLoader</a> by Facebook.</p>
<p>We provide the dataloader - an implementation for handling an array of inputs that returns an array of outputs/responses in the same order. The dataloader takes care of combining and batching requests from multiple places in the code in an optimal fashion. You can read more about it in the Dataloader's <a href="https://github.com/graphql/dataloader">documentation</a>.</p>
<h2>Dataloader for Product resolver</h2>
<p>When a Product appears in multiple parts of the same GraphQL query, each will create separate requests to the backend. For example, let's consider this simple GraphQL query -</p>
<div class="highlight"><pre><span></span><code><span class="k">query</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">foo</span><span class="p">:</span><span class="w"> </span><span class="n">product</span><span class="p">(</span><span class="n">id</span><span class="p">:</span><span class="w"> </span><span class="s">"foo"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span><span class="n">productCardFields</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nl">bar</span><span class="p">:</span><span class="w"> </span><span class="n">product</span><span class="p">(</span><span class="n">id</span><span class="p">:</span><span class="w"> </span><span class="s">"bar"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span><span class="n">productCardFields</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The products <code>foo</code> and <code>bar</code> are batched together into a single query using aliasing. If we implement a resolver for a product that calls the ProductBackend, we will end with <strong>two</strong> separate requests. Our goal is to make it in a single request. We can implement this with a dataloader -</p>
<div class="highlight"><pre><span></span><code><span class="k">async</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">getProductsByIds</span><span class="p">(</span><span class="nx">ids</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">products</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">fetch</span><span class="p">(</span><span class="sb">`/products?ids=</span><span class="si">${</span><span class="nx">ids</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s2">","</span><span class="p">)</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">products</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">productLoader</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">Dataloader</span><span class="p">(</span><span class="nx">getProductsByIds</span><span class="p">);</span>
</code></pre></div>
<p>We can use this <code>productLoader</code> in our <code>product</code> resolver -</p>
<div class="highlight"><pre><span></span><code><span class="nx">resolvers</span><span class="p">.</span><span class="nx">Query</span><span class="p">.</span><span class="nx">product</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">async</span><span class="w"> </span><span class="p">(</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">product</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">productLoader</span><span class="p">.</span><span class="nx">load</span><span class="p">(</span><span class="nx">id</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">product</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div>
<p>The Dataloader takes care of the magic of combining multiple calls to the load method into a single call to our implementation - <code>getProductsByIds</code>.</p>
<h2>Complexities</h2>
<p>The DataLoader deduplicates inputs, optionally cache the outputs and also provides a way to customize these functionalities. In the <code>productLoader</code> defined above, our input is the product <strong>id</strong> - a <strong>string</strong>. When we introduce the concepts of <a href="https://cloud.google.com/blog/products/api-management/restful-api-design-can-your-api-give-developers-just-information-they-need">partial responses</a>, the backend expects more than just the <code>id</code> - it also predicts the <code>fields</code> parameter used to select the fields for the response. So our input to the loader is not just a string - let's say, it's an object with keys - <code>ids</code> and <code>fields</code>. The dataloader implementation now becomes -</p>
<div class="highlight"><pre><span></span><code><span class="k">async</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">getProductsByIds</span><span class="p">(</span><span class="nx">inputs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">ids</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">inputs</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">input</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">input</span><span class="p">.</span><span class="nx">id</span><span class="p">);</span>
<span class="w"> </span><span class="c1">//</span>
<span class="w"> </span><span class="c1">// We have a problem here</span>
<span class="w"> </span><span class="c1">// v</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">fields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">inputs</span><span class="p">[</span><span class="mf">0</span><span class="p">].</span><span class="nx">fields</span><span class="p">;</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">products</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">fetch</span><span class="p">(</span>
<span class="w"> </span><span class="sb">`/products?ids=</span><span class="si">${</span><span class="nx">ids</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s2">","</span><span class="p">)</span><span class="si">}</span><span class="sb">&fields=</span><span class="si">${</span><span class="nx">fields</span><span class="si">}</span><span class="sb">`</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">products</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>Here, in the above code-block, the problem is highlighted with a comment - each of the <code>productLoader.load</code> calls can have a different set of fields. What is our strategy for merging all of these fields? Why do we need to merge?</p>
<p>Let's go back to an example and understand why we should handle this -</p>
<div class="highlight"><pre><span></span><code><span class="k">query</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">foo</span><span class="p">:</span><span class="w"> </span><span class="n">product</span><span class="p">(</span><span class="n">id</span><span class="p">:</span><span class="w"> </span><span class="s">"foo"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">name</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nl">bar</span><span class="p">:</span><span class="w"> </span><span class="n">product</span><span class="p">(</span><span class="n">id</span><span class="p">:</span><span class="w"> </span><span class="s">"bar"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">price</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The product <code>foo</code> requires <strong>name</strong> and product <code>bar</code> requires <strong>price</strong>. If we remind ourselves how this gets translated to backend fields using the dependency map, we end up with the following calls -</p>
<div class="highlight"><pre><span></span><code><span class="nx">productLoader</span><span class="p">.</span><span class="nx">load</span><span class="p">({</span>
<span class="w"> </span><span class="nx">id</span><span class="o">:</span><span class="w"> </span><span class="s2">"foo"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">fields</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"name"</span><span class="p">],</span>
<span class="p">});</span>
<span class="nx">productLoader</span><span class="p">.</span><span class="nx">load</span><span class="p">({</span>
<span class="w"> </span><span class="nx">id</span><span class="o">:</span><span class="w"> </span><span class="s2">"bar"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">fields</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"price.currency"</span><span class="p">,</span><span class="w"> </span><span class="s2">"price.amount"</span><span class="p">],</span>
<span class="p">});</span>
</code></pre></div>
<p>If these two calls get into a single batch, we need to merge the fields such that both of them work during the transformation of backend fields to schema fields. Unfortunately, it's impossible to select different fields for different ids in the backend in most cases. If this is possible in your case, you probably do not need merging. But for our use-case and probably many others, let's continue the topic assuming merging is necessary.</p>
<h2>Merging fields</h2>
<p><img alt="Merge fields and IDs" src="https://engineering.zalando.com/posts/2021/03/images/merge.png"></p>
<p>In the above example, the correct request to the backend would be -</p>
<div class="highlight"><pre><span></span><code><span class="err">GET /products</span>
<span class="err"> ? ids = foo , bar</span>
<span class="err"> & fields = name , price.currency , price.amount</span>
</code></pre></div>
<p>The merge strategy is quite simple; it's a union of all the fields. Structurally we need the following transformation - <code>[ { id, fields } ]</code> to <code>{ ids, mergedFields }</code>. The following implementation merges the inputs -</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span><span class="w"> </span><span class="nx">mergeInputs</span><span class="p">(</span><span class="nx">inputs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">ids</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">fields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Set</span><span class="p">();</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">const</span><span class="w"> </span><span class="nx">input</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="nx">inputs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ids</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">input</span><span class="p">.</span><span class="nx">ids</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">const</span><span class="w"> </span><span class="nx">field</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="nx">input</span><span class="p">.</span><span class="nx">fields</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fields</span><span class="p">.</span><span class="nx">add</span><span class="p">(</span><span class="nx">field</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ids</span><span class="p">,</span>
<span class="w"> </span><span class="nx">mergedFields</span><span class="o">:</span><span class="w"> </span><span class="p">[...</span><span class="nx">fields</span><span class="p">].</span><span class="nx">join</span><span class="p">(</span><span class="s2">","</span><span class="p">),</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</code></pre></div>
<h2>Putting it all together</h2>
<p>Combining all the little things we handled so far, the flow for the <code>product</code> field resolution would be -</p>
<ol>
<li><code>getFields</code>: compute sub-fields by looking ahead in AST</li>
<li><code>getBackendFields</code>: compute the list of backend fields from sub-fields and dependency map</li>
<li><code>productLoader.load({ id, backendFields })</code>: use the product loader to schedule in the dataloader to fetch a product.</li>
<li><code>mergeFields</code>: merge the different inputs to dataloader into a list of ids and union of all backendFields from all inputs.</li>
<li>Send the batched input as a request to the backend and get the partial response</li>
<li><code>getSchemaResponse</code>: compute schema fields from partial backend response, sub-fields computed in the first step, and the transformer map</li>
</ol>
<div class="highlight"><pre><span></span><code><span class="kd">const</span><span class="w"> </span><span class="nx">productLoader</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">DataLoader</span><span class="p">(</span><span class="nx">getBackendProducts</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">resolvers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Query</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">async</span><span class="w"> </span><span class="nx">product</span><span class="p">(</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">__</span><span class="p">,</span><span class="w"> </span><span class="nx">info</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">fields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">getFields</span><span class="p">(</span><span class="nx">info</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">backendFields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">getBackendFields</span><span class="p">(</span><span class="nx">fields</span><span class="p">,</span><span class="w"> </span><span class="nx">dependencyMap</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">backendResponse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">productLoader</span><span class="p">.</span><span class="nx">load</span><span class="p">({</span>
<span class="w"> </span><span class="nx">id</span><span class="p">,</span>
<span class="w"> </span><span class="nx">fields</span><span class="o">:</span><span class="w"> </span><span class="nx">backendFields</span><span class="p">,</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">schemaResponse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">getSchemaResponse</span><span class="p">(</span>
<span class="w"> </span><span class="nx">backendResponse</span><span class="p">,</span>
<span class="w"> </span><span class="nx">fields</span><span class="p">,</span>
<span class="w"> </span><span class="nx">transformerMap</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">schemaResponse</span><span class="p">;</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">},</span>
<span class="p">};</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">dependencyMap</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"title"</span><span class="p">],</span>
<span class="w"> </span><span class="nx">price</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"price.currency"</span><span class="p">,</span><span class="w"> </span><span class="s2">"price.amount"</span><span class="p">],</span>
<span class="w"> </span><span class="nx">stock</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"stock_availability"</span><span class="p">],</span>
<span class="p">};</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">transformerMap</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="nx">resp</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">resp</span><span class="p">.</span><span class="nx">title</span><span class="p">,</span>
<span class="w"> </span><span class="nx">price</span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="nx">resp</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">resp</span><span class="p">.</span><span class="nx">currency</span><span class="si">}</span><span class="sb"> </span><span class="si">${</span><span class="nx">resp</span><span class="p">.</span><span class="nx">amount</span><span class="si">}</span><span class="sb">`</span><span class="p">,</span>
<span class="w"> </span><span class="nx">stock</span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="nx">resp</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">resp</span><span class="p">.</span><span class="nx">stock_availability</span><span class="p">,</span>
<span class="p">};</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">getFields</span><span class="p">(</span><span class="nx">info</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">info</span><span class="p">.</span><span class="nx">fieldNodes</span><span class="p">[</span><span class="mf">0</span><span class="p">].</span><span class="nx">selectionSet</span><span class="p">.</span><span class="nx">selections</span><span class="w"> </span><span class="c1">// TODO: handle all field nodes in other fragments</span>
<span class="w"> </span><span class="p">.</span><span class="nx">map</span><span class="p">(</span>
<span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">selection</span><span class="w"> </span><span class="c1">// TODO: handle fragments</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">selection</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span>
<span class="w"> </span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">getBackendFields</span><span class="p">(</span><span class="nx">schemaFields</span><span class="p">,</span><span class="w"> </span><span class="nx">dependencyMap</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Set helps in deduping</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">backendFields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Set</span><span class="p">(</span>
<span class="w"> </span><span class="nx">schemaFields</span>
<span class="w"> </span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">field</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">dependencyMap</span><span class="p">[</span><span class="nx">field</span><span class="p">])</span>
<span class="w"> </span><span class="p">.</span><span class="nx">reduce</span><span class="p">((</span><span class="nx">acc</span><span class="p">,</span><span class="w"> </span><span class="nx">field</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">[...</span><span class="nx">acc</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">field</span><span class="p">],</span><span class="w"> </span><span class="p">[])</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">backendFields</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">async</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">getBackendProducts</span><span class="p">(</span><span class="nx">inputs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">ids</span><span class="p">,</span><span class="w"> </span><span class="nx">mergedFields</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">mergeInputs</span><span class="p">(</span><span class="nx">inputs</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">products</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">fetch</span><span class="p">(</span>
<span class="w"> </span><span class="sb">`/products?ids=</span><span class="si">${</span><span class="nx">ids</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s2">","</span><span class="p">)</span><span class="si">}</span><span class="sb">&fields=</span><span class="si">${</span><span class="nx">mergedFields</span><span class="si">}</span><span class="sb">`</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">products</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">mergeInputs</span><span class="p">(</span><span class="nx">inputs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">ids</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">fields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Set</span><span class="p">();</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">const</span><span class="w"> </span><span class="nx">input</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="nx">inputs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ids</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">input</span><span class="p">.</span><span class="nx">ids</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">const</span><span class="w"> </span><span class="nx">field</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="nx">input</span><span class="p">.</span><span class="nx">fields</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fields</span><span class="p">.</span><span class="nx">add</span><span class="p">(</span><span class="nx">field</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ids</span><span class="p">,</span>
<span class="w"> </span><span class="nx">mergedFields</span><span class="o">:</span><span class="w"> </span><span class="p">[...</span><span class="nx">fields</span><span class="p">].</span><span class="nx">join</span><span class="p">(</span><span class="s2">","</span><span class="p">),</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">getSchemaResponse</span><span class="p">(</span><span class="nx">backendResponse</span><span class="p">,</span><span class="w"> </span><span class="nx">transformerMap</span><span class="p">,</span><span class="w"> </span><span class="nx">schemaFields</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">schemaResponse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">const</span><span class="w"> </span><span class="nx">field</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="nx">schemaFields</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">schemaResponse</span><span class="p">[</span><span class="nx">field</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">transformerMap</span><span class="p">[</span><span class="nx">field</span><span class="p">](</span><span class="nx">backendResponse</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">schemaResponse</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<h2>Conclusion</h2>
<p>All of the code, patterns, and nuances we have seen until now may differ for different applications or different languages. The critical aspect is to leverage the declarative nature of GraphQL and optimize for better user experience at all points throughout the lifecycle of a request.</p>
<p>Field filtering using Dependency Maps and Transformer Maps enables us to handle complexities in optimizing GraphQL servers for performance. Though this looks like a lot of work, at runtime, this outperforms the otherwise unoptimized handling of huge responses from the backend - JSON parsing cost + transfer of bytes + construction time of the response by the backend.</p>
<p>You also have to consider the trade-off of whether such optimizations work for every backend. As the GraphQL schema grows, these solutions scale well. At Zalando's scale, it has proved to be better than transferring a giant unoptimized blob of data.</p>
<hr>
<p>_If you would like to work on similar challenges, consider <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=95c8de231us&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B1%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B2%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B3%5D=Software%20Engineering%20-%20Frontend&filters%5Bcategories%5D%5B4%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Machine%20Learning&filters%5Bcategories%5D%5B7%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B8%5D=Software%20Engineering%20-%20Principal%20Engineering&filters%5Bcategories%5D%5B9%5D=Applied%20Science%20%26%20Research&filters%5Bcategories%5D%5B10%5D=Product%20Design%20%26%20User%20Experience&filters%5Bcategories%5D%5B11%5D=Product%20Management&search=software%20engineer">joining our engineering teams</a>.</p>
<hr>
<h2>Related posts</h2>
<ul>
<li><a href="https://engineering.zalando.com/posts/2021/03/how-we-use-graphql-at-europes-largest-fashion-e-commerce-company.html">Introduction to how we use GraphQL at Zalando</a></li>
<li><a href="https://engineering.zalando.com/posts/2023/10/understanding-graphql-directives-practical-use-cases-zalando.html">Understanding GraphQL Directives: Practical Use-Cases at Zalando</a></li>
<li><a href="https://engineering.zalando.com/posts/2022/02/graphql-persisted-queries-and-schema-stability.html">GraphQL persisted queries and Schema stability</a></li>
<li><a href="https://engineering.zalando.com/posts/2021/04/modeling-errors-in-graphql.html">Modeling Errors in GraphQL</a></li>
</ul>Flexbox Layout Behavior in Jetpack Compose2021-03-16T00:00:00+01:002021-03-16T00:00:00+01:00Andy Dyertag:engineering.zalando.com,2021-03-16:/posts/2021/03/flexbox-layout-behavior-in-jetpack-compose.html<p>Much of the layout behavior defined in the flexbox spec has a direct analog in Jetpack Compose.</p><h3>Introduction</h3>
<p>The <a href="https://drafts.csswg.org/css-flexbox-1/">CSS Flexible Box Layout specification</a> (AKA flexbox) is a useful abstraction for describing layouts in a platform agnostic way. For this reason, it is widely used on the web and even <a href="https://github.com/google/flexbox-layout">on mobile</a>. Readers familiar with <a href="https://developer.android.com/reference/androidx/constraintlayout/widget/ConstraintLayout"><code>ConstraintLayout</code></a> can think of flexbox as conceptually similar to the <a href="https://developer.android.com/reference/androidx/constraintlayout/helper/widget/Flow"><code>Flow</code></a> virtual layout it supports. This type of layout is ideal for grids or other groups of views with varying sizes.</p>
<p>In the <a href="https://play.google.com/store/apps/details?id=de.zalando.mobile">Zalando Fashion</a> <a href="https://apps.apple.com/de/app/zalando-fashion-and-shopping/id585629514">Store apps</a>, we are using flexbox to define the layout of our backend-driven screens, which I <a href="http://andydyer.org/blog/2019/12/22/appcraft-faster-than-a-speeding-release-train/">spoke about previously</a>. Thus far, we have been using <a href="https://github.com/facebook/litho">Litho</a> on Android and <a href="https://github.com/TextureGroup/Texture">Texture</a> on iOS (both of which use the flexbox based <a href="https://github.com/facebook/yoga">Yoga layout engine</a>) for rendering backend driven screens because they support things that are essential when building fully dynamic UI at runtime such as async layout, efficient diffing of changes, and view flattening.</p>
<p>As Google prepares <a href="https://developer.android.com/jetpack/compose">Jetpack Compose</a> (now in beta) for production release, we have started evaluating it as a successor to Litho. Compose offers numerous <a href="https://developer.android.com/reference/kotlin/androidx/compose/foundation/layout/package-summary#top-level-functions">layout composables</a>, many with bits of flexbox like behavior. However, there is no <code>Flexbox</code> composable that does it all and no blog post explaining how flexbox concepts map to Compose, so I wrote this one. I also built <a href="https://github.com/abdyer/flexbox-compose">this sample app</a>, parts of which I will reference in code examples below.</p>
<p>Before we continue, yes, I know technically it's called <em>Compose UI</em> and not simply <em>Compose</em>, but <a href="https://jakewharton.com/a-jetpack-compose-by-any-other-name/">as Jake said</a>, most of us are already thinking of it this way. Insert a "UI" where necessary while reading if you'd like.</p>
<h3>Flex</h3>
<p>Let's start with the flex attributes, which describe the direction, size, and horizontal/vertical alignment of a layout's children.</p>
<h4>Flex Direction</h4>
<p><a href="https://drafts.csswg.org/css-flexbox-1/#flex-direction-property">Flex direction</a> specifies whether items are arranged vertically or horizontally. Compose has <a href="https://developer.android.com/reference/kotlin/androidx/compose/foundation/layout/package-summary#row"><code>Row</code></a> and <a href="https://developer.android.com/reference/kotlin/androidx/compose/foundation/layout/package-summary#column"><code>Column</code></a> composables that work for simple horizontal and vertical layouts.</p>
<div class="highlight"><pre><span></span><code><span class="nd">@Composable</span>
<span class="kd">fun</span><span class="w"> </span><span class="nf">RowExample</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Row</span><span class="p">(</span>
<span class="w"> </span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">fillMaxWidth</span><span class="p">()</span>
<span class="w"> </span><span class="p">.</span><span class="na">padding</span><span class="p">(</span><span class="n">bottom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16.</span><span class="n">dp</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">background</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MaterialTheme</span><span class="p">.</span><span class="na">colors</span><span class="p">.</span><span class="na">primaryVariant</span><span class="p">),</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Child</span><span class="p">()</span>
<span class="w"> </span><span class="n">Child</span><span class="p">()</span>
<span class="w"> </span><span class="n">Child</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>If <a href="https://drafts.csswg.org/css-flexbox-1/#flex-wrap-property">flex wrap</a> behavior is needed to control how items wrap across multiple rows, the <a href="https://developer.android.com/reference/kotlin/androidx/compose/foundation/layout/package-summary#flowrow"><code>FlowRow</code></a> and <a href="https://developer.android.com/reference/kotlin/androidx/compose/foundation/layout/package-summary#flowcolumn"><code>FlowColumn</code></a> composables will do this. However, <a href="https://android-review.googlesource.com/c/platform/frameworks/support/+/1521704">these were deprecated</a> before I even finished writing this article, so the best we can do is use the old implementation as a reference for our own.</p>
<div class="highlight"><pre><span></span><code><span class="nd">@Deprecated</span>
<span class="nd">@Composable</span>
<span class="kd">fun</span><span class="w"> </span><span class="nf">FlowRowExample</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">FlowRow</span><span class="p">(</span>
<span class="w"> </span><span class="n">mainAxisSpacing</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">8.</span><span class="n">dp</span><span class="p">,</span>
<span class="w"> </span><span class="n">crossAxisSpacing</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">8.</span><span class="n">dp</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">repeat</span><span class="p">(</span><span class="m">20</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Child</span><span class="p">(</span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">48.</span><span class="n">dp</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">24.</span><span class="n">dp</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The above code results in the following UI:
<img alt="Flex wrap example" src="https://engineering.zalando.com/posts/2021/03/images/flex-wrap.jpg"></p>
<h4>Flex Grow & Shrink</h4>
<p><a href="https://developer.mozilla.org/en-US/docs/Web/CSS/flex-grow">Flex grow</a> controls how children will expand to fill available space in their parent layout. <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/flex-shrink">Flex shrink</a> is its opposite, controlling how children will shrink relative to siblings if their parent layout does not have room for all of them.</p>
<p>Use the <code>weight()</code> modifier for flex grow behavior. Compose does not really have a flex shrink analog, but with its variety of layout composables, this can be overcome with a different approach in most cases. Depending on your specific needs, one approach could be to use <code>Modifier.preferredWidth(IntrinsicSize.Min)</code> to specify that a composable should not take up any more space than its children require. You can read more about it <a href="https://jetc.dev/slack/2021-01-17-matching-parent-size.html">here</a> in this question reposted from the <a href="https://slack.kotlinlang.org/">kotlinlang Slack</a> in Mr. Mark Murphy's excellent <a href="https://jetc.dev">jetc.dev</a> newsletter.</p>
<div class="highlight"><pre><span></span><code><span class="nd">@Composable</span>
<span class="kd">fun</span><span class="w"> </span><span class="nf">FlexGrowExample</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Row</span><span class="p">(</span>
<span class="w"> </span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">fillMaxWidth</span><span class="p">()</span>
<span class="w"> </span><span class="p">.</span><span class="na">padding</span><span class="p">(</span><span class="n">bottom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16.</span><span class="n">dp</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">background</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MaterialTheme</span><span class="p">.</span><span class="na">colors</span><span class="p">.</span><span class="na">primaryVariant</span><span class="p">),</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">FlexChild</span><span class="p">(</span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">weight</span><span class="p">(</span><span class="m">1F</span><span class="p">))</span>
<span class="w"> </span><span class="n">FlexChild</span><span class="p">(</span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">weight</span><span class="p">(</span><span class="m">2F</span><span class="p">))</span>
<span class="w"> </span><span class="n">FlexChild</span><span class="p">(</span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">weight</span><span class="p">(</span><span class="m">1F</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The above code results in the following UI:
<img alt="Flex grow example" src="https://engineering.zalando.com/posts/2021/03/images/flex-grow.jpg"></p>
<p>When the utmost flexibility is needed, there's always <a href="https://developer.android.com/codelabs/jetpack-compose-layouts#5">implementing your own</a> <code>Layout</code> composable or the raw power of the <a href="https://developer.android.com/jetpack/compose/layout#contraintlayout">ConstraintLayout composable</a>, which can be used directly from Compose. If you don't mind reading Java instead of Kotlin, the implementation in Google's <a href="https://github.com/google/flexbox-layout/blob/master/flexbox/src/main/java/com/google/android/flexbox/FlexboxHelper.java"><code>flexbox-layout</code> library</a> is a good starting point for understanding the algorithm.</p>
<h3>Alignment</h3>
<p>Alignment controls how items are arranged on their vertical and horizontal axes. This can be done on a parent layout with the <code>*-content</code> properties or on the children themselves using the <code>*-self</code> properties.</p>
<h4>Main Axis</h4>
<p>Main axis alignment refers to how children are aligned on the main axis of their parent; horizontal for rows and vertical for columns. In the flexbox spec, this is known as <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/justify-content"><code>justify-content</code></a>. In Compose, main axis alignment is controlled by the the <code>horizontalArrangement</code> parameter passed to <code>Row</code> and the <code>verticalArrangement</code> parameter passed to <code>Column</code>. Both include options such as start/end, center, and space around/between/evenly for possible values.</p>
<div class="highlight"><pre><span></span><code><span class="nd">@Composable</span>
<span class="kd">fun</span><span class="w"> </span><span class="nf">ArrangementExample</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Row</span><span class="p">(</span>
<span class="w"> </span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">fillMaxWidth</span><span class="p">()</span>
<span class="w"> </span><span class="p">.</span><span class="na">padding</span><span class="p">(</span><span class="n">bottom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16.</span><span class="n">dp</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">background</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MaterialTheme</span><span class="p">.</span><span class="na">colors</span><span class="p">.</span><span class="na">primaryVariant</span><span class="p">),</span>
<span class="w"> </span><span class="n">horizontalArrangement</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Arrangement</span><span class="p">.</span><span class="na">SpaceBetween</span><span class="p">,</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Child</span><span class="p">()</span>
<span class="w"> </span><span class="n">Child</span><span class="p">()</span>
<span class="w"> </span><span class="n">Child</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The above code results in the following UI:
<img alt="Arrangement example" src="https://engineering.zalando.com/posts/2021/03/images/space-between.jpg"></p>
<h4>Cross Axis</h4>
<p>Cross axis alignment refers to how children are aligned on the non-main axis of their parent; vertical for rows and horizontal for columns. In the flexbox spec, <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/align-items"><code>align-items</code></a> and <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/align-content"><code>align-content</code></a> control layout children while <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/align-self"><code>align-self</code></a> allows children to do so themselves. In Compose, cross axis alignment is controlled by the <code>verticalAlignment</code> parameter passed to <code>Row</code>, the <code>horizontalAlignment</code> parameter passed to <code>Column</code>, and the <code>align</code> modifier on their child composables. Both include options start, end, and center for possible values.</p>
<div class="highlight"><pre><span></span><code><span class="nd">@Composable</span>
<span class="kd">fun</span><span class="w"> </span><span class="nf">AlignmentExample</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Row</span><span class="p">(</span>
<span class="w"> </span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">fillMaxWidth</span><span class="p">()</span>
<span class="w"> </span><span class="p">.</span><span class="na">height</span><span class="p">(</span><span class="m">150.</span><span class="n">dp</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">padding</span><span class="p">(</span><span class="n">bottom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16.</span><span class="n">dp</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">background</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MaterialTheme</span><span class="p">.</span><span class="na">colors</span><span class="p">.</span><span class="na">primaryVariant</span><span class="p">),</span>
<span class="w"> </span><span class="n">verticalAlignment</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Alignment</span><span class="p">.</span><span class="na">CenterVertically</span><span class="p">,</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Child</span><span class="p">()</span>
<span class="w"> </span><span class="n">Child</span><span class="p">()</span>
<span class="w"> </span><span class="n">Child</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The above code results in the following UI:
<img alt="Alignment example" src="https://engineering.zalando.com/posts/2021/03/images/alignment.jpg"></p>
<p>You may have noticed that the space around/between/evenly options from <code>justify-content</code> are not listed for the cross axis. This is because there is no cross axis space around/between alignment in Compose. However, the resulting layout could be achieved via other composable combinations.</p>
<p>Flexbox also specifies a <code>stretch</code> option for cross axis alignment. In Compose, the <code>stretch</code> equivalent would be individual children using the <code>fillMaxSize()</code>/<code>fillMaxWidth()</code>/<code>fillMaxHeight()</code> modifiers.</p>
<h3>Layout</h3>
<p>Finally, let's look at a few other attributes that affect a view's size and position.</p>
<h4>Aspect Ratio</h4>
<p>Compose's <code>aspectRatio()</code> modifier works exactly as you'd expect. It takes a float representing the desired ratio and uses that value to determine the size in the unspecified layout direction (width or height).</p>
<p>For example, specifying <code>fillMaxWidth()</code> and <code>aspectRatio(16F / 9F)</code> results in a rectangle that fills the width of the screen with a height corresponding to 9/16 of that width.</p>
<div class="highlight"><pre><span></span><code><span class="nd">@Composable</span>
<span class="kd">fun</span><span class="w"> </span><span class="nf">AspectRatioExample</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Box</span><span class="p">(</span>
<span class="w"> </span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">padding</span><span class="p">(</span><span class="n">bottom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16.</span><span class="n">dp</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">background</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MaterialTheme</span><span class="p">.</span><span class="na">colors</span><span class="p">.</span><span class="na">secondary</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">fillMaxWidth</span><span class="p">()</span>
<span class="w"> </span><span class="p">.</span><span class="na">aspectRatio</span><span class="p">(</span><span class="m">16F</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">9F</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">border</span><span class="p">(</span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2.</span><span class="n">dp</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MaterialTheme</span><span class="p">.</span><span class="na">colors</span><span class="p">.</span><span class="na">secondaryVariant</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>The above code results in the following UI:
<img alt="Aspect ratio example" src="https://engineering.zalando.com/posts/2021/03/images/aspect-ratio.jpg"></p>
<h4>Padding & Margins</h4>
<p>Compose has a <code>padding()</code> modifier, but none for margins. Margins can be considered extra padding, so a single value can be used.</p>
<h4>Absolute Position</h4>
<p>When absolute positioning is needed to place one composable on top of another, the <a href="https://developer.android.com/reference/kotlin/androidx/compose/foundation/layout/package-summary#box"><code>Box</code></a> composable can be used. <code>Box</code> children can use the <code>align()</code> modifier to specify where they are aligned within the box including top start/center/end, bottom start/center/end, and center start/end.</p>
<div class="highlight"><pre><span></span><code><span class="nd">@Composable</span>
<span class="kd">fun</span><span class="w"> </span><span class="nf">AbsolutePositionExample</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Box</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Box</span><span class="p">(</span>
<span class="w"> </span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">fillMaxWidth</span><span class="p">()</span>
<span class="w"> </span><span class="p">.</span><span class="na">height</span><span class="p">(</span><span class="m">240.</span><span class="n">dp</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">background</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MaterialTheme</span><span class="p">.</span><span class="na">colors</span><span class="p">.</span><span class="na">primaryVariant</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="n">Child</span><span class="p">(</span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">align</span><span class="p">(</span><span class="n">Alignment</span><span class="p">.</span><span class="na">TopStart</span><span class="p">))</span>
<span class="w"> </span><span class="n">Child</span><span class="p">(</span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">align</span><span class="p">(</span><span class="n">Alignment</span><span class="p">.</span><span class="na">TopEnd</span><span class="p">))</span>
<span class="w"> </span><span class="n">Child</span><span class="p">(</span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">align</span><span class="p">(</span><span class="n">Alignment</span><span class="p">.</span><span class="na">BottomStart</span><span class="p">))</span>
<span class="w"> </span><span class="n">Child</span><span class="p">(</span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">align</span><span class="p">(</span><span class="n">Alignment</span><span class="p">.</span><span class="na">BottomEnd</span><span class="p">))</span>
<span class="w"> </span><span class="n">Child</span><span class="p">(</span><span class="n">modifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Modifier</span><span class="p">.</span><span class="na">align</span><span class="p">(</span><span class="n">Alignment</span><span class="p">.</span><span class="na">Center</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The above code results in the following UI:
<img alt="Absolute position example" src="https://engineering.zalando.com/posts/2021/03/images/absolute-position.jpg"></p>
<h3>Conclusion</h3>
<p>In this article, we have seen how much of the layout behavior defined in the flexbox spec has a direct analog in Compose and a few places where we have to do a bit more work to approximate certain concepts. Please see <a href="https://github.com/abdyer/flexbox-compose">the sample app repo</a> for the code as well as my first attempt at working with the <a href="https://developer.android.com/jetpack/compose/navigation">Compose Navigation library</a>.</p>
<p>During our recent Hack Week, we had a chance to spend more time with Compose. We were impressed with how easy it was to get started and managed to build a fairly performant Compose powered implementation of our home screen. For a beta, it's quite promising!</p>
<p>Thanks for reading!</p>
<p><em>We're hiring! Consider joining <a href="https://jobs.zalando.com/en/tech/client-engineering">our teams</a> who work on the Zalando Mobile App and build user experiences for our customers.</em></p>Micro Frontends: from Fragments to Renderers (Part 1)2021-03-11T00:00:00+01:002021-03-11T00:00:00+01:00Jan Brockmeyertag:engineering.zalando.com,2021-03-11:/posts/2021/03/micro-frontends-part1.html<p>Moving beyond Project Mosaic. Get an insight into the declarative view composition framework that powers new features for Zalando's website.</p><p>In 2015, we wanted to improve how we delivered features to customers and move away from a monolithic shop system. <a href="https://www.mosaic9.org/">Project Mosaic</a> and its microservices approach for the frontend were vital to support this transition. Mosaic enabled a relatively large number of teams to work on the main Zalando website <a href="https://engineering.zalando.com/posts/2015/10/from-jimmy-to-microservices-rebuilding-zalandos-fashion-store.html">independently and without performance compromises</a>. At its core, Mosaic architecture relies on page Fragments, which are owned by different teams.</p>
<p>Mosaic helped us deliver features quickly and experiment at scale, contributing to Zalando’s growth, but we <a href="https://engineering.zalando.com/posts/2018/12/front-end-micro-services.html">identified limitations to the Fragments approach</a>. The main pain points for Zalando at that time were:</p>
<ul>
<li>Differences in tech stacks, bundling, and deployment practices across fragments led to inconsistent user experience and cross-team collaboration difficulties</li>
<li>A high entry barrier for teams contributing to the customer experience. To be able to add new features to the website, engineers had to<ul>
<li>build and operate their fragments (usually frontend and backend services)</li>
<li>discover and integrate with all the data sources</li>
<li>re-implement or adapt the UI</li>
<li>re-implement or adjust tracking & A/B testing</li>
</ul>
</li>
</ul>
<p>In 2018, we started designing Interface Framework (IF) to overcome these issues. The new transition’s key goal was to build a platform that unified the tech stack and centralized the deployment and operation process for various parts of the Zalando website. It would enable a fully personalized customer experience, and guarantee overall UX consistency based on a new design language.</p>
<p>Now, we'd like to give you an update on our approach to frontend development in the form of a blog series. The first part covers the key features of the new framework and provides an overview of its architecture.</p>
<h2>Why Interface Framework</h2>
<h3>Consistent Entity Data</h3>
<p>We identified a reasonably small amount of content pieces in use by Zalando that can be visualized or catered for personalization purposes. For example, a Product, a Collection, or an Outfit. When organized in tree-like structures, they can be used to define layouts and content of the Zalando core user journey pages. When used individually, they can be the common language used across microservices to exchange data.</p>
<p>We call them Entities. Each Entity has a type and a unique id.</p>
<h3>Dynamic View & Content Composition</h3>
<p>Interface Framework supports dynamic composition of the user interface. It composes a page by forming a tree of nested Entities and transforming it into a tree of matching Renderers. The mapping of Entities to Renderers is specified in a declarative set of layout rules, which we call rendering rules. A Renderer is responsible for visualizing data related to an Entity.</p>
<p>Let's assume we are presenting a product page with some slots below the article to show additional content. Our personalization service chooses to provide three pieces of content: a collection, an outfit, and another collection. It determines what content the customers see on the page.</p>
<p>The Rendering Engine then decides to visualize the collection as a carousel, outfit as a card component, and the third collection as another carousel. It is responsible for how the content gets rendered to the customers.</p>
<h3>Integrated Monitoring</h3>
<p>Interface Framework automatically connects all views to the internal monitoring tools, ensuring that only the unified, user consent compliant, and thoroughly tested implementation is used. It helps to prevent incidents and disruptions in business reporting and personalization.</p>
<h3>Orchestrated A/B Testing</h3>
<p>A/B tests can now run in an orchestrated way to compare the results and make informed choices. This ensures features are tested with a representative user base, using standardized A/B testing scenarios and KPIs to ease comparison between features. Defining and setting up global A/B tests also means reducing the overhead of doing it for every page.</p>
<p>The integration of <a href="https://engineering.zalando.com/posts/2021/01/experimentation-platform-part1.html">Zalando’s A/B testing platform</a> in IF allows us to:</p>
<ul>
<li>Implement experiments with only a few lines of code, and get the implementation automatically validated</li>
<li>Track experiments automatically without additional efforts to analyze the results</li>
<li>Continue managing experiments via the intuitive A/B testing platform UI</li>
<li>Keep experiment latency overhead low by batching all requests to the A/B testing platform for all Renderers</li>
</ul>
<h3>Integrated Testing for Developers</h3>
<p>As Interface Framework provides a single integration point where all code is developed and deployed, we give developers access to deployment previews, which allow any open pull request to be previewed in an environment very close to production. This setup is different from the traditional staging approach. The preview deployment is connected to production endpoints and follows 100% production routing while ensuring that only authenticated developers can access it.</p>
<h3>Consistent UX Design</h3>
<p>All pages running on Interface Framework, the look & feel, accessibility features, and actual components used are all defined by a design system. Our server-side rendering framework, which we call Rendering Engine, takes over the complexity of component version management and optimizes client code bundle size.</p>
<h3>Page Performance Quality Gates</h3>
<p>We evaluated best practices from CI/CD pipelines for Fragments from various teams and combined them to measure the performance for pages served by Interface Framework. We do support the following tools:</p>
<ul>
<li><strong><a href="https://github.com/GoogleChrome/lighthouse-ci">Lighthouse CI</a>:</strong> a tool to automatically run performance and accessibility tests for specific pages. It validates assertions with results and decides whether the current score is good enough for production.</li>
<li><strong>Bundle Size Limits:</strong> we have a tool to automatically compute and check bundle sizes for Renderers on every pull request. It shows the results for all Renderers that have changed with the current version being released.</li>
<li><strong>Client Metrics:</strong> we provide a built-in layer to report <a href="https://web.dev/vitals/">Web Vitals</a> and custom metrics to capture all Zalando pages’ user experience.</li>
</ul>
<h3>Increased Organizational Speed and Efficiency</h3>
<p>We are still organized around feature teams which have frontend engineers embedded. The main difference is that now they are working in a monolithic repository providing a unified and automated environment that offers new joiners a quick onboarding. The teams develop features and UI elements within Renderers. These Renderers are associated with Entities that make up our new page semantic.</p>
<p>There is quite a cultural shift as some ownership lines are now blurred in Renderers, with multiple teams contributing to most of them. As a result, we now have a much more collaborative development environment where teams benefit from their best practices. A centralized repository also means it is easier to ship large project changes and contribute to other teams' code, supported by a set of contribution guidelines.</p>
<p>We now have an aligned set of modern frontend technologies (React, TypeScript, GraphQL), a centralized server infrastructure, a release process, and a robust set of monitoring capabilities with dashboards and alerts. We are more efficient in terms of operations, and new reliability patterns immediately impact the whole website.</p>
<h2>Architecture Overview</h2>
<p>The following chart gives an overview of the underlying architecture. It contains all the core components of Interface Framework.</p>
<p><img alt="Interface Framework's Architecture" src="https://engineering.zalando.com/posts/2021/03/images/architecture_if.png"></p>
<p>The <a href="https://engineering.zalando.com/posts/2021/03/how-we-use-graphql-at-europes-largest-fashion-e-commerce-company.html"><strong>GraphQL API</strong></a> is a data aggregation layer. It is to become the primary data source for all web pages at Zalando and reduce data integration costs across many teams. It provides a unified way for accessing content as an output of personalization services like the Recommendation System.</p>
<p>The <strong>Rendering Engine</strong> is a backend service and client-side runtime running in Node.js and the browser. Its primary purpose is to resolve and render a tree of Entities for a given request. The Recommendation System controls the structure of this tree.</p>
<p>A <strong>Renderer</strong> is a self-contained, reusable piece of code that runs inside the Rendering Engine. It declaratively specifies all of its data dependencies via GraphQL and uses the Zalando Design System to represent a single Entity visually.</p>
<p>The mapping of Entities to Renderers is one-to-many since different visual representations are possible for an Entity. An outfit Entity, for example, can be represented as a main view or a card component within a collection. Each Renderer, on the other hand, corresponds to one specific Entity type.</p>
<p>We do support a hybrid approach with Interface Framework. The Rendering Engine can serve views in different configurations:</p>
<ul>
<li>View is a Mosaic Template and only uses Fragments.</li>
<li>View contains both Renderers and Fragments.</li>
<li>View only consists of Renderers.</li>
</ul>
<p>This support for both rendering modes was and is still very beneficial for teams migrating their page from Mosaic to IF. Currently, we serve around 90% of traffic via Interface Framework.</p>
<h2>Future Posts</h2>
<p>In upcoming posts, we will dive deeper into the framework’s core components and share what we have learned during the transition from Mosaic to Interface Framework.</p>
<ul>
<li><a href="https://engineering.zalando.com/posts/2021/09/micro-frontends-part2.html">Part 2: Deep Dive into Rendering Engine</a></li>
</ul>
<p><strong>Update 2023/07</strong>: See <a href="https://engineering.zalando.com/posts/2023/07/rendering-engine-tales-road-to-concurrent-react.html">Rendering Engine Tales: Road to Concurrent React</a> for an update on Rendering Engine and how we integrated React Concurrent features as part of our upgrade to React 18.</p>
<hr>
<p><em>We always look for talented Engineers to join Zalando as a <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&search=frontend&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Frontend">Frontend Engineer</a>!</em></p>How we use GraphQL at Europe's largest fashion e-commerce company2021-03-04T00:00:00+01:002021-03-04T00:00:00+01:00Aditya Pratap Singhtag:engineering.zalando.com,2021-03-04:/posts/2021/03/how-we-use-graphql-at-europes-largest-fashion-e-commerce-company.html<p>Managing consistent and backwards-compatible APIs for Web and mobile App frontends is always a complex task in the long-term. At Zalando, we have used GraphQL to solve some of the common problems of frontend data requirements while gaining speed of delivery in a large and quickly growing organisation. This article is about <strong>GraphQL as Unified-Backend-For-Frontend (UBFF)</strong> application and first in a series of posts about problems we solved with our use of GraphQL at Zalando.</p><p><img alt="GraphQL logo" src="https://engineering.zalando.com/posts/2021/03/images/graphql.png#previewimage"></p>
<h2>Background</h2>
<p>Today's large scale organizations leveraging microservice architecture face a plethora of problems at the data aggregation and presentation layers. Managing consistent and backwards-compatible APIs for Web and Mobile App frontends is definitely one of the complex ones. The balance between a frontend developer's need for consistent data source and of product managers for delivering new features quickly in a fast-paced, large organization is a tough nut to crack. It is very common for frontend developers to struggle finding the right backend service to deliver a given feature.</p>
<p>The <a href="https://samnewman.io/patterns/architectural/bff/">Backend-for-frontend (BFF)</a> concept is a pattern pioneered by Soundcloud wherein a backend application is created for every business and frontend specific use case. With our adoption of microservices at Zalando in 2015, we used this pattern to create a large number of BFFs for Web Product details page, Web wishlist page, Mobile app wishlist view, Mobile app home view and so on. The BFF is very similar to Netflix’s approach of <a href="https://netflixtechblog.com/embracing-the-differences-inside-the-netflix-api-redesign-15fd8b3dc49d">Embracing the Differences</a> which pointed out 4 key characteristics for APIs serving frontend applications:</p>
<ul>
<li>Embrace the differences of the devices</li>
<li>Separate content gathering from content formatting/delivery</li>
<li>Redefine the border between “Client” and “Server”</li>
<li>Distribute innovation</li>
</ul>
<p>While these two approaches addressed most of these concerns of frontend development, they also introduced other issues for a large organisation like Zalando:</p>
<ol>
<li>Lack of optimal balance between fast feature delivery and developer experience</li>
<li>Duplication of efforts due to the large number of Backend-for-Frontend microservices</li>
<li>Inconsistent experience for Zalando customers across platforms</li>
<li>Fragmented handling of <code>Security</code> and <code>Authentication</code> concerns</li>
<li>Fragmented <code>Observability</code> implementations</li>
</ol>
<p>Out of the above problems, <em>Inconsistent experience for Zalando customers across platforms</em> is a subtle one to understand and is more evident when the same business logic and aggregation is done in multiple ways in multiple backends leading to broken customer experiences. This is a classic example of <a href="https://en.wikipedia.org/wiki/Conway's_law">Conway's law</a> which in this case may ignore the User's point of view of different user experiences in their interaction with different frontend applications for the same organization.</p>
<p>The diagram below shows the inconsistency issue that is not uncommon across different user interfaces for the same application if served via multiple backends. In the mobile app the delivery date range for an article on Zalando is <strong>5-9 Feb</strong> whereas in the desktop version it’s <strong>1-3 Feb</strong>. Even though this particular example is hypothetical, we have seen such inconsistent data bugs at Zalando in the past due to the different BFFs having fragmented logic across different services.</p>
<p><img alt="Inconsistent data across desktop and mobile" src="https://engineering.zalando.com/posts/2021/03/images/inconsistent-data.png"></p>
<p>All the above problems at large scale become exponentially hard. We observed this also at Zalando and used our <em>Unified Backend-For-Frontend</em> graph of <code>Entities</code> approach to address most of these concerns.</p>
<h2>Our setup</h2>
<p><code>GraphQL</code> is a query language developed by Facebook to enable declarative data fetching. The users of the API declaratively specify the shape of the data requirement via the query and response structure they expect.</p>
<p>For example, in order to fetch the name of the example product mentioned above you can query it as:</p>
<p><img alt="graphql query" src="https://engineering.zalando.com/posts/2021/03/images/graphql-query.png"></p>
<p>From the <a href="https://spec.graphql.org/June2018/#sec-Overview">GraphQL specification design principles</a>, GraphQL was created with business requirements and hierarchical views in modern applications in mind:</p>
<blockquote>
<p><strong>Hierarchical</strong>: GraphQL specification recommends the language to be structured in hierarchy to be well suited for Hierarchical Views in modern frontend applications</p>
<p><strong>Product-centric</strong>: The evolution of a GraphQL schema is directly influenced by the product/business features being developed by frontend engineers</p>
</blockquote>
<p>These are the two main principles we have kept in mind at Zalando while building a single <strong>GraphQL API</strong> as a <strong>Unified Backend-For-Frontends (UBFFs)</strong> for all Web and mobile App frontend feature teams. We use a monorepo which has a shared ownership across 12+ domain teams using a set of contribution principles. This is similar to the <strong>one unified graph</strong> concept highlighted in <a href="https://principledgraphql.com/integrity#1-one-graph">Principled GraphQL</a>.</p>
<p>We use an <code>Entity</code> system where entities are the first-class citizens in the graph with our custom implementation of GraphQL specification (<a href="https://github.com/zalando-incubator/graphql-jit">graphql-jit</a>) for performance optimization. The Entities themselves represent content and domain models spread across the Zalando shop e.g. <code>Product</code>, <code>Campaign</code> (elaborating the Entity model will be its own post in the series). The overall application data flow looks like this.</p>
<p><img alt="Architecture and data flow across desktop and mobile" src="https://engineering.zalando.com/posts/2021/03/images/architecture.png"></p>
<p>We started with the GraphQL solution at Zalando in the first half of 2018 and have had the service in production since the end of 2018. The unified GraphQL schema has grown significantly in the last 2 years to a dense graph now with more than 12 domains and serves more than 80% of Web and 50% of the App use cases (as of February 2021).</p>
<h2>Advantages</h2>
<p>With our implementation of GraphQL running in production for the last 2 years at Zalando, we addressed most of the aforementioned concerns and observed multiple advantages including:</p>
<ul>
<li>Improved efficiency for developers to find and access data in one place as opposed to finding and integrating with the individual APIs.</li>
<li>Improved developer experience via GraphQL tools such as explorer with live assortment data.</li>
<li>Faster deployments leading to shipping features faster, leading to happy product managers.</li>
<li>Consistent customer experience across platforms with a single consistent data source for frontends.</li>
<li>Reduced duplication of effort to develop the same feature across platforms.</li>
<li>Easy to enforce governance and organisational best practices.</li>
<li>The GraphQL layer has a "No Business Logic" principle, which allows domain specific backend APIs to steer domain or platform (Web vs. App) specific content on their own.</li>
</ul>
<h2>Known concerns and challenges</h2>
<h3><a href="https://samnewman.io/patterns/architectural/bff/#reuse">Code reuse leading to bloated code base</a></h3>
<p>Our approach with GraphQL has been to avoid any platform or domain specific logic in the GraphQL layer and instead let the domain specific teams drive this via presentation layer backend services. This allows us to keep a business logic agnostic data-aggregation layer which serves frontend developers and also helps in operational maintenance.</p>
<p><img alt="Presentation layer ensuring business logic agnostic graph" src="https://engineering.zalando.com/posts/2021/03/images/responsibilities-architecture.png"></p>
<h3>Adoption and learning curve</h3>
<p>Given GraphQL was a new technology for our teams, it involved investment in terms of learning curve and adoption. We addressed the adoption using some common mechanisms:</p>
<ol>
<li><strong>One-stop-shop Documentation</strong>: We use a single <a href="https://documentation.divio.com/">structured documentation</a> with embedded GraphQL editor, schema documentation, Voyager for schema exploration, practice exercises to allow our new users to adopt GraphQL.</li>
<li><strong>Support chat</strong>: Just like any platform team we also provide support channel for any queries from users and contributors of the GraphQL service.</li>
<li><strong>Trainings</strong>: Given that GraphQL is new at Zalando, we conducted GraphQL adoption training with 150+ developers participating to learn about using GraphQL at Zalando. The training had a broad impact on a large population of developers intending to switch to GraphQL.</li>
<li><strong>Consultation</strong>: GraphQL schema design is always a tricky topic even for frontend developers who can use GraphQL. In order to ensure a single, dense, unified graph, our team also provided consultation for all new domains being integrated into the Unified graph.</li>
</ol>
<p>These four measures have resulted in increasing the number of contributors to our monorepo from 50 to 150+ in 2020 and developers using GraphQL for feature development from 70 to 200.</p>
<h3><a href="http://www.designsmells.com/articles/does-your-architecture-smell/">God Component</a></h3>
<p>God component is a design smell when a component is excessively large either in the terms of LOC or number of classes. We have a monorepo for the unified GraphQL service which makes it a potential architectural and operational risk. We address the architectural risk by shared ownership mechanism at Zalando, guided by a set of contribution principles. For the operational risk, we observe and address most issues by Reliability Patterns such as <code>Circuit breakers</code>, <code>Timeouts</code> and <code>Retry</code> patterns. We also introduced <a href="https://docs.microsoft.com/en-us/azure/architecture/patterns/bulkhead">Bulkhead pattern</a> to provide more Fault tolerance and isolation by deploying the application to serve traffic per platform (separate deployments for Web and mobile Apps).</p>
<h2>Related work on Unified GraphQL</h2>
<p>Unified Graph is a known concept which is being adopted by a lot of large organisations. Below is a list of some of the large organisations using unified GraphQL in production:</p>
<ol>
<li>
<p>Github has a <a href="https://docs.github.com/en/graphql">GraphQL implementation with a single graph</a> of all the domains including repos, users, marketplace etc. in it.</p>
</li>
<li>
<p>Shopify has a single GraphQL implementations for its <a href="https://shopify.dev/docs/storefront-api/reference">StoreFront</a> (customer facing) and <a href="https://shopify.dev/docs/admin-api/graphql/reference">Admin</a> (merchant facing) APIs where they allow customers and partners to build experiences using the unified graphs for each of those.</p>
</li>
<li>
<p>AirBnB has been working on creating a Unified Schema for GraphQL solution, which they shared during the <a href="https://www.youtube.com/embed/pywcFELoU8E">GraphQL Summit 2019 talk</a>.</p>
</li>
<li>
<p>Expedia moved from a REST specific service to a <a href="https://www.apollographql.com/customers/expediagroup/">Central data graph using GraphQL</a> to solve their problems of using REST endpoints where developers were spending more time to figure out which service to call than to develop features.</p>
</li>
<li>
<p><a href="https://www.apollographql.com/docs/apollo-server/federation/introduction/">Apollo Federation</a> is Apollo's solution for providing single data Graph over multiple Graphs across an organization. The difference between the Unified Graph we have at Zalando and Apollo's federation is that instead of having multiple Graphs connected via a library and gateway we have a single service at Zalando which connects all the domains in a single schema Graph. This has tradeoffs which we have addressed as mentioned <a href="#god-component">here</a>, since we gain by keeping a single Graph in terms of tooling, deployment and governance.</p>
</li>
<li>
<p>Netflix also has its own version of one-graph that they use in the Netflix Studio ecosystem and elaborated the setup in <a href="https://netflixtechblog.com/how-netflix-scales-its-api-with-graphql-federation-part-1-ae3557c187e2">this blog post series</a>.</p>
</li>
</ol>
<h2>Conclusion and next steps</h2>
<p>The Unified Backend-For-Frontend (UBFF) GraphQL is not a silver bullet, but is a tradeoff which has worked well for our frontend data fetching problems at Zalando. In the next few articles in this series we will cover other aspects of our usage of GraphQL at Zalando in context of <em>Observability</em>, <em>Performance Optimization</em>, <em>Security</em>, <em>Tooling</em>, <em>Errors</em> etc. which allowed us to scale the adoption of the service to 200+ Web and App developers and serve the use cases of more than 25-30 feature teams.</p>
<h2>References</h2>
<ul>
<li><a href="https://samnewman.io/patterns/architectural/bff/">Backend For Frontend Pattern by Sam Newman</a></li>
<li><a href="https://netflixtechblog.com/embracing-the-differences-inside-the-netflix-api-redesign-15fd8b3dc49d">Netflix API redesign</a></li>
<li><a href="http://spec.graphql.org/draft">GraphQL spec</a></li>
<li><a href="https://martinfowler.com/bliki/CircuitBreaker.html">Circuit Breaker pattern</a></li>
<li><a href="https://martinfowler.com/bliki/CircuitBreaker.html">Bulkhead pattern</a></li>
<li><a href="https://netflixtechblog.com/how-netflix-scales-its-api-with-graphql-federation-part-1-ae3557c187e2">Netflix GraphQL Federation approach</a></li>
</ul>
<p><em>If you would like to work on similar challenges and help scale our approach to developing web and mobile clients, consider joining <a href="https://jobs.zalando.com/en/tech/client-engineering">our client engineering teams</a>.</em></p>Building an End to End load test automation system on top of Kubernetes2021-03-02T00:00:00+01:002021-03-02T00:00:00+01:00Amila Kumaranayakatag:engineering.zalando.com,2021-03-02:/posts/2021/03/building-an-end-to-end-load-test-automation-system-on-top-of-kubernetes.html<p>Learn how we built an end-to-end load test automation system to make load tests a routine task.</p><h2>Introduction</h2>
<p>At Zalando we continuously invent new ways for customers to interact with fashion. In order to provide an excellent customer experience, we must ensure our systems can technically handle high traffic events such as Cyber Week or other sales campaigns. We have published a <a href="https://engineering.zalando.com/posts/2020/10/how-zalando-prepares-for-cyber-week.html">detailed article</a> on how Zalando prepares for the Cyberweek. Checkout and payments related systems are particularly important during sales events. As we continuously evolve our systems and add new features to optimize the customer experience, it is cumbersome and expensive to manually test our systems capability to handle high traffic.</p>
<p>Our department is responsible for payments processing systems of Zalando, these systems must maintain high availability and reliability. We set out to build an automated end-to-end load testing system capable of simulating real user behaviour across the whole system composed of microservices in order to achieve high stability in our systems. This testing system automatically steers generated traffic based on a dynamically adjusted orders per minute configuration. In order to really push our services to the edge, we wanted to run the load testing system in our test cluster, as this enables us to break things when necessary without causing customer impact. These tests can then be conveniently managed and triggered by our team and serve as the first quality gate of the Payment system.
As part of the Cyber Week preparation, we formed a dedicated project team tasked with making our vision come to life.</p>
<p>To summarize, we wanted to build a load testing tool with the following features:</p>
<ul>
<li>Automatic load test execution based on a schedule.</li>
<li>Simple API through which developers can manually trigger a load test.</li>
<li>Load test tool to be ran in our test environment, that scales our Kubernetes services and Amazon ECS<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>(Elastic Container Service) environment up to our production configuration and then execute load tests.</li>
<li>Automated alarms if a load test causes SLO (<a href="https://sre.google/sre-book/service-level-objectives/">Service Level Objective</a>) breaches.</li>
<li>The generated load test traffic must imitate our customer's checkout flow.</li>
</ul>
<p>The diagram below illustrates how the testing system (NodePool A) and our Payment platform (NodePool B and ECS) is deployed:
<img alt="Load Test Flow" src="https://engineering.zalando.com/posts/2021/03/images/loadtestconductor-flow.png"></p>
<h2>Traffic generation</h2>
<p>Our first step was to select a load testing framework. We considered multiple options such as Locust, Vegeta and JMeter. This was filtered down to <a href="https://locust.io/">Locust</a> and <a href="https://github.com/tsenart/vegeta">Vegeta</a> due to <a href="https://jmeter.apache.org/">JMeter</a> not being popular internally. We chose Locust as it was more popular within our development teams, thus the test suite would be easier to maintain. We have also <a href="https://engineering.zalando.com/posts/2019/04/end-to-end-load-testing-zalandos-production-website.html">blogged before</a> on how we leveraged Locust in prior preparations for sales events.</p>
<p>Locust works both in standalone and distributed mode. It operates a controller with multiple workers in distributed mode. In order to generate higher loads a distributed setup is required to overcome resource limitations. We created locust scripts covering multiple business processes mimicking real world traffic patterns to our services. These scripts were then packaged as a docker container and deployed as a distributed locust system.</p>
<h2>Mock External Dependencies</h2>
<p>When we defined the scope of the load tests we all agreed we would only focus on testing internal service components and did not want to involve external dependencies for routine tests. Therefore we decided to mock these dependencies.</p>
<p>The table below compares a variety of tools that can be used to implement mocks.</p>
<table>
<thead>
<tr>
<th></th>
<th style="text-align: center;">Mobtest</th>
<th style="text-align: center;">Wiremock</th>
<th style="text-align: center;">Mockserver</th>
<th style="text-align: center;">Mokoon</th>
<th style="text-align: center;">Hoverfly</th>
</tr>
</thead>
<tbody>
<tr>
<td>Language</td>
<td style="text-align: center;">Javascript</td>
<td style="text-align: center;">Java</td>
<td style="text-align: center;">Java</td>
<td style="text-align: center;">Javascript</td>
<td style="text-align: center;">Golang</td>
</tr>
<tr>
<td>Github star/fork</td>
<td style="text-align: center;">1289/173</td>
<td style="text-align: center;">3453/934</td>
<td style="text-align: center;">2280/616</td>
<td style="text-align: center;">1402/63</td>
<td style="text-align: center;">1468/131</td>
</tr>
<tr>
<td>Config (API, route, ...)</td>
<td style="text-align: center;">Json config</td>
<td style="text-align: center;">Json</td>
<td style="text-align: center;">Js config</td>
<td style="text-align: center;">Js config</td>
<td style="text-align: center;">Json</td>
</tr>
<tr>
<td>Latency simulation</td>
<td style="text-align: center;">Fixed</td>
<td style="text-align: center;">Fixed / Random</td>
<td style="text-align: center;">Fixed</td>
<td style="text-align: center;">Fixed</td>
<td style="text-align: center;">Fixed / Random</td>
</tr>
<tr>
<td>Fault simulation</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
</tr>
<tr>
<td>Stateful behaviour</td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">State machine</td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">key-value map</td>
</tr>
<tr>
<td>Easy to extend</td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">Yes</td>
</tr>
<tr>
<td>Proxying</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
</tr>
<tr>
<td>Response templating</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
</tr>
<tr>
<td>Request matching</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">Yes</td>
</tr>
<tr>
<td>Record & Replay</td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">Yes</td>
</tr>
</tbody>
</table>
<p>After evaluating multiple options we settled on using <a href="https://github.com/SpectoLabs/hoverfly">Hoverfly</a> as the mocking solution. Hoverfly provides the ability to easily set up mocks with static or dynamic responses. Mocks were created and deployed for multiple external dependencies. Furthermore, we wanted to run the load tests against services that could at the same time be used for other tests. This meant that the service needed to dynamically switch the dependency between the real service and its mock. For this, we leveraged header-based routing using <a href="https://opensource.zalando.com/skipper/">Skipper</a>, so a service can decide whether to use mocks or actual dependent service by examining if the request belongs to a load test or not.</p>
<p>Hoverfly example mocking a service with PATCH endpoint:</p>
<div class="highlight"><pre><span></span><code><span class="x">{</span>
<span class="x"> "data": {</span>
<span class="x"> "pairs": [</span>
<span class="x"> {</span>
<span class="x"> "request": {</span>
<span class="x"> "path": [{</span>
<span class="x"> "matcher": "exact",</span>
<span class="x"> "value": "/test"</span>
<span class="x"> }],</span>
<span class="x"> "method": [{</span>
<span class="x"> "matcher": "exact",</span>
<span class="x"> "value": "PATCH"</span>
<span class="x"> }]</span>
<span class="x"> },</span>
<span class="x"> "response": {</span>
<span class="x"> "status": 204,</span>
<span class="x"> "body": "",</span>
<span class="x"> "encodedBody": false,</span>
<span class="x"> "headers": {</span>
<span class="x"> "Date": [</span>
<span class="x"> "</span><span class="cp">{{</span> <span class="nv">currentDateTime</span> <span class="s1">'Mon, 02 Jan 2020 15:04:05 GMT'</span> <span class="cp">}}</span><span class="x">"</span>
<span class="x"> ],</span>
<span class="x"> "Load-Test": [</span>
<span class="x"> "true"</span>
<span class="x"> ]</span>
<span class="x"> },</span>
<span class="x"> "templated": true</span>
<span class="x"> }</span>
<span class="x"> }</span>
<span class="x"> ],</span>
<span class="x"> "globalActions": {</span>
<span class="x"> "delays": []</span>
<span class="x"> }</span>
<span class="x"> },</span>
<span class="x"> "meta": {</span>
<span class="x"> "schemaVersion": "v5",</span>
<span class="x"> "hoverflyVersion": "v1.1.2",</span>
<span class="x"> "timeExported": "2020-01-07T13:21:02+02:00"</span>
<span class="x"> }</span>
<span class="x">}</span>
</code></pre></div>
<p>To start Hoverfly using this configuration, one can simply run:</p>
<div class="highlight"><pre><span></span><code><span class="n">hoverfly</span> <span class="o">-</span><span class="n">webserver</span> <span class="o">-</span><span class="kn">import</span> <span class="nn">simulation.json</span>
</code></pre></div>
<h2>Load Test Conductor</h2>
<p>In order to meet our goal of running automated load tests in the test cluster, we needed to design a system that could manage the full lifecycle of a load test and ensure the cluster and deployed applications match our production configuration. So applications in load test environment is updated to match resource allocation, number of instances and application version of the production environment.</p>
<h3>Load test lifecycle</h3>
<p>We defined the lifecycle of one load test as follows:</p>
<ol>
<li>Deploy all applications in the test environment to be the same version as production.</li>
<li>Scale up the applications in the test environment to meet the resource configuration of the production environment.</li>
<li>Generate load test traffic that replicates real customer behaviour.</li>
<li>Scale down applications in the test environment after the test as a cost saving measure.</li>
<li>Clean up databases and remove unnecessary test data.</li>
</ol>
<p>For this purpose, we built a microservice in Golang called the load-test-conductor that executes and manages these load test phases and transitions. Our service design was heavily influenced by what Kubernetes popularized for infrastructure management. We wanted our system to be a declarative system. Therefore, the service provides a simple API that can be used by engineers to run load tests by defining the desired state of load test. Executing a load test is now just one API call away!</p>
<p>On the diagram below, you can find the system components of the Load Test Conductor:
<img alt="Conductor Components" src="https://engineering.zalando.com/posts/2021/03/images/conductor_components_1.png"></p>
<h2>Deployment and Scaling</h2>
<p>To ensure that the exact version of the service running in production is deployed and services are pre-scaled, we automated deployment and scaling of the application within the Load Test Conductor. We use our Continuous Delivery Platform (CDP) to find the version deployed in production using the Kubernetes client and trigger a new deployment of this exact version in our staging environment. Applications which need to be included in a load test can be provided as an environment-specific configuration. The <strong>Deployer</strong> component will trigger a deployment and wait till all the deployments are completed. Afterwards, the <strong>Scaler</strong> component triggers scaling based on the target configuration. Our load test conductor currently supports scaling resources in Kubernetes and AWS ECS environments. It also handles scaling down to the previous state once the load test is completed or failed.</p>
<h2>Load generation</h2>
<p>We chose to run locust in distributed mode to mimic customer traffic. Each Locust worker executes our test scripts and interacts with our microservices in order to simulate the customer journey through our systems. We wanted to be able to test different load scenarios, so we decided to implement an algorithm in the load-test-conductor that can instrument the locust workers through the API provided by Locust. The Locust API provides the functionality to change the count and the rate at which Locust workers are spawned. We designed an algorithm that ramps up locust workers based on a business KPI (orders placed per minute). Users of the test system can define a ramp-up time, a plateau time and the target orders per minute that the test should reach. Our algorithm then hatches the locust workers based on the configured parameters and dynamically recalculates the hatch rate and locust worker count needed to reach the defined orders per minute target.</p>
<h4>Load generation pseudo code</h4>
<div class="highlight"><pre><span></span><code><span class="n">set</span><span class="w"> </span><span class="n">initial</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="mi">1</span>
<span class="n">set</span><span class="w"> </span><span class="n">calculation</span><span class="w"> </span><span class="n">interval</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="mi">60</span><span class="w"> </span><span class="n">seconds</span>
<span class="k">while</span><span class="w"> </span><span class="nb">load</span><span class="w"> </span><span class="n">test</span><span class="w"> </span><span class="n">time</span><span class="w"> </span><span class="n">has</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="n">exceeded</span>
<span class="w"> </span><span class="n">get</span><span class="w"> </span><span class="n">locust</span><span class="w"> </span><span class="n">status</span>
<span class="w"> </span><span class="n">calculate</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">per</span><span class="w"> </span><span class="n">defined</span><span class="w"> </span><span class="n">calculation</span><span class="w"> </span><span class="n">interval</span>
<span class="w"> </span><span class="n">calculate</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">per</span><span class="w"> </span><span class="n">minute</span>
<span class="w"> </span><span class="n">set</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="n">from</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">reported</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="n">locust</span><span class="o">.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">user</span><span class="w"> </span><span class="n">count</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">locust</span><span class="w"> </span><span class="n">status</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">equal</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">zero</span>
<span class="w"> </span><span class="nb">print</span><span class="w"> </span><span class="s2">"load test is being initialized."</span>
<span class="w"> </span><span class="n">set</span><span class="w"> </span><span class="n">loadtest</span><span class="w"> </span><span class="n">hatch</span><span class="w"> </span><span class="n">rate</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">one</span>
<span class="w"> </span><span class="n">set</span><span class="w"> </span><span class="n">loadtest</span><span class="w"> </span><span class="n">user</span><span class="w"> </span><span class="n">count</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">initial</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">users</span>
<span class="w"> </span><span class="n">set</span><span class="w"> </span><span class="n">loadtest</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">per</span><span class="w"> </span><span class="n">minute</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="n">set</span><span class="w"> </span><span class="n">loadtest</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">per</span><span class="w"> </span><span class="n">minute</span><span class="w"> </span><span class="n">equal</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">zero</span>
<span class="w"> </span><span class="nb">print</span><span class="w"> </span><span class="s2">"load test stalled due to no orders getting generated."</span>
<span class="w"> </span><span class="n">set</span><span class="w"> </span><span class="n">loadtest</span><span class="w"> </span><span class="n">hatch</span><span class="w"> </span><span class="n">rate</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">one</span>
<span class="w"> </span><span class="n">set</span><span class="w"> </span><span class="n">loadtest</span><span class="w"> </span><span class="n">user</span><span class="w"> </span><span class="n">count</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">one</span>
<span class="w"> </span><span class="k">else</span>
<span class="w"> </span><span class="n">calculate</span><span class="w"> </span><span class="n">total</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="n">needed</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">achive</span><span class="w"> </span><span class="n">target</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">per</span><span class="w"> </span><span class="n">minute</span><span class="w"> </span><span class="n">rate</span><span class="w"> </span><span class="n">using</span>
<span class="w"> </span><span class="n">current</span><span class="w"> </span><span class="n">locust</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="n">per</span><span class="w"> </span><span class="n">minute</span><span class="w"> </span><span class="n">rate</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">per</span><span class="w"> </span><span class="n">minute</span><span class="w"> </span><span class="n">rate</span><span class="o">.</span>
<span class="w"> </span><span class="n">calculate</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="n">that</span><span class="w"> </span><span class="n">needs</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">be</span><span class="w"> </span><span class="n">created</span><span class="o">.</span>
<span class="w"> </span><span class="n">calculate</span><span class="w"> </span><span class="n">time</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="nb">load</span><span class="w"> </span><span class="n">test</span><span class="o">.</span>
<span class="w"> </span><span class="n">calculate</span><span class="w"> </span><span class="n">iterations</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="nb">load</span><span class="w"> </span><span class="n">test</span><span class="o">.</span>
<span class="w"> </span><span class="n">calculate</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">spawn</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="n">iteration</span><span class="o">.</span>
<span class="w"> </span><span class="n">calculate</span><span class="w"> </span><span class="n">hatchrate</span>
<span class="w"> </span><span class="n">set</span><span class="w"> </span><span class="n">loadtest</span><span class="w"> </span><span class="n">hatch</span><span class="w"> </span><span class="n">rate</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">calculated</span><span class="w"> </span><span class="n">hatchrate</span>
<span class="w"> </span><span class="n">set</span><span class="w"> </span><span class="n">loadtest</span><span class="w"> </span><span class="n">hatch</span><span class="w"> </span><span class="n">rate</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">calculated</span><span class="w"> </span><span class="n">users</span>
<span class="w"> </span><span class="n">update</span><span class="w"> </span><span class="n">locust</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="nb">load</span><span class="w"> </span><span class="n">test</span><span class="w"> </span><span class="n">parameters</span><span class="p">,</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="n">triggers</span><span class="w"> </span><span class="nb">load</span><span class="w"> </span><span class="n">generation</span><span class="o">.</span>
<span class="w"> </span><span class="n">sleep</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">calculaton</span><span class="w"> </span><span class="n">interval</span><span class="w"> </span><span class="n">time</span><span class="o">.</span>
</code></pre></div>
<h2>Test Execution & Test Evaluation</h2>
<p>To trigger the load test, we used a Kubernetes CronJob that calls the API of the load test conductor. For our Payment system, load tests take about 2 hours to complete.</p>
<p>To monitor the system during test execution, we leverage Grafana dashboards that provide insights into the most important metrics, for example - latency, throughput and response code rates. Through manual inspection of the graphs, we also evaluate if a load test was successful or not. Additionally, we use alerts that trigger when a service did not meet its SLO during a test.</p>
<p>Test results have to be manually evaluated to decide if the outcome is successful or not, which is sufficient for us for the time being.</p>
<h2>Conclusions</h2>
<p>Overall, the solution fulfilled the goal of a successful preparation and scaling of our applications. However, running load tests on the test cluster posed several challenges. Sometimes, new deployments were rolled out during tests, which caused the service to point to pods with minimal resources instead of the scaled up one. Several infrastructure components like cluster node type, databases, centrally managed event queues (<a href="https://github.com/zalando/nakadi">Nakadi</a>) had to be adjusted for similarity with the production environment. This required a lot of communication effort and alignment with teams managing the services.</p>
<p>We made the deployment of the production versions of the applications an optional feature, so that developers can test their feature branch code. The load test tool has become our standard way to verify for every developed change that the applications can handle peak production traffic.</p>
<p>Giving developers the possibility to run load tests by a simple API call encourages and enables them to thoroughly load test applications.</p>
<p>Since these load tests are conducted in a non-production environment, we could stress the services till they fail. In combination with load tests in production, this was essential for preparing our production services for higher load.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>ECS is only used by a small set of isolated services, all other services run on <a href="https://engineering.zalando.com/tags/kubernetes.html">Kubernetes</a>. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Integration tests with Testcontainers2021-02-25T00:00:00+01:002021-02-25T00:00:00+01:00Marek Hudymatag:engineering.zalando.com,2021-02-25:/posts/2021/02/integration-tests-with-testcontainers.html<p>We explore how to write integration tests using Testcontainers.org library in Java-based backend applications.</p><p>In this article, I will show how teams at <a href="https://zms.zalando.com/">Zalando Marketing Services</a> are using integration tests in Java-based backend applications. We will follow the idea of integration tests: the main concept and the attributes of a good integration test. Then, we will discuss an example based on the TestContainers library used in the Spring environment.</p>
<h2>Integration tests</h2>
<p>There are many definitions of integration testing. For example, the definition found on <a href="https://en.wikipedia.org/wiki/Integration_testing">Wikipedia</a> is: <code>Integration testing is the phase in software testing in which individual software modules are combined and tested as a group</code>.</p>
<p>For this article, we define integration tests as tests of communication between our code and external components, e.g. database, one of the AWS services (like S3, Kinesis, DynamoDB, SQS, and others) or an external system with which we are communicating over HTTP.</p>
<p>The purpose of integration tests is to assess how our code will behave when communicating with external services. Not only in happy path scenarios, but especially in corner cases, e.g. external service will respond with an unexpected HTTP code, the HTTP response will come after a defined timeout, AWS S3 responses with internal errors.</p>
<h2>Amount of integration tests</h2>
<p>While implementing tests, we need to remember to maintain the proper balance between different test types. Integration tests cannot be the core of the testing codebase.</p>
<p><a href="https://martinfowler.com/articles/practical-test-pyramid.html">A pyramid of testing</a> shows us the proportions of different types of tests. For backend applications, the foundations are unit tests and component tests. Integration tests are a complement of unit tests and other test types like component, system, and manual.</p>
<p><img alt="Pyramid of testing" src="https://engineering.zalando.com/posts/2021/02/images/pyramid-of-testing.png"></p>
<p><a href="https://en.wikipedia.org/wiki/System_testing">System tests</a> and manual tests should ideally be the rarest type of tests.
From our experience, we estimate the number of integration tests to be around 25% of unit tests, but it varies from application to application.</p>
<h2>Integration tests with Testcontainers library</h2>
<p>Let's see how to organize an integration test with the Testcontainers library, and how to manage a startup/teardown of Docker containers.
<a href="https://www.testcontainers.org/">Testcontainers.org</a> is a JVM library that allows users to run and manage Docker images and control them from Java code. <a href="https://www.testcontainers.org/#who-is-using-testcontainers">Zalando uses it</a> mainly for integration tests.
To implement an integration test, you need to run your application similarly to a unit test (method annotated by <code>@Test</code>).</p>
<p>The integration test additionally runs external components as real Docker containers. External components can be one of:</p>
<ul>
<li><strong>database storage</strong> - for example, run real PostgreSQL as a Docker image,</li>
<li><strong>mocked HTTP server</strong> - you can mimic the behavior of other HTTP services by using Docker images from MockServer or WireMock,</li>
<li><strong>Redis</strong> - run real Redis as a Docker image,</li>
<li><strong>streams or queues</strong> (like RabbitMQ and others),</li>
<li><strong>AWS components</strong> like S3, Kinesis, DynamoDB, and others, which you can emulate with Localstack</li>
<li>other <strong>application</strong> that can be run as a Docker image.</li>
</ul>
<p>It is very easy to run Docker images from Java code. Every Docker image can be run with <code>GenericContainer</code>. For the most popular Docker images, there are prepared wrapper classes for convenient usage.</p>
<p>To make sure that every Docker image will be stopped after usage and resources are released, the library uses JVM ShutdownHooks and a special Docker image <code>Ryuk</code>. ShutdownHooks stops images when tests are finished. In case the Java process is no longer available, the <code>Ryuk</code> container stops all Docker images. It is worth mentioning that it is possible to disable <code>Ryuk</code> containers.</p>
<p><img alt="Your service communicates with external components run as Docker images." src="https://engineering.zalando.com/posts/2021/02/images/concept.jpg"></p>
<h2>Maven configuration</h2>
<p>To use Testcontainers, add a maven dependency with a current library version.</p>
<div class="highlight"><pre><span></span><code><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>org.testcontainers<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>testcontainers<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>${testcontainers.version}<span class="nt"></version></span>
<span class="w"> </span><span class="nt"><scope></span>test<span class="nt"></scope></span>
<span class="nt"></dependency></span>
</code></pre></div>
<p>It's important to have control over test execution. Unit tests should be executed before integration tests. It is a consequence of the pyramid of testing and helps to ensure that feedback loops are short.
In some cases, you may want to skip integration tests, for example when your local machine is slow and you want to run it only on CI/CD.</p>
<p>To run the integrations tests after your unit tests, simply add <code>maven-failsafe-plugin</code> to your project. Failsafe and Surefire plugins work in different build phases.
By default, the Maven Surefire plugin executes unit tests during the test phase. It includes all classes whose name ends with Test / Tests or TestCase.
The Failsafe plugin runs integration tests in the integration-test phase. To separate execution, we configure Failsafe plugin to run classes with postfix <code>IntegrationTest</code>.
We also create a special profile, here: <code>with-integration-tests</code> to control if we want to run integration-tests or not.</p>
<div class="highlight"><pre><span></span><code><span class="nt"><profiles></span>
<span class="w"> </span><span class="nt"><profile></span>
<span class="w"> </span><span class="nt"><id></span>with-integration-tests<span class="nt"></id></span>
<span class="w"> </span><span class="nt"><build></span>
<span class="w"> </span><span class="nt"><pluginManagement></span>
<span class="w"> </span><span class="nt"><plugins></span>
<span class="w"> </span><span class="nt"><plugin></span>
<span class="w"> </span><span class="nt"><groupId></span>org.apache.maven.plugins<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>maven-failsafe-plugin<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><executions></span>
<span class="w"> </span><span class="nt"><execution></span>
<span class="w"> </span><span class="nt"><goals></span>
<span class="w"> </span><span class="nt"><goal></span>integration-test<span class="nt"></goal></span>
<span class="w"> </span><span class="nt"><goal></span>verify<span class="nt"></goal></span>
<span class="w"> </span><span class="nt"></goals></span>
<span class="w"> </span><span class="nt"></execution></span>
<span class="w"> </span><span class="nt"></executions></span>
<span class="w"> </span><span class="nt"><configuration></span>
<span class="w"> </span><span class="nt"><includes></span>
<span class="w"> </span><span class="nt"><include></span>**/*IntegrationTest.java<span class="nt"></include></span>
<span class="w"> </span><span class="nt"></includes></span>
<span class="w"> </span><span class="nt"></configuration></span>
<span class="w"> </span><span class="nt"></plugin></span>
<span class="w"> </span><span class="nt"></plugins></span>
<span class="w"> </span><span class="nt"></pluginManagement></span>
<span class="w"> </span><span class="nt"></build></span>
<span class="w"> </span><span class="nt"></profile></span>
</code></pre></div>
<p>An invocation of maven command would look like:</p>
<div class="highlight"><pre><span></span><code>mvn clean verify -P with-integration-tests
</code></pre></div>
<h2>Basic integration test with TestContainers</h2>
<p>Let’s set up a basic integration test with JUnit 5 and Spring Boot.</p>
<p>An integration test class example can look like the example below. The test class inherits from <code>AbstractIntegrationTest</code>. The test method creates an entity in the database run as a Docker image. Later, we read the entity from the database and control if the entity has been written correctly.</p>
<div class="highlight"><pre><span></span><code><span class="kd">class</span> <span class="nc">AccountRepositoryIntegrationTest</span><span class="w"> </span><span class="kd">extends</span><span class="w"> </span><span class="n">AbstractIntegrationTest</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Autowired</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">AccountRepository</span><span class="w"> </span><span class="n">dao</span><span class="p">;</span>
<span class="w"> </span><span class="nd">@Test</span>
<span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">shouldCreateAccount</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// given</span>
<span class="w"> </span><span class="n">Account</span><span class="w"> </span><span class="n">account</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">createAccount</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// when</span>
<span class="w"> </span><span class="n">underTest</span><span class="p">.</span><span class="na">save</span><span class="p">(</span><span class="n">account</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// then</span>
<span class="w"> </span><span class="n">Optional</span><span class="o"><</span><span class="n">Account</span><span class="o">></span><span class="w"> </span><span class="n">actualOptional</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dao</span><span class="p">.</span><span class="na">findById</span><span class="p">(</span><span class="n">account</span><span class="p">.</span><span class="na">getId</span><span class="p">());</span>
<span class="w"> </span><span class="n">Account</span><span class="w"> </span><span class="n">expected</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">createAccount</span><span class="p">();</span>
<span class="w"> </span><span class="n">assertThat</span><span class="p">(</span><span class="n">actualOptional</span><span class="p">).</span><span class="na">isPresent</span><span class="p">();</span>
<span class="w"> </span><span class="n">assertThat</span><span class="p">(</span><span class="n">actualOptional</span><span class="p">.</span><span class="na">get</span><span class="p">()).</span><span class="na">isEqualTo</span><span class="p">(</span><span class="n">expected</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The test class below is an abstract class that will be inherited by all integration tests. It contains static references to Docker containers - <a href="https://www.testcontainers.org/test_framework_integration/manual_lifecycle_control/#singleton-containers">singleton container</a>.
In the static block, we start all images. We do not need to stop them, it will be done automatically. In the example below, the <code>PostgreSQLContainer</code> is going to listen on a random port. To facilitate adding properties with dynamic values, we used the <code>@DynamicPropertySource</code> annotation that was introduced in Spring Framework 5.2.5 (it has a more compact syntax than <code>ApplicationContextInitializer</code>).</p>
<div class="highlight"><pre><span></span><code><span class="nd">@SpringBootTest</span><span class="p">(</span><span class="n">webEnvironment</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">WebEnvironment</span><span class="p">.</span><span class="na">RANDOM_PORT</span><span class="p">)</span>
<span class="nd">@ActiveProfiles</span><span class="p">(</span><span class="s">"test"</span><span class="p">)</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">abstract</span><span class="w"> </span><span class="kd">class</span> <span class="nc">AbstractIntegrationTest</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="n">PostgreSQLContainer</span><span class="w"> </span><span class="n">postgreSQL</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">PostgreSQLContainer</span><span class="p">(</span><span class="s">"postgres:13.1"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withUsername</span><span class="p">(</span><span class="s">"testUsername"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withPassword</span><span class="p">(</span><span class="s">"testPassword"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withDatabaseName</span><span class="p">(</span><span class="s">"testDatabase"</span><span class="p">);</span>
<span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">postgreSQL</span><span class="p">.</span><span class="na">start</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nd">@DynamicPropertySource</span>
<span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">postgresqlProperties</span><span class="p">(</span><span class="n">DynamicPropertyRegistry</span><span class="w"> </span><span class="n">registry</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">registry</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"db_url"</span><span class="p">,</span><span class="w"> </span><span class="n">postgreSQL</span><span class="p">::</span><span class="n">getJdbcUrl</span><span class="p">);</span>
<span class="w"> </span><span class="n">registry</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"db_username"</span><span class="p">,</span><span class="w"> </span><span class="n">postgreSQL</span><span class="p">::</span><span class="n">getUsername</span><span class="p">);</span>
<span class="w"> </span><span class="n">registry</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"db_password"</span><span class="p">,</span><span class="w"> </span><span class="n">postgreSQL</span><span class="p">::</span><span class="n">getPassword</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h2>@TestContainers annotation</h2>
<p>There are also different ways of running your containers. You can use the annotations set prepared in the Junit-Jupiter maven module:</p>
<div class="highlight"><pre><span></span><code><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>org.testcontainers<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>junit-jupiter<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>${testcontainers.version}<span class="nt"></version></span>
<span class="w"> </span><span class="nt"><scope></span>test<span class="nt"></scope></span>
<span class="nt"></dependency></span>
</code></pre></div>
<p>A test class annotated with the <code>@Testcontainers</code> annotation runs all containers annotated with the <code>@Container</code> annotation. Additionally, when the container is static, it shares containers between test methods. You can control the startup order of containers by using <code>dependsOn</code> method of <code>GenericContainer</code>. The main limitation is, that containers <strong>cannot be reused between test classes</strong>. Moreover, this extension has only been tested with sequential test execution. Using it with parallel test execution is unsupported and may have unintended side effects.
The test class would look like the example below.</p>
<div class="highlight"><pre><span></span><code><span class="nd">@Testcontainers</span>
<span class="nd">@SpringBootTest</span><span class="p">(</span><span class="n">webEnvironment</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">WebEnvironment</span><span class="p">.</span><span class="na">RANDOM_PORT</span><span class="p">)</span>
<span class="nd">@ActiveProfiles</span><span class="p">(</span><span class="s">"test"</span><span class="p">)</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">ApplicationIntegrationTest</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Container</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="n">PostgreSQLContainer</span><span class="w"> </span><span class="n">postgreSQL</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">PostgreSQLContainer</span><span class="p">(</span><span class="s">"postgres:13.1"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withUsername</span><span class="p">(</span><span class="s">"testUsername"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withPassword</span><span class="p">(</span><span class="s">"testPassword"</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="na">withDatabaseName</span><span class="p">(</span><span class="s">"testDatabase"</span><span class="p">);</span>
<span class="w"> </span><span class="nd">@DynamicPropertySource</span>
<span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">postgresqlProperties</span><span class="p">(</span><span class="n">DynamicPropertyRegistry</span><span class="w"> </span><span class="n">registry</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">registry</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"spring.datasource.url"</span><span class="p">,</span><span class="w"> </span><span class="n">postgreSQL</span><span class="p">::</span><span class="n">getJdbcUrl</span><span class="p">);</span>
<span class="w"> </span><span class="n">registry</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"spring.datasource.password"</span><span class="p">,</span><span class="w"> </span><span class="n">postgreSQL</span><span class="p">::</span><span class="n">getPassword</span><span class="p">);</span>
<span class="w"> </span><span class="n">registry</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"spring.datasource.username"</span><span class="p">,</span><span class="w"> </span><span class="n">postgreSQL</span><span class="p">::</span><span class="n">getUsername</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nd">@Test</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">contextLoads</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h2>Lifecycle of integration test</h2>
<p>All tests (including integration tests) should follow principles defined as FIRST. The acronym FIRST was defined in the book <a href="https://www.oreilly.com/library/view/clean-code-a/9780136083238/">Clean Code</a> written by Robert C. Martin.</p>
<ul>
<li><strong>[F]</strong>ast - A test should not take more than a second to finish the execution.</li>
<li><strong>[I]</strong>solated - No order-of-run dependency.</li>
<li><strong>[R]</strong>epeatable - A test method should NOT depend on any data in the environment/instance in which it is running.</li>
<li><strong>[S]</strong>elf-Validating - No manual inspection required to check whether the test has passed or failed.</li>
<li><strong>[T]</strong>horough - Should cover every use case scenario and NOT just aim for 100% coverage.</li>
</ul>
<p>Running a Docker image for every test method can take an enormous amount of time. To increase performance we need to make a real-life compromise. We can run a Docker image per class or even run once for all integration test executions. The second approach has been presented in the code.
If we decide to share Docker images between tests, we need to be ready for it. There are many ways to achieve it, e.g.:</p>
<ul>
<li>Tests should operate on unique IDs, names, etc. That way, we can avoid collisions of database constraints. In this case, you don’t need to clean up after the test execution. Some problems can occur, for example when you count elements in the database table. You can count elements created by different tests.</li>
<li>Tests should clean up the state after execution. This approach consumes much more development time and is error-prone.</li>
</ul>
<p>If we would like to run tests concurrently, it would require even more discipline from developers.</p>
<h2>Advantages of using the TestContainers library</h2>
<ul>
<li>You run tests against real components, for example, the PostgreSQL database instead of the H2 database, which doesn’t support the Postgres-specific functionality (e.g. partitioning or JSON operations).</li>
<li>You can mock AWS services with Localstack or Docker images provided by AWS. It will simplify administrative actions, cut costs and make your build offline.</li>
<li>You can run your tests offline - no Internet connection is needed. It is an advantage for people who are traveling or if you have a slow Internet connection (when you have already run them once and there is no version change in the container).</li>
<li>You can test corner cases in HTTP communication like:<ul>
<li>programmatically simulate timeout from external services (e.g. by configuring MockServer to respond with a delay that is bigger than the timeout set in your HTTP client),</li>
<li>simulate HTTP codes that are not explicitly supported by our application.</li>
</ul>
</li>
<li>Implementation and tests can be written by developers and exposed in the same pull request by backend developers.</li>
<li>Even one integration test can verify if your application context starts properly and your database migration scripts (e.g. Flyway) are executing correctly.</li>
</ul>
<h2>Disadvantages of using the TestContainers library</h2>
<ul>
<li>We bring another dependency to our system that you need to maintain.</li>
<li>You need to run containers at least once - it consumes time and resources. For example, PostgreSQL as a Docker image needs around 4 seconds to start on my machine, whereas the H2 in-memory database needs only 0.4 seconds. From my experience, Localstack which emulates AWS components, can start much longer, even 20 seconds on my machine.</li>
<li>A continuous integration (e.g. Jenkins) machine needs to be bigger (build uses more RAM and CPU).</li>
<li>Your local computer should be pretty powerful. If you run many Docker images, it can consume a lot of resources.</li>
<li>Sometimes, integration tests with TestContainers are still not sufficient. For example, while testing REST responses with a mockserver container you can miss changes of real API. Inside the integration test, you may not reflect it, and your code still can crash on production. To minimize the risk, you may consider leveraging Contract Testing via <a href="https://spring.io/projects/spring-cloud-contract">Spring Cloud Contract</a>.</li>
</ul>
<h2>Code example</h2>
<p>You can find examples of usages in my <a href="https://github.com/marekhudyma/application-style">GitHub project</a>.</p>
<p><em>If you would like to help us improve our tests and thus help shipping high-quality features for our customers, please consider joining our <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Product%20Design%20%26%20User%20Research&filters%5Bcategories%5D%5B1%5D=Applied%20Science&filters%5Bcategories%5D%5B2%5D=Software%20Engineering&filters%5Bcategories%5D%5B3%5D=Product%20Management%20%28Technology%29&filters%5Bentities%5D%5B0%5D=zms">Engineering Teams</a></em> at Zalando Marketing Services (ZMS).</p>A Machine Learning Pipeline with Real-Time Inference2021-02-16T00:00:00+01:002021-02-16T00:00:00+01:00Henning-Ulrich Essertag:engineering.zalando.com,2021-02-16:/posts/2021/02/machine-learning-pipeline-with-real-time-inference.html<p>How we improved an ML legacy system using Amazon SageMaker</p><p>Customers love the freedom to try the clothes first and pay later. We’d love to offer everyone the convenience of deferred payment. However, fraudsters exploit this to acquire goods they never pay for. The better we know the probability of an order defaulting, the better we can steer the risk and offer the convenience of deferred payment to more customers.</p>
<p>That’s where our Machine Learning models come into play.</p>
<p><img alt="payments" src="https://engineering.zalando.com/posts/2021/02/images/payments.png#center"></p>
<p>We have been tackling this problem for a while now.
Everything started with a simple Python and scikit-learn setup.
In 2015 we decided to migrate to Scala and Spark in order to scale better. You can read about this transition <a href="https://engineering.zalando.com/posts/2016/05/scalable-fraud-detection-fashion-platform.html">on our engineering blog</a>.
Last year we started to explore the potential value of tooling provided by Zalando's Machine Learning Platform (ML Platform) team as part of our strategy investment.</p>
<h3>Pain Points with the existing solution</h3>
<p>Our current solution serves us well. However, it has a few pain points, namely:</p>
<ol>
<li>It’s highly coupled to Scala and Spark which makes using state of the art libraries (mostly Python) difficult.</li>
<li>It contains custom tailored code for functionalities which nowadays can be replaced by managed services. This adds an additional layer of complexity, making it difficult to maintain and to onboard new team members.</li>
<li>It is a bit problematic in production: it uses a lot of memory, suffers from latency spikes, new instances start rather slowly which affects scalability.</li>
<li>It has a monolithic design, meaning that feature preprocessing and model training are highly coupled. There is no pipeline with clear steps and everything runs on the same cluster during training.</li>
</ol>
<h3>Requirements for the New System</h3>
<p>We started the project by writing down requirements for the new solution. The requirements fulfilled by our current system still stand:</p>
<ul>
<li><strong>API</strong>: the new system needs to conform to the existing API. We receive a JSON response with order data, and return a response in a JSON format.</li>
<li><strong>Latency</strong>: the deployed service must respond to requests quickly. 99.9% of responses must be returned under a threshold in the order of milliseconds.</li>
<li><strong>Load</strong>: the busiest model must be able to handle hundreds of requests per second (RPS) on a regular basis. During sales events, the requests rate for a model may scale at a higher order of magnitude.</li>
<li><strong>Support for multiple models in production</strong>: several models, divided per assortment type, market, etc., must be available in the production service at any given time.</li>
<li><strong>Unified feature implementation</strong>: our model features require preprocessing (extraction from the request JSON) both in production and in our training data (which comes in the same JSON format). The preprocessing applied to incoming requests in production must be identical to that applied to the training data. We want to avoid implementing this logic twice for both cases.</li>
<li><strong>Performance metrics</strong>: we must be able to compare the performance between the new and the old version of a model (using the same data) to improve our tagging capabilities.</li>
</ul>
<p>To alleviate the current pains, we require our new system to meet the following criteria in addition to those above:</p>
<ol>
<li><strong>Independence from a specific model framework</strong>: our research team develops improved models with different frameworks, such as PyTorch, Tensorflow, <a href="https://engineering.zalando.com/posts/2020/06/distributed-xgb-sagemaker.html">XGBoost</a>, etc.</li>
<li><strong>Fast scale-up</strong>: the production system should adjust to growing traffic and accept requests in a matter of minutes.</li>
<li><strong>Clear pipeline</strong>: the pipeline should have clear steps, especially the separation between data preprocessing and model training should be easy to understand.</li>
<li><strong>Use existing services</strong>: ML tooling made quite a leap in the recent years and when possible we should take advantage of what’s available instead of building custom solutions.</li>
</ol>
<h3>Architecture of the New System</h3>
<p>The system is a machine learning workflow built primarily from services provided by AWS. At Zalando, we use a tool provided by Zalando’s ML Platform team called <a href="https://www.linkedin.com/pulse/building-ml-workflows-zalando-zflow-s%25C3%25A1nchez-fern%25C3%25A1ndez/">zflow</a>. It is essentially a Python library built on top of <a href="https://aws.amazon.com/step-functions/">AWS Step Functions</a>, <a href="https://aws.amazon.com/lambda/">AWS Lambdas</a>, <a href="https://aws.amazon.com/sagemaker/">Amazon SageMaker</a>, and <a href="https://databricks.com/">Databricks</a> Spark, that allows users to easily orchestrate and schedule ML workflows.</p>
<p>With this approach we steer away from implementing the whole system from scratch, hopefully making it easier to understand, which was one of the pain points (#2) of our prior system.</p>
<p>In this new system, a single workflow orchestrates the following tasks:</p>
<ul>
<li>Training data preprocessing, using a Databricks cluster and a scikit-learn batch transform job on SageMaker</li>
<li>Training a model using a SageMaker training job</li>
<li>Generating predictions with another batch transform job</li>
<li>Generating a report to demonstrate model’s performance, done with a Databricks job</li>
<li>Deploying a SageMaker endpoint to serve the model</li>
</ul>
<p><img alt="statemachine" src="https://engineering.zalando.com/posts/2021/02/images/statemachine.jpg"></p>
<p>The platform solution allowed us to create a clean workflow with a lot of flexibility when it comes to technology selection for all the steps. We consider this a big improvement in regards to our pain point #4.</p>
<p>Using a SageMaker training job allows us to substitute the model training step with any model available as a SageMaker container. In rare cases, when the algorithm is not already provided, we still have the possibility to implement the container on our own. This gives us much more flexibility and deals with pain point #1 discussed before.</p>
<h5>Model Evaluation</h5>
<p>After the training is finished, a SageMaker model is generated. To evaluate the performance of the model candidate we perform inference on a dedicated test dataset. As we needed to check additional metrics to the ones provided out of the box by SageMaker, we added a custom Databricks job to calculate those metrics and to plot them in a PDF report (example below, where we see a model performing poorly).</p>
<p><img alt="PR_AUC" src="https://engineering.zalando.com/posts/2021/02/images/PR_AUC_ROC.png"></p>
<h5>Model Serving</h5>
<p>At inference time, a SageMaker endpoint serves the model. Requests include a payload which requires preprocessing before it is delivered to the model. This can be accomplished using a so-called “inference pipeline model” in SageMaker.</p>
<p><img alt="Model serving" src="https://engineering.zalando.com/posts/2021/02/images/model_serving.png#center"></p>
<p>The inference pipeline here consists of two Docker containers:</p>
<ul>
<li>A scikit-learn container for processing the incoming requests, i.e. extracting features from the input JSON or basic data transformations</li>
<li>Main model container (i.e. XGBoost, PyTorch) for model predictions</li>
</ul>
<p>The containers are lightweight and optimized for serving. They are able to scale-up sufficiently fast. This solved our pain point #3.</p>
<h3>Performance Metrics</h3>
<h5>Latency and Success Rate</h5>
<p>We then performed a series of load tests. During every load test the endpoint was hit continuously for 4 minutes. We varied:</p>
<ul>
<li>The EC2 instance type</li>
<li>Number of instances</li>
<li>The request rate. Different rates were applied to different AWS instance types. For example, it does not make sense to use ml.t2.medium instances to serve a model at a highest request rate, as they are not meant for such a load.</li>
</ul>
<p>We reported the following metrics:</p>
<ul>
<li><strong>Success</strong>: the percentage of all requests which returned an HTTP 200 OK status. 100% is optimal. Although there is no hard threshold here, the success rate should be high enough to serve endpoint requests.</li>
<li><strong>99th</strong>: the 99th percentile for response rates of all requests, in milliseconds. To be usable, an endpoint must be able to respond to requests within the agreed sub-second threshold.</li>
</ul>
<p>Sample results, for m5.large instance type:</p>
<p><img alt="load1" src="https://engineering.zalando.com/posts/2021/02/images/load3.png"></p>
<p>Some of our findings:</p>
<ul>
<li>For a rate of 200 requests/s, a single ml.m5.large instance can handle the load with a p99 of under 80ms.</li>
<li>For a rate of 400 requests/s, the success rate is not near 100% until 4 or more ml.m5.large instances are used. The response rates are under 50ms.</li>
<li>For the 1000 requests/s rate, 2 or more ml.m5.4xlarge or ml.m5.12xlarge instances can keep the success rate with response times below 200ms.</li>
</ul>
<h5>Cost</h5>
<p>Based on our estimates the cost of serving our models will increase significantly after the migration. We anticipate the increase by up to 200%. The main reason behind it is cost efficiency of the legacy system, where all the models are served from one big instance (multiplied for scaling). In the new system every model gets a separate instance(s).</p>
<p>Still, this is a cost increase that we are willing to accept for the following reasons:</p>
<ul>
<li>Model flexibility. Having a separate instance per model means every model can use a different technology stack or framework for serving.</li>
<li>Isolation. Every model’s traffic is separated, meaning we can scale each model individually, and flooding one model with requests doesn’t affect other models.</li>
<li>Use of managed services instead of maintaining a custom solution.</li>
</ul>
<h5>Scale-up Time</h5>
<p>We would like to be able to adjust our infrastructure to traffic as fast as possible. This is why we verified how much time it takes to scale the system up. Based on our experiments, adding an instance to a SageMaker endpoint with our current configuration reduces scale-up time by 50% over our old system. However, we wish to explore options for reducing this time further.</p>
<h3>Cross Team Collaboration</h3>
<p>Development of this system was a collaborative effort of two different teams: Zalando Payments and Zalando Machine Learning Platform, with each contributing members to a dedicated virtual team. This inter-team collaborative workstyle is typical for the ML Platform team, which offers the services of data scientists and software engineers to accelerate onboarding to the platform. To define the scope of the collaboration, the two teams created a Statement of Work (or SoW) to specify what services and resources the ML Platform will provide, and for what length of time. The entire collaboration lasted 9 months.</p>
<p>The two teams collaborated in a traditional Kanban development style: we developed user stories, broke them into tasks, and completed each task. We met weekly for a replanning and had daily standups to catch up.</p>
<p>Our collaboration was not without friction. Having developers from two different teams means overhead from two different teams. For example:</p>
<ul>
<li>We had periods where the ML Platform team members had to deliver training programs for other parts of the company, and could not devote much time to this project. Similarly, members of the Payments team would occasionally need to attend to unrelated firefighting duties and miss a week of the collaborative project. Clearly communicating these external influences was very important, as the Payments team members are not aware of what is happening in the ML Platform team, and vice-a-versa.</li>
<li>Sharing knowledge between the two teams was crucial, especially in the early stages of the project. While the Payments' team members are experts at the underlying business domain, the ML Platform team members were not. Similarly, while the ML Platform team members are experienced with the tools used for the project, the Payments’ team members did not have this expertise.</li>
</ul>
<h3>Conclusion and Outlook</h3>
<p>Our new system fulfills the requirements of the old system, while addressing its pain points:</p>
<ol>
<li>Because we use Amazon SageMaker for the model actions (i.e. training, endpoints, etc.), the system is guaranteed to be independent from the modeling framework.</li>
<li>Each model served behind a SageMaker endpoint scales more quickly than in the old system, and we can easily increase the number of instances used for model training to speed up our pipeline execution.</li>
<li>Each stage of the pipeline has a clear purpose and thanks to SageMaker Inference Pipelines, the data processing and model inferencing can take place within a single endpoint.</li>
<li>Because we are using Zalando ML Platform tooling, our new system takes advantage of technology from AWS, in particular Amazon SageMaker.</li>
</ol>
<p>We plan to use a similar architecture in other data science products.</p>
<p>The project was a successful test of a team collaboration across departments, and proved that such collaboration can bring great results.</p>
<p><em>If you would like to work on similar problems, consider <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Applied%20Science&filters%5Bcategories%5D%5B1%5D=Software%20Engineering&filters%5Btypes%5D%5B0%5D=Full-Time&search=%22machine%20learning%22">joining our Machine Learning teams</a>.</em></p>Find out what challenges Customer Conversion solves at Zalando2021-02-11T00:00:00+01:002021-02-11T00:00:00+01:00Kerstin Schartnertag:engineering.zalando.com,2021-02-11:/posts/2021/02/customer-conversion-at-zalando.html<p>We have spoken with our Director Customer Conversion, Pascal Hahn to find out more about their Product and to understand what the teams are looking for in the upcoming Hiring Sprint Event</p><p><img alt="Pascal Hahn" src="https://engineering.zalando.com/posts/2021/02/images/pascal-hahn.jpg#right"></p>
<p>When our <a href="https://pages.beamery.com/zalando/page/hiring-sprint-event?utm_source=beamery&utm_medium=landingpage-p-paid&utm_campaign=2018-dim&utm_term=LinkedInTRM&utm_content=hiringsprintFeb">Hiring Sprint</a> kicks off next month, we will be looking for great professionals to join some of our stellar teams – Shopping Cart, Checkout, Sales Orders and Returns. All meaningful segments of our <strong>Customer Conversion</strong> organization, these teams are responsible for forging and shaping some of the most relevant experiences in Zalando customer journey. Skilled in innovating and versed in perfection, our Customer Conversion organization might become your next career step if you ace our Hiring Sprint.</p>
<p>To give you a better idea of what expects you here I have spoken with our Director Customer Conversion, Pascal Hahn, who has talked me through the priorities of his teams and has shared some advice for those who are keen to join it ;)</p>
<h3>Pascal, could you introduce the major functions and priorities of your teams?</h3>
<p><strong>Customer Conversion</strong> is the organization that enables our <strong>35M</strong> customers to shop on Zalando. We are split in two departments: the Purchase department that delivers experiences from Shopping cart to Order confirmation, and the Post Purchase department that is responsible for processing orders, sorting out order details, order history as well as return experiences. Each department delivers experiences end-to-end, from ideation, product inception and development to operating and scaling them. Our mission is to let customers buy their beloved pieces easily and effortlessly by providing seamless, convenient and reliable experiences throughout. The work we do is a broad mix of designing and building new capabilities, experimenting, expanding and extending existing experiences or improving scalability and operational posture overall.</p>
<h3>"Solving something that matters" - what does it mean for the team? What does it mean for you personally?</h3>
<p>There’s no e-commerce without people shopping; and to work on the experiences that Zalando customers across all 17 markets use when they shop for their next favorite piece is a great mission. Being part of delivering excellent shopping experiences is what makes working at Zalando very special for me.</p>
<h3>What do you appreciate the most about the challenges you face in your job?</h3>
<p>To have a shot at solving problems that affect millions of users, together with some of the industry’s brightest minds is a privilege. When I started here about a year ago, I didn't know much about the inner workings of retail, and ever since I haven’t had a single day at Zalando without learning something new. Going forward, I still feel like there’s so much to learn.</p>
<h3>Pascal, could you give some advice to people who'd like to work in the Customer Conversion organization?</h3>
<p>If you’re excited about innovating at the intersection of the physical and the digital; if you take pride in building and operating systems that “just work”; if you enjoy using state-of-the-art tech at scale – this is the right place for you to work at. Whether you choose to work on product innovations with our product management team, or join us as an engineer or engineering leader that owns, delivers and operates our experiences, or as a data scientist who works on detecting transactional risks that affect our overall business – we offer a number of roles and challenges.</p>
<h3>What do you think is the main achievement of the teams in Customer Conversion of the past few years?</h3>
<p>The COVID pandemic has posed many challenges to our customers, team members, teams and business. When some markets introduced severe lockdowns, we had to react quickly building new features with very short timelines. Keeping the Zalando Store open and coping with the increased scale while delivering new features to our customers continually has been no easy feat. In addition, all the while we were working from home and had to cope with our own personal difficulties brought on by the virus and the imposed restrictions.</p>
<p>For more details on how to participate in our 1st Hiring Sprint follow <a href="http://zln.do/3nNskEV">this Link</a>!</p>It's Never Too Late For a Career Change2021-02-04T00:00:00+01:002021-02-04T00:00:00+01:00Julia Millertag:engineering.zalando.com,2021-02-04:/posts/2021/02/its-never-too-late-for-a-career-change.html<p>A story of a Business Analyst and Product Manager turning into a Software Engineer.</p><p>Is it ever too late to follow your dream and start a new career? Well, I was 30 and had been working for Zalando for more than 4 years when I decided to change my career path for the second time. I made the decision a year ago, joined my new team in April 2020, and I didn't regret it for a single day.</p>
<p>Since that transition, a lot of people approached me with questions and asked me for advice. I started to realize that my experience could be valuable to others out there. Some people may want to change their career too but are afraid of failure or do not have enough support from their friends or colleagues, or maybe haven’t even shared their thoughts with anyone yet.</p>
<p>This article contains answers to the questions I was frequently asked. I hope it might support you with the decision whether a career in software engineering is what you always wanted, provide you with arguments to convince people around you that switching careers is a great idea if you do it for the right reasons, or just help you go through a difficult time of uncertainty.</p>
<p><img alt="Julia after the Coding Camp" src="https://engineering.zalando.com/posts/2021/02/images/inside-zalando.jpg"></p>
<h3>What did you do before you became an engineer?</h3>
<p>I studied business mathematics and joined Zalando as a Business Analyst after completing my master's degree. At my first job, I was helping out one of the Product Managers (PM) in my department. One year later I was offered the opportunity to become a PM myself. By that time, product duties had already taken more than 50% of my working time, so it was an easy decision. I continued to work as PM for another 3 years.</p>
<h3>How did you become interested in coding?</h3>
<p>I was always working quite closely with engineers in my team. At some point, they realized that I enjoy thinking about technical stuff too, and started to involve me in their discussions. I still remembered a bit of coding that I did during my bachelor years, and I started spending some of my free time attending online courses and re-learning how to code.</p>
<h3>How did you learn to code?</h3>
<p>My interest was growing, but at the same time, I had to admit that I couldn't spend enough time coding outside my work. You should know that I'm a very social person, so almost every evening in my normal week is blocked for some kind of social activity. I love to travel, so the weekends didn't help either. I decided to give it a proper try: take a sabbatical and do a full-time course at <a href="https://www.ironhack.com/en/berlin">Ironhack</a> coding camp for 9 weeks. With the help of this course I built the foundation for my current programming skill set.</p>
<h3>Why did you decide to switch to engineering?</h3>
<p>After 9 weeks of coding every day<sup id="fnref:*"><a class="footnote-ref" href="#fn:*">1</a></sup>, I still enjoyed it. So I said to myself, this is what I'd like to be paid for! It felt right to pursue something that is so much fun even while it's sometimes frustrating.</p>
<h3>How did you know it was the right decision?</h3>
<p>This was the key question for me. It was a life-changing decision, so I wanted to be fully aware of my motivations and confident that I really want it. My key takeaways were:</p>
<ol>
<li><strong>Make sure to not bargain one trouble for another.</strong> It's absolutely crucial to know that you want to become an engineer rather than just escape your current job. To verify that it's not about my current product or team, I first switched to another department still as a PM but working on a completely different topic. Only after spending half a year with the new project, I could say with certainty that my wish was not about the circumstances but the engineering job itself.</li>
<li><strong>Make sure you want to become an engineer for the right reasons.</strong> I made a list of pros and cons for both my current job and software development and then talked to engineers I knew to ensure it's not just how I <em>imagine</em> this job to be. If some aspects of your current role make you unhappy, make sure it's not going to be a major part of your future role. If you are happy with your job, but the main reason is that you think you could earn more money as an engineer – please, think twice. However, if you can see how becoming a software engineer would fit your interests, character, and life goals much better than your current job – go for it!</li>
</ol>
<h3>What do you like most about engineering?</h3>
<p>My favorite topic! There are so many things! Here are just a few highlights:</p>
<ol>
<li><strong>Power of creativity</strong>: when you write code, you create something that wasn't there before. Sometimes it's really touchable, like a new button, sometimes it's a new behavior you introduce, sometimes a performance gain. Whatever it is, the act of creation makes you feel almost like a god ^^.</li>
<li><strong>Joy of focus</strong>: I love that engineering goals are usually very tangible. I also love that, at least at the beginning of your engineering career, you can focus on one task at a time. In my previous roles, I would often end up juggling a lot of balls at the same time, which can be very exhausting. It’s an extremely satisfying experience to really complete something end to end, even if it’s just a little button that does exactly one thing.</li>
<li><strong>Solving puzzles</strong>: you often have to solve what feels like real mysteries. When you investigate failures or look for root causes of a bug, you are the Sherlock Holmes in this story. If you are into this kind of puzzles, it's going to be amazing.</li>
<li><strong>Constant learning</strong>: no matter how long you are in this job, there is always more to learn - new frameworks, programming languages, tools, principles, concepts, entire new areas of technology. This feeling is shared by every engineer I know, regardless of how many years of experience they have. Your brain is always working, and it's beautiful.</li>
</ol>
<h3>Weren't you afraid to start on a new path after 4 years of a professional career?</h3>
<p>Of course I was! Every new start is terrifying. But if you know why you are doing it and you have the support of your colleagues, friends and family, it's less scary. Even if you don't have that, the engineering community is a lovely place – there are always people who will point you in the right direction when you ask for help. Also, what's the worst thing that could happen? If a year down the line I should realize that it's not the right thing for me, I can always return to my previous job with even more valuable experience in my mental backpack.</p>
<h3>How did you feel about throwing away years of professional experience?</h3>
<p>The answer is simple: I didn't throw them away. Whatever you were doing before, whatever you learned and practiced, stays with you and you can most certainly use it in your new role. In my case, it was easy to justify: I brought with me the knowledge about the software development lifecycle, soft skills and business acumen. If you worked in a different role before, you still learned useful things there: maybe you were part of a team, a problem solver or a great communicator, or maybe you are amazing at structuring things. Whatever it is, you are going to need it and it's going to help you.</p>
<h3>How did your friends and family react?</h3>
<p>I was a bit afraid to tell them. "I'm 30, and I finally figured out what I want to become when I grow up" sounded weird even in my own head. But almost everyone I shared my idea with was so supportive and excited once I explained my motivation, that soon I started to gain a lot of energy from telling people about my goal and sharing my plans.</p>
<h3>Is it better to do the change inside your current company or join a new one?</h3>
<p>Well, it really depends on your current situation. On the one hand, I would highly recommend doing the first steps in your current company because it makes things <em>easier</em>. You already know the company, you know some people, you are not a complete newbie. I’m not sure if Zalando is special that way, but I received unimaginable amounts of support from my leads, colleagues and the company itself. Zalando invests in its people, so I was financially supported from the very first milestone on this way. My wonderful company paid for my coding camp, and the only thing I had to do in return was to sign that I won’t leave within the next year (which I didn’t intend to do anyway). Every next step would have also been way harder in a new environment.
On the one hand, if you are not happy with your current employer, staying there only to make the transition easier is probably not the best idea. Short: if you like your company - make your transition there, if not - don't be afraid to leave.</p>
<h3>What concrete steps can I take towards switching to engineering?</h3>
<p>The way to engineering can be very different. Here is how I would go about it:</p>
<ol>
<li>Try online programming courses to see if you like it. While doing that myself, I collected a <a href="https://docs.google.com/document/d/1pWs9v7ecaksEYonProyTuGimee5Y8zgY0ZqAiQ5lR3E/edit?usp=sharing">list of resources</a> that I found helpful, feel free to check it out and add new ones using the comments.</li>
<li>If you are still not quite sure, take a vacation or a sabbatical and give it a full-time test-drive.</li>
<li>Write a list of things that you love about your current job and that you think you might love about being an engineer. Talk to someone about it and verify that you have the right motivation.</li>
<li>Talk to your manager about your goal. Together you can figure out what would be the right way: a slow transition with a part-time involvement, or a full switch at a time frame that is satisfactory for both of you.</li>
<li>Do it :)</li>
</ol>
<p><img alt="Trying online courses" src="https://engineering.zalando.com/posts/2021/02/images/coding-with-a-cat.jpg"></p>
<h2>Conclusion</h2>
<p>I have met a lot of wonderful people who would like to change their careers and try something new. Many of them have always dreamed of becoming an engineer but were told not to. Actually, my own sister once said that I shouldn’t study Computer Science because I’m not smart enough for that, so I didn’t. It can be scary, you might feel like people are going to be judgmental about it, you might be afraid to lose your stability - and it’s all justified. My goal here is to let you know that you are not alone with your fear. The change is not as crazy as it might sound to you, and that there are more people like you who have already successfully made the transition, and can support you. Give it a try!</p>
<p>If you have any questions that I haven’t covered here, don't hesitate to <a href="https://www.linkedin.com/in/julia-miller-ber/">reach out</a> to me, and I'll gladly share everything I know.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:*">
<p>I'd like to point out that this was a very special situation for a limited amount of time. In normal times and especially during quarantine I pay a lot of attention to my work-life-balance and strongly recommend everyone to do the same. <a class="footnote-backref" href="#fnref:*" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Stop using constants. Feed randomized input to test cases.2021-02-02T00:00:00+01:002021-02-02T00:00:00+01:00Vijaya Prakash Kandeltag:engineering.zalando.com,2021-02-02:/posts/2021/02/randomized-input-testing-ios.html<p>Most test cases assert using hand typed constants. Leveraging randomized input is a much better approach.</p><h1>Introduction</h1>
<p>Testing is widely accepted practice in software industry. I am an iOS Engineer and have been writing tests, like most of us. The way I approach testing changed radically a few years back. And I have used and shared this new technique for a few years within Zalando and outside. In this post, I will explain what is wrong with most test cases and how to apply randomized input to improve tests.</p>
<p>This is our sample code under test:</p>
<div class="highlight"><pre><span></span><code><span class="kd">struct</span> <span class="nc">DomainStore</span> <span class="p">{</span>
<span class="kd">private</span> <span class="kd">let</span> <span class="nv">internalStorage</span> <span class="p">=</span> <span class="n">UserDefaults</span><span class="p">.</span><span class="n">standard</span>
<span class="kd">func</span> <span class="nf">set</span><span class="p">(</span><span class="n">value</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span> <span class="k">for</span> <span class="n">key</span><span class="p">:</span> <span class="nb">String</span><span class="p">)</span> <span class="p">{</span>
<span class="n">internalStorage</span><span class="p">.</span><span class="kr">set</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="k">for</span><span class="p">:</span> <span class="n">key</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span> <span class="nf">get</span><span class="p">(</span><span class="k">for</span> <span class="n">key</span><span class="p">:</span> <span class="nb">String</span><span class="p">)</span> <span class="p">-></span> <span class="nb">String</span><span class="p">?</span> <span class="p">{</span>
<span class="n">internalStorage</span><span class="p">.</span><span class="n">value</span><span class="p">(</span><span class="k">for</span><span class="p">:</span> <span class="n">key</span><span class="p">)</span> <span class="k">as</span><span class="p">?</span> <span class="nb">String</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h2>The usual testing approach</h2>
<div class="highlight"><pre><span></span><code><span class="kd">func</span> <span class="nf">test_setValueCanBeRetrieved</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nv">storage</span> <span class="p">=</span> <span class="n">DomainStore</span><span class="p">()</span>
<span class="n">storage</span><span class="p">.</span><span class="kr">set</span><span class="p">(</span><span class="n">value</span><span class="p">:</span> <span class="s">"Zalando"</span><span class="p">,</span> <span class="k">for</span><span class="p">:</span> <span class="s">"companyName"</span><span class="p">)</span>
<span class="kd">let</span> <span class="nv">obtained</span> <span class="p">=</span> <span class="n">storage</span><span class="p">.</span><span class="kr">get</span><span class="p">(</span><span class="k">for</span><span class="p">:</span> <span class="s">"companyName"</span><span class="p">)</span><span class="o">!</span>
<span class="n">XCTAssertEqual</span><span class="p">(</span><span class="s">"Zalando"</span><span class="p">,</span> <span class="n">obtained</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>Imagine someone opens your code a few months down the road and modifies the code under test ever so slightly.</p>
<div class="highlight"><pre><span></span><code><span class="kd">struct</span> <span class="nc">DomainStore</span> <span class="p">{</span>
<span class="kd">private</span> <span class="kd">let</span> <span class="nv">internalStorage</span> <span class="p">=</span> <span class="n">UserDefaults</span><span class="p">.</span><span class="n">standard</span>
<span class="kd">func</span> <span class="nf">set</span><span class="p">(</span><span class="n">value</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span> <span class="k">for</span> <span class="n">key</span><span class="p">:</span> <span class="nb">String</span><span class="p">)</span> <span class="p">{</span>
<span class="n">internalStorage</span><span class="p">.</span><span class="kr">set</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="k">for</span><span class="p">:</span> <span class="n">key</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span> <span class="nf">get</span><span class="p">(</span><span class="k">for</span> <span class="n">key</span><span class="p">:</span> <span class="nb">String</span><span class="p">)</span> <span class="p">-></span> <span class="nb">String</span><span class="p">?</span> <span class="p">{</span>
<span class="k">return</span> <span class="s">"Zalando"</span> <span class="c1">// Note</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>This diligent test runs on your machine or on CI and it passes. Does it mean the production code works fine? Of course not. Most Test Driven Development (TDD) practitioners would move past this DomainStore but, should you? How can we reveal similar quality issues and address them?</p>
<p><strong>Fundamentally we are testing using constant String while the production method suggests it can take any String.</strong></p>
<p>When we check this function signature.</p>
<div class="highlight"><pre><span></span><code><span class="kd">func</span> <span class="nf">set</span><span class="p">(</span><span class="n">value</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span> <span class="k">for</span> <span class="n">key</span><span class="p">:</span> <span class="nb">String</span><span class="p">)</span>
</code></pre></div>
<p>It tells it can take any <code>String</code> instance. Not just <code>"Zalando"</code>. However, our previous test asserted on only 1 instance of String type.</p>
<h2>Better approach: Feed Randomized Input to test cases</h2>
<p>The fundamental idea of this technique is <strong>never to feed test cases hand typed constants.</strong> What do we feed in then? Welcome <code>randomness</code>.</p>
<p>This is our fixed test case.</p>
<div class="highlight"><pre><span></span><code><span class="kd">func</span> <span class="nf">test_setValueCanBeRetrieved</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nv">storage</span> <span class="p">=</span> <span class="n">DomainStore</span><span class="p">()</span>
<span class="kd">let</span> <span class="nv">value</span> <span class="p">=</span> <span class="nb">String</span><span class="p">.</span><span class="n">random</span> <span class="c1">// Note</span>
<span class="kd">let</span> <span class="nv">key</span> <span class="p">=</span> <span class="nb">String</span><span class="p">.</span><span class="n">random</span>
<span class="n">storage</span><span class="p">.</span><span class="kr">set</span><span class="p">(</span><span class="n">value</span><span class="p">:</span> <span class="n">value</span><span class="p">,</span> <span class="k">for</span><span class="p">:</span> <span class="n">key</span><span class="p">)</span>
<span class="kd">let</span> <span class="nv">obtained</span> <span class="p">=</span> <span class="n">storage</span><span class="p">.</span><span class="kr">get</span><span class="p">(</span><span class="k">for</span><span class="p">:</span> <span class="n">key</span><span class="p">)</span><span class="o">!</span>
<span class="n">XCTAssertEqual</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="n">obtained</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p><strong>Note:</strong></p>
<ul>
<li><code>String.random</code> produces random instance of a <code>String</code>. At Zalando, we use this <a href="https://github.com/kandelvijaya/Randomizer">Randomizer</a> library for generating random inputs. It covers most the used types in the Standard Library.</li>
<li>If <strong>Randomizer</strong> doesn’t fit your need, feel free to extend it or add your custom conformance to <code>Random</code> protocol requirement.</li>
</ul>
<p>Now the above tempered code will not pass through this test case. Unless we run it, we don’t know ahead of time what values we are going to test with. And these values are different across runs. Effectively exercising our production code with many permutations of possible values. This is the essence of randomized input tests (sometimes referred to as permutation tests).</p>
<h3>Going beyond a simple case</h3>
<p>Here’s one example test case from our module. The code below creates random label component and sets random accessibility options on model layer, then asserts if the rendered view has correct accessibility information.</p>
<div class="highlight"><pre><span></span><code><span class="kd">func</span> <span class="nf">test_whenAccessibilityProvided_andComponentHasTapAction_thenAccessibilityIsSet</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nv">props</span> <span class="p">=</span> <span class="n">LabelProps</span><span class="p">.</span><span class="n">random</span>
<span class="kd">let</span> <span class="nv">accessibilityModel</span> <span class="p">=</span> <span class="n">APIAccessibility</span><span class="p">.</span><span class="n">random</span>
<span class="kd">let</span> <span class="nv">component</span> <span class="p">=</span> <span class="n">LabelComponent</span><span class="p">(</span>
<span class="n">componentId</span><span class="p">:</span> <span class="p">.</span><span class="n">random</span><span class="p">,</span>
<span class="n">flex</span><span class="p">:</span> <span class="p">.</span><span class="n">random</span><span class="p">,</span>
<span class="n">actions</span><span class="p">:</span> <span class="p">.</span><span class="n">random</span><span class="p">,</span>
<span class="n">props</span><span class="p">:</span> <span class="n">props</span><span class="p">,</span>
<span class="n">accessibility</span><span class="p">:</span> <span class="n">Accessibility</span><span class="p">(</span><span class="n">with</span><span class="p">:</span> <span class="n">accessibilityModel</span><span class="p">,</span> <span class="n">componentType</span><span class="p">:</span> <span class="p">.</span><span class="n">label</span><span class="p">(</span><span class="n">props</span><span class="p">)),</span>
<span class="n">debugProps</span><span class="p">:</span> <span class="n">DebugProps</span><span class="p">()</span>
<span class="p">)</span>
<span class="kd">let</span> <span class="nv">node</span> <span class="p">=</span> <span class="n">MockNode</span><span class="p">()</span>
<span class="n">component</span><span class="p">.</span><span class="n">actions</span> <span class="p">=</span> <span class="p">[</span><span class="n">EventType</span><span class="p">.</span><span class="n">tap</span><span class="p">:</span> <span class="p">[</span><span class="n">ComponentAction</span><span class="p">(.</span><span class="n">random</span><span class="p">,</span> <span class="p">.</span><span class="n">log</span><span class="p">(.</span><span class="n">random</span><span class="p">))]]</span>
<span class="n">component</span><span class="p">.</span><span class="n">updateAccessibility</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
<span class="n">XCTAssertTrue</span><span class="p">(</span><span class="n">node</span><span class="p">.</span><span class="n">isAccessibilityElement</span><span class="p">)</span>
<span class="n">XCTAssertEqual</span><span class="p">(</span><span class="n">node</span><span class="p">.</span><span class="n">accessibilityLabel</span><span class="p">,</span> <span class="n">accessibilityModel</span><span class="p">.</span><span class="n">label</span><span class="p">)</span>
<span class="n">XCTAssertEqual</span><span class="p">(</span><span class="n">node</span><span class="p">.</span><span class="n">accessibilityHint</span><span class="p">,</span> <span class="n">accessibilityModel</span><span class="p">.</span><span class="n">hint</span><span class="p">)</span>
<span class="n">XCTAssertTrue</span><span class="p">(</span><span class="n">node</span><span class="p">.</span><span class="n">accessibilityTraits</span><span class="p">.</span><span class="bp">contains</span><span class="p">(.</span><span class="n">staticText</span><span class="p">))</span>
<span class="n">XCTAssertTrue</span><span class="p">(</span><span class="n">node</span><span class="p">.</span><span class="n">accessibilityTraits</span><span class="p">.</span><span class="bp">contains</span><span class="p">(.</span><span class="n">button</span><span class="p">))</span>
<span class="p">}</span>
</code></pre></div>
<p><strong>Note:</strong></p>
<ul>
<li>User defined types (usually Structs) are composed of standard library types and predefined custom types. We can extend user defined types in our test target to conform to <code>Random</code>. An example conformance of LabelProps is as below:</li>
</ul>
<div class="highlight"><pre><span></span><code><span class="kd">struct</span> <span class="nc">LabelProps</span><span class="p">:</span> <span class="n">Codable</span><span class="p">,</span> <span class="nb">Hashable</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nv">text</span><span class="p">:</span> <span class="nb">String</span>
<span class="kd">let</span> <span class="nv">backgroundColor</span><span class="p">:</span> <span class="nb">String</span><span class="p">?</span>
<span class="kd">let</span> <span class="nv">font</span><span class="p">:</span> <span class="n">FontProps</span>
<span class="p">}</span>
<span class="kd">extension</span> <span class="nc">LabelProps</span><span class="p">:</span> <span class="n">Random</span> <span class="p">{</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="kd">var</span> <span class="nv">random</span><span class="p">:</span> <span class="n">LabelProps</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">LabelProps</span><span class="p">(</span><span class="n">text</span><span class="p">:</span> <span class="p">.</span><span class="n">random</span><span class="p">,</span> <span class="n">backgroundColor</span><span class="p">:</span> <span class="p">.</span><span class="n">random</span><span class="p">,</span> <span class="n">font</span><span class="p">:</span> <span class="p">.</span><span class="n">random</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<ul>
<li>We could do code generation on build phase to synthesize the Random conformance. Although this is out of scope of this post, its how <code>Equatable</code> conformance works.</li>
<li>Due to Swift’s type inference; <code>.random</code> will use the exact type’s random conformance.</li>
<li>For cases where we need to compare against input value, we can store the generated model into a local property. Like we did for <code>accessibilityModel</code>.</li>
<li>There are times when function under tests expects <code>Email</code>, <code>URL</code>, <code>Deeplink</code> or <code>PhoneNumber</code>s. These data types are often represented by <code>String</code>. However, <code>String.random</code> is not good enough on this case. There are 2 ways of tackling this. One is to extend String to have <code>String.randomEmail</code>. Another is to create concrete type which conforms to <code>Random</code>.</li>
</ul>
<h2>Conclusion</h2>
<p>This technique was not my realization. I grasped the phrase <strong>“Don’t use constants on tests”</strong> from <a href="https://twitter.com/jdortiz">Jorge Ortiz</a> during his workshop on Clean Architecture on <a href="https://www.swiftaveiro.xyz/">Swift Averio</a>, 2017. It then changed the way I write tests. I hope this technique will help you too.</p>
<p>The technique of permutation testing by using random input applies to all software testing; not just iOS development. The only requirement is <code>Type.random</code>.</p>
<p><em>If you also care about high code quality, consider joining our <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Product%20Design%20%26%20User%20Research&filters%5Bcategories%5D%5B1%5D=Applied%20Science&filters%5Bcategories%5D%5B2%5D=Software%20Engineering&filters%5Bcategories%5D%5B3%5D=Product%20Management%20%28Technology%29&search=mobile">Mobile Engineering</a> teams!</em></p>Creating a uniform landscape for macOS Software2021-01-21T00:00:00+01:002021-01-21T00:00:00+01:00Bernardo Prieto Curieltag:engineering.zalando.com,2021-01-21:/posts/2021/01/creating-a-uniform-landscape-for-mac-software.html<p>Here's how we managed to automate the patch management process through the use of JAMF Pro, open source tools and a set of in-house developments to tie these tools together.</p><p><img alt="macOS installed software packages" src="https://engineering.zalando.com/posts/2021/01/images/preview.png#previewimage"></p>
<p>At the time of this writing, we have a universe of Mac applications — that are identified and version-inventoried — within the fleet of little over 3,000 Mac devices in Zalando from which a subset — selected either by their importance, frequency of updates or size of the install base — are part of a so-called <strong>software lifecycle</strong>.</p>
<p>However, in July 2019, when a <a href="https://support.zoom.us/hc/en-us/articles/360031244812-Security-CVE-2019-13449">vulnerability was discovered in <strong>Zoom</strong></a> (long before becoming the mainstream video conference app during the COVID-19 pandemic), Information Security requested the immediate deployment of the latest patch to every device that had the app installed and a report of the progress of this task.</p>
<p>The report and the patch were not a challenge in themselves — this was already part of what we were doing with core applications such as Google Chrome, or Chat — but the process was nothing more than a set of manual and repetitive chores that could be streamlined.</p>
<p>So this defined a set of goals:</p>
<ul>
<li>Procure patches and updates in a proactive way</li>
<li>Test them and then deploy to our users as soon as possible after their release</li>
<li>Keep detailed information about the patch levels of key applications</li>
<li>Automate, as much as possible, all these tasks</li>
</ul>
<h1>Our tools</h1>
<h2>JAMF Patch Management</h2>
<p>The Mac Management Platform in use in <strong>Zalando</strong>, called <a href="https://www.jamf.com"><strong>JAMF Pro</strong></a>, provides Patch Management functionalities that are great at detecting the patch level of devices and deploying the appropriate versions; however, getting this functionality to work properly has the following requirements.</p>
<h3>A source of patch definitions</h3>
<p>The first thing the system needs is the so-called <em>definition of the title</em><sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup> including dates, versions, OS requirements, etc. in a JSON format. <strong>JAMF</strong> (the company behind JAMF Pro) offers a web service with a basic set of titles, but of course, that doesn’t cover all our core applications. Fortunately, it’s also possible to configure additional sources of patch definitions, either local or from third parties.</p>
<h3>Installation packages</h3>
<p>Each vendor has different locations to provide their installers; additionally, for the management platform to be able to install applications (or its updates), they need to be uploaded to distribution points in a <em>PKG format</em>, which is not always what the vendor provides.</p>
<h2>AutoPkg</h2>
<p>An open source tool developed by the community of Mac admins around the world, called <a href="http://autopkg.github.io/autopkg/"><strong>AutoPkg</strong></a>, provides a framework to automate many of the tasks surrounding patch management. The steps taken through the process are defined on plist-format files called <em>recipes</em>, which AutoPkg follows.</p>
<h3>Recipes</h3>
<p>The community of AutoPkg users has generated recipes that cover a broad range of applications and that are updated regularly; nevertheless, for security reasons, AutoPkg requires manual inspection of downloaded recipes or the creation of local copies, before allowing an automated execution. AutoPkg recipes have a parent-child relationship which brings modularity and also the chance of having different results depending on the child recipe that was executed.</p>
<h3>Processors</h3>
<p>Each step of a recipe is executed by a <strong>Python</strong> piece of code called <em>processor</em>. AutoPkg includes dozens of these processors — each of them with a specific functionality — but also has the ability to run custom processors, coded by users, to provide functionality not covered by the standard ones.</p>
<h1>Our solution</h1>
<p>The combination of JAMF Patch Management and AutoPkg was the right one to accomplish our goals, but this doesn’t work for our needs just out of the box and then it evolved into three different projects.</p>
<h2>Cookbook</h2>
<p>The name was obvious for the project aiming to standardize and manage our AutoPkg recipes.</p>
<p>For improved modularity of the process, each application that we have introduced into the software lifecycle has its own set of recipes:</p>
<ul>
<li>Download from the vendor</li>
<li>Create a package</li>
<li>Sign the package<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup></li>
<li>Upload to the distribution points</li>
</ul>
<p>In addition to the recipes, we created three custom processors to:
<img alt="Chat message about a new version of Postman available." src="https://engineering.zalando.com/posts/2021/01/images/chatbot.jpg#right"></p>
<ul>
<li>Announce in a Google Chat group the availability of a new version, packaged and uploaded to our system</li>
<li>Generate the JSON patch definition and upload it to our own definition server, for titles not covered by JAMF</li>
<li>Update information in our reporting tool, LineUp</li>
</ul>
<p>Finally, for better organization of the workload, <em>Cookbook</em> is a git repository. We work locally, push our changes to the repository and then after merging, we pull on a server called <em>Apple Packaging Station</em> that runs AutoPkg on a regular schedule with help from a third party tool called <a href="https://www.lindegroup.com/autopkgr"><strong>AutoPkgR</strong></a>.</p>
<h2>LineUp</h2>
<p>When we first created a report about the deployment of the patch of <strong>Zoom</strong>, we pulled the information from our platform directly into a <strong>Google Spreadsheet</strong> and then used <strong>Google Data Studio</strong> to generate a chart.</p>
<p>This may seem okay for a one-shot requirement, but in reality this happens often throughout the year and becomes hard to maintain or scale. So then we opted for a custom database (hosted in Zalando’s shared <a href="https://engineering.zalando.com/tags/postgresql.html"><strong>Postgres</strong></a> cluster) queried with <strong>Grafana</strong>, which offers great visualization capabilities.</p>
<p>But then, with a proper database structure already holding the data, the next logical step was to add a custom visualization tool and provide it with its own API to update the information. This is when <strong>LineUp</strong> was born.</p>
<p><img alt="LineUp Example" src="https://engineering.zalando.com/posts/2021/01/images/lineup.jpg#right"></p>
<p>At the beginning, we were just looking for a simple mechanism to show information from the database without requiring a client application or the user to run SQL queries, and even the simplest web development frameworks, once connected to a database, have power to do much more than this. We selected <strong>Django</strong> as our framework and after developing these simple views, we decided to leverage its capabilities and come up with <strong>detailed views for each Mac application</strong>, creating a module to use JAMF’s API to get up-to-date information about them.</p>
<p>Then, while working on this, it was natural to expand the scope and include the inventory of applications running in the <strong>Windows</strong> and <strong>Ubuntu</strong> platforms and to do so, we developed a module to query Zalando’s <strong>asset management platform</strong>.</p>
<h2>PackageChanger</h2>
<p>After each scheduled execution of our <strong>AutoPKG</strong> recipes we end up with a set of packages uploaded to the distribution points, notifications about them in our Chat group, and the JAMF server aware of these new versions of applications. Now it’s time to test the updates and release them if they are working properly.</p>
<p>This became a new tedious process which is done in JAMF’s web UI. Each update implies going to a set of screens to associate the new version with a package, assign that version to a group of testers and later, release the version to the rest of the users as well as setting this version as the baseline installer for new devices.</p>
<p>To simplify these steps, we created <strong>PackageChanger</strong>, a command line tool that, through JAMF’s API, let’s us work with packages and versions in a faster and simpler way than using a web UI.</p>
<p><img alt="PackageChanger Example" src="https://engineering.zalando.com/posts/2021/01/images/package-changer.jpg#left"></p>
<p>To work with the API we selected <a href="https://github.com/PixarAnimationStudios/ruby-jss"><strong>Ruby-JSS</strong></a> — a Ruby library developed by the Mac admins at <strong>Pixar Animation Studios</strong> — which to this day is the most comprehensive and well documented library to interact with it.</p>
<h1>Our next steps</h1>
<p>The work done so far has improved significantly the way we make updates available, especially for key applications, and has provided us with ways to have <strong>real-time information</strong> during first few hours after a software vulnerability is disclosed. We are still missing, nevertheless, some refinements to have a completely streamlined software lifecycle.</p>
<h2>User interaction</h2>
<p>Patch management from JAMF offers us two ways to deploy patches: <strong>automatic push</strong> or through the <strong>Self Service application</strong> notifying the user when updates are available. The latter would be optimal, but the notification mechanism <a href="https://www.jamf.com/jamf-nation/discussions/30475/broken-notification-center-notifications">does not work</a> and leaves us with our user base unaware of patches. On the other hand, pushing updates has proven to be a source of discomfort for users, especially because updated applications need to be closed and reopened and it’s really difficult to find a convenient moment to do this.</p>
<p>As a response, we are working on <strong>an alternative notification mechanism</strong>, so we can continue to offer updates through Self Service, but making users aware of them with enough frequency and convenience so that they install them in a comfortable and timely manner.</p>
<p><img alt="UpdateBuddy Example" src="https://engineering.zalando.com/posts/2021/01/images/update-buddy.jpg#left"></p>
<h2>Quality gate</h2>
<p>Before generally releasing a patch we deploy it to a small subset of devices whose owners are considered <strong>testers</strong>. This allows us to know if the installer works and if the application runs as expected after the update.</p>
<p>These tests may be enough for simple applications — such as Google Chat — but fall short for specialized or complex ones — such as <strong>Tableau Desktop</strong> — where only a trained user would be able to tell if the new version is ready to be deployed to the user base.</p>
<p>The next improvement in this direction would be a <strong>quality gate</strong>, in which additional tests for releases are described and a bigger set of testers can go through them, decide if they are passed successfully, and then approve collectively the deployment of a patch.</p>
<h2>Increased selection of titles</h2>
<p>The initial set of applications covered by patch management was selected because of the obvious level of use the get within Zalando: Google Chrome, Chat, Backup and Sync, etc.</p>
<p>Afterwards, when <strong>LineUp</strong> provided us with information about the number of installations of each application, we had a roadmap of sorts to know which applications should be covered next. For example, we discovered that over one third of the Mac fleet has <strong>Docker</strong> installed on them, so we decided to start offering it in Self Service and provide patch management so that we can be sure our user base has easy access to this tool.</p>
<p>Here, the next step is part of a continuous improvement cycle, in which we will keep adding applications to the automated lifecycle.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>Within patch management, the word <em>title</em> is used to refer to pieces of software that can be inventoried and have versioning, and range from internal tools to applications from the App Store. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>At the time of this writing <strong>macOS Catalina</strong> and <strong>macOS Big Sur</strong> allow the installation, through an MDM<sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup>, of unsigned packages. This may change with future releases of macOS and make crucial to include an automated signing step, which we already have. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:3">
<p>MDM stands for <em>Mobile Device Management</em>, which consists in a platform and a set of tools for the administration of mobile devices such as smartphones, tablets and laptops. <a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
</ol>
</div>Experimentation Platform at Zalando: Part 1 - Evolution2021-01-12T00:00:00+01:002021-01-12T00:00:00+01:00Shan Huangtag:engineering.zalando.com,2021-01-12:/posts/2021/01/experimentation-platform-part1.html<p>Challenges and solutions of our experimentation platform at Zalando</p><p>Online controlled experimentation, aka A/B test, has been a golden standard for evaluating improvements in software systems. By changing one factor at a time, A/B test causally measures, from real users, whether one product variant is better than the other.</p>
<p>As an increasingly important area in tech companies, experimentation platforms face -- apart from their scientific challenges -- many unique engineering problems. In this blog series, we will share what we’ve learned at Zalando. During this journey, we have presented our works at well-known conferences including <a href="https://pydata.org/berlin2018/schedule/presentation/69/">PyData 2018</a>, <a href="http://ide.mit.edu/sites/default/files/agendas/CODE%202018%20Agenda.pdf">Conference on Digital Experimentation 2018</a>, and <a href="https://causalscience.org/programme/day-1/">Causal Data Science Meeting 2020</a>.</p>
<p>In this first post, we’ll introduce the evolution of experimentation platform at Zalando. Technical challenges and their solutions of experimentation engine, analysis system, data quality issues, and data visualization will follow in the upcoming posts.</p>
<p>The next sections are structured using the Experimentation Evolution Model in <a href="https://exp-platform.com/Documents/2017-05%20ICSE2017_EvolutionOfExP.pdf">Fabijan et.al., 2017</a>.</p>
<h2>Phase one: crawl (before 2016)</h2>
<p>As natural as data-driven decisions sound today, it’s not the focus in early stages of Zalando. In the early days, A/B tests are set up by each team individually and manually -- as well as their analyses.</p>
<p>Soon we discovered that such setup can neither ensure A/B test quality, nor can we know whether product teams actually run A/B tests before making decisions. There is very little A/B testing knowledge in most product teams then -- we realized the need of a centralized experimentation service. In order to take full control of data infrasture as well as analysis features, we need an in-house experimentation platform at Zalando instead of using off-the-shelf A/B testing tools.</p>
<p>In 2015, the first version of Zalando's Experimentation platform <em>Octopus</em> was released. It is named after <a href="https://en.wikipedia.org/wiki/Paul_the_Octopus">Paul the Octopus</a>, who correctly chose the winner team of a match at FIFA 2010, with a small error rate. That’s the essence of an experimentation platform, except that our metrics are based on trustworthy statistics rather than Paul’s mood of the day.</p>
<p>At this period, our biggest challenge is <strong>Lack of cross-functional knowledge</strong>. The initial platform was built by a virtual team with members from various parts of Zalando. The platform had three parts: experiment management, experiment execution, and experiment analysis. In the early days, the team's focus was set to execution because of few service customers - analyses can be performed manually in the worse case. This initial virtual team consisted of engineers and data scientist who had little knowledge of each other's domain at that time. For example, data scientists didn't have production software experience and didn't know Scala, whereas software engineers didn't know concepts of statistics. To decouple the development processes of one subgroup from another, we ended up with building an open-source statistics library wrapped by the backend production system.</p>
<h2>Phase two: walk (2016-2020)</h2>
<p>Even though wrapping analysis scripts into a production software system is not a scalable solution, it worked for the load at that time. Through hard groundwork, we achieved a platform where teams can configure and manage their A/B tests in one place. Another major benefit of platformization is that randomization process and analysis methods are now standardized. Octopus uses a two-sided t-test with 5% significance level to analyze results.</p>
<p>During these years, we have boosted the number of running A/B tests at Zalando.</p>
<p><img alt="Number of experiments" src="https://engineering.zalando.com/posts/2021/01/images/num_exp.png"></p>
<p>There is a decrease of number of A/B tests in early 2020. This decrease could have been due to a focus of teams on large-scale coordinated product initiatives, which were not A/B testable during this period. Another possible cause is that we suggest to pause A/B tests due to abnormal user behaviour in the beginning of COVID-19 in Europe.</p>
<p>On the other hand, we also faced a few big challenges. The keywords of improvements in this period are <em>scalability</em> and <em>trustworthiness</em>:</p>
<ul>
<li><strong>Establishing experimentation culture</strong>. Many teams started to make product decisions through A/B testing, however, it’s a big company and the experimentation culture didn’t reach every corner. We started to look at use cases from various departments and integrated them into Octopus. We also provided in-person A/B testing training in the company at regular intervals. In addition, there is a company-wide initiative to ensure each team has embedded A/B test owners (product analysts or data scientists) who have sufficient knowledge of experimentation.</li>
<li><strong>Source data tracking</strong>. The experimental data were collected from each product team through tracking events (we track only users who provided appropriate consent). A dedicated tracking team ingested these events, unified data schema, and stored them in a big data database. However, data tracking concepts were not holistically understood across the company -- some teams define their own version of tracking event schema. This inconsistency resulted in corrupted and missing data. As a consumer of this data, our A/B test analyses suffer from data quality. This situation started to improve after a period of extensive cross-team communication and reorganization.</li>
<li><strong>A/B test design quality</strong>. Since we found that A/B tests from different teams had various level of quality, we introduced an A/B test design audit process as well as weekly consultation hours. Aspects of quality include testable hypothesis, clear problem statement, clear outcome KPI, A/B test runtime, and finishing based on planned stopping criteria. We also wrote internal blogs regularly to share our tips for effective A/B testing in Octopus.</li>
<li><strong>A/B test analysis method quality</strong>. To make our services trustworthy, we revisited our analysis methods rigorously in peer reviews with applied scientists from other teams. We documented analysis steps transparently. Through scientific peer reviews, we have identified potential improvement areas such as non-inferiority tests.</li>
<li><strong>The right analysis tool</strong>. A/B tests are not always feasible for every use case. For example, comparing performance between two countries. In such cases, quasi-experimental methods are better suited. We provided guidelines and software packages to help analysts to choose the right causal inference tool.</li>
<li><strong>Randomization engine latency</strong>. Some applications have strict requirements for latency. For example, a slightly higher loading times of product detail pages may cause customers to churn. We enhanced the latency of our services through a few engineering optimizations. Technical details will be discussed in later posts.</li>
<li><strong>Controlled rollout</strong>. In some cases, teams want to gradually increase the traffic into the tests, so that they don’t accidentally show a buggy variant to a lot of users. In other cases, several teams are working on a complex feature release and want to release the product at the same time. In general, such staged rollouts are called controlled rollouts. To support these use cases, Octopus created new features such as traffic ramp-up in experimentation and <a href="https://martinfowler.com/articles/feature-toggles.html">feature toggles</a>.</li>
<li><strong>Analysis system scalability</strong>. The biggest challenge we had in this period is that our initial analysis system can not handle the load of concurrent A/B tests anymore due to constraints in its architecture. As the maintenance cost of the analysis system became too high, we didn't have capacity to work on improvement of analysis methods. We concluded that the need of a new analysis system was pressing. In the end, we spent two years rebuilding the new analysis system in Spark. Our lessons learned will be shared in a separate post.</li>
</ul>
<p><img alt="Causal inference tool usage" src="https://engineering.zalando.com/posts/2021/01/images/ci_tool_usage.png"></p>
<h2>Phase three: run (2020-)</h2>
<p>At this point, experimentation culture is established in most parts of the company. With the scalable infrastructure ready, the team can now work on more advanced statistical methods.</p>
<p>We are looking forward to bringing experimentation at Zalando to a new stage by:</p>
<ul>
<li><strong>Scaling out experimentation expertise</strong>. We have designed a new company-wide training curriculum that has a more smooth study experience. It covers causality, statistical inference, and analysis tools at Zalando. We have also increased the scope of causal inference research peer reviews to the whole company.</li>
<li><strong>Automating data quality indicators</strong>. A/B testing results are highly senstive to data quality. The most important data quality indicator is <a href="https://exp-platform.com/Documents/2019_KDDFabijanGupchupFuptaOmhoverVermeerDmitriev.pdf">sample ratio mismatch</a> -- the actual sample size split is significantly different from the expected sample size split. Companies similar to Zalando have identified that between 6-10% of their A/B tests have sample ratio mismatch, a similar analysis on our historical data shows that at least 20% of A/B tests are affected within Zalando. Our platform automatically raises alerts to the affected team when sample ratio mismatch is detected. Further data investigation will be needed before analysis results are shown to users in the platform's dashboard. Another major data quality issue is the data tracking consent imposed by GDPR. As we process data only for visitors who provided their consent, we have been working on research to understand the selection bias for A/B tests and its solution.</li>
<li><strong>Overall evaluation criteria</strong>. In the last few years, we understand from our users that selecting outcome KPI for A/B tests is a big pain point. We have now provided teams qualitative guidelines: a) KPIs should be team-specific. KPIs should be sensitive to the product that each team controls, i.e. each team can drive their KPIs by changing product features; b) KPIs should be proxies to long-term customer lifetime values, instead of short-term revenues. We plan to incorporate these guidelines into Octopus with scientifically proven methods.</li>
<li><strong>Faster experimentation</strong>. We found that the median runtime of an A/B test at Zalando is about three weeks. This is higher than similar companies in the tech industry. Many users might claim their test has time constraints based on business requirements. We plan to support trustworthy analysis for faster experimentation by more advanced analysis methods, such as variance reduction, Bayesian analysis, and multi-armed bandit.</li>
<li><strong>Stable unit assumption</strong>. In practice, each unit in the A/B test may not represent a unique person. For example, currently we are not able to detect the same person from Zalando website and Zalando App and assign him/her the same variant. A solution of this problem creates new engineering challenges due to latency requirement.</li>
<li><strong>Data visualization</strong>. Smart data visualization provides answers to questions you didn’t know you had. With complex and hierarchical data from A/B tests, there is quite some potential for data visualization designs.</li>
</ul>
<p><img alt="Number of sample ratio mismatch" src="https://engineering.zalando.com/posts/2021/01/images/share_srm.png"></p>
<h2>Summary</h2>
<p>To sum up, experimentation platform at Zalando has evolved a lot since 2015. Nevertheless, we are and will always be focusing on bringing more <em>scalable</em> and more <em>trustworthy</em> experimentation to Zalando. We thank all team members, contributors and leadership who made it happen during this incredible journey.</p>
<h2>Future posts</h2>
<p>In the upcoming posts, we will provide more details about the technical challenges and solutions of the experimentation engine, analysis system, data quality issues, and data visualization. Stay tuned!</p>
<p><em>If you would like to get to know the Experimentation platform sooner, consider joining our <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Software%20Engineering&filters%5Bcategories%5D%5B1%5D=Applied%20Science&filters%5Bcategories%5D%5B2%5D=Product%20Design%20%26%20User%20Research&filters%5Bcategories%5D%5B3%5D=Product%20Management%20%28Technology%29&filters%5Btypes%5D%5B0%5D=Full-Time&filters%5Bentities%5D%5B0%5D=zalando&search=engineer">Engineering teams</a>.</em></p>How Zalando prepares for Cyber Week2020-10-08T00:00:00+02:002020-10-08T00:00:00+02:00Bartosz Ocytkotag:engineering.zalando.com,2020-10-08:/posts/2020/10/how-zalando-prepares-for-cyber-week.html<p>Learn how we prepare our platform for Cyber Week - the highest traffic period in the year.</p><h2>Introduction</h2>
<p>Cyber Week has become an increasingly important time of the year in e-commerce. <a href="https://corporate.zalando.com/en/newsroom/en/news-stories/zalando-achieves-record-breaking-cyber-week-results">In 2019</a>, we have attracted 840,000 new customers and our sales (Gross Merchandise Volume) increased by 32% compared to the previous year. During the event we grew faster as a business than throughout the year where we grow at a 20-25% rate. Our peak orders per minute reached 7,200 compared to 4,200 the year before (+71% YoY).</p>
<p>From an engineering point of view, Cyber Week is a very exciting time, during which all systems are exposed to load that is far beyond any peak seen throughout the year. The experience of supporting the event itself has been extremely rewarding for everyone involved due to close collaboration between teams and strong focus on operational excellence and reliability. During the preparation time for the Cyber Weeks we created new capabilities in our teams and platform that serve us throughout the whole year. Looking back at the past years, we would like to share our experience and how our capabilities evolved over time around key themes of: <em>Site Reliability Engineering</em>, <em>Load Testing in Production</em>, and the <em>Preparation</em> approach itself.</p>
<h2>Site Reliability Engineering</h2>
<h3>Phase 1: Building up knowledge about reliability engineering</h3>
<p>Six years ago, when our e-commerce platform was still within on-premise data centers, we had a handful of on-call teams. Two of these teams were responsible for the backend and frontend systems of our e-commerce platform and were primarily responsible for Cyber Week preparations and support during the event. When we started moving more and more critical systems into the AWS cloud as part of our <a href="https://engineering.zalando.com/posts/2018/12/front-end-micro-services.html">micro-frontend architecture</a>, we adopted the "you build it - you run it" mindset and the number of on-call teams has increased dramatically to around 100 teams today. This also meant that we needed to educate many teams about designing for reliability. To achieve that, we formed a team of 10 colleagues, who were passionate about SRE and who signed up to perform <a href="https://landing.google.com/sre/sre-book/chapters/evolving-sre-engagement-model/#:~:text=The%20most%20typical%20initial%20step,a%20service%20operating%20in%20production.">production readiness reviews</a> of our applications ahead of Cyber Week. In preparation for that, we ran a series of workshops with teams to share knowledge about reliability patterns and identified clusters of applications that required adjustments, so that the platform is stable in case of various failure types (e.g. failures of dependencies, overload, timeouts).</p>
<h3>Phase 2: Distributed tracing</h3>
<p>We use distributed tracing following the OpenTracing standard across our platform. This allows us to inspect the performance of our distributed system and quickly find contributing factors for increased latency or error rates across our applications. After instrumenting a set of applications and proving the intended wins resulting from it, we leveraged Cyber Week preparations to scale this effort. In year one, we focused on critical, tier-1 systems involved in the hot path of the browse journey in <a href="https://en.zalando.de">our shop</a>. The year following that, we have expanded the coverage further to tier-2 systems for applications in the scope of Cyber Week. During the instrumentation, we have adopted additional conventions that help us identify the traffic sources: App, Web, push notifications, load tests. This allows us to better understand traffic patterns and perform capacity planning based on the request ratios between incoming traffic and the respective parts of our platform.</p>
<h3>Phase 3: Dedicated team for SRE enablement</h3>
<p>What started as a grass-roots movement around SRE practices in Phase 1, has evolved to a SRE department within Zalando, which is focused on reliability engineering, observability, and providing necessary infrastructure around monitoring, logging and distributed tracing. The SRE team also organizes trainings and knowledge exchange within the SRE guild where teams share lessons learned and pitfalls about operating systems in production and collaborate on formulating best practices.</p>
<p>Distributed tracing has been a game-changer for us. We have leveraged tracing data to reduce alert fatigue of our on-call teams through an approach called adaptive paging. It's an alert handler that leverages the causality from tracing and OpenTracing's semantic conventions to page the team closest the problem. From a single alerting rule, a set of heuristics is applied to identify the most probable cause, paging the respective team instead of the alert owner. See our talk from the SRECon <a href="https://www.usenix.org/conference/srecon19emea/presentation/mineiro">Are We All on the Same Page? Let's Fix That</a> which explains our approach in detail.</p>
<h2>Load testing in Production</h2>
<h3>Phase 1: Feeling lucky</h3>
<p>Over the years of operating our shop in the Data Center, we learned how to scale our shop's frontend. We kept adding servers and scaling our Solr fleet responsible for Product Data and Search until this has become impractical due to a multi-month lead time needed to get new, physical servers. The Solr fleet was the one most benefiting from auto-scaling in the cloud and thus the first system that we moved to the cloud six years ago. Our backend services (e.g. product information management, inventory management, order management, customer accounts and data) however, formed an over-provisioned system with a fixed number of instances in the Data Center. At its heart were PostgreSQL instances heavily optimized by our Database infrastructure team that we scaled through sharding and switching from spinning disks to SSDs.</p>
<p>This was sufficient for Cyber Week in 2015 where commercial campaigns were just about the right size for our capacity. With no past knowledge about what type of traffic to expect we were amazed how much more headroom our backend systems really had. Never before had we seen load throughout the day that surpassed every past evening peak we saw. There were of course some challenges with scaling, but we could overcome these with small tuning of the system configuration during the event. This was achieved mostly through pausing some asynchronous processing that was not essential for accepting and processing orders.</p>
<h3>Phase 2: Load Tests in Production</h3>
<p>In a cloud-based system that relies heavily on auto-scaling for cost-optimization, proper testing and capacity planning is a must. To achieve that, we set the target to better understand our scalability limits. We tried many approaches and given our experience, the only way we found effective for a large-scale system like ours are live load tests in production. Testing in production is an established practice, but difficult to execute well. Mistakes become really costly as the customer experience is degraded and thus this approach requires the ability to quickly notice customer impact and react by aborting the test or mitigating the incident otherwise.</p>
<p>To achieve our goal, we wrote simulators that place sales orders for test products that can be clearly differentiated from real customer orders, processed to a certain degree, and then skipped at the stage of fulfillment. This gives us the understanding of the limitations of our order processing system and all its dependencies, incl. inventory management and payment processing. Further, as shared before in <a href="https://engineering.zalando.com/posts/2019/04/end-to-end-load-testing-zalandos-production-website.html">end-to-end load testing Zalando’s production website</a>, we wrote a simulator that traverses the user journey across key customer touch-points in our shop. We ran this simulation in production for all countries and mimic the traffic patterns we observe for sales events. Through that we uncover scalability bottlenecks and verify if certain resilience patterns work properly. Running the simulation is a fun and thrilling exercise, especially if the whole team starts suddenly hearing pagers fire as we continue to increase the test traffic.</p>
<h3>Phase 3: Load Tests inform capacity planning</h3>
<p>Having written and evolved the user journey simulator for two years we were not fully satisfied with its abilities to generate load at scale. There were too many rough edges and tuning the simulator to be able to generate the required load profiles and investing our development time was very time consuming. We decided that it's better to leverage an existing product that will do the job better. This paid off heavily as last year we were able to run the tests both on App and Web platforms simultaneously.</p>
<p>The different types of load tests that we ran in production last year helped inform capacity planning based on commercial goals and the projected sales. The final, clean run of tests also gave us sufficient confidence that the platform was scaled to sustain a certain amount of incoming traffic and sales in the peak minute and thus contributed to a smooth event for our teams.</p>
<h2>Preparation as a project</h2>
<p>The Cyber Week project is always at the top of our project lists and we dedicate highest attention to the preparation work. Over the past years, we have progressively increased collaboration between the engineering and commercial teams and have dedicated Program Managers responsible for the delivery of the project. With every year we tune the structure and reporting within this project.</p>
<p>Thanks to the high priority of the Cyber Week preparations, every year we are able to invest in a key theme that helps us build up new capabilities that we did not have before - be it resilience engineering know-how, load testing in production, capacity planning, production readiness reviews, or collaboration across the company. On top of that, we also run dedicated projects aimed at increasing scalability of our platform and deliver changes to the customer experience for sales events.</p>
<h2>During the event</h2>
<p>After months of preparation, the event itself is a cherry on top - it's the time where we see how the time invested has paid off. If we are well prepared, we expect a rather uneventful time in terms of the number of production incidents. For the key period where we expect the highest load on our systems, we organize a Situation Room to ensure rapid incident response. In the room, we gather representatives from key engineering teams, SRE team, and dedicated Incident Commanders to closely watch the operational performance of our platform. It's basically a control center with dozens of screens and graphs, that looked like this in 2019:</p>
<p><img alt="Zalando's Cyber Week Situation Room" src="https://engineering.zalando.com/posts/2020/10/images/cw-situation-room.jpg"></p>
<h2>Summary</h2>
<p>We've explored two key themes in Zalando's Cyber Week preparation journey. We are constantly tuning our approach based on insights from each year and adapting the areas we invest in to the business growth and commercial campaign requirements. This year has an added twist of remote working, which likely will require us to rethink how to organize the Situation Room efficiently. With seven weeks until Cyber Week, our preparations for this year's event are well underway and we are looking forward to sharing results and lessons learned in follow-up posts. With our growing application landscape, there are sufficient challenges ahead as we have more than 1122 applications (out of 4000+) in scope of the Cyber Week preparations.</p>
<p><img alt="Applications in scope for Cyber Week" src="https://engineering.zalando.com/posts/2020/10/images/applications-in-scope.png"></p>
<p>If you would like to know more and help us out with similar challenges, consider joining the <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Architecture&filters%5Bcategories%5D%5B1%5D=Software%20Engineering%20-%20Backend&filters%5Bcategories%5D%5B2%5D=Software%20Engineering%20-%20Data&filters%5Bcategories%5D%5B3%5D=Software%20Engineering%20-%20Frontend&filters%5Bcategories%5D%5B4%5D=Software%20Engineering%20-%20Full%20Stack&filters%5Bcategories%5D%5B5%5D=Software%20Engineering%20-%20Leadership&filters%5Bcategories%5D%5B6%5D=Software%20Engineering%20-%20Machine%20Learning&filters%5Bcategories%5D%5B7%5D=Software%20Engineering%20-%20Mobile&filters%5Bcategories%5D%5B8%5D=Software%20Engineering%20-%20Principal%20Engineering&filters%5Bcategories%5D%5B9%5D=Applied%20Science%20%26%20Research&filters%5Bcategories%5D%5B10%5D=Product%20Design%20%26%20User%20Experience&filters%5Bcategories%5D%5B11%5D=Product%20Management&search=%22Site%20Reliability%20Engineer%22">SRE</a> or other <a href="https://jobs.zalando.com/en/jobs/?filters%5Bcategories%5D%5B0%5D=Software%20Engineering">Engineering Teams</a>.</p>Meet Boris Malensek, Our Head Of Engineering In Merchant Operations2020-09-08T00:00:00+02:002020-09-08T00:00:00+02:00Kerstin Schartnertag:engineering.zalando.com,2020-09-08:/posts/2020/09/meet-boris-malensek-head-of-engineering-merchant-operations.html<p>We have talked with Boris about his career journey within Zalando, the evolution of Merchant Operations, and the engineering culture within the company.</p><p><img alt="Boris Malensek" src="https://engineering.zalando.com/posts/2020/09/images/boris-malensek.jpg#right"></p>
<p>We spoke about his professional journey within Zalando, the evolution of Merchant Operations, and the engineering culture within the company.</p>
<p>The interview was initially conducted for Zalando’s External Talent Community.</p>
<h3>Boris, let’s go back to the start. What attracted you to Zalando in the first place?</h3>
<p>The main reason for my attraction to Zalando was how quickly the company was able to adapt to change. I liked that they were constantly trying out new things, even if at that given moment they didn’t seem like the best solutions. At Zalando, there have always been believers in the change, and for me that is important. I think of the process as a journey, and who you share this journey with has always been important to me.</p>
<h3>Do you think that’s the main incentive for people to join Zalando – the constant change?</h3>
<p>I don’t think there is just one formula, one reason, why people choose to join the company. But what candidates should understand is that Zalando will always change. We will probably become a more stable organisation over time, but there will always be changes. We will continue to try out new things, and people should not be afraid of that. Some things turn out to be a great success, others don't, but we will always try to innovate and be better than before.</p>
<h3>What is special and particular about Software Engineering at Zalando?</h3>
<p>The engineering culture. Since the day I joined it remains the most impressive engineering culture I’ve experienced. What I refer to by the engineering culture is the support you receive on various levels: from a single line of code up to global challenges. There is always someone ready to help you, someone to learn from, and that’s really powerful. Our feedback culture is getting stronger with people having healthy attitudes towards sharing feedback. In general, we strive to build a community based on trust.
Zalando has invested a lot in technology and our solutions and tooling are state-of-the-art. The way we enable our engineering teams to deploy their software – fast, autonomously, at scale and still compliant – is impressive. That sets us apart from many other companies.
Our approach to solving problems is unique. We always try to put the customer first, we try to understand why we do what we do, what the purpose is, and this is important. We always aim to explain our strategy in the clearest way possible.</p>
<h3>As the Head of Engineering in Merchant Operations, what do you do and what are your responsibilities?</h3>
<p>Firstly, on a daily basis I enable the team to tackle complex challenges by providing guidance when they are unsure of how to come to an optimal solution. However, my main goal is to make myself “obsolete”: I aim to develop the team in such a way that they feel empowered to solve problems independently.
An important part of my role as a leader is to hire the best talent for our business unit and the broader organisation. I am also responsible for planning and outlining strategies for upcoming technological, architectural or organisational changes that support the longer term Zalando Group Strategy. I work on building a network within and outside Zalando, so that I can turn to like-minded engineers and leaders for help with problems. Finally, I am accountable for the software that we deliver: it needs to be scalable and resilient, and when we fail, we need to fail fast, learn from it, and move forward to continuously improve on what we have done before.</p>
<h3>Boris, you have just had your 5-year anniversary at Zalando and have gone through several stages of career growth from a Senior Software Engineer to an Engineering Manager, to a Head of Engineering. When the time came to pursue the next steps in your development, what motivated you to choose a management path? What does being an engineering leader entail?</h3>
<p>Most of us want to grow by simply stepping out of our comfort zone. That’s definitely something that still drives me today, and at Zalando I have opportunities to do that. I came to Zalando as an experienced Senior Software Engineer, and leading people and projects was not new to me. When I joined Zalando, there was a reorganisation within the company and with perseverance and self-driven efforts, I enthusiastically grabbed the opportunity to become an Engineering Manager.
Being a leader has taught me the importance of creating opportunities for career growth within an organisation. I am to provide opportunities for growth both within my team and beyond - I believe that it's important to support employees' growth first and foremost, no matter where it may take them.</p>
<h3>Merchant Operations is often referred to as a great success story within Zalando, could you tell us about how this business unit evolved?</h3>
<p>Merchant Operations has a rich history. I have been involved with the department from the very start, but when I joined it five years ago it was called Brand Solutions. Brand Solutions was building a prototype for a marketplace. It had a small tech team, and I was the third software engineer to be hired for the team. We had a great commercial team working alongside us, developing the idea of the marketplace and managing important partner relationships. Over time, we grew into a fully-fledged organisation. Three years ago, David Roberts joined us as the VP of Merchant Operations, and around the same time our objective became clear: build a B2B marketplace model, to bring Zalando closer to being the Starting Point for Fashion by increasing our assortment to include external partners. Currently, we have around 80 people in the engineering organisation, compared to just 10 in the early days. We have engineers in <a href="https://jobs.zalando.com/en/tech/jobs/?filters%5Boffices%5D%5B0%5D=Berlin&filters%5Bcategories%5D%5B0%5D=Technology&filters%5Bcategories%5D%5B1%5D=Product%20Design">Berlin</a> and <a href="https://jobs.zalando.com/en/tech/jobs/?filters[offices][0]=Dublin%20%28Ireland%29">Dublin</a>. Our Dublin team has been a great success story, having ramped up really quickly after the beginning of our expansion in October 2019 to a team of 15 today.
What makes Merchant Operations unique is that it started as a pure operations team. However, if you want to reach the scale required to become a giant in the fashion e-commerce industry, you need to focus on innovating through technology - and that is how we began to transform. Our biggest initiative currently is Zalando Direct (<a href="https://jobs.zalando.com/en/tech/jobs/?filters%5Bcategories%5D%5B0%5D=Technology&filters%5Bcategories%5D%5B1%5D=Product%20Design&search=zDirect">zDirect</a>) which steers the business of external partners to Zalando's platform and extensive customer base, which increases our offering and convenience proposition exponentially.</p>
<h3>Lastly, could you give a piece of advice for a Senior Software Engineer who would like to join Zalando?</h3>
<p>Patience is very important. I think it is always important to give yourself some time to learn, grow and focus on what you believe to be your ultimate goal. If you are a Senior Software Engineer and still in doubt about the direction you would like to take with your development, you have to think about this first and foremost. Your goal may be ambitious. But it’s really important that you think of constructive steps you can take to move towards it. Be disciplined. Stay <a href="https://jobs.zalando.com/en/tech/">determined</a>, don't be afraid to ask for what you want, and remember to remain open to a path of continuous learning. It's only when you step outside of your comfort zone, that you realise what you are capable of.</p>Inbox Zero is not a Lifestyle2020-07-17T00:00:00+02:002020-07-17T00:00:00+02:00Tim Kroegertag:engineering.zalando.com,2020-07-17:/posts/2020/07/leading-self.html<p>Personal productivity is subject of frequent debate and optimization. Learn how to stay organized as a leader and feel accomplished every day.</p><p><img alt="Photo of a laptop on a desk showing the author on a video call on the screen and a Google calendar screenshot partially obscuring the author on the screen" src="https://engineering.zalando.com/posts/2020/07/images/tim-laptop-calendar.jpg#previewimage"></p>
<p>The following guidelines and tricks help me with task management, time management, planning & prioritization, reacting to ad-hoc situations, and the sense of not having accomplished anything during the day. There is some overlap with our Remote Work Guidelines<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>. My meta-advice for applying anything from this article: start with one improvement, don’t try it all at once. Start with tools you have at hand. It’s an ongoing improvement process, and it’s ok to fail and start over. I've been iterating over this on and off for roughly three years now.</p>
<p>Having worked as a software developer in my early career, I've been a manager for roughly 10 years now. I have gone back to an individual contributor role for a year in between. An aspect to consider when reading about my experience and the suggestions provided, is that a <em>manager's schedule</em> is somewhat different from a <em>maker's schedule</em>. Depending on your organization's challenges, a manager still needs to be able to create, to provide e.g. structure and strategy. This needs an environment comparable to that of a maker. On the other hand, makers will benefit from applying some of the solutions lined out in this article when they need to adapt to a challenging environment themselves. "Different types of work need different types of schedules"<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup>, and while this article is primarily aimed at managers, I believe that makers can take away some learnings, too, especially when they are planning to transition from an individual contributor role to a manager's career path.</p>
<p>To limit the scope of this article and the suggested solutions, a nice concept to introduce is the concept of <em>constants</em>. I'm going to refer to constants as constraints that are considered to be true, and can’t be ignored, at least not for too long: I have eight hours per day and 40 hours per week for work. I need to eat and take a break. I will need to process email and other requests. I need time to plan, and some plans I made will need to be changed.</p>
<p>In order to address all this, I need transparency on what kind of time and energy I have available, and what work needs to be done by when. I will need to understand how flexible I can change what I have planned to adapt to a new situation. For all this, I use the Google calendar and a task management tool.</p>
<h2>Configure work time</h2>
<p><a href="https://support.google.com/calendar/answer/7638168?hl=en">Setting up your working hours in Google Calendar</a> is a good reminder for you and your colleagues when you are available and when you should not be working. Make conscious decisions to break the rule of working outside of your working hours when needed. When your colleagues see they're inviting you to an event outside of your work hours, they will reconsider, or at least reach out to you first. That way you assert a certain control over your calendar and the invites you are getting.</p>
<h2>Make a decision for every event</h2>
<p>Events without a decision clutter your calendar and make the organizers’ lives harder. Make a decision on the same day or the next day latest for every incoming event, and move on. State a clear reason in the comment in case you decline an event.</p>
<h2>Hide declined events</h2>
<p>You’ve already made a decision on those events, and you don’t need declined events to clutter your calendar. If you ever need to revisit that decision, you can enable showing declined events for that purpose in your calendar's settings, and disable it again afterwards.</p>
<p><img alt="Screenshot of Google Calendar's view options configuration with 'show declined events' deselected" src="https://engineering.zalando.com/posts/2020/07/images/calendar-view-options.png" title="Google Calendar view options"></p>
<h2>Defragment your calendar</h2>
<p>If you have many short appointments like 1:1's, group them together. If short appointments come in, try to fill gaps or place them next to other meetings. That way you optimize for continuous free space which helps with blocking time for focused work that takes more than just 30 minutes. You can also use Google Calendar's <em>reschedule event</em> functionality to ask the organizer to reschedule, if you prefer a different time, and the other participants are available.</p>
<h2>Block recurring events</h2>
<p>Take back control over how and when you are working on what. Some things need to be done every day (processing email, responding to calendar invites and chats, having lunch, or planning and prioritizing work) and you need to make room for that. You can always cut back if you’re running out of overhead tasks. My work time as you can see in the following screenshot is from 10:00 to 19:00. I usually do not exceed my 40 hours work week with this setup.</p>
<p><img alt="Screenshot of a typical week in my Google Calendar. There are daily recurring blocks for standups, lunch, processing things, and ending the day. Other weekly or monthly recurring events are grouped mostly on Wednesday afternoon and are a different color." src="https://engineering.zalando.com/posts/2020/07/images/calendar-typical-week.png" title="A typical work week"></p>
<p>For all tasks that need doing, I follow a Getting Things Done (GTD) approach<sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup>. I process my inbox after lunch because I like to get started with work I planned instead of new input from my inbox. When processing, I make prioritization decisions mostly on importance and urgency<sup id="fnref:4"><a class="footnote-ref" href="#fn:4">4</a></sup>. Processing means that I try to organize all tasks into my task management system, which makes it easier for me to discover these tasks at the right time in the right context. A task management system can be anything from a formatted text file or a google document, to a more sophisticated, dedicated task management app. Setting this up is a topic on its own. I suggest to start with whatever you have at hand. I try to follow a strict agenda for task processing:</p>
<ul>
<li>Review perspectives<sup id="fnref:5"><a class="footnote-ref" href="#fn:5">5</a></sup><ul>
<li>What is happening today and the next few days?</li>
<li>What input am I waiting for that will be provided by someone else?</li>
<li>What is stalled (i.e. it’s not clear what the next step would be)?</li>
</ul>
</li>
<li>Process email inbox</li>
<li>Process assigned <a href="https://drive.google.com/drive/search?q=followup:actionitems">Google Followup Action Items</a></li>
<li>Process our internal communication platform</li>
<li>Process Google Chat (pull mode)</li>
<li>Process other inboxes (e.g. task management tool inbox). Categorize and compartmentalize tasks & projects.</li>
<li>Plan and schedule events in the calendar for important or full focus tasks</li>
<li>Flag tasks I plan to complete today</li>
</ul>
<p>Tasks that are <em>flagged</em> are the focus for today and are highlighted in my task management system (e.g. listed on top of the text file). That way I can always go to one spot after some inevitable context switching to get back on track fast. In the evening I try to clear out my inbox, and process and schedule all tasks that came in after lunch for the next few days, so I can start the next morning without having to look into my email inbox. That way I might reach <em>Inbox Zero</em> from time to time, which feels extremely good. A much more important aspect than trying to achieve Inbox Zero all the time, is measuring how much you have on your plate and if your inbox is constantly filling up, or if you're able to keep a healthy balance. <strong>Inbox Zero is a signal, not a lifestyle.</strong></p>
<h2>Categorize calendar entries</h2>
<p>When you categorize your calendar entries, you can see immediately what can be easily rescheduled or canceled in case of emergencies and urgent and important ad-hoc requests. You see how much time you have available, and you can reflect much better on what you did at the end of the day or week. It’s good to feel accomplished about your “focus week”, or “hiring week”, the “catch-up week” or an “off-the-charts week” if you made those choices deliberately. I use the following colors to categorize events.</p>
<ul>
<li>Red: Lunch (to remind myself of the importance)</li>
<li>Bright blue: Inbox processing / quick topics / Getting Things Done (GTD)</li>
<li>Light purple: 1:1's / Jour Fixes with directs and skip-level directs</li>
<li>Dark grey: Recurring department or team meetings</li>
<li>Yellow: Everything hiring related like interviews, preparation and briefings</li>
<li>Orange: Focus time</li>
<li>Dark blue: Mentoring, Career Development, Performance Management</li>
<li>Light orange: Trainings</li>
<li>Green: Everything else (default for incoming events, because green is hope)</li>
</ul>
<p>You can also use emojis to make your calendar look nicer. I’m a visual person and I used this trick to cheat myself into caring more about my calendar and getting into the habit of maintaining good calendar hygiene. If emojis don’t work for you, maybe you’ll find something else. My colleague Lacey Nagel uses an elaborate emoji mapping for events she owns:</p>
<ul>
<li>🌊 blockers for time to focus on specific tasks</li>
<li>📌 user research/interviews</li>
<li>🥙 planned breaks / lunch by myself</li>
<li>🍱 lunch with other Zalando's</li>
<li>🙌 1:1's</li>
<li>🐩 backlog refinement</li>
<li>🗺 planning</li>
<li>🔬 retro</li>
<li>🎂 reminders for colleagues’ birthdays</li>
</ul>
<p>I use some of those and use the following additional emojis for my calendar:</p>
<ul>
<li>📥 processing my inbox / mail</li>
<li>🧹 finishing up for the day</li>
<li>🎓 career development</li>
</ul>
<h2>A Hiring Week</h2>
<p>Looking at my calendar, I know at one glance I don’t have to try and reschedule something yellow, but I can delay focus time, or make a conscious decision to cut back on inbox processing, or move a 1:1. Even if you didn’t work on what you planned to (e.g. product review), because you had to jump in and interview a candidate, you can feel good about it looking at the yellow accomplishments at the end of your week.</p>
<p><img alt="Screenshot of a week in my Google Calendar that was focused more on hiring. Two days consist almost completely out of yellow events." src="https://engineering.zalando.com/posts/2020/07/images/calendar-hiring-week.png" title="A 'hiring' week"></p>
<h2>Plan and schedule your focus work</h2>
<p>If you don’t block those time slots in your calendar, someone else will do it. Understand your energy levels<sup id="fnref:6"><a class="footnote-ref" href="#fn:6">6</a></sup>. You might just want to get a few small things done and out of the way, to get the energy to work on the product strategy next. Maybe you don’t have a lot of energy left, so you can read a document that was shared, or watch an all-hands that was recorded earlier. Different kinds of tasks need different levels of energy. I adopted the energy levels “Short Dashes”, “Full Focus”, “Hanging around”, and “Depleted”. These can be contexts, tags, categories, or different To-Do lists in your task management system, to allow easy access to these tasks.</p>
<h2>A Focus week</h2>
<p>In the example below I had to get the Performance & Development statements for my directs ready before the due date, so I put blockers in the calendar and focused on it. I also finalized a quarterly product review. Another thing you can see is I felt in the mood to go through a few emails and process my inbox earlier on Tuesday so instead of cutting my lunch short, I switched the inbox processing event and the lunch event around.</p>
<p><img alt="Screenshot of a week in my Google Calendar where I could focus on preparing documents for product review, performance assessment, and career development sessions. Roughly 40% of the 40 work hours are orange 'focus' blockers across the week." src="https://engineering.zalando.com/posts/2020/07/images/calendar-focus-week.png" title="A 'focus' week"></p>
<h2>A Management week</h2>
<p>In the next example you can see that preparing material for performance management is a diligent effort and takes a lot of time, same as participating in the corresponding alignment meetings (PRCs). I cut back heavily on inbox processing and lunch, and did some overtime to make it work. At the same time I did not want to cancel the training sessions I had scheduled a long time ago, and had been looking forward to, or miss out on a project closing dinner on Thursday to celebrate success. That was a conscious decision again, so I can’t complain about it afterwards. Cutting back on a routine can be a slippery slope to breaking an established good habit, so be mindful to get back to a normal setup as soon as possible, and compensate for the overtime by taking some time off the following week.</p>
<p><img alt="Screenshot of a week in my Google Calendar where I spent most of the time preparing performance assessment material and participated in the corresponding alignment meetings, including some overtime. Most days are orange because of the focused preparation, and blue because of the career development & performance management character of the events." src="https://engineering.zalando.com/posts/2020/07/images/calendar-management-week.png" title="A 'management' week"></p>
<h2>Feel accomplished working asynchronously</h2>
<p>Transitioning from the office to working remote, especially when using asynchronous communication, can further reduce the feeling of being appreciated and accomplished. The lack of face to face communication means less exposure to this type of appreciation. As someone giving feedback, or when reading something that someone else created or contributed to, you can compensate by explicitly expressing your appreciation. A thank you here and there goes a long way, even if it’s not actionable feedback. It doesn’t have to be. As someone who misses this kind of appreciation, I try to find other signals that potentially correlate with doing a good job, and being appreciated for it, like e.g. the number of readers of a document, or the amount of comments, discussion, and other contributions on topics I'm driving.</p>
<h2>What has changed since going full-remote in March 2020?</h2>
<p>One thing that has changed is that because of the lack of commute, I had more time in the morning, and I started to eat breakfast. Not doing that before meant that I would need to have lunch at noon because I hadn't eaten properly in the morning and would be hungry already. Now with a proper breakfast to start the day, I have shifted lunch to 1pm and process my inbox right before at 12. I essentially switched those events around. You also see that we introduced recurring executive sync meetings at the end of the day to stay connected while working in a remote-first setup.</p>
<p><img alt="Screenshot of how a typical week in my Google Calendar looks like after going remote. The daily inbox processing events at 1pm have switched places with the lunch event, which was at 12 beforehand." src="https://engineering.zalando.com/posts/2020/07/images/calendar-remote-week.png" title="A the new remote week setup"></p>
<h2>Closing comment</h2>
<p>I hope this blog post helps you in <em>leading yourself</em>. Reflecting on how I feel today compared to when I started out on this journey a few years ago, it is a night and day difference. When you learn concepts like the Eisenhower matrix, or Getting Things Done (GTD), most of the time you don't get specific tips and details of how to apply it on a day to day basis. I'm sharing my concrete experience as a template for you to start out with, customize, and iterate on.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p><a href="https://engineering.zalando.com/posts/2020/03/how-to-work-remotely-at-zalando.html">Guidelines for remote work at Zalando</a> <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p><a href="https://fs.blog/2017/12/maker-vs-manager/">Maker vs. Manager</a> <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:3">
<p><a href="https://hamberg.no/gtd/">GTD in 15 minutes – A Pragmatic Guide to Getting Things Done</a> <a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:4">
<p><a href="https://www.eisenhower.me/eisenhower-matrix/">Eisenhower Matrix</a> <a class="footnote-backref" href="#fnref:4" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:5">
<p>The term 'perspective' is task management tool specific: <a href="https://medium.com/smarter-productivity/a-modern-approach-to-gtd-contexts-and-perspectives-in-omnifocus-32a5256f1a0e">A modern approach to GTD contexts and perspectives in OmniFocus</a> <a class="footnote-backref" href="#fnref:5" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
<li id="fn:6">
<p><a href="https://medium.com/smarter-productivity/a-modern-approach-to-gtd-contexts-and-perspectives-in-omnifocus-32a5256f1a0e">A modern approach to GTD contexts and perspectives in OmniFocus
</a> <a class="footnote-backref" href="#fnref:6" title="Jump back to footnote 6 in the text">↩</a></p>
</li>
</ol>
</div>Technology Choices at Zalando - Updating our Tech Radar Process2020-07-15T00:00:00+02:002020-07-15T00:00:00+02:00Bartosz Ocytkotag:engineering.zalando.com,2020-07-15:/posts/2020/07/technology-choices-at-zalando-tech-radar-update.html<p>We have revisited the process of technology selection at Zalando, adjusted the Tech Radar ring semantics, and moved towards principle-based decision making. In this post, we would like to share the process and its outcomes so far.</p><p><img alt="Zalando Tech Radar" src="https://engineering.zalando.com/posts/2020/07/images/zalando-tech-radar.jpg#previewimage"></p>
<h2>Challenges with our Tech Radar</h2>
<p>The <a href="https://opensource.zalando.com/tech-radar/">Zalando Tech Radar</a> is modelled after the <a href="https://www.thoughtworks.com/radar">Thoughtworks Technology Radar</a> and includes a ring-based scoring for a certain technology/framework along with supplementary information about pros, cons, restrictions, usage, and lessons learned at Zalando available as a knowledge base for our teams. Since publishing, the approach and <a href="https://engineering.zalando.com/posts/2018/01/building-tech-radar.html">visualization engine</a> has been used by others and also showcased at conferences <a href="https://twitter.com/arungupta/status/1194653758275256320">as an example</a> of how tech companies manage their technology choices.</p>
<p>Our initial concept of the Tech Radar suffered from a series of problems, which we have observed in the Engineering Community while maintaining the Tech Radar:</p>
<ol>
<li>The ring change criteria were too high level without being specific for technology types (e.g. programming languages, data stores) or context (e.g. backend, data science, mobile), its support by our infrastructure and impact to engineering usage. They didn’t allow for transparent, objective, and recurring rescoring of the Tech Radar nor for clear guidance for our engineers on how to select or suggest technologies to evaluate.</li>
<li>The Tech Radar has been easy to ignore due to lack of a formal process and oftentimes delivery teams have been making key technology choices in isolation without consulting them with the guild maintaining the Tech Radar. Only after technologies were already in production, radar entries and ring changes were proposed instead of having followed the Tech Radar cycle. This led to a disconnect between the ring assignments and factual usage across teams.</li>
<li>The Tech Radar relied on voluntary contributions degrading in frequency due to neither being clearly incentivized nor part of the job expectations for higher grades. Contributions are usually driven by a small group of engineers forming an informal guild, who were driving the collection of lessons learned material and encouraging teams across the organization to contribute. The guild lacked a formal mandate to make company-wide technology decisions and was insufficiently representing our departments across the company.</li>
</ol>
<h2>Confirming the problem statements</h2>
<p>To address these problems we have embarked on a journey starting with confirming the observed problems with our Engineering Managers and getting more insights on how they manage technology choices in their teams. We also explored potential effects on delivery in the past years. We found that Engineering Managers have felt insufficiently supported by the company to manage expectations and technology choices in their teams and missed the ability to lean on stricter guidance. Further, too broad technology choice has had an effect on the growth rate of their teams and created challenges with cross-team code contributions.</p>
<h2>Technology choices in Tech companies</h2>
<p>Having confirmed the problem, we’ve been collecting ideas on how the problems can be approached. We began with researching how other tech companies are managing technology selection. Unlike Zalando, other established tech companies (Google, Spotify, Tencent, <a href="https://github.com/foursquare/fsqio/blob/master/src/docs/fsqio/policies/new_technology_proposal.md">Foursquare</a>, and other <a href="https://www.cncf.io/people/end-user-community/">CNCF End User companies</a>) use a much stricter technology selection process, limit programming language choices, and invest into changing the <a href="https://cloudblogs.microsoft.com/opensource/2019/10/16/announcing-dapr-open-source-project-build-microservice-applications/">way applications are built</a> to leverage centralized control planes, which increases development velocity. They limit the tech stack choices due to the amount of investment into infrastructure support and the high cost of removing technologies that did not prove to be useful.</p>
<p>A too high number of technologies, that are adopted company-wide, make it challenging and expensive for Infrastructure teams to provide high-quality and well integrated tooling, e.g. CI/CD, observability, profiling, vulnerability scanning, compliance, governance, etc. It also causes the teams that provide infrastructure solutions to strongly depend on coordinated and continuous community contribution for technologies that are not supported centrally. A broad freedom of choice leads to increased difficulties in supporting software long-term when the original authors have left the company, which is guaranteed to happen sooner or later. There are also other problems related to development collaboration: (1) adjusting to cross-language communication becomes significant as teams will repeatedly implement the same functional components in different ways, (2) the code duplication rate is increased and it's costly to address non-functional requirements of services in terms of performance, high availability, and scalability, and (3) cross-team collaboration across different code bases is hindered.</p>
<p>Generally, aside from specialized use cases, especially high value in flexibility around technology choices is provided when organizations have the ability to identify technologies that are bringing a paradigm shift (e.g. Kubernetes) paired with business value and use case fit. This proves to be a difficult task and companies rarely get the timing right.</p>
<h2>Data collection</h2>
<p>We sourced information from the Engineering Community through a Programming Language survey among our developers. The survey indicated how many engineers are currently using a certain language, which they feel comfortable working with and to which degree, as well as which language they would like to support others with in terms of guidelines or ad-hoc help. We cross-checked this data with our 4,000+ applications and derived how the different programming languages have gained traction and popularity over time.</p>
<h2>Setting the bar for ADOPT languages</h2>
<p>We have collected expectations around the level of support that we would like to see for ADOPT languages, ranging from clear guidelines on the VM lifecycles, integration into CI/CD systems, observability, size and health of the community within and outside of the company, ability to hire engineers to grow our teams using those languages, up to best practices for common tasks like performance analysis and tuning through inspection of heap dumps or flame graphs. We then collected data on how all our languages used in production benchmark against that criteria to see how big the gap in our expectations is with reality.</p>
<h2>Defining new ring semantics</h2>
<p>We have redefined the ring semantics as follows:</p>
<ul>
<li><strong>ADOPT</strong>: technologies with broad adoption, in which Zalando is willing to invest long-term</li>
<li><strong>TRIAL</strong>: captures all current experiments in production</li>
<li><strong>ASSESS</strong>: active, non-production assessments of promising technologies and trends</li>
<li><strong>HOLD</strong>: discouraged from broad adoption where the company is not willing to invest further; no new applications may use this technology</li>
<li><strong>NIL</strong>: no ring assignment, captures previous assessments and findings for long-term documentation purposes (we periodically archive HOLD entries as NIL)</li>
</ul>
<p>We optionally limit the ring assignments through a clear scope recommendation: Backend, Mobile, Web, Data, Machine Learning, and Infrastructure. This allows us to better differentiate between the specifics of those use cases. The updated semantics allow us to be broad in assessing the value of emerging technologies, but be selective in terms of their deployments to production and level of investment into adoption and promotion within the company. For TRIAL, we also involve explicit sponsorship from our Engineering Heads, who will support production trials and commit to being accountable for divesting from non-promising technologies and the removal of failed experiments from our technology landscape.</p>
<h2>Technology Selection Principles and Principal Engineering Community</h2>
<p>The timing for making changes to Tech Radar was fortunate due to two reasons. First, we have started an update of our role expectations for Software Engineers and Engineering Managers and included the responsibility and accountability for technology selection along with incentivizing contributions to the process in the new expectations. Second, we created a community of Principal Engineers with the most senior engineers across the company as members, who have been empowered to make decisions on technology selection and thus maintain the Tech Radar. We kicked off the community with a day-long remote off-site where we captured engineering challenges we face at Zalando, brainstormed on principles for technology selection, and initial exchange about the implications of new ring assignments and learnings about the programming languages we use in production. In departments that were not represented by Principal Engineers, we have included our Senior Engineers to contribute instead. Following the off-site, we have formalized Technology Selection Principles that provide guidance on technology choices in terms of breadth and depth, focus on company instead of local decision making, etc. <a href="https://www.meeteor.com/post/principle-based-decision-making">Principle-based decision making</a> enables healthy discussions and differs enormously from preference-based decision making, which easily becomes personal and leads to conflicts.</p>
<h2>Parting ways with Clojure, Haskell, and Rust</h2>
<p>Having reviewed the use cases where our teams have used the languages that are not on ADOPT, their current adoption within Zalando since 2016, the available set of languages, and the level of investment required to bring them to ADOPT, we have decided to part ways with Clojure, Haskell, and Rust and not create new applications in those languages moving forward. Although our teams have built many services using these languages and learned how to operate these at scale with many successes, following our technology selection principles, we decided to not further invest in these languages as their unique capabilities are not giving us any further <a href="https://dehora.net/journal/leverage-in-engineering-organisations">leverage</a> at this point in time. Instead, we are focusing our community efforts on Kotlin and TypeScript and expect our language communities to help us move these to ADOPT later this year.</p>
<p>Please note that this decision is specific to the context of Zalando (1,200+ developers, 4,000+ applications) and our current technology landscape and engineering practices. As such, this decision is not transferable to other organizations nor to be understood as a statement about the technical capabilities of the languages themselves. We encourage readers to follow a similar exercise as ours to derive decisions for their context.</p>
<h2>Next steps</h2>
<p>So far, we have reviewed the area of programming languages as the one having the biggest long-term impact on our engineers and system architecture as well as being the one sparking many debates on which language is better and why (when arguing based on preferences). As the next step, we are proceeding with reviewing the remaining categories of the Tech Radar, so stay tuned for further updates on our journey. (Update: check out our follow-up post on <a href="/posts/2021/06/zalando-tech-radar-scaling-contributions.html">Scaling Contributions to the Tech Radar</a>)</p>
<p><em>If you would like to work with us and help shaping our technology landscape, join our team and help us with setting the course as a <a href="https://jobs.zalando.com/en/tech/jobs/?filters%5Bcategories%5D%5B0%5D=Software%20Engineering&search=principal">Principal Engineer</a>.</em></p>Launching the Engineering Blog2020-07-01T00:00:00+02:002020-07-01T00:00:00+02:00Henning Jacobstag:engineering.zalando.com,2020-07-01:/posts/2020/07/launching-the-engineering-blog.html<p>We recently re-launched Zalando's Engineering Blog. Learn how we have set up a blog with a Lighthouse score of 100.</p><p>Our Engineering Blog was launched in June 2020 after a long break of the previous tech blog.
This post describes the technical setup behind <code>engineering.zalando.com</code>.</p>
<p>You will learn:</p>
<ul>
<li>Which static site generator we selected and why.</li>
<li>What customizations we applied to design the blog and the publishing process.</li>
<li>How we serve static HTML using Skipper and S3.</li>
</ul>
<h2>Static Site Generator</h2>
<p>Our previous tech blog used a CMS which only a limited number of people had access to.
The CMS system also lacked a workflow to propose and review drafts.
As authors of the Engineering Blog will (mostly) be software engineers, we decided to switch to a git-based workflow
and a static site generator.</p>
<p><a href="https://www.staticgen.com/">StaticGen</a> provides a nice overview of many different static site generators.
Nearly all of them provide the necessary features to generate a static HTML site from blog posts written in Markdown.
So which static site generator to choose?</p>
<p>With the need to customize the blog engine, e.g. with custom templates and features like author titles,
the main criteria for the static site generator is to use a familiar programming language for templating and for plugins.
The static site generator should generate plain HTML and not contain unnecessary features we won't use.
The winner was <a href="https://getpelican.com/">Pelican</a>:</p>
<p><img alt="StaticGen: Pelican stats" src="https://engineering.zalando.com/posts/2020/07/images/staticgen-pelican.png#center"></p>
<ul>
<li>Pelican is written in Python. Python is the language the most people are familiar with in Zalando, so it's a safe bet.</li>
<li>Templates are written in <a href="https://palletsprojects.com/p/jinja/">Jinja</a>. Jinja is a popular templating system, it's <a href="https://github.com/search?l=Python&q=org%3Azalando+org%3Azalando-incubator+jinja2&type=Code">used in Zalando Open Source</a> and <a href="https://github.com/search?l=Python&q=user%3Ahjacobs+jinja2&type=Code">I use it in my own OSS projects</a>.</li>
<li>Atom/RSS feeds are supported out-of-the-box</li>
<li>There are <a href="https://github.com/getpelican/pelican-plugins">many existing plugins</a> and it's easy to write your own in Python.</li>
<li>It's actively developed. The <a href="https://github.com/getpelican/pelican/">last git commit</a> was 16 days ago at the time of writing.</li>
</ul>
<h2>Customization</h2>
<p>We implemented the blog's design with plain HTML/CSS. The CSS is generated via <a href="https://postcss.org/">PostCSS</a> and <a href="https://tailwindcss.com/">Tailwind CSS</a>.
Customizing Pelican's Jinja templates was straightforward.</p>
<p>Other customizations we did:</p>
<ul>
<li>Enable <a href="https://engineering.zalando.com/atom.xml">the Atom feed</a> via the <code>FEED_ATOM</code> setting in <code>pelicanconf.py</code>.</li>
<li>Generate <a href="https://engineering.zalando.com/sitemap.xml">the sitemap XML</a> with the <a href="https://github.com/pelican-plugins/sitemap">sitemap plugin</a>.</li>
<li>Add author titles with the <a href="https://pypi.org/project/pelican-metadataparsing/">pelican-metadataparsing plugin</a>.</li>
<li>Minify generated HTML with the <a href="https://pypi.org/project/pelican-htmlmin/">pelican-htmlmin plugin</a>.</li>
</ul>
<p>Additionally to the above, we want to make sure that automatic linting is in place for blog posts:</p>
<ul>
<li>Required meta keys must be present, e.g. title, summary, and author names.</li>
<li>The blog post Markdown file must be in the right year/month folder.</li>
<li>Article tags should be curated via an explicit allowlist. We want to avoid introducing many unnecessary tags and different tags for the same concept, e.g. "Postgres" vs. "PostgreSQL".</li>
</ul>
<p>Linting is done via <a href="https://pre-commit.com/">pre-commit</a> which calls a custom Python script to validate blog post Markdown files.
The <code>.pre-commit-config.yaml</code> looks something like this:</p>
<div class="highlight"><pre><span></span><code><span class="nt">minimum_pre_commit_version</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1.21.0</span>
<span class="nt">repos</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">repo</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">meta</span>
<span class="w"> </span><span class="nt">hooks</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">check-hooks-apply</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">check-useless-excludes</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">repo</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">local</span>
<span class="w"> </span><span class="nt">hooks</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">validate-content</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Validate blog content</span>
<span class="w"> </span><span class="nt">language</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">system</span>
<span class="w"> </span><span class="c1"># run with poetry to get dependencies (Pelican)</span>
<span class="w"> </span><span class="nt">entry</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">poetry run ./validate-content.py</span>
<span class="w"> </span><span class="nt">types</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="nv">markdown</span><span class="p p-Indicator">]</span>
<span class="w"> </span><span class="nt">exclude</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">^content/pages/.*.md$</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">repo</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">https://github.com/pre-commit/pre-commit-hooks</span>
<span class="w"> </span><span class="nt">rev</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">v3.1.0</span>
<span class="w"> </span><span class="nt">hooks</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">check-added-large-files</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">end-of-file-fixer</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">trailing-whitespace</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mixed-line-ending</span>
</code></pre></div>
<p>Zalando's CI/CD system automatically lints all files by executing <code>make lint</code>.</p>
<h2>Writing a blog post</h2>
<p>Anybody in Zalando can pitch a blog post idea by creating an issue in the git repo:</p>
<p><img alt="Blog post pitch: new issue" src="https://engineering.zalando.com/posts/2020/07/images/blog-post-pitch-new-issue.png"></p>
<p>Bootstrapping a new blog post looks like this:</p>
<div class="highlight"><pre><span></span><code>hjacobs@ZALANDO-123:~/workspace/engineering-blog$<span class="w"> </span>make<span class="w"> </span>new
poetry<span class="w"> </span>run<span class="w"> </span>./scripts/new-post.py
This<span class="w"> </span>will<span class="w"> </span>create<span class="w"> </span>a<span class="w"> </span>new<span class="w"> </span>blog<span class="w"> </span>post,<span class="w"> </span>please<span class="w"> </span>answer<span class="w"> </span>a<span class="w"> </span>few<span class="w"> </span>questions..
Title<span class="w"> </span>of<span class="w"> </span>blog<span class="w"> </span>post:<span class="w"> </span>Launching<span class="w"> </span>the<span class="w"> </span>Engineering<span class="w"> </span>Blog
Slug<span class="w"> </span><span class="o">[</span>launching-the-engineering-blog<span class="o">]</span>:
Date<span class="w"> </span><span class="o">(</span>estimated<span class="o">)</span><span class="w"> </span>of<span class="w"> </span>publishing<span class="w"> </span><span class="o">[</span><span class="m">2020</span>-07-04<span class="o">]</span>:
Author<span class="w"> </span>names<span class="w"> </span><span class="o">(</span>separate<span class="w"> </span>with<span class="w"> </span>semicolon<span class="o">)</span><span class="w"> </span><span class="o">[</span>Henning<span class="w"> </span>Jacobs<span class="o">]</span>:
Author<span class="w"> </span>titles<span class="w"> </span><span class="o">(</span>separate<span class="w"> </span>with<span class="w"> </span>semicolon<span class="o">)</span><span class="w"> </span><span class="o">[</span>Senior<span class="w"> </span>Principal<span class="w"> </span>Engineer<span class="o">]</span>:
<span class="o">========================================</span>
Title:<span class="w"> </span>Launching<span class="w"> </span>the<span class="w"> </span>Engineering<span class="w"> </span>Blog
Slug:<span class="w"> </span>launching-the-engineering-blog
Authors:<span class="w"> </span>Henning<span class="w"> </span>Jacobs
Author<span class="w"> </span>Titles:<span class="w"> </span>Senior<span class="w"> </span>Principal<span class="w"> </span>Engineer
Date:<span class="w"> </span><span class="m">2020</span>-07-04
URL:<span class="w"> </span>/posts/2020/07/launching-the-engineering-blog.html
<span class="o">========================================</span>
Does<span class="w"> </span>this<span class="w"> </span>look<span class="w"> </span>correct?<span class="w"> </span>Answer<span class="w"> </span><span class="s1">'y'</span><span class="w"> </span>or<span class="w"> </span><span class="s1">'n'</span>:<span class="w"> </span>y
Creating<span class="w"> </span>content/2020/07/launching-the-engineering-blog/2020-07-04-launching-the-engineering-blog.md<span class="w"> </span>..
Useful<span class="w"> </span>commands:
-<span class="w"> </span>make<span class="w"> </span>devserver<span class="w"> </span>Start<span class="w"> </span><span class="nb">local</span><span class="w"> </span>webserver,<span class="w"> </span>find<span class="w"> </span>your<span class="w"> </span>draft<span class="w"> </span>on<span class="w"> </span>http://localhost:8000/drafts/
-<span class="w"> </span>make<span class="w"> </span>lint<span class="w"> </span>Validate<span class="w"> </span>content<span class="w"> </span>and<span class="w"> </span>formatting.
Please<span class="w"> </span>edit<span class="w"> </span>your<span class="w"> </span>article<span class="w"> </span><span class="k">in</span><span class="w"> </span>content/2020/07/launching-the-engineering-blog/2020-07-04-launching-the-engineering-blog.md
and<span class="w"> </span>don<span class="err">'</span>t<span class="w"> </span>forget<span class="w"> </span>to<span class="w"> </span>open<span class="w"> </span>a<span class="w"> </span>PR<span class="w"> </span>:-<span class="o">)</span>
</code></pre></div>
<p>Opening a PR to the Engineering Blog repository will trigger a build (<code>make html</code>) on our Zalando Continuous Delivery Platform.
The PR build will publish a preview of the blog under a private (authenticated) URL.</p>
<p>After merging the blog post PR, it will automatically be published on the live site <code>engineering.zalando.com</code>.</p>
<h2>Serving static HTML</h2>
<p>Zalando's Continuous Delivery Platform has a built-in feature to upload files to a given S3 bucket. This feature is used to upload all files from the <code>output</code> directory (generated by Pelican) to the blog's S3 bucket.
The S3 bucket is created via CloudFormation which also configures the S3 website:</p>
<div class="highlight"><pre><span></span><code><span class="nt">AWSTemplateFormatVersion</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2010-09-09</span>
<span class="nt">Metadata</span><span class="p">:</span>
<span class="w"> </span><span class="nt">StackName</span><span class="p">:</span><span class="w"> </span><span class="s">"engineering-blog"</span>
<span class="w"> </span><span class="nt">Tags</span><span class="p">:</span>
<span class="w"> </span><span class="nt">application</span><span class="p">:</span><span class="w"> </span><span class="s">"engineering-blog"</span>
<span class="nt">Resources</span><span class="p">:</span>
<span class="w"> </span><span class="nt">S3Bucket</span><span class="p">:</span>
<span class="w"> </span><span class="nt">Type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">AWS::S3::Bucket</span>
<span class="w"> </span><span class="nt">Properties</span><span class="p">:</span>
<span class="w"> </span><span class="nt">BucketName</span><span class="p">:</span><span class="w"> </span><span class="s">"<BUCKET-NAME>"</span>
<span class="w"> </span><span class="nt">AccessControl</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">PublicRead</span>
<span class="w"> </span><span class="nt">WebsiteConfiguration</span><span class="p">:</span>
<span class="w"> </span><span class="nt">IndexDocument</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">index.html</span>
<span class="w"> </span><span class="nt">ErrorDocument</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">error.html</span>
<span class="w"> </span><span class="nt">DeletionPolicy</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Retain</span>
<span class="w"> </span><span class="nt">BucketPolicy</span><span class="p">:</span>
<span class="w"> </span><span class="nt">Type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">AWS::S3::BucketPolicy</span>
<span class="w"> </span><span class="nt">Properties</span><span class="p">:</span>
<span class="w"> </span><span class="nt">PolicyDocument</span><span class="p">:</span>
<span class="w"> </span><span class="c1"># ...</span>
</code></pre></div>
<p>The <a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-s3-websiteconfiguration.html">WebsiteConfiguration property</a> will make the bucket contents available on <code>http://<BUCKET-NAME>.s3-website.<REGION>.amazonaws.com</code>.
The S3 website only provides an HTTP endpoint (no SSL) and not a domain we would want to use publicly.</p>
<p>One way to serve the contents with a custom domain and SSL is to <a href="https://aws.amazon.com/premiumsupport/knowledge-center/cloudfront-serve-static-website/">create a CloudFront web distribution</a>.
I decided to not use CloudFront as all the required infrastructure for domain+SSL is already in place.</p>
<p>We have <a href="https://github.com/zalando/skipper/">Skipper</a> as the Kubernetes Ingress proxy running for all our 140+ Kubernetes clusters.
<a href="https://github.com/kubernetes-sigs/external-dns">External DNS</a> automatically configures the DNS name and the <a href="https://github.com/zalando-incubator/kube-ingress-aws-controller">Kubernetes Ingress Controller for AWS</a> configures the AWS ALB with the right ACM SSL certificate. So let's reuse this infrastructure and let Skipper proxy all requests to the S3 website bucket endpoint.
This can be achieved by adding a default Skipper route as Ingress annotation:</p>
<div class="highlight"><pre><span></span><code><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">networking.k8s.io/v1beta1</span>
<span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Ingress</span>
<span class="nt">metadata</span><span class="p">:</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s">"engineering-blog"</span>
<span class="w"> </span><span class="nt">labels</span><span class="p">:</span>
<span class="w"> </span><span class="nt">application</span><span class="p">:</span><span class="w"> </span><span class="s">"engineering-blog"</span>
<span class="w"> </span><span class="nt">annotations</span><span class="p">:</span>
<span class="w"> </span><span class="nt">zalando.org/skipper-routes</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">redirect_app_default: * -> compress() -> setDynamicBackendUrl("http://<BUCKET-NAME>.s3-website.<REGION>.amazonaws.com") -> <dynamic>;</span>
<span class="nt">spec</span><span class="p">:</span>
<span class="w"> </span><span class="nt">rules</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">host</span><span class="p">:</span><span class="w"> </span><span class="s">"engineering.zalando.com"</span>
<span class="w"> </span><span class="nt">http</span><span class="p">:</span>
<span class="w"> </span><span class="nt">paths</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">backend</span><span class="p">:</span>
<span class="w"> </span><span class="nt">serviceName</span><span class="p">:</span><span class="w"> </span><span class="s">"engineering-blog"</span>
<span class="w"> </span><span class="nt">servicePort</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">80</span>
</code></pre></div>
<p>That Skipper's <code>compress()</code> filter enables <code>gzip</code> compression as the S3 endpoint does not provide response compression out-of-the-box.
The ACM certificate, HTTP/2 support, the S3 website response, and the enabled compression are visible when doing a curl request (output shortened):</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>curl<span class="w"> </span>-v<span class="w"> </span>--compressed<span class="w"> </span>https://engineering.zalando.com<span class="w"> </span>-o<span class="w"> </span>/dev/null
*<span class="w"> </span>SSL<span class="w"> </span>connection<span class="w"> </span>using<span class="w"> </span>TLSv1.2<span class="w"> </span>/<span class="w"> </span>ECDHE-RSA-AES128-GCM-SHA256
*<span class="w"> </span>Server<span class="w"> </span>certificate:
*<span class="w"> </span>subject:<span class="w"> </span><span class="nv">CN</span><span class="o">=</span>engineering.zalando.com
*<span class="w"> </span>subjectAltName:<span class="w"> </span>host<span class="w"> </span><span class="s2">"engineering.zalando.com"</span><span class="w"> </span>matched<span class="w"> </span>cert<span class="err">'</span>s<span class="w"> </span><span class="s2">"engineering.zalando.com"</span>
*<span class="w"> </span>issuer:<span class="w"> </span><span class="nv">C</span><span class="o">=</span>US<span class="p">;</span><span class="w"> </span><span class="nv">O</span><span class="o">=</span>Amazon<span class="p">;</span><span class="w"> </span><span class="nv">OU</span><span class="o">=</span>Server<span class="w"> </span>CA<span class="w"> </span>1B<span class="p">;</span><span class="w"> </span><span class="nv">CN</span><span class="o">=</span>Amazon
*<span class="w"> </span>SSL<span class="w"> </span>certificate<span class="w"> </span>verify<span class="w"> </span>ok.
><span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/2
><span class="w"> </span>Host:<span class="w"> </span>engineering.zalando.com
><span class="w"> </span>user-agent:<span class="w"> </span>curl/7.68.0
><span class="w"> </span>accept:<span class="w"> </span>*/*
><span class="w"> </span>accept-encoding:<span class="w"> </span>deflate,<span class="w"> </span>gzip,<span class="w"> </span>br
<<span class="w"> </span>HTTP/2<span class="w"> </span><span class="m">200</span>
<<span class="w"> </span>content-type:<span class="w"> </span>text/html
<<span class="w"> </span>content-encoding:<span class="w"> </span>deflate
<<span class="w"> </span>etag:<span class="w"> </span><span class="s2">"304fcc9c31aac19255bf1d84669059df"</span>
<<span class="w"> </span>last-modified:<span class="w"> </span>Sat,<span class="w"> </span><span class="m">27</span><span class="w"> </span>Jun<span class="w"> </span><span class="m">2020</span><span class="w"> </span><span class="m">07</span>:23:19<span class="w"> </span>GMT
<<span class="w"> </span>server:<span class="w"> </span>AmazonS3
<<span class="w"> </span>vary:<span class="w"> </span>Accept-Encoding
</code></pre></div>
<h2>Performance</h2>
<p>The static website should be fast. So let's test. We can use <a href="https://github.com/tsenart/vegeta">Vegeta</a> for some basic HTTP load testing.
60ms as p99 latency looks good:</p>
<div class="highlight"><pre><span></span><code><span class="err">$</span><span class="w"> </span><span class="n">echo</span><span class="w"> </span><span class="ss">"GET https://engineering.zalando.com/"</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">vegeta</span><span class="w"> </span><span class="n">attack</span><span class="w"> </span><span class="o">-</span><span class="n">duration</span><span class="o">=</span><span class="mi">60</span><span class="n">s</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">vegeta</span><span class="w"> </span><span class="n">report</span>
<span class="n">Requests</span><span class="w"> </span><span class="o">[</span><span class="n">total, rate, throughput</span><span class="o">]</span><span class="w"> </span><span class="mi">3000</span><span class="p">,</span><span class="w"> </span><span class="mf">50.02</span><span class="p">,</span><span class="w"> </span><span class="mf">50.00</span>
<span class="n">Duration</span><span class="w"> </span><span class="o">[</span><span class="n">total, attack, wait</span><span class="o">]</span><span class="w"> </span><span class="mf">59.995</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="mf">59.98</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="mf">15.246</span><span class="n">ms</span>
<span class="n">Latencies</span><span class="w"> </span><span class="o">[</span><span class="n">min, mean, 50, 90, 95, 99, max</span><span class="o">]</span><span class="w"> </span><span class="mf">12.418</span><span class="n">ms</span><span class="p">,</span><span class="w"> </span><span class="mf">19.751</span><span class="n">ms</span><span class="p">,</span><span class="w"> </span><span class="mf">17.049</span><span class="n">ms</span><span class="p">,</span><span class="w"> </span><span class="mf">25.05</span><span class="n">ms</span><span class="p">,</span><span class="w"> </span><span class="mf">38.382</span><span class="n">ms</span><span class="p">,</span><span class="w"> </span><span class="mf">59.958</span><span class="n">ms</span><span class="p">,</span><span class="w"> </span><span class="mf">244.094</span><span class="n">ms</span>
<span class="n">Bytes</span><span class="w"> </span><span class="ow">In</span><span class="w"> </span><span class="o">[</span><span class="n">total, mean</span><span class="o">]</span><span class="w"> </span><span class="mi">51441000</span><span class="p">,</span><span class="w"> </span><span class="mf">17147.00</span>
<span class="n">Bytes</span><span class="w"> </span><span class="k">Out</span><span class="w"> </span><span class="o">[</span><span class="n">total, mean</span><span class="o">]</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mf">0.00</span>
<span class="n">Success</span><span class="w"> </span><span class="o">[</span><span class="n">ratio</span><span class="o">]</span><span class="w"> </span><span class="mf">100.00</span><span class="o">%</span>
<span class="n">Status</span><span class="w"> </span><span class="n">Codes</span><span class="w"> </span><span class="o">[</span><span class="n">code:count</span><span class="o">]</span><span class="w"> </span><span class="mi">200</span><span class="err">:</span><span class="mi">3000</span>
<span class="n">Error</span><span class="w"> </span><span class="k">Set</span><span class="err">:</span>
</code></pre></div>
<p>The user experience with a real browser is much more interesting. <a href="https://developers.google.com/web/tools/lighthouse/">Chrome Lighthouse</a> can be used to assess the page performance.
Google's PageSpeed Insights uses Lighthouse for its score calculation.
Running <a href="https://developers.google.com/speed/pagespeed/insights/?url=https%3A%2F%2Fengineering.zalando.com">PageSpeed Insights for the blog</a> reports a nice score of 100 out of 100 (desktop):</p>
<p><img alt="PageSpeed Insights for https://engineering.zalando.com/" src="https://engineering.zalando.com/posts/2020/07/images/page-speed-insights-engineering-zalando-com.png"></p>
<p>Thanks go out to our Employer Branding colleagues who created the design and implemented the responsive HTML/CSS layout!</p>
<h2>Summary</h2>
<p>I hope this blog post gives you some inspiration for setting up your own blog with Pelican or some other static site generator.
After re-launching our Engineering Blog, our main focus will be providing regular and high quality content.
We still have to figure out the best way to source, review, and schedule blog posts.</p>
<p><a href="https://twitter.com/ZalandoTech">Follow ZalandoTech on Twitter</a> and subscribe to <a href="https://engineering.zalando.com/atom.xml">the Atom/RSS feed</a> to get the latest articles.</p>PgBouncer on Kubernetes and how to achieve minimal latency2020-06-24T00:00:00+02:002020-06-24T00:00:00+02:00Dmitrii Dolgovtag:engineering.zalando.com,2020-06-24:/posts/2020/06/postgresql-connection-poolers.html<p>Experiments with connection poolers on Kubernetes for Postgres Operator</p><h1>Introduction</h1>
<p>In the new Postgres Operator release 1.5 we have implemented couple of new
interesting <a href="https://github.com/zalando/postgres-operator/releases/tag/v1.5.0">features</a>, including connection pooling support. <a href="https://sanctum.geek.nz/arabesque/vim-koans/">Master Wq</a>
says there is "No greatest tool", to run something successfully in production
one needs to understand pros and cons. Let's try to dig into the topic, and
take a look at the performance aspect of connection pooler support, mostly from
a scaling perspective.</p>
<p>But first let's make an introduction. Why do we quite often need a connection
pooler for PostgreSQL (and in fact for many other <a href="https://www.cockroachlabs.com/docs/stable/recommended-production-settings.html#connection-pooling">databases</a> too)? There
are several performance implications of having too many connections to a
database open that result from how a connection is <a href="https://www.postgresql.org/docs/12/connect-estab.html">opened</a> (PostgreSQL
uses a "process per user" client/server model, in which too many connections mean too
many processes fighting for resources and drowning in context switches and
<a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c#n1736">CPU migrations</a>) and how <a href="https://www.postgresql.org/message-id/20200301083601.ews6hz5dduc3w2se%40alap3.anarazel.de">certain aspects</a> of transaction handling are
implemented (e.g. <code>GetSnapshotData</code> has <code>O(connections)</code> complexity). Having
said that there are three options where to implement a connection pooler:</p>
<ul>
<li>on the database side, like proposed in this <a href="https://www.postgresql.org/message-id/flat/KL1PR0601MB380006383DE897E2026ACEC6B6D40%40KL1PR0601MB3800.apcprd06.prod.outlook.com#329a9ba21d8f634eebade5d1d62fa3c0">patch</a></li>
<li>as a separate component between the database and the application</li>
<li>on the application side</li>
</ul>
<p>For Postgres Operator we have chosen the second approach. Although there are
pros and cons for all of those options, any other will obviously require a lot
of efforts (application side connection pooler is not something under the
operator control, and internal connection pooler for PostgreSQL is a major
feature one needs to develop yet). Another interesting choice to make in this
case is which solution for connection pooling to use. At the moment for
PostgreSQL there are couple of available options (listed in no particular
order):</p>
<ul>
<li><a href="http://www.pgbouncer.org">PgBouncer</a></li>
<li><a href="https://www.pgpool.net/mediawiki/index.php/Main_Page">Pgpool-II</a></li>
<li><a href="https://github.com/yandex/odyssey">Odyssey</a></li>
<li><a href="https://agroal.github.io/pgagroal/">pgagroal</a></li>
</ul>
<p>PgBouncer is probably the most popular and the oldest solution. Pgpool-II can
actually do much more than just connection pooling (e.g. it can do load
balancing), but it means it's a bit more heavyweight than others. Odyssey and
pgagroal are much newer and try to be more performance optimized and scalable
than the alternatives.</p>
<p>Eventually we went for PgBouncer, but current implementation allow us to switch to
any other solutions if they conform to a basic common standard. Now let's
take a look at how PgBouncer performs in tests.</p>
<h1>Setup</h1>
<p>In fact, we did significant amount of benchmarks with PgBouncer for different
workloads on our Kubernetes clusters and learned few interesting details. For
example, I didn't know that a Kubernetes <code>Service</code> can distribute workload in
not exactly uniform way, so that one can see something like this, where the
third pod is only half utilized and in fact gets half as much queries as the
others:</p>
<div class="highlight"><pre><span></span><code>NAME CPU(cores) MEMORY(bytes)
pool-test-7d8bfbc47f-6bbhr 977m 5Mi
pool-test-7d8bfbc47f-8jtnp 995m 6Mi
pool-test-7d8bfbc47f-ghvpn 585m 6Mi
pool-test-7d8bfbc47f-s945p 993m 6Mi
</code></pre></div>
<p>This could happen if <code>kube-proxy</code> works in <code>iptables</code> <a href="https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-iptables">mode</a> and calculates
probabilities to land on a pod instead of strict round-robin.</p>
<p>But in this article I want to offer one example, produced in a more artificial
environment of my laptop. That's mostly because we can get more interesting
metrics that are interesting for this particular case, but do not make sense to
collect for all workloads. My original idea was to play around CPU management
policies and <a href="https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy">exclusive CPUs</a>, to show what will happen if a PgBouncer runs
with a fixed cpuset. But interesting enough, another effect introduced an even
bigger difference, so the following experiment will be more about scaling of
PgBouncer instances.</p>
<p>To simulate the networking part of our experiment, let's setup a separate network
namespace, where we will run PostgreSQL and PgBouncer, and connect it via veth
link with the root namespace.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># setup veth link with veth0/veth1 at the ends</span>
$<span class="w"> </span>ip<span class="w"> </span>link<span class="w"> </span>add<span class="w"> </span>veth0<span class="w"> </span><span class="nb">type</span><span class="w"> </span>veth<span class="w"> </span>peer<span class="w"> </span>name<span class="w"> </span>veth1
<span class="c1"># check that they're present</span>
$<span class="w"> </span>ip<span class="w"> </span>link<span class="w"> </span>show<span class="w"> </span><span class="nb">type</span><span class="w"> </span>veth
<span class="c1"># add a new network namespace</span>
$<span class="w"> </span>ip<span class="w"> </span>netns<span class="w"> </span>add<span class="w"> </span>db
<span class="c1"># move one end into the new namespace</span>
$<span class="w"> </span>ip<span class="w"> </span>link<span class="w"> </span><span class="nb">set</span><span class="w"> </span>veth1<span class="w"> </span>netns<span class="w"> </span>db
<span class="c1"># check that now only veth0 is visible</span>
$<span class="w"> </span>ip<span class="w"> </span>link<span class="w"> </span>show<span class="w"> </span><span class="nb">type</span><span class="w"> </span>veth
<span class="c1"># check that veth1 is visible from the other namespace</span>
$<span class="w"> </span>ip<span class="w"> </span>netns<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>db<span class="w"> </span>ip<span class="w"> </span>link<span class="w"> </span>show<span class="w"> </span><span class="nb">type</span><span class="w"> </span>veth
<span class="c1"># add corresponding addresses and bring everything up</span>
$<span class="w"> </span>ip<span class="w"> </span>addr<span class="w"> </span>add<span class="w"> </span><span class="m">10</span>.0.0.10/24<span class="w"> </span>dev<span class="w"> </span>veth0
$<span class="w"> </span>ip<span class="w"> </span>netns<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>db<span class="w"> </span>ip<span class="w"> </span>addr<span class="w"> </span>add<span class="w"> </span><span class="m">10</span>.0.0.1/24<span class="w"> </span>dev<span class="w"> </span>veth1
$<span class="w"> </span>ip<span class="w"> </span>link<span class="w"> </span><span class="nb">set</span><span class="w"> </span>veth0<span class="w"> </span>up
$<span class="w"> </span>ip<span class="w"> </span>netns<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>db<span class="w"> </span>ip<span class="w"> </span>link<span class="w"> </span><span class="nb">set</span><span class="w"> </span>veth1<span class="w"> </span>up
$<span class="w"> </span>ip<span class="w"> </span>netns<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>db<span class="w"> </span>ip<span class="w"> </span>link<span class="w"> </span><span class="nb">set</span><span class="w"> </span>lo<span class="w"> </span>up
</code></pre></div>
<p>This link is going to be blazingly fast, so let's add a small delay to the veth
interface, which corresponds to the empirical network latency we observe in
our Kubernetes clusters. Distribution parameter here is mostly to emphasize
its presence, since it's normal by default anyway.</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>tc<span class="w"> </span>qdisc<span class="w"> </span>add<span class="w"> </span>dev<span class="w"> </span>veth0<span class="w"> </span>root<span class="w"> </span>netem<span class="w"> </span>delay<span class="w"> </span>1ms<span class="w"> </span><span class="m">0</span>.1ms<span class="w"> </span>distribution<span class="w"> </span>normal
</code></pre></div>
<p>In our experiment we will run pgbench test with a query <code>;</code>, which is the
<a href="https://jakewheat.github.io/sql-overview/sql-2008-foundation-grammar.html#direct-SQL-statement">smallest SQL query</a> one can come up with. The idea is to not load the
database itself too much and see how PgBouncer instance will handle many
connections, which is in this case 1000 dispatched via 8 threads. A word of
warning: use pgbench carefully, since in some cases it could be a bottleneck
and produce confusing results. In our case we will try to limit this by pinning
all the components to a separate cores, collect performance counters to see
where what do we spend time and be alerted about strange results. But for a
more diverse workload and more holistic approach you can use <a href="https://github.com/oltpbenchmark/oltpbench/">oltpbench</a> or
<a href="https://github.com/petergeoghegan/benchmarksql">benchmarksql</a>.</p>
<p>The result will be per transaction execution <a href="https://www.postgresql.org/docs/current/pgbench.html#id-1.9.4.10.8.6">log</a>. Every component, namely:</p>
<ul>
<li>PostgreSQL instance</li>
<li>Two PgBouncer instances</li>
<li>PgBench workload generator</li>
</ul>
<p>is bound to a single CPU core, with Intel turbo being disabled and CPU
scaling governor for all the cores set to <code>performance</code>. Two instances of
PgBouncer will run with <code>so_reuseport</code> option, which is essentially a way to
get PgBouncer to use <a href="http://www.pgbouncer.org/config.html#so_reuseport">more CPU cores</a>. The only degree of freedom we will
investigate is their location between cores in relation to whether it's a real
separate core, or just a separate hyperthread.</p>
<h1>Benchmark</h1>
<p>Here are the benchmark results, presenting rolling mean, 99th latency and
standard deviation values, executed on a rather modest setup with 2 physical
cores each with 2 hyperthreads for three cases:</p>
<ul>
<li>Only one instance of PgBouncer on an isolated real core</li>
<li>Two PgBouncers on isolated hyperthreads, but on the same physical core.</li>
<li>Two PgBouncers on isolated cores (with potential noise from other components
on the different hyperthread).</li>
</ul>
<p><a href="https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf#page=311">Hyper-Threading</a> means than two components are still fighting for CPU time,
but will share some execution state and cache. Usually, it renders more
deviations in latency, which we will have in mind.</p>
<p><img alt="separate_cores tar gz" src="https://engineering.zalando.com/posts/2020/06/images/pgbouncer-cpu-cores.png"></p>
<p>One nice feature we can immediately see is that results are relatively stable,
which is good. Another interesting note is that despite the fact that we were
only changing the core location for every component, we can see a significant
difference in latency. For a single PgBouncer instance we've got the lowest
latency, while for two PgBouncers on the same physical core it's almost two
times higher (with somewhat minimal increase in throughput). In case of two
PgBouncers on a different physical cores, even with potential competition for
resources with another component (and a different resource consumption
pattern), the latency is somewhere in between (with the throughput best of the
three). Why is that?</p>
<p>In the course of investigation more and more puzzling measurements were
collected, showing no significant difference in sampling with <code>perf</code> of
PostgreSQL activity or both PgBouncer instances. Let's take a closer look at
what PgBouncer is actually doing:</p>
<p><img alt="pgbouncer" src="https://engineering.zalando.com/posts/2020/06/images/pgbouncer-flamegraph.png"></p>
<p>As expected, it spends a lot of its time doing networking. Kernel <a href="https://www.kernel.org/doc/html/latest/networking/scaling.html#suggested-configuration">docs</a>
says that:</p>
<blockquote>
<p>For interrupt handling, HT has shown no benefit in initial tests, so limit
the number of queues to the number of CPU cores in the system.</p>
</blockquote>
<p>This could be our working assumption. Network interrupts probably are not very
well scaled between hyperthreads, so one needs to use a real core to scale them
out. To get a bit more evidences, let's take a look at interrupts latencies in
both cases, different cores and different hyperthreads. For that we can use
<code>irq:softirq_entry</code> and <code>irq:softirq_exit</code> and a <a href="http://www.brendangregg.com/perf.html">script from Brendan Gregg</a>:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># one PgBouncer instance is running on a CPU2 with no other PgBouncer on the</span>
<span class="c1"># same physical core. We're interested only in NET_RX,NET_TX vectors.</span>
$<span class="w"> </span>perf<span class="w"> </span>record<span class="w"> </span>-e<span class="w"> </span>irq:softirq_entry,irq:softirq_exit<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-a<span class="w"> </span>-C<span class="w"> </span><span class="m">2</span><span class="w"> </span>--filter<span class="w"> </span><span class="s1">'vec == 2 || vec == 3'</span>
$<span class="w"> </span>perf<span class="w"> </span>script<span class="w"> </span><span class="p">|</span><span class="w"> </span>awk<span class="w"> </span><span class="s1">'{ gsub(/:/, "") } $5 ~ /entry/ { ts[$6, $10] = $4 }</span>
<span class="s1"> $5 ~ /exit/ { if (l = ts[$6, $9]) { printf "%.f %.f\n", $4 * 1000000,</span>
<span class="s1"> ($4 - l) * 1000000; ts[$6, $10] = 0 } }'</span><span class="w"> </span>><span class="w"> </span>latencies.out
</code></pre></div>
<p>And the same for another case when a PgBouncer sits together with another one
on the same physical core. Here is the 99th percentile of the resulting
latencies:</p>
<p><img alt="softirq_net_rx_net_tx_latencies" src="https://engineering.zalando.com/posts/2020/06/images/pgbouncer-softirq.png"></p>
<p>Which indeed points into the direction of network interrupts being a bit slower
for the case when both PgBouncers are sharing the same physical CPU. In theory,
it means that we can get surprising performance results after adding more pods
to a connection pool deployment depending on where did those new pods land, on
an isolated CPU or on a CPU with another hyperthread already busy. In the view
of these results it could be beneficial to configure <a href="https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/">CPU manager</a> in the
cluster, so that this would not be an issue.</p>
<h1>Conclusion</h1>
<p>Having said all above I must admit it's just a tip of the iceberg. If there
could be interesting complications about how to run a connection pooler within
a single node, you can imagine what happens on a higher architecture level.
We've spent a lot of time discussing different design possibilities for
Postgres Operator, e.g. whether it should be a single "big" pgbouncer instance
(with many processes reusing the same port) with an affinity to be close to the
database, or multiple "small" instances equidistant from the database. Every
design has its own trade-offs about network round trips and availability
implications, but since we value simplicity (especially in the view of such
complicated topic) we went for a rather straightforward approach relying on the
standard Kubernetes functionality:</p>
<ul>
<li>
<p>Postgres Operator creates a single connection pooler deployment and exposes
it via new service.</p>
</li>
<li>
<p>Connection pooler pods are distributed between availability zones.</p>
</li>
<li>
<p>Due the nature of connection pooling, pods are doing CPU intensive work with
minimal amount of memory (less than a hundred of megabytes in a simple case)
and it makes sense to create as many as needed to prevent resource
saturation. Those pods could be scattered across multiple nodes and availability zones which
means latency variability.</p>
</li>
<li>
<p>For those cases when this variability could not be tolerated, we would consider
creating manually a single "big" pooler instance with the affinity to put it
on the same node as the database and adjust CPU manager to squeeze everything
we can from this setup. This new instance would be a primary one for
connecting with another one providing HA.</p>
</li>
</ul>
<p>This simplicity should not be confused with ignorance, it's based on
understanding of proposed solution limitations and what could be adjusted
beyond them. As in my other blog posts and talks I would love to emphasize the
importance of the described methodology: even if you have such a complicated
system in your hand as Kubernetes it's important to understand what happens
underneath!</p>Learnings from Distributed XGBoost on Amazon SageMaker2020-06-22T00:00:00+02:002020-06-22T00:00:00+02:00Scott Joseph Smalltag:engineering.zalando.com,2020-06-22:/posts/2020/06/distributed-xgb-sagemaker.html<p>What I learned from distributed training with XGBoost on Amazon SageMaker.</p><h3>Overview</h3>
<p><a href="https://xgboost.readthedocs.io/en/latest/">XGBoost</a> is a popular Python library for gradient boosted decision trees. The implementation allows practitioners to distribute training across multiple compute instances (or workers), which is especially useful for large training sets.</p>
<p>One tool used at Zalando for deploying production machine learning models is the managed service from Amazon called <a href="https://aws.amazon.com/sagemaker/">SageMaker</a>. XGBoost is already included in SageMaker as a built-in algorithm, meaning that a prebuilt docker container is available. This container also supports distributed training, making it easy to scale training jobs across many instances.</p>
<p>Despite SageMaker handling the infrastructure side of things, I found that distributed training with XGBoost and SageMaker is not as easy as simply increasing the number of instances. I discovered a few small "gotchas!" when attempting a few simple trainings. This post will step through my failed attempts, and end with a genuine distributed training with XGBoost in Amazon SageMaker.</p>
<h3>Experiment Setup</h3>
<p>I wanted to get an intuitive idea of how well the training time with XGBoost scaled as the number of instances scaled, as well as the training time when the data size increases. I am not especially interested in producing the "best" model to solve a problem, per-say, but there is a natural trade-off between training time and model accuracy that should considered.</p>
<p>For a data set, I used the <a href="https://github.com/zalandoresearch/fashion-mnist">Fashion MNIST</a> by Zalando Research. The problem itself is to classify small images (28x28 pixels) of clothing as being from 1 of 10 different classes (t-shirts, trousers, pullovers, etc). The data set has 60,000 images for a training set and 10,000 images for a validation set.</p>
<p>To increase the training size, I duplicate the training data to measure the scaling of model training time as the computational resources change. The number of times the training data is duplicated is referred to as the <em>"replication factor"</em>. For a typical ML project, you probably don't want to duplicate the training set outright. Although doing so improves our training and validation accuracies here, this method is likely not as efficient as changing hyperparameters (however, you might create new images with noise to improve regularization). For reference, the size on disk of the training data for different replication factors is provided below.</p>
<ul>
<li>Replication factor 1: 0.63 GB, 60,000 images</li>
<li>Replication factor 2: 1.24 GB, 120,000 images</li>
<li>Replication factor 4: 2.48 GB, 240,000 images</li>
<li>Replication factor 8: 4.95 GB, 480,000 images</li>
</ul>
<p>I wanted to use hyperparameters that would give a somewhat reasonable performance for accuracy, so I used a hyperparameter tuning job in SageMaker, with one instance per training. I tuned all of the tunable hyperparameters, except <code>"num_round"</code>, which was fixed to 100. This hyperparameter increases the number of decision trees used, and increases training time and accuracy as its value increases. My hyperparameters were as follows:</p>
<div class="highlight"><pre><span></span><code>hps = {'alpha': 0.0,
'colsample_bylevel': 0.4083530569296091,
'colsample_bytree': 0.8040025839325579,
'eta': 0.11764087266272522,
'gamma': 0.43319156621549954,
'lambda': 37.547406128070286,
'max_delta_step': 10,
'max_depth': 6,
'min_child_weight': 5.076838893848415,
'num_round': 100, # Not tuned: kept fixed
'subsample': 0.8915771964367318,
'num_class': 10, # Not tuned: defined by Fashion MNIST
'objective': 'multi:softmax' # Not tuned: defined by Fashion MNIST
}
</code></pre></div>
<p>There are additional hyperparameters than those listed above which are not tunable. I took those as their default value (which, as you will see, can cause some unexpected results). The full list of <a href="https://xgboost.readthedocs.io/en/release_0.90/parameter.html">hyperparameters offered by XGBoost</a> is different from the those <a href="https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost_hyperparameters.html">offered by the SageMaker container</a> as SageMaker adds a few additional hyperparameters which do not control model performance. The objective <code>"multi:softmax"</code> produces a metric called <code>merror</code>, which is defined as <code>#(wrong cases)/#(all cases)</code>.</p>
<p>Lastly, the tools. I wrote all of the code for my experiments in Python 3.7 using the <a href="https://sagemaker.readthedocs.io/en/stable/overview.html">Amazon SageMaker Python SDK</a>. I used the SageMaker docker container version 0.90-1 for XGBoost, the URI of which can be found by using the SageMaker Python SDK:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">sagemaker.amazon.amazon_estimator</span> <span class="kn">import</span> <span class="n">get_image_uri</span>
<span class="n">container</span> <span class="o">=</span> <span class="n">get_image_uri</span><span class="p">(</span><span class="n">region</span><span class="p">,</span> <span class="s1">'xgboost'</span><span class="p">,</span> <span class="n">repo_version</span><span class="o">=</span><span class="s1">'0.90-1'</span><span class="p">)</span>
</code></pre></div>
<p>For each of the SageMaker training jobs, I used the <code>ml.m5.xlarge</code> <a href="https://aws.amazon.com/sagemaker/pricing/instance-types/">instance</a>.</p>
<h3>Failed Attempt: Naive Distributed Computing</h3>
<p>My first attempt was to check how the training time scaled as the number of instances increases. I expected to see a roughly linear improvement: if the number of instances doubles, then the training time should be cut in half.</p>
<p>I used a naive approach: other than the settings mentioned in the "Experiment Setup" section, I used default values, including a replication factor of 1. What I found was very different from my expectations:</p>
<p><img alt="null" src="https://engineering.zalando.com/posts/2020/06/images/fullyreplicated.png"></p>
<p>There are two things to note here. Going from 1 to 2 instances <em>increases</em> the training time, though I expected to see the training time cut in half. Going beyond 2 instances, the training time is relatively flat.</p>
<p>Going from 1 to 2 instances demonstrates an internal switch of non-distributed to distributed training with XGBoost. There is a hyperparameter called <code>tree_method</code> which sets the algorithm used for computing splits at a node in a decision tree of XGBoost. The default for <code>tree_method</code> is <code>"auto"</code>. For one instance, their greedy algorithm called <code>"exact"</code> is used. For more than 1 instance, an algorithm called "approx" is used, which approximates the greedy algorithm. The logic behind "auto" is explained in the <a href="https://xgboost.readthedocs.io/en/release_0.90/parameter.html#parameters-for-tree-booster">XGBoost documentation</a> (as well as other algorithm choices) and the implementation of "exact" and "approx" are described in the <a href="https://arxiv.org/abs/1603.02754">XGBoost paper</a> from Chen and Guestrin.</p>
<p>The second thing to note is that after two instances, the training time remains flat as more instances are added. This is because each instance is training using the <em>same data</em>. In SageMaker, unless otherwise specified, the entire training data set is distributed to each instance. This setting is called <code>"FullyReplicated"</code>. However, XGBoost expects that each instance receives a subset of the full data set. Another way to think of this is that the training data is completely replicated a number of times equal to the number of instances, and then each copy sent to each instance.</p>
<p>The data distribution can be corrected by <em>sharding</em> the training data (dividing it into different files, one for each instance), and defining an <code>s3_input</code> object such as</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">sagemaker</span>
<span class="n">s3_input_train</span> <span class="o">=</span> <span class="n">sagemaker</span><span class="o">.</span><span class="n">s3_input</span><span class="p">(</span><span class="n">s3_data</span><span class="o">=</span><span class="n">s3_location</span><span class="p">,</span>
<span class="n">content_type</span><span class="o">=</span><span class="s1">'csv'</span><span class="p">,</span>
<span class="n">distribution</span><span class="o">=</span><span class="s1">'ShardedByS3Key'</span><span class="p">)</span>
</code></pre></div>
<p>and then starting a training job by passing the <code>s3_input</code> objects for training (and a similar one for validation):</p>
<div class="highlight"><pre><span></span><code>xgb.fit(inputs={'train': s3_input_train, 'validation': s3_input_validation})
</code></pre></div>
<h4>Take-Aways</h4>
<ul>
<li>XGBoost takes different default actions for the hyperparameter <code>tree_method</code> when moving to distributed training from non-distributed training. We should be mindful of this when estimating training times and when tuning hyperparameters.</li>
<li>XGBoost expects data to be split for each of the instances, but SageMaker by default sends the entire data set to each instance. We need to set the data distribution to <code>"ShardedByS3Key"</code> in SageMaker to match the expectations of XGBoost.</li>
</ul>
<h3>Failed Attempt: Using the Greedy Algorithm</h3>
<p>To correct my previous failed attempt at distributed training, I made two changes to my experiment:</p>
<ul>
<li>I set the value of the hyperparameter <code>tree_method</code> to <code>"exact"</code>, so that each training job uses the same value of <code>tree_method</code>.</li>
<li>I set the data distribution for SageMaker to <code>"ShardedByS3Key"</code>, and divided my training set randomly so that each instance gets a different piece of the training set.</li>
</ul>
<p>In addition to the expected training times (i.e. doubling the number of instances cuts the training time in half), I also tried increasing the replication factor to get a sense of the scaling of training time compared to the size of the training set. I expect something similar: if the size of the training data doubles, then the training time should double.</p>
<p><img alt="null" src="https://engineering.zalando.com/posts/2020/06/images/scaling_linear.png"> <img alt="null" src="https://engineering.zalando.com/posts/2020/06/images/scaling_log.png"></p>
<p>The first plot shows the training times for each of the 4 replication factor. The second plot is the same as the first, but in log scale. The dotted lines indicate my expected training time (i.e. doubling the number of instances should halve the training time).</p>
<p>This actually looks pretty good! The training times match well with what one might expect. The trainings with higher replication factors require more time to run computations as there is more data to process. In fact, it's about a factor 2 increase in the training time when the training data size is doubled. It's also worth pointing out that more training data results in better scalability. In fact, with lower replication factors, the training time plateaus (actually, it even increases a little) with a larger number of instances. This would suggest that the overhead costs are eating the benefits of distributing the workload.</p>
<p>At first glance, everything seems to be ok: more training data implies longer training times, more compute resources implies shorter run times. But a check of the training and validation errors shows that something is not right:</p>
<p><img alt="null" src="https://engineering.zalando.com/posts/2020/06/images/scaling_training.png"> <img alt="null" src="https://engineering.zalando.com/posts/2020/06/images/scaling_validation.png"></p>
<p>As the number of instances increases, the error for training and validation <em>increases</em>. This is an artifact of the hyperparameter <code>tree_method</code>. For distributed training, XGBoost does <strong>NOT</strong> implement the <code>"exact"</code> algorithm. However, SageMaker has no problem letting us select this value in the distributed trainings. In this situation, the training data is divided among the instances, and then each instance calculates its own XGBoost model, ignoring all other instances. Once each instance is finished, the model from the first instance is saved, and the others are discarded.</p>
<p>The timing and error graphs reflect this behavior: as the number of instances increases, the training data on any given instance is smaller, resulting in faster trainings but worse error. A cheaper way to replicate this experiment is to throw away a percentage of the training data and then train with only one instance.</p>
<h4>Take-Aways</h4>
<ul>
<li>Don't use <code>"exact"</code> for the value of <code>tree_method</code> with distributed XGBoost, because it's not actually implemented on the XGBoost side. Use instead <code>"approx"</code> or <code>"hist"</code>.</li>
</ul>
<h3>Successful Attempt: Distributed XGBoost with SageMaker</h3>
<p>After the learning from the previous attempt, I repeated the experiment, but this time using <code>"approx"</code> for <code>tree_method</code>. This does introduce a new hyperparameter, called <code>sketch_eps</code>, for which I use the default value.</p>
<p><img alt="null" src="https://engineering.zalando.com/posts/2020/06/images/scaling_linear_approx.png"> <img alt="null" src="https://engineering.zalando.com/posts/2020/06/images/scaling_log_approx.png"></p>
<p>The scaling looks good here and similar to those from experiment 2, albeit with longer training times. A check of the training and validation errors is more satisfying:</p>
<p><img alt="null" src="https://engineering.zalando.com/posts/2020/06/images/scaling_training_approx.png"> <img alt="null" src="https://engineering.zalando.com/posts/2020/06/images/scaling_validation_approx.png"></p>
<p>From the training and validation errors, we do see noise appearing. Note that there is randomness to using XGBoost: the piece of the training set given to each instance was selected randomly for each training, and node splitting in a decision tree has randomness (see hyperparameters like <code>subsample</code> or <code>colsample_bylevel</code>).</p>
<h4>Take-Aways</h4>
<ul>
<li>Using many instances with a "low" amount of training data is a waste of computational resources. For example, using a replication factor of 1, the training time of using 10 instances is not much better than using 3 or 4.</li>
<li>When the training data is sufficiently large, doubling the number of instances approximately halves the training time.</li>
<li>The scaling in training data size is about what we expect: doubling the training data approximately doubles the training time.</li>
</ul>
<h3>Conclusion</h3>
<p>Amazon SageMaker makes it easy to scale XGBoost algorithms across many instances. But with so many "knobs" to play with, it's easy to create an inefficient machine learning project.</p>How to work remotely at Zalando2020-03-13T00:00:00+01:002020-03-13T00:00:00+01:00Tim Krögertag:engineering.zalando.com,2020-03-13:/posts/2020/03/how-to-work-remotely-at-zalando.html<p>Going fully remote as a company from one day to another is a challenge. Working remotely requires a clear set of “rules to live by” that have 100% buy-in across the company, and a healthy system of meetings, events, and habits that keep people communicating.</p><p><em>This document is heavily informed by remote work guidance from other companies and authors. Notable sources include
FYI's <a href="https://usefyi.com/remote-work-best-practices/">11 Best Practices for Working Remotely</a> and Laurel Farrer’s <a href="https://www.yonder.io/post/how-to-design-powerful-rituals-for-successful-distributed-companies">How
to Design Powerful Rituals for Successful Distributed
Companies</a>. Special
thanks to <a href="https://twitter.com/teemow">Timo</a> from <a href="https://www.giantswarm.io/">GiantSwarm</a> for sharing learnings in an
ad-hoc phone call. Other sources are linked in the appendix. We would like to highlight that we added a link to Alice
Goldfuss’ <a href="https://blog.alicegoldfuss.com/work-in-the-time-of-corona/">Work in the Time of Corona</a>, which was published
after this document was available internally, because of how succinctly and thoroughly she covers areas that other
guidelines address partially at best. Zalando has some remote working experience due to <a href="https://jobs.zalando.com/en/tech/locations/">our tech
hubs</a>, but we do not consider ourselves experts in this matter. That being
said, we want to share our internal guidelines in the hope that others might find them useful.</em></p>
<p>Going fully remote as a company from one day to another is a challenge. Working remotely requires (1) a clear set of
“rules to live by” that have 100% buy-in across the company, and (2) a healthy system of meetings, events, and habits
that keep people communicating.</p>
<p>Due to the current circumstances, we have an opportunity to practice remote collaboration. Compared to just one team
member doing mobile work and everyone else being co-located, we have the advantage that everybody is in the same
situation (all remote). You can even get to know your colleagues better. Maybe introduce your co-workers to your cat
during a video call.</p>
<p>This document contains guidelines, tips, and expectations to make 'remote' possible in our current situation. Please
read these carefully and apply them in your teams adjusting to your special circumstances, if needed. The most important
baseline rules to follow are:</p>
<ul>
<li>Get VPN (needed for some internal Zalando tools and datacenter access) and make yourself familiar with Zalando's
privacy information [internal link]</li>
<li>Establish daily standups via chat and video</li>
<li>Have weekly 1:1s between manager and team members</li>
<li>Perform weekly team retrospectives</li>
<li>Establish personal and team rituals</li>
<li>Prioritize documentation and clear communication</li>
<li>Embrace asynchronous work and communication</li>
</ul>
<p>We expect every tech leader in Zalando to follow these baseline requirements, and support and empower their teams. The
appendix contains the FAQ, additional tips, and resource links.</p>
<h2>Guidelines</h2>
<h3>Managers</h3>
<ul>
<li>💬 Establish daily standups via chat (asynchronous) and video call (synchronous).</li>
<li>👫 Establish regular weekly 1:1 meetings (video calls) to check in regularly with your directs.</li>
<li>😊 Create a safe environment and culture for team members to report when they are away from the keyboard (e.g. "I'm
AFK" in team chat, or via <a href="https://support.google.com/hangoutschat/answer/9093489">Google Chat Snooze</a>) to prevent
the feeling of being pressured to always be online.</li>
</ul>
<h3>Practice good meeting etiquette</h3>
<ul>
<li>🎥 Prefer <a href="http://meet.google.com">Hangouts Meet</a> over chat, turn on video to understand non-verbal communication.</li>
<li>📵 Be present and don’t fiddle with the phone.</li>
<li>🤩 Use agendas to communicate the purpose of a meeting.</li>
<li>📄 Share a document as pre-read and solicit comments before the meeting.</li>
<li>📝 Write meeting notes (assign a note-taker!) and share them.</li>
<li>👍 Define action items and owners.</li>
<li>⏲️ Start on time, end on time.</li>
</ul>
<p>GitLab provides some <a href="https://about.gitlab.com/company/culture/all-remote/meetings/">good advice for All-Remote
Meetings</a>.</p>
<h3>Prioritize documentation and clear communication</h3>
<ul>
<li>Document more than normal e.g. outlines of your ideas, next steps, meeting notes.</li>
<li>Collaborate virtually, e.g. virtual whiteboards & sticky notes (use <a href="https://docs.google.com/presentation/d/1XfQhQLKRr-BzlxqmJgKLVFlxS5ycuWklRqYvRaiive4/edit#slide=id.p">Google
Slides</a> or
<a href="https://jamboard.google.com/">Google Jamboard</a>, a digital whiteboard), work on documents in real-time. Check out
“Working with Google Software at the Zalando Workspace” instructions [internal link].</li>
<li>Share how you feel by using emojis 🤗. What’s going well? What’s not going well? Explain how you are feeling and when
you need help.</li>
<li>Empathy is everything: always assume positive intent. Tone and nuance can get lost over chat, so assuming your
colleague is coming from a positive place helps with potential misunderstandings. If you think your colleague acts
weird, or a chat is getting too long or confusing, have a video call.</li>
<li>Say what is obvious too: communicating everything explicitly is key to avoid misunderstandings.</li>
<li>Take care of the Google Drive structure so that people can find documents faster. Familiarize yourself with the
search features, e.g. searching within a subfolder is possible via the triangle on the right of the search bar 🔍.</li>
</ul>
<h3>Create boundaries between work and life</h3>
<p>Boundaries between work and life get blurred when working remotely. We want to prevent that work environment and home
environment merge into one. It’s easy to adopt bad routines, like waking up and immediately checking your email, sitting
down for breakfast while working, keep working throughout the day without going for lunch or regularly drinking some
water. Suddenly it’s 21:00 and you’re dehydrated, hungry, a headache is creeping up, but you’re still working.
Unplugging is important to stay healthy. Our core working hours are between 10:00 to 16:00 local time and yes, you are
responsible for getting your work done and to make sure to attend meetings while working your regular hours, but please
use the following guidance to stay healthy.</p>
<ul>
<li>📅 Time-block your day so you have a start and end time: <a href="https://support.google.com/calendar/answer/7638168?hl=en">configure your work time in Google
Calendar</a>. This makes it transparent for your colleagues
and manager when you are available and when not.</li>
<li>🍲 Plan and block your lunch slot as a recurring public event. This helps you stay healthy and manages expectations
for availability.</li>
<li>⏰ Plan regular breaks, e.g. by <a href="https://apps.apple.com/us/app/time-out-break-reminders/id402592703">setting a break
reminder</a> and stay hydrated.</li>
<li>💻 Create a physical space for work at home that you can leave at the end of the day (i.e. don’t work from bed).</li>
<li>💼 Use props that signal your brain that you’re working (e.g. work shoes, work shirt).</li>
<li>📴 Switch off when you're away from work.</li>
<li>🎵 Use background music or sound to help with concentration. Background noise helps in creating an environment which
you associate with working. You can share your favorite playlists within the team for that.</li>
</ul>
<h4>Tune In</h4>
<ul>
<li>👋 Check-in to team-chat by stating that you’re starting to work and what you worked on the day before.</li>
<li>📥 Assign tickets (e.g. GitHub issues) to yourself when you start working on them. Leave a comment to inform the
whole team about progress.</li>
<li>🔕 Update your chat status (e.g. mute) when you need to focus.</li>
</ul>
<h4>Tune Out</h4>
<ul>
<li>✌️ Check-out of team-chat ("heading out from work", "AFK" for "away from keyboard").</li>
<li>🌜 Use the <a href="https://support.google.com/hangoutschat/answer/9093489">Google Chat "Snooze Notifications" feature</a> to
signal absence. If you have set up your work hours in Google Calendar, this happens automatically for non-work
hours.</li>
<li>📤 Commit work frequently instead of only committing locally. Finish up by committing in the evening and provide a
short summary in the ticket on the progress or blockers.</li>
</ul>
<h3>Make yourself visible and be responsive</h3>
<p>Organizing expectations around communication creates a healthy relationship between employees and supervisors — no one
will have concerns about productivity expectations or be left in the dark.</p>
<ul>
<li>✉️ Catch up on email at least twice a day to stay informed.</li>
<li>📅 Check your calendar, respond to invites with a 'yes' or 'no' plus comment. Attend appointments.</li>
<li>💬 Scan relevant chats (esp. your team chat) every hour.</li>
<li>📟 Find a balance between synchronous team interaction and embracing the benefits of an asynchronous work style. You
can stay online when working, and update your team via chat on what you’re working on, or manage expectations around
check-ins. This way we compensate for the loss of ad-hoc availability from not sitting next to each other.</li>
</ul>
<h3>Reflect and Adapt</h3>
<p>The new remote situation is radically different from how your team worked before. Set up weekly team retrospectives
(video call) to recap what worked well and what can be improved. We recommend using <a href="https://docs.google.com/presentation/d/1XfQhQLKRr-BzlxqmJgKLVFlxS5ycuWklRqYvRaiive4/edit#slide=id.p">Google slides to simulate a
whiteboard</a> with
sticky notes: the first slide is the whiteboard. The following slides are for each team member (one slide per member)
where they can prepare red & green "sticky notes" before the retrospective meeting. The meeting runs similar to a
physical meeting: 1) every team member copies their notes to the "whiteboard" (1st slide), 2) the team clusters the
notes on the whiteboard, 3) the team selects 1-2 most important issues, 4) the team defines action items and next steps.</p>
<p>APPENDIX</p>
<h2>FAQ</h2>
<h3>What about the monthly tech onboarding and engineering bootcamp?</h3>
<p>Tech onboarding and engineering bootcamp will happen remotely through <a href="http://meet.google.com">Google Hangout Meets</a>.</p>
<h3>What should I do if my Internet connection at home is unavailable or slow?</h3>
<p>If you don't have Internet at home or an unstable or slow connection and no company-provided phone, please contact
Helpdesk which can provide phones for tethering.</p>
<h2>Other Tips for Successful Remote Work</h2>
<p>These tips are copied from Trello's excellent <a href="https://blog.trello.com/remote-work-team-success-guide">The Best Advice For Remote Work Success From 10 Global
Teams</a> (free PDF guide).</p>
<h3>Chat vs. Video Calls</h3>
<p>Recognizing the humanity in team members via seeing their face on a video call is a game-changer:</p>
<ul>
<li>Tools can mask intention and humanity: Keep in mind that at the end of the chat is a human being with feelings and
reactions.</li>
<li>If you have constructive feedback to give, do it over a video call so your intentions come across.</li>
<li>Due to a lack of verbal and emotional cues: One person may perceive a chat convo as an argument when the other
person perceives it as a discussion.</li>
<li>Resentment builds over time due to underlying issues not being addressed. Digital communication gone rogue can breed
misunderstandings and hurt feelings.</li>
</ul>
<h3>Expect Structure</h3>
<p>Establish a process, structure, and agenda around meetings and updates so everyone can follow along no matter their
location. Assign a meeting lead and scribe (note taker) to ensure key decisions are captured in writing.</p>
<h3>Treat Others With Transparency</h3>
<p>Keep important information accessible for everyone: log side chat decisions, record video meetings, and always take
notes to share in public (company-internal) spaces.</p>
<h3>Use Video for Face-to-Face</h3>
<p>Seeing as up to 10,000 non-verbal cues can be exchanged in one minute of face-to-face interaction. Video meeting tools
( <a href="http://meet.google.com">Hangouts Meet</a>) are essential for building relationships with others. You can set up
team-building activities over video that play into the strengths of remote work, like sharing your office view or
introducing your cat to your coworker’s dog and watching the furry friendship unfold.</p>
<h3>Never work from bed</h3>
<p><em>"When I started working 100% remotely at Buffer, I set the rule for myself that I would never work from bed, and here’s
why:- It becomes more difficult to fall asleep because working from bed weakens the mental association between your
bedroom and sleep.- You may start to feel like you’re always at work and lose a place to come home to.- Your quality of
sleep will decrease because using electronics before bed reduces the melatonin you need to fall asleep.”</em>
- Hailley Griffis, Future of Work Marketer, Buffer</p>
<h2>Resources</h2>
<p><a href="https://grafana.com/blog/2020/03/12/how-to-work-from-home-effectively-tips-from-the-remote-first-grafana-labs-team/">Grafana: How to work from home effectively: Tips from the remote-first Grafana Labs
team</a></p>
<p><a href="https://blog.mistro.io/2020/03/11/10-hacks-to-improve-your-wfh-experience-in-10-minutes-or-less/">mistro: 10 hacks to improve your WFH experience in 10 minutes (or
less)</a></p>
<p><a href="https://blog.alicegoldfuss.com/work-in-the-time-of-corona/">Alice Goldfuss: Work in the Time of Corona</a></p>
<p><a href="https://about.gitlab.com/company/culture/all-remote/meetings/">GitLab: All-Remote Meetings</a></p>
<p><a href="https://about.gitlab.com/company/culture/all-remote/what-not-to-do/">GitLab: What not to do when implementing remote: don't replicate the in-office experience
remotely</a></p>
<p><a href="https://www.yonder.io/post/how-to-design-powerful-rituals-for-successful-distributed-companies">Yonder: How to Design Powerful Rituals for Successful Distributed
Companies</a></p>
<p><a href="https://usefyi.com/remote-work-best-practices/">fyi: 11 Best Practices for Working Remotely</a></p>
<p><a href="https://www.techrepublic.com/article/the-10-rules-found-in-every-good-remote-work-policy/">TechRepublic: The 10 rules found in every good remote work
policy</a></p>
<p><a href="https://blog.giantswarm.io/taking-care-remotely/">GiantSwarm: Taking Care Remotely</a></p>
<p><a href="https://blog.giantswarm.io/giant-swarm-is-remote-first-and-i-put-it-to-the-test/">GiantSwarm: Giant Swarm is "Remote First" and I put it to the
test</a></p>
<p><a href="https://blog.giantswarm.io/surviving-and-thriving-how-to-really-work-emotely/">GiantSwarm: Surviving and Thriving: How To Really Work
Remotely</a></p>
<p><a href="https://blog.trello.com/remote-work-team-success-guide">Trello: The Best Advice For Remote Work Success From 10 Global Teams [Free
Guide]</a></p>
<p><a href="https://www.youtube.com/watch?v=y5KH6SpgEng">SRECon2017: Don't Call Me Remote! Building and Managing Distributed Teams -
Facebook</a></p>
<p><a href="https://www.inc.com/jeff-haden/it-only-takes-8-words-to-create-the-best-work-from-home-policy-youll-ever-see.html">Inc.: It Only Takes 7 Words to Create the Last Work-From-Home Policy You'll Ever
Need</a></p>
<p><a href="https://klinger.io/post/180989912140/managing-remote-teams-a-crash-course">Andreas Klinger: Managing Remote Teams - A Crash Course - Startup Lessons
Learned</a></p>
<p><a href="https://www.wired.com/story/how-to-work-from-home-without-losing-your-mind/">Wired: How to Work From Home Without Losing Your
Mind</a></p>Open Source: June Updates - New releases, continue to foster diversity and inclusion in tech2019-07-15T00:00:00+02:002019-07-15T00:00:00+02:00Hong Phuc Dangtag:engineering.zalando.com,2019-07-15:/posts/2019/07/oss-june-updates.html<p>This is a recap of open source activities and development at Zalando in the month of June.</p><h2>Project Highlights</h2>
<ul>
<li>
<p><a href="https://github.com/zalando-incubator/kopf">Kopf - Kubernetes Operator Pythonic Framework</a> now supports built-in resources and can be used to write controllers of any kind (pods, namespaces, mixed), not only of custom resources. Check out the latest release for more details <a href="https://github.com/zalando-incubator/kopf/releases">https://github.com/zalando-incubator/kopf/releases</a></p>
</li>
<li>
<p><a href="https://github.com/zalando/skipper">Skipper</a> publishes new releases weekly. Some of the important features were implemented such as support to proxy Kubernetes API server and support Kubernetes externalName services from ingress.</p>
</li>
<li>
<p><a href="https://github.com/zalando-incubator/kube-ingress-aws-controller">Kubernetes Ingress Controller for AWS</a> added dualstack and ssl-policy support in its last release. The controller helps to configure AWS application load balancers according to Kubernetes Ingress resources.</p>
</li>
</ul>
<h2>Foster diversity and inclusion in tech</h2>
<p>Zalando hosted the launch event of Persian Women In Tech Berlin. This is the first event of the new Berlin chapter of the international organization of <a href="http://www.persianwomenintech.com/">Persian Women in Tech</a>. <a href="https://twitter.com/sherybrauner">Shery Brauner</a>, spoke about her career path from Iran to Germany, daring to take risks and now leading an engineering team at Zalando.</p>
<p><img alt="shery" src="https://engineering.zalando.com/posts/2019/07/shery1.jpeg#center"></p>
<p>Learn more about Zalando's initiatives around diversity and inclusion topics here: <a href="https://jobs.zalando.com/en/diversity">https://jobs.zalando.com/en/diversity</a></p>
<h2>Zalando Around The World</h2>
<p>Meet and connect with Zalando representatives at tech events around the world:</p>
<p><a href="https://openexpoeurope.com">OpenExpo Europe</a>, Madrid, Jun 20: <a href="https://twitter.com/hpdang">Hong Phuc Dang</a>, InnerSource manager, shared how Zalando applies open source practices internally, tools and processes that we use to foster alignment and collaboration within the company. View <a href="https://github.com/zalando/public-presentations/blob/master/files/2019-06-20_Open_Source_Within_Corporate_Walls-OpenEXPO_Madrid%20.pdf">Hong's slides</a></p>
<p><a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/262032282/">Data Engineering Meetup</a>, Berlin, Jun 20: <a href="https://de.linkedin.com/in/gargsuyash">Suyash Garg</a> gave an update on Zalando <a href="https://github.com/zalando-nakadi">Nakadi project</a>, with a focus on Nakadi SQL - a SQL engine for streaming queries over Nakadi Event Types.</p>
<iframe width="600" height="360" src="https://www.youtube.com/embed/wPxn7lBSUnQ?list=PL28EP3RAJ4QC782oZPG6A4YQw-gFvCrNr" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<hr>
<p><a href="https://www.containerdays.io/">ContainerDays</a>, Hamburg, Jun 24 - 26: <a href="https://twitter.com/try_except_">Henning Jacobs</a> presented his well-known <a href="https://srcco.de/posts/kubernetes-failure-stories.html">Kubernetes Failure Stories</a></p>
<iframe width="600" height="360" src="https://www.youtube.com/embed/LpFApeaGv7A" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<hr>
<p><a href="https://github.com/zalando/public-presentations">Public Zalando Tech Presentations Repository</a> is a compiled list of public talks by Zalando employees including meetup presentations, recorded conference talks, slides, etc. We try to keep the list up-to-date. <strong>Do check it out!</strong></p>How we release open source projects2019-05-27T00:00:00+02:002019-05-27T00:00:00+02:00Per Plougtag:engineering.zalando.com,2019-05-27:/posts/2019/05/how-we-release-open-source-projects.html<p>Zalando has over 200 open source projects on Github, with about two new proposed projects a month. Over the years we've refined our process to ensure it is easy and transparent to publish projects as open source, and keep maintainers accountable for the well-being of their projects.</p><p>This blog post describes how we manage the process of proposing, reviewing and approving projects to become open source, while at the same time ensuring project code follows our compliance rules, and the maintainers of the projects are aware of their responsibilities.</p>
<p><a href="https://opensource.zalando.com/docs/releasing/index/">See our formal release guidelines</a></p>
<h2>Overview</h2>
<p>The process involves five steps that take the project from internal source code, through a review phase to our incubator, which eventually results in the project being graduated into our top level organisation, or archived as an inactive project due to lack of activity or maintainers:</p>
<ol>
<li>An internal project is proposed for release by a Zalando engineer</li>
<li>The project proposal is reviewed by the internal open source review group</li>
<li>If approved, the project is published on the Zalando Incubator on GitHub</li>
<li>The project activity and health is monitored by the open source team</li>
<li>The project graduates from the incubator and into the main Zalando organisation, or the project is decommissioned and marked as archived.</li>
</ol>
<p>How we monitor incubator projects and decide on whether to promote or archive them will be detailed detail in a later blogpost.</p>
<h2>Proposing a new open source project</h2>
<p>The first step to getting an internal project published on the Zalando Incubator is to fill out a google form and confirm understanding our requirements, <a href="https://opensource.zalando.com/docs/releasing/index/">which is available here</a>.</p>
<p>Anyone inside of Zalando can do this and this step serves 2 purposes:</p>
<ol>
<li><strong>To collect information required to publish a project</strong>, such as its current location, who will be maintaining it and the long term plan for maintaining it.</li>
<li><strong>To set expectations for the maintainers</strong>, such as amount of time needed to maintain the project, sign-off from the developers' engineering lead and ensuring the project does not require internal Zalando dependencies.</li>
</ol>
<p><img alt="Project Release Form" src="https://engineering.zalando.com/posts/2019/05/os-release-form.png"></p>
<p>You can see a public version of the <a href="https://goo.gl/forms/9C4xlel5DlIK52Xw1">approval form without validation here</a>.</p>
<p>Questions addressing who signed-off on publishing the project and how many hours the developers can commit to maintain the project serve as a good way to set expectations, both for the lead who appoved and for the maintainers. To run a sustainable project requires a commitment and we do not expect developers to use their private time—instead ensuring time will be made to work on the project should be part of the conversation.</p>
<p>We also address the need to have basic project health files in place such as a Code of Conduct, ways for users to get in touch in case of security issues, features or bugs, by providing maintainers with a <a href="https://github.com/zalando-incubator/new-project">standard set of files</a> for guidance. We do this for 2 reasons:</p>
<ol>
<li>Ownership of code should be visible to other teams inside Zalando, and to potential audits, beyond ownership, these files also communicate how to contribute, how to report security issues and our code of conduct.</li>
<li>Communication channels must be public, so maintainers of a project can be approached by external contributors. We want to avoid the throw code over the wall antipattern, so having clear ways to reach our maintainers is a central part of taking active ownership of code.</li>
</ol>
<h2>The Open Source Review Group</h2>
<p>When a project is proposed, it is automatically shared on an internal mailing-list that consists of everyone at Zalando currently maintaining an approved project. This group is currently about two hundred people, which allows us to spread the decision making process across many different people and viewpoints.</p>
<p><img alt="Review Group" src="https://engineering.zalando.com/posts/2019/05/review-group.png"></p>
<h3>Discussing the why</h3>
<p>The point here is to have as many eyes on the proposal as possible, specifically we are interested in discussing the WHY of releasing a project and the 3 questions below is central in this discussion:</p>
<ol>
<li>Will the project be sustainable?</li>
<li>Do Zalando have any value in open sourcing and maintaing it long term?</li>
<li>Does it have any value to anyone outside Zalando?</li>
</ol>
<p>When code is released as open source, you are essentially sharing something of value, and, you are also taking responsibility for committing time to the additional overhead associated with open sourcing. This commitment and exchange of value should be justified. There are multiple ways to look at this, such as:</p>
<ol>
<li>The project contributes positively to the employer branding efforts and supports hiring of tech talent</li>
<li>The project helps establish the company as a leader in a certain domain</li>
<li>The project will gain features and bugfixes from external community members</li>
<li>The maintainer team could gain valuable knowledge through collaborating with external community members</li>
</ol>
<p>At Zalando we've seen several projects contribute to our employer branding efforts, it is however a side-effect and should not be the main reason for open sourcing. It is of course nice that Zalando is recognised for its Kubernetes (<a href="https://github.com/kubernetes-incubator/external-dns">External-DNS</a>, <a href="https://github.com/zalando-incubator/stackset-controller">Stackset-Controller</a>, <a href="https://github.com/zalando-incubator/es-operator">es-operator</a>), PostgreSQL (<a href="https://github.com/zalando/patroni">Patroni</a> and <a href="https://github.com/zalando/postgres-operator">postgres-operator</a>) and Machine Learning projects (<a href="https://github.com/zalandoresearch/flair">Flair</a> and <a href="https://github.com/zalandoresearch/fashion-mnist">Fashion-mnist</a>). Nonethless it is hard to measure the brand impact of such projects, and not a long-term motivation for the maintainers or Zalando.</p>
<p>Justifying open sourcing is not easy, a fair amount of guessing is involved since you do not know how people outside the company will receive and adopt your projects. However, making an assesment of possible impact before release will be good guidance for the project maintainers.</p>
<h3>Reviewing project quality</h3>
<p>Besides discussing the WHY, the open source team looks at compliance-specific areas which could be a blocker for releasing:</p>
<ol>
<li>Do we use dependencies which have incompatible licensing</li>
<li>Does the source code contain anything confidential (such as tokens, urls, passwords, etc)</li>
<li>Does the project contain functionality or IP which gives Zalando a competetive advantage (such as the code that powers our search results)</li>
<li>Is the project something Zalando would consider trying to patent?</li>
</ol>
<p>We use a dependency licensing scanning tool, as well as a source code scanner to look for tokens and passwords, to automate this as much as possible.</p>
<h3>Review Meeting</h3>
<p>Once a month the review group sits down with the maintainers proposing new projects. The discussions from the mailing-list are considered by the group and a decision is made. The project is either released, rejected, or, the maintainers are asked to improve certain aspects of the project before it can be released. By including the maintainers directly in the discussion we avoid having a blackbox reviewing projects in secrecy, instead the discussions are fast and transparent to everyone involved.</p>
<p>Depending on the number of project proposals, the meeting takes between 30-60 minutes. For each project reviewed, the open source team writes a one page release notes document which outlines why the project is being released, the discussion in the meeting and the measures taken to ensure our compliance rules are followed.</p>
<p>After the review meeting, the open source team sits down with the maintainers and perform the release of the project on GitHub.</p>
<h2>Publishing the source code</h2>
<p>After mailing-list discussion and approval in the monthly meeting, the project is released. We have a specific approach to doing this:</p>
<ol>
<li>
<p>We only transfer the current state of the repository to github, so we do not include the git history, while having the history would be very valuable to track down decisions for code changes, it is simply too big of a security risk and would require the maintainers to audit all commits.</p>
</li>
<li>
<p>We automatically merge project files with our baseline files, to ensure all repositories have a minimal set of files, these are templated with employee names, emails and github names, so contact info and meta data is consistent.</p>
</li>
<li>
<p>The project is setup with a dedicated team assigned to it, with the correct branch protection in place and compliance tooling installed by default (we have a bot called <a href="https://github.com/zalando-incubator/zincr">Zincr</a> for this).</p>
</li>
</ol>
<p>And that is our release process for initially releasing new projects, I hope it gave you an insight into what a company of Zalando's size has to consider before releasing new code and how we have tried to keep the process simple and transparent for the maintainers of our projects.</p>
<p>In future posts, I will go through how we monitor current projects, how we decide what to keep and what to decommission as the projects evolve.</p>Understanding Redis Background Memory Usage2019-05-16T00:00:00+02:002019-05-16T00:00:00+02:00Tiago Ilievetag:engineering.zalando.com,2019-05-16:/posts/2019/05/understanding-redis-background-memory-usage.html<p>A closer look at how the Linux kernel influences Redis memory management</p><h3>A closer look at how the Linux kernel influences Redis memory management</h3>
<p>Recently, I was talking to a long-time friend, previous university colleague and former boss, who mentioned the fact
that Redis was failing to persist data to disk in low memory conditions. For that reason, he advised to never let a
Redis in-memory dataset to be bigger than 50% of the system memory. Thinking about how wasteful that practice would be,
it's interesting to understand why this can happen and look for alternatives to assure that Redis will be able to use as
much memory as there's available to it, without sacrificing its durability.</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="ch">#!/usr/bin/env python</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">import</span> <span class="nn">string</span>
<span class="kn">import</span> <span class="nn">uuid</span>
<span class="kn">import</span> <span class="nn">redis</span>
<span class="n">MEM_GB</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">*</span> <span class="mi">1024</span><span class="o">**</span><span class="mi">3</span>
<span class="n">KEY_SIZE</span> <span class="o">=</span> <span class="mi">1024</span><span class="o">**</span><span class="mi">2</span>
<span class="n">TOTAL_KEYS</span> <span class="o">=</span> <span class="nb">int</span><span class="p">((</span><span class="n">MEM_GB</span> <span class="o">*</span> <span class="mf">0.5</span><span class="p">)</span> <span class="o">/</span> <span class="n">KEY_SIZE</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">gen_data</span><span class="p">():</span>
<span class="k">return</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">string</span><span class="o">.</span><span class="n">ascii_letters</span> <span class="o">+</span> <span class="n">string</span><span class="o">.</span><span class="n">digits</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1024</span><span class="p">)])</span> <span class="o">*</span> <span class="mi">1024</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">redis</span><span class="o">.</span><span class="n">StrictRedis</span><span class="p">()</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">TOTAL_KEYS</span><span class="p">):</span>
<span class="n">r</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="n">uuid</span><span class="o">.</span><span class="n">uuid4</span><span class="p">(),</span> <span class="n">gen_data</span><span class="p">())</span>
</code></pre></div></td></tr></table></div>
<p>It will generate random key/value pairs of 1MB each, using up to half of the total memory available. As it was executed
on a 2GB RAM virtual machine, it will create a dataset about 1GB in size. Considering the memory used by the OS and all
other processes, we can be sure that Redis is now using a bit more than 50% of the total system memory. From this point
in time, calling BGSAVE will result in an error:</p>
<div class="highlight"><pre><span></span><code><span class="mf">127.0.0.1</span><span class="p">:</span><span class="mf">6379</span><span class="o">></span><span class="w"> </span><span class="n">BGSAVE</span>
<span class="p">(</span><span class="n">error</span><span class="p">)</span><span class="w"> </span><span class="n">ERR</span>
</code></pre></div>
<p>And the following message will appear in /var/log/redis/redis-server.log (on a Ubuntu 18.04 LTS system):</p>
<div class="highlight"><pre><span></span><code><span class="mi">10202</span><span class="o">:</span><span class="n">M</span><span class="w"> </span><span class="mi">13</span><span class="w"> </span><span class="n">Sep</span><span class="w"> </span><span class="mi">11</span><span class="o">:</span><span class="mi">34</span><span class="o">:</span><span class="mf">16.535</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">Can</span><span class="err">'</span><span class="n">t</span><span class="w"> </span><span class="n">save</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">background</span><span class="o">:</span><span class="w"> </span><span class="n">fork</span><span class="o">:</span><span class="w"> </span><span class="n">Cannot</span><span class="w"> </span><span class="n">allocate</span><span class="w"> </span><span class="n">memory</span>
</code></pre></div>
<p>Looking at the <a href="https://github.com/antirez/redis/blob/4.0.9/src/rdb.c#L1066-L1117">source code for this operation</a>, this
message is shown when the fork() system call returns -1. In <a href="http://man7.org/linux/man-pages/man2/fork.2.html#RETURN_VALUE">its man
page</a>, we can see that this return code only means that
it failed and no child process were created. Based on that information and the error message, one might say that the
process failed because it was duplicating the entire dataset in memory, an action that can't be done with less than half
memory available.</p>
<p>Digging through a bit of Unix history, we'll find that the first-generation of Unix OSes <a href="https://books.google.de/books?id=9yIEji1UheIC&lpg=PA295&dq=%22copy+on+write%22&pg=PA295&redir_esc=y#v=onepage&q=%22copy%20on%20write%22&f=false">indeed duplicated the whole
parent address
space</a>
when fork() was called. On modern kernels like Linux, this doesn't happen anymore and the <a href="http://man7.org/linux/man-pages/man2/fork.2.html#NOTES">NOTES section of the same man
page</a> mentions this in detail:</p>
<p>*Under Linux, fork() is implemented using copy-on-write pages, so the only penalty that it incurs is the time and memory
required to duplicate the parent's page tables, and to create a unique task structure for the child.</p>
<p>*A copy-on-write approach is much more efficient than actually copying data from one place to the other. The child
process will share the same memory pages as its parent, but in the end will only need enough memory to create pointers
to the actual data. Each of these memory pages will only be copied if, and only if, the child process tries to write
something to them, hence the name copy-on-write (CoW). As the data is being dumped to disk, this is a read-only
operation that results in virtually no increase in memory usage.</p>
<p>The question now is: if nowhere near double the amount of memory is needed, why is it still failing? The answer is that
the Linux kernel cannot make the compromise of allowing a child process to point to that amount of data, as there's no
guarantee it won't modify it. If the kernel allowed that, it could result in a situation where there the total system
memory wouldn't be enough to hold everything that was allocated by both parent and child processes. The good news is
that there's a way to overcome that, presented as a tip in the Redis log file:</p>
<div class="highlight"><pre><span></span><code><span class="mi">10202</span><span class="o">:</span><span class="n">M</span><span class="w"> </span><span class="mi">13</span><span class="w"> </span><span class="n">Sep</span><span class="w"> </span><span class="mi">11</span><span class="o">:</span><span class="mi">33</span><span class="o">:</span><span class="mf">09.943</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">WARNING</span><span class="w"> </span><span class="n">overcommit_memory</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="kd">set</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="mi">0</span><span class="o">!</span><span class="w"> </span><span class="n">Background</span>
<span class="n">save</span><span class="w"> </span><span class="n">may</span><span class="w"> </span><span class="n">fail</span><span class="w"> </span><span class="n">under</span><span class="w"> </span><span class="n">low</span><span class="w"> </span><span class="n">memory</span><span class="w"> </span><span class="n">condition</span><span class="o">.</span><span class="w"> </span><span class="n">To</span><span class="w"> </span><span class="n">fix</span><span class="w"> </span><span class="k">this</span><span class="w"> </span><span class="n">issue</span><span class="w"> </span><span class="n">add</span>
<span class="s1">'vm.overcommit_memory = 1'</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="sr">/etc/s</span><span class="n">ysctl</span><span class="o">.</span><span class="na">conf</span><span class="w"> </span><span class="n">and</span><span class="w"> </span><span class="n">then</span><span class="w"> </span><span class="n">reboot</span><span class="w"> </span><span class="n">or</span><span class="w"> </span><span class="n">run</span><span class="w"> </span><span class="n">the</span>
<span class="n">command</span><span class="w"> </span><span class="s1">'sysctl vm.overcommit_memory=1'</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="k">this</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">take</span><span class="w"> </span><span class="n">effect</span><span class="o">.</span>
</code></pre></div>
<p>The message is a bit misleading, as a system that is using a bit more than 50% of memory isn't exactly in a "low memory
condition," but is still consistent with what we know about the problem until now. Before trying any command or
configuration with exactly knowing what it does, let's look at what the 1 option means in the overcommit_memory section
of the <a href="http://man7.org/linux/man-pages/man5/proc.5.html">proc file system man page</a>:</p>
<p><em>In mode 1, the kernel pretends there is always enough memory, until memory actually runs out. One use case for this
mode is scientific computing applications that employ large sparse arrays. In Linux kernel versions before 2.6.0, any
nonzero value implies mode 1.</em></p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>sudo<span class="w"> </span>sysctl<span class="w"> </span>vm.overcommit_memory<span class="o">=</span><span class="m">1</span>
vm.overcommit_memory<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span>
$<span class="w"> </span>redis-cli
<span class="m">127</span>.0.0.1:6379><span class="w"> </span>BGSAVE
Background<span class="w"> </span>saving<span class="w"> </span>started
</code></pre></div>
<p>After that there will be much better messages in the Redis log:</p>
<div class="highlight"><pre><span></span><code><span class="mi">10202</span><span class="o">:</span><span class="n">M</span><span class="w"> </span><span class="mi">13</span><span class="w"> </span><span class="n">Sep</span><span class="w"> </span><span class="mi">11</span><span class="o">:</span><span class="mi">47</span><span class="o">:</span><span class="mf">04.663</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Background</span><span class="w"> </span><span class="n">saving</span><span class="w"> </span><span class="n">started</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="n">pid</span><span class="w"> </span><span class="mi">10337</span>
<span class="mi">10337</span><span class="o">:</span><span class="n">C</span><span class="w"> </span><span class="mi">13</span><span class="w"> </span><span class="n">Sep</span><span class="w"> </span><span class="mi">11</span><span class="o">:</span><span class="mi">47</span><span class="o">:</span><span class="mf">05.833</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">DB</span><span class="w"> </span><span class="n">saved</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">disk</span>
<span class="mi">10337</span><span class="o">:</span><span class="n">C</span><span class="w"> </span><span class="mi">13</span><span class="w"> </span><span class="n">Sep</span><span class="w"> </span><span class="mi">11</span><span class="o">:</span><span class="mi">47</span><span class="o">:</span><span class="mf">05.839</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">RDB</span><span class="o">:</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="n">MB</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">memory</span><span class="w"> </span><span class="n">used</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="n">copy</span><span class="o">-</span><span class="n">on</span><span class="o">-</span><span class="n">write</span>
<span class="mi">10202</span><span class="o">:</span><span class="n">M</span><span class="w"> </span><span class="mi">13</span><span class="w"> </span><span class="n">Sep</span><span class="w"> </span><span class="mi">11</span><span class="o">:</span><span class="mi">47</span><span class="o">:</span><span class="mf">05.885</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Background</span><span class="w"> </span><span class="n">saving</span><span class="w"> </span><span class="n">terminated</span><span class="w"> </span><span class="k">with</span><span class="w"> </span><span class="n">success</span>
</code></pre></div>
<p><em>Work with people like Tiago. Have a look at our <a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1">open jobs page</a>.</em></p>Back-Pressure Strategy for a Sharded Akka Cluster2019-05-09T00:00:00+02:002019-05-09T00:00:00+02:00Rohit Sharmatag:engineering.zalando.com,2019-05-09:/posts/2019/05/back-pressure-strategy-for-a-sharded-akka-cluster.html<p>AWS SQS polling from sharded Akka Cluster running on Kubernetes</p><h3>AWS SQS polling from sharded Akka Cluster running on Kubernetes</h3>
<p><strong>NOTE:</strong> This blog post requires the reader to have prior knowledge of <a href="https://aws.amazon.com/sqs/">AWS SQS</a>, <a href="https://doc.akka.io/docs/akka/2.5/actors.html">Akka
Actors</a> and <a href="https://doc.akka.io/docs/akka/2.5/cluster-sharding.html">Akka Cluster
Sharding</a>.</p>
<p>My <a href="https://medium.com/@programmerohit/distributed-cache-with-akka-cluster-sharding-and-akka-http-on-kubernetes-2d695b134154">last
post</a>
introduced <a href="https://doc.akka.io/docs/akka/2.5/cluster-sharding.html">Akka Cluster Sharding</a> as a Distributed Cache
running on <a href="https://kubernetes.io/">Kubernetes</a>.</p>
<p>As that Proof-of-concept(PoC) proved promising, we started building a high-throughput and low-latency system based on
the gained experiences and learnings.</p>
<h2>Background</h2>
<p>The system under consideration polls (fetches) messages from AWS SQS and does the following:</p>
<ol>
<li>Processes polled SQS messages (such as JSON modifications)</li>
<li>Stores polled SQS messages in a datastore</li>
<li>Stores the latest state derived from polled SQS messages in-memory</li>
<li>Publishes the processed SQS message to destination AWS SQS (for other systems to work with them)</li>
<li>Finally acknowledging back the polled SQS messages to source AWS SQS.</li>
</ol>
<p>This sounds pretty simple to implement at first, but turns into a challenging task when it happens at scale (up to
45,000 SQS-messages-processed/second).</p>
<p>Characteristics of the SQS message(s):</p>
<ul>
<li>
<p>SQS message’s size varies from 5KBs to 100KBs</p>
</li>
<li>
<p>SQS message is uniquely identified by an identifier, let’s call it <code>event_id</code>. And there are more than <strong>250,000
unique event_id</strong>(s) in the system</p>
</li>
<li>
<p>SQS messages are versioned and some lower versioned SQS messages will be acknowledged back to source AWS SQS(as these
messages does not affect the state of system) without any processing(JSON Modification), storing into datastore and
publishing to destination AWS SQS</p>
</li>
<li>
<p>SQS messages are evenly distributed by <code>event_id</code>, i.e in theory, all the SQS messages in one batch have a unique
<code>event_id</code></p>
</li>
</ul>
<h2>The Problem</h2>
<p>Polling AWS SQS is easy. Controlled and dynamic polling based on the workload of a highly distributed system is
challenging where failure is inevitable.</p>
<p>In the beginning, the implementation was simple and straightforward. One Actor (let’s say SQS Batch Poller) was
responsible for polling and sending those polled SQS messages to desired entity actors to be processed, stored,
published to destination SQS and eventually be acknowledged back to source SQS.</p>
<p>Moreover, the performance (time taken to process, CPU, memory etc) of the system depended on the size of SQS messages. A
5KB SQS message was quicker to process and required less resources compared to a 100KBs SQS message. This variation in
size of the SQS messages made the workload of the system very dynamic and unpredictable.</p>
<p>This implementation worked fine with few thousand messages in SQS, but failed catastrophically when this number grew up
to millions.</p>
<p>The failure happened because the SQS Batch Poller Actor kept polling SQS messages from AWS SQS without any knowledge of
the state (processed or unprocessed) of already polled SQS messages. This filled the cluster with more than 120,000
unprocessed SQS messages and reduced the throughput to 10–12 SQS-messages-processed/sec. This resulted in unreachable
Akka cluster nodes (Kubernetes Pods), killing them with OOM and eventually bringing down the whole system (Akka
cluster).</p>
<p><em>Why did the Akka Cluster stop polling after ~120,000 SQS messages? Because that’s the <a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-limits.html">limit imposed by AWS
SQS</a>. SQS can only have
~120,000 un-acknowledged or in-flight messages.</em></p>
<p>A better approach to poll SQS, without hitting the Akka cluster’s limits and killing it, was needed. The SQS Batch
Poller Actor needed to be aware of the workload of the system and adjust the rate of polling AWS SQS accordingly.</p>
<h2>Solution</h2>
<p>The solution was to inform SQS Batch Poller Actor about the state of unprocessed SQS messages(Workload) in the system.
i.e implementing Back-Pressure.</p>
<p>The key point in the Back-Pressure strategy was to limit the number of unprocessed messages the cluster can have at any
given point in time. This strategy ensured that SQS is only polled if there is a demand for more SQS messages in the
system and allowed the system to behave in a predictable manner irrespective of the size of SQS message.</p>
<p><strong>The diagram below depicts the high-level architecture of the Back-Pressure Strategy.</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d30f329247f41cd032faf20ae3d0d779cc03f540_1_wb9vyueprcbh3ybtszkmvw.jpeg?auto=compress,format"></p>
<p>The architecture consists of two main Actors, namely SQSBatchPollerManager and SQSBatchPoller, responsible for managing
Back-Pressure and Polling SQS.</p>
<p>Before starting to define and implement Back-Pressure strategy, a few important details/assumptions need to be laid
down.</p>
<ul>
<li>
<p><strong>maxUnprocessedMessages:</strong> A configurable limit on maximum number of SQS messages that can be present in the system
at any given point in time. This limit can be adapted according to the throughput requirements and system limits.
Increasing this limit comes at the cost of higher resources such as Memory, CPU, Network, etc.</p>
</li>
<li>
<p><strong>parallelism:</strong> Parallelism factor to limit the number of SQS batches polled in parallel. This is a prevention
against creating a peak in resource usages such as overwhelming database or a third party service with burst of
thousands of request at once to load initial state of Entity actor.</p>
</li>
<li>
<p><strong>batchSize:</strong> Each SQS batch can have a maximum of 10 SQS messages.</p>
</li>
</ul>
<h3>Involved Actors in Back-Pressure strategy</h3>
<p><strong><a href="https://gist.github.com/sharma-rohit/3090717c52397a8de517be2655a076e1">SQS Batch Poller Manager Actor (SQSBatchPollerManager):</a></strong>
<strong>SQSBatchPollerManager</strong> actor is responsible for keeping track of unprocessed SQS messages in the system and to
calculate the number of messages to be polled from SQS.</p>
<p><strong><a href="https://gist.github.com/sharma-rohit/3c023b8a29fac67b8f9f80b708104c62">SQS Batch Poller Actor (SQSBatchPoller):</a></strong>
<strong>SqsBatchPoller</strong> actor actually polls SQS message batch from AWS SQS and keeps track of the lifecycle of the polled SQS
messages. It also informs back to the SqsBatchPollerManager upon complete processing of the SQS messages batch.</p>
<p><strong>Entity Actor (EntityActor):</strong>
<strong>EntityActor</strong> is responsible for processing(such as JSON Modification), storing into datastore, publishing to
destination SQS, acknowledging back the polled SQS message to the source SQS and, finally informing back to
SQSBatchPoller about successful or failed processing of this polled SQS message.</p>
<p><strong>How these Actor(s) collectively implement Back-Pressure strategy?
</strong>After successful cluster formation, the cluster is ready to poll and process SQS messages. Let’s see the whole process
of Back-Pressured SQS polling step by step for a better understanding.</p>
<ol>
<li>SQSBatchPollerManager receives a message <strong>PollSqs</strong> to start SQS polling.</li>
<li>Upon receiving <strong>PollSqs</strong> message, SQSBatchPollerManager calculates the number of SQS batches that can be polled in
parallel (<em>parallelism</em>) while not exceeding the maximum number of unprocessed SQS messages
(<em>maxUnprocessedMessages</em>) the cluster can sustain. After calculating the number of SQS messages to poll,
SQSBatchPollerManager creates child actor(s), SQSBatchPoller, and sends a message <strong>PollSqsBatch</strong> to it.</li>
<li>Upon receiving <strong>PollSqsBatch</strong> message from SQSBatchPollerManager, SQSBatchPoller polls AWS SQS and sends these
polled SQS messages to <a href="https://doc.akka.io/docs/akka/2.5/cluster-sharding.html">Cluster Shard Region Actor</a> which
in turn forwards these SQS messages to respective EntityActor.</li>
<li>Upon receiving SQS messages, EntityActor processes(such as JSON Modification), stores the state into datastore,
publishes to destination SQS, acknowledges the polled SQS message to the source SQS and, finally sends a message
<strong>SQSMessageProcessed</strong> back to SQSBatchPoller.</li>
<li>SQSBatchPoller waits for all the EntityActor(s) to send back an acknowledgement message <strong>SQSMessageProcessed</strong>.
After receiving all the acknowledgement back from concerned EntityActor(s), it sends a message <strong>BatchProcessed</strong>
back to SQSBatchPollerManager and kills itself.</li>
<li>SQSBatchPollerManager upon receiving <strong>BatchProcessed</strong> sends itself a message <strong>PollSqs</strong> and the whole process
repeats from step 2 again.</li>
</ol>
<p>With this strategy, AWS SQS polling is controlled by the speed of processing SQS messages by the system (Akka
Cluster).</p>
<h2>What’s next</h2>
<p>What’s described above is a simplified version of the actual Back-Pressure strategy used in production system, But the
underlying principle of Back-Pressure is exactly the same. Some obvious caveats such as handling SQS failures, Node(s)
crashes, Actor crashes, optimization in polling AWS SQS, etc are excluded here and is out of the scope of this post.</p>
<p>I will try to write more about the handling of the failure cases listed above and optimizations in following posts.</p>How to Manage Stakeholder Requests in Big Organizations2019-05-03T00:00:00+02:002019-05-03T00:00:00+02:00Bohdan Feniaktag:engineering.zalando.com,2019-05-03:/posts/2019/05/how-to-manage-stakeholder-requests-in-big-organizations.html<p>Explaining our Facilitator Role</p><p>An important factor of success in agile environment is that team works well together. It is also important for a
software engineer to be able to focus for longer periods of time with limited interruptions.</p>
<p>Many companies have solved the challenge of focus and dedication for the team by having a designated role, such as Scrum
Master or Producer, who is responsible for managing stakeholder requests, prioritizing them and communicating to the
development team.</p>
<p>But sometimes requests can't be evaluated by a responsible person in the first place. There are topics where somebody
from the development team needs to have a look as well to help understand the technical side more deeply.</p>
<p>On top of that, sometimes anomalies appear on monitoring dashboards. Network slowdowns, operational issues and many
other things might happen during working time and immediate action could be required to fix the issue.</p>
<p>Who should be responsible in this case? How can we ensure that team’s stakeholder relationships stay healthy and help us
move forward?</p>
<p>Recently my team, which is responsible for developing an innovative machine learning product for the fashion world,
faced a very similar issue.</p>
<p>Engineers were spending a reasonable percentage of their time working on ad-hoc requests from our stakeholders. There
was no proven way to track or organize such requests well, so we could not guarantee the level of support we strive for.</p>
<p>At that point we understood that we needed either a clear owner of such topics or a pre-defined collaborative
responsibility. It simply did not work out-of-the-box and the team needed to institutionalise the meaning of ownership.
And we were up for the challenge.</p>
<p>So the team decided to introduce an internal role in our technical team - a facilitator, or, as we call it - Batman -
the role that is perceived more as an honor than a burden, and everyone is comfortable with doing it ad-hoc.</p>
<p><strong>Key principles of the role are:</strong></p>
<ul>
<li>Every member of the team shares responsibility for all stakeholder requests and service health, by performing
facilitator duties on a shift basis</li>
<li>Facilitator duties are only valid during working hours</li>
</ul>
<p><strong>Key benefits of the role:</strong></p>
<ul>
<li>Stakeholders are always provided with support within a guaranteed lead time (normally up to 2 hours)</li>
<li>Knowledge about services and requests is spread more evenly in the team by performing the duties and learning from
other team members</li>
<li>Quantity of ad-hoc requests / issues is always visible on the team’s dashboard</li>
</ul>
<p><strong>When to set it up?</strong></p>
<p>The role is reasonable when:</p>
<ul>
<li>The team is working in a big organization with many external and internal stakeholders</li>
<li>There are regular incoming ad-hoc requests and/or anomalies on monitoring dashboards</li>
<li>A reasonable part of the team (40-50%) is busy working on ad-hoc requests during the sprint</li>
<li>Iteration goals are not achieved regularly because of influx of unplanned work</li>
</ul>
<p><strong>How to set it up?</strong>
Make sure to have an open conversation in the team. Acknowledge things that you would like to take care of. Spend some
time on agreeing within the team of what it means and how it should work. The final definition and duties of the role
can be different, everything depends on the team and skill set within the team.</p>
<p><strong>Here are a few important guidelines we want to share that helped us to set up the role:</strong></p>
<ul>
<li>Define a clear set of expectations from the role, make sure duties are well described and understood by the team and
stakeholders</li>
<li>Set up a schedule for taking on the role and make sure it is flexible enough.</li>
<li>Weekly shifts have proven to be optimal for our team.</li>
<li>Rule of thumb: no person should do two shifts in a row.</li>
<li>Clearly define what is not within the scope of the role.</li>
<li>Set up F.A.Q. section and maintain it, it will serve both team and stakeholders well.</li>
<li>Consider talking about the role actively in team retrospectives, or even having a dedicated retrospective about the
role every 1-2 months. Openly talking about successes and challenges helps to adjust the process.</li>
<li>Count in the amount of time needed to perform the duties during planning.</li>
<li>Set up a short handover meeting for remaining tasks from the previous shift.</li>
<li>Document the findings, so the need for the role fades away with time (invest in proper runbooks and knowledge
sharings)</li>
</ul>
<p><strong>What is the feedback from the team regarding the role?</strong></p>
<ul>
<li>Structured approach to ad-hoc requests is a big plus.</li>
<li>Noticeable improvement in knowledge sharing among team members.</li>
<li>Everyone should assume equal responsibility when doing the facilitator job.</li>
<li>The role only works well if everybody in the team embraces it and is diligent when performing facilitator duties.</li>
<li>When the team has different backgrounds (for example, backend engineering and research engineering), the time is
needed to adjust to each other’s technical stack and way of thinking.</li>
<li>Handover of existing requests needs to be thought through better.</li>
<li>Sometimes there are not enough small tickets to pick up by the team member on the facilitator shift.</li>
</ul>
<h2>What is next?</h2>
<p>We truly believe that talking about the role and how it develops helps us to adjust the process as we move forward.</p>
<p>We will continue developing the role inside our team to help us become even better.</p>Learning DevOps as a Software Engineer2019-04-25T00:00:00+02:002019-04-25T00:00:00+02:00Angela Igrejatag:engineering.zalando.com,2019-04-25:/posts/2019/04/learning-devops-as-a-software-engineer.html<p>How developing DevOps skills as a Software Engineer helps you to grow and become a better Engineer.</p><p>At Zalando the teams are autonomous and involved in the entire software development process - from gathering stakeholder
requirements to design, implementation, testing and deployment. For me, this was one of the greatest
challenges/opportunities of joining Zalando and it allowed me to grow on so many dimensions of software development, one
of these being DevOps.</p>
<p>When I initially joined Zalando I had previously been focused only on software development and I was eager to understand
how my software should be deployed and operated.</p>
<p>As part of the autonomy mindset, each team is given an AWS account where they can deploy their services. There is common
infrastructure based on <a href="https://stups.io/">STUPS</a> (fully open source by the way) that provides a common way to handle
logging, monitoring and deployment concerns. Today we are actively moving to a <a href="https://kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/public-presentations.html">Kubernetes based
setup</a> and a fully integrated
continuous delivery platform.</p>
<p>There are three main topics that I faced while doing DevOps: Monitoring or Visibility, Reliability, and Software
Delivery. Let’s focus on each one individually and how learning about it improved the solutions I bring to production.</p>
<h2>Monitoring / Visibility</h2>
<p>For a period of time, we did not know how our application was behaving. This lack of visibility included not knowing
whether our users were seeing errors and the latencies of any backends for frontend.</p>
<p>This problem became apparent when there were some errors in one endpoint and we only learned about it when notified by
the end users. This was a personal wake up call to better understand how the applications the team owns should be
operated.</p>
<p>We started by measuring the four golden signals:</p>
<ul>
<li>
<p><strong>Latency</strong> - We gathered the latency perceived by our application on the various endpoints and from the load
balancer’s perspective. Differences between these two signals can for example showcase long Garbage Collector pauses
that may not be visible in internal application metrics.</p>
</li>
<li>
<p><strong>Request rate</strong> - Abnormal variations should be investigated, especially during a deployment. One can also learn
about the saturation point in terms of requests by monitoring this signal during load tests.</p>
</li>
<li>
<p><strong>Saturation</strong> - We included in this CPU and memory consumption, TCP connection stats like new connections, total
connections and the ones in TIME_WAIT and CLOSE_WAIT states.</p>
</li>
<li>
<p><strong>Error and Success rate</strong> - Like the latency, we measure this inside the application on the various endpoints and on
the load balancer level. Inconsistencies between these two could be explained by misconfiguration of timeouts on the LB
level or other abnormal scenarios like the application refusing new connections.</p>
</li>
</ul>
<p>We chose to not alert on saturation signals, and only use the latency and the error rate since these are the metrics
that affect the end user experience. If there is no impact on latency and error rate, having the CPU at 99% is
completely acceptable and actually a sign of good design since it would mean that application requires very little
slack.</p>
<p>These monitoring capabilities provide us with an understanding of how our system is behaving in real time, information
about application usage patterns, and helps us to foresee possible problems/issues. Now when we are developing a new
service, we do not go to production without having good monitoring in place beforehand.</p>
<h2>Reliability</h2>
<p>Once the monitoring was improved, we saw a lot of inefficiencies that were introduced by our backend for the frontend.
We expected our latencies to match closely the backend metrics but this was not the case. Upon further investigation, it
was discovered that our authentication strategy was introducing significant unnecessary latency.</p>
<p>We also looked at where the stateful components of the system were being stored and added a Redis deployment to hold the
session data. Previously with every deployment our users would need to log in again which meant that releases had to be
aligned with them.</p>
<p>Our work on reliability highlighted that we had not properly considered how different components interacted when
designing the system. Now, thinking about which components can fail and how their failure can affect the system is a
common exercise when building new services or even refactoring current ones.</p>
<h2>Software Delivery</h2>
<p>The last topic we focused on was improving the way we deliver the software the team develops. Initially releases based
around docker images were manual and done from developer machines, which was not even compliant with our internal
policies. The first attempt at improving the situation was focused on producing these docker images by using a Jenkins
job which improved the compliance status. The second iteration moved the team to a continuous delivery workflow using
Kubernetes and an internal Continuous Delivery Platform. In order to enable this without reducing the quality of
delivery, we introduced end to end testing (you can read about it
<a href="https://engineering.zalando.com/posts/2019/02/end-to-end-microservices.html">here</a>). These tests run on the staging and production
deployments before the traffic switch to the new deployment. If the tests fail, the deployment is aborted and we are
notified via instant messaging. I am happy with the current state of our delivery process but I continue to learn and
try to find improvements.</p>
<p>Having a continuous delivery workflow reduced the operational needs of the team and allowed us to deliver faster to our
stakeholders.</p>
<p>With our migration to Kubernetes we have also improved our application architecture. We have simplified it to just one
service in the frontend and we moved all the stateful components to a Redis datastore. In the image below you can see
the architecture before and after the Kubernetes migration.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2cdba264cd05222e2dbfa39b8befd438e1e97f62_fraud-cockpit-architecture---before-migration-to-kubernetes.png?auto=compress,format"></p>
<p>System architecture before migration to Kubernetes</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/71ea4615a1301b804581b064df9f63acdbbec99a_fraud-cockpit-architecture---after-migration-to-kubernetes.png?auto=compress,format"></p>
<p>System architecture after migration to Kubernetes</p>
<p><em>As a software engineer getting involved with DevOps helped me to better understand how our applications are delivered
to our customers and empowered me with crucial knowledge to investigate and fix issues autonomously. From my working
experience, gaining DevOps knowledge as a software engineer has greatly improved my ability to have an impact.</em></p>Open Source: March Updates - A new Kubernetes operator & more Cloud Native Apps2019-04-25T00:00:00+02:002019-04-25T00:00:00+02:00Hong Phuc Dangtag:engineering.zalando.com,2019-04-25:/posts/2019/04/oss-march-updates.html<p>This is a recap of open source activities and development at Zalando in the month of March.</p><h2>Project Highlights</h2>
<p>A new operator is added to Zalando’s list of Cloud Native Applications. <a href="https://github.com/zalando-incubator/es-operator">Elasticsearch Operator</a> - an operator for running Elasticsearch in Kubernetes with focus on operational aspects, like safe draining and offering auto-scaling capabilities for Elasticsearch data nodes, rather than just abstracting manifest definitions.</p>
<p>To make things even simpler for developers, we also released a new framework that helps to build Kubernetes operators in Python. <a href="https://github.com/zalando-incubator/kopf">Kopf - Kubernetes Operator Pythonic Framework</a> - a framework and a library to make Kubernetes operators development easier, just in few lines of Python code. The main goal is to bring the Domain-Driven Design to the infrastructure level, with Kubernetes being an orchestrator/database of the domain objects (custom resources), and the operators containing the domain logic (with no or minimal infrastructure logic).</p>
<p><strong>Dedicated Open Source Time In The Zalando Cloud Infrastructure Team</strong> The engineering team led by <a href="https://twitter.com/jannis_r">Jannis Rake-Revelant</a>, who is responsible for some our most popular open source projects have, since the beginning of the year, dedicated 20% of their time to ensure their open source projects are actively maintained and improved. As a company we believe it is important to take long term responsibility and show commitment to the open source community which we benefit from every day.</p>
<h2>Zalando Around The World</h2>
<p>Meet and connect with Zalando representatives at tech events around the world:</p>
<p><a href="https://www.gaia.fish">GAIA Conference</a>, Göteborg, Apr 9: <a href="https://twitter.com/mikiobraun">Mikio Braun</a> - our AI expert - gave a keynote on Putting Data Science into Production. Check out his talk below:</p>
<iframe width="600" height="360" src="https://www.youtube.com/embed/jePTtEFBgLI" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<hr>
<p><a href="https://conferences.oreilly.com/strata/strata-eu">Strata Conference</a>, London, Apr 29 - May 2: Dirk Petzoldt - Head of Engineering will share how Zalando handles big data in our online marketing platform. <a href="https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/74071">More details</a></p>
<hr>
<p><a href="https://codingserbia.com">Coding Serbia Conference</a>, Novi Sad, May 15 - 17: <a href="https://de.linkedin.com/in/lmineiro">Luis Mineiro</a> - Senior Site Reliability Engineer, will explain how we set up monitoring and alerting at Zalando and go over the basic concepts of Distributed Tracing and OpenTracing.</p>
<p><img alt="luis" src="https://engineering.zalando.com/posts/2019/04/codingserbia.jpeg"></p>
<hr>
<p><a href="https://githubsatellite.com/">GitHub Satellite</a>, Berlin, May 22 - 23: <a href="https://www.linkedin.com/in/per-ploug-krogslund/">Per Ploug</a> - Open Source Manager, will talk about Open Source and security and try to answer the question: who is actually responsible for the security of open source dependencies?</p>
<h1>More reading</h1>
<ul>
<li>
<p><a href="https://srcco.de/posts/accelerate-software-delivery-performance.html">How we measure delivery performance at Zalando</a></p>
</li>
<li>
<p><a href="https://opensource.zalando.com/docs">Zalando Open Source Documentation</a></p>
</li>
<li>
<p><a href="https://opensource.zalando.com/tech-radar/">The Tech Radar: Zalando selection of technology choices</a></p>
</li>
</ul>How to set an ideal thread pool size2019-04-18T00:00:00+02:002019-04-18T00:00:00+02:00Anton Ilinchiktag:engineering.zalando.com,2019-04-18:/posts/2019/04/how-to-set-an-ideal-thread-pool-size.html<p>How to get the most out of java thread pool</p><p>We all know that thread creation in Java is not free. The actual overhead varies across platforms, but thread creation
takes time, introducing latency into request processing, and requires some processing activity by the JVM and OS. This
is where the Thread Pool comes to the rescue.</p>
<p>The thread pool reuses previously created threads to execute current tasks and offers a solution to the problem of
thread cycle overhead and resource thrashing.</p>
<p>In this post, I want to talk about how to set an optimal thread pool size. A well-tuned thread pool can get the most out
of your system and help you survive peak loads. On the other hand, even with a thread pool in place, thread handling
could be a bottleneck.</p>
<h2>Why should I set a limit for my thread pool?</h2>
<p>There is a lovely pre-configured thread pool - Executors.newChachedThreadPool
Why don't we just use it?</p>
<p>Let's look at how it works:</p>
<div class="highlight"><pre><span></span><code><span class="o">/**</span><span class="w"> </span><span class="n">Thread</span><span class="w"> </span><span class="n">Pool</span><span class="w"> </span><span class="n">constructor</span><span class="w"> </span><span class="o">*/</span>
<span class="n">public</span><span class="w"> </span><span class="n">ThreadPoolExecutor</span><span class="p">(</span><span class="nb nb-Type">int</span><span class="w"> </span><span class="n">corePoolSize</span><span class="p">,</span>
<span class="w"> </span><span class="nb nb-Type">int</span><span class="w"> </span><span class="n">maximumPoolSize</span><span class="p">,</span>
<span class="w"> </span><span class="n">long</span><span class="w"> </span><span class="n">keepAliveTime</span><span class="p">,</span>
<span class="w"> </span><span class="n">TimeUnit</span><span class="w"> </span><span class="n">unit</span><span class="p">,</span>
<span class="w"> </span><span class="n">BlockingQueue</span><span class="w"> </span><span class="n">workQueue</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="o">...</span><span class="p">}</span>
<span class="o">/**</span><span class="w"> </span><span class="n">Cached</span><span class="w"> </span><span class="n">Thread</span><span class="w"> </span><span class="n">Pool</span><span class="w"> </span><span class="o">*/</span>
<span class="n">public</span><span class="w"> </span><span class="k">static</span><span class="w"> </span><span class="n">ExecutorService</span><span class="w"> </span><span class="n">newCachedThreadPool</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">ThreadPoolExecutor</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">Integer</span><span class="o">.</span><span class="n">MAX_VALUE</span><span class="p">,</span>
<span class="w"> </span><span class="mi">60</span><span class="n">L</span><span class="p">,</span><span class="w"> </span><span class="n">TimeUnit</span><span class="o">.</span><span class="n">SECONDS</span><span class="p">,</span>
<span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">SynchronousQueue</span><span class="p">());</span>
<span class="p">}</span>
</code></pre></div>
<p>Do you see this SynchronousQueue? It means that each new task will create a new thread if all existing threads are busy.
In the case of high load, at best we will get a thread "starvation" situation, at worst OutOfMemoryError.</p>
<p>It is better to maintain control and not allow clients to "DDoS/throttle" our service.</p>
<h2>Know your limits</h2>
<p>Before you start sizing a thread pool you have to understand what you are limited to. And I don’t only mean hardware.</p>
<p>For example if a worker thread depends on a database, the thread pool is limited by the database's connection pool size.
Does it make any sense to have 1000 running threads in front of a database connection pool with 100 connections?</p>
<p>Or if a worker thread calls an external service which can handle only a few requests simultaneously, the thread pool is
limited by the throughput of this service as well.</p>
<p>It is obvious but we often forget it.</p>
<p>Of course, one of the most important resources for thread pool is CPU. We can get the total number of CPUs that we have
as follows:</p>
<div class="highlight"><pre><span></span><code>int numOfCores = Runtime.getRuntime().availableProcessors();
</code></pre></div>
<p>It was a classic way to get number of CPUs for many years. But be careful with this command if you run your service in a
container environment. *Without specifying any constraints, a containerized process will be able to see the hardware on
the host OS.</p>
<p>*Here are some nice articles on this topic: <a href="https://mjg123.github.io/2018/01/10/Java-in-containers-jdk10.html">Better Containerized JVMs in
JDK10</a></p>
<p>and: <a href="https://jaxenter.com/nobody-puts-java-container-139373.html">Nobody puts Java in a container</a>.</p>
<p>Other constraints like memory, file handles, socket handles, could be critical as well.</p>
<h2>Just give me the formula!</h2>
<p>Brian Goetz in his famous book "Java Concurrency in Practice" recommends the following formula:</p>
<div class="highlight"><pre><span></span><code> Number of threads = Number of Available Cores * (1 + Wait time / Service time)
</code></pre></div>
<p><strong>Waiting time</strong> - is the time spent waiting for IO bound tasks to complete, say waiting for HTTP response from remote
service.</p>
<p>(not only IO bound tasks, it could be time waiting to get monitor lock or time when thread is in WAITING/TIMED_WAITING
state)</p>
<p><strong>Service time</strong> - is the time spent being busy, say processing the HTTP response, marshaling/unmarshaling, any other
transformations etc.</p>
<p>Wait time / Service time - this ratio is often called <em>blocking coefficient</em>.</p>
<p>A computation-intensive task has a blocking coefficient close to 0, in this case, the number of threads is equal to the
number of available cores. If all tasks are computation intensive, then this is all we need. Having more threads will
not help.</p>
<p><em>For example:</em></p>
<p>A worker thread makes a call to a microservice, serializes response into JSON and executes some set of rules. The
microservice response time is 50ms, processing time is 5ms. We deploy our application to a server with a dual-core
CPU:</p>
<div class="highlight"><pre><span></span><code> 2 * (1 + 50 / 5) = 22 // optimal thread pool size
</code></pre></div>
<p>But this example is oversimplified. Besides an HTTP connection pool, your application may have requests from JMS and
probably a JDBC connection pool.</p>
<p>If you have different classes of tasks it is best practice to use multiple thread pools, so each can be tuned according
to its workload.</p>
<p>In case of multiple thread pools, just add a target CPU utilization parameter to the formula.</p>
<p>Target CPU utilization [0..1], 1 - means thread pull will keep the processors fully utilized).</p>
<p>The formula becomes:</p>
<div class="highlight"><pre><span></span><code> Number of threads = Number of Available Cores <span class="gs">* Target CPU utilization *</span> (1 + Wait time / Service time)
</code></pre></div>
<h2>Little's law</h2>
<p>At this step we can get an optimal thread pool size, we know our theoretical upper bounds and we have some metrics in
place. But how does the number of parallel workers change the latency or throughput?</p>
<p><a href="https://en.wikipedia.org/wiki/Little%27s_law">Little's law</a> can be used to answer this question. The law says that the
number of requests in a system equals the rate at which they arrive, multiplied by the average amount of time it takes
to service an individual request. We can use this formula to calculate how many parallel workers there should be to
handle a predefined throughput at a particular latency level.</p>
<div class="highlight"><pre><span></span><code>L = λ * W
L - the number of requests processed simultaneously
λ – long-term average arrival rate (RPS)
W – the average time to handle the request (latency)
</code></pre></div>
<p>Using this formula, we can calculate the system capacity, or how many instances running in parallel we need in order to
handle the required number of requests per second with a stable response time.</p>
<p>Let's get back to our example. We have a service with average response time 55ms (50 wait time + 5 service time) and
thread pool size with 22 worker threads.</p>
<p>Applying Little's law formula we get:</p>
<div class="highlight"><pre><span></span><code><span class="mf">22</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="mf">0.055</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">400</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">requests</span><span class="w"> </span><span class="n">per</span><span class="w"> </span><span class="n">second</span><span class="w"> </span><span class="n">our</span><span class="w"> </span><span class="n">service</span><span class="w"> </span><span class="n">can</span><span class="w"> </span><span class="n">handle</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">stable</span><span class="w"> </span><span class="n">response</span><span class="w"> </span><span class="n">time</span>
</code></pre></div>
<h2>Conclusion</h2>
<p>These formulas are not a silver bullet and cannot magically fit any projects but they could be a great starting point
for your project. The disadvantage of the formulas is that they focus on the average number of requests in the system
and might not suit for various traffic burst patterns. You can start with the values calculated by these formulas and
then adjust your thread pool properties after load testing.</p>
<p>And one more time - “measure don’t guess”!</p>End-to-end load testing Zalando’s production website2019-04-11T00:00:00+02:002019-04-11T00:00:00+02:00Brian Mahertag:engineering.zalando.com,2019-04-11:/posts/2019/04/end-to-end-load-testing-zalandos-production-website.html<p>How we made sure we stayed online for Black Friday 2018</p><p>Black Friday is the busiest day of the year for us, with over <a href="https://corporate.zalando.com/en/newsroom/en/stories/infographic-zalandos-black-friday-2018-results">4,200 orders per
minute</a> during the
event in 2018. We need to make sure we’re technically able to handle the huge influx of customers.</p>
<p>As a part of our preparations we ask all of our teams to perform load tests to ensure their individual components will
handle the expected load. In addition, and due to the distributed nature of our <a href="https://engineering.zalando.com/posts/2018/12/front-end-micro-services.html">system's
architecture</a>, we also need to ensure it will handle the
expected load once all components have to work together. To ensure this, we simulate real user behaviour using different
<em>scenarios</em> that contain the most common user actions (e.g. visiting the homepage, browsing the catalogue, adding an
item to cart, checking out) on a large scale on the production system.</p>
<p>In preparation for Black Friday 2018 our Testing & Quality Strategy team, in cooperation with our SRE (Site Reliability
Engineering) team, took on the challenge of providing the tooling required to perform these simulations.</p>
<p><strong>A new set of tools</strong></p>
<p>Our starting point was to look at what was done to prepare for Black Friday 2017. We reviewed a tool that had been
created internally to perform end-to-end load testing. It used scenarios written in JavaScript and ran using a
distributed set of <a href="https://pptr.dev/">Puppeteer</a> nodes, each of them interacting with an instance of a Chrome browser.
Unfortunately, due to the heavy usage of resources by the browser instances at such a large scale, it was prohibitively
expensive to run and so couldn’t be used again.</p>
<p>We went back to the drawing board and, along with feedback gathered from stakeholders that were involved in the previous
year’s efforts, started to design a new solution.</p>
<p>We first looked at existing load testing tools such as <a href="https://jmeter.apache.org/">JMeter</a>,
<a href="https://locust.io/">Locust</a>, and <a href="https://github.com/tsenart/vegeta">Vegeta</a>; all of which we had previous experience
with. We quickly realised that, whilst they all individually had their merits, none of them alone completely solved the
problem.</p>
<p>We needed a way of recording scenarios representing a user interacting with our website in order to simulate traffic
from real users. What's more, we needed a method of translating the scenarios into load test scripts that could be
replayed in a lightweight manner and reused. Finally, we needed a mechanism for cost-effective scaling of the load.</p>
<p>After a few design rounds we came up with the following multi-tool solution:</p>
<p><strong>Locust</strong>
From the learnings of the previous year, we knew that creating our own load test runner from scratch would not be
feasible, nor desirable, in the time we had. Therefore, we decided the core of our solution would be one of the already
existing load testing tools that we had previously investigated. We settled on using Locust due to its <a href="https://docs.locust.io/en/stable/running-locust-distributed.html">in-built ability
to run in a distributed mode</a> and its support for
scripting (it uses Python files as inputs).</p>
<p><strong>HAR files</strong>
In order to easily record the scenarios, we realised we could again reuse existing technologies: a web browser’s session
can be easily exported by modern browsers as <a href="https://en.wikipedia.org/wiki/.har">HAR (HTTP Archive) files</a>. This,
however, presented us with a new challenge: how do we convert these HAR files into something Locust can run?</p>
<p><strong>Transformer</strong>
We built <a href="https://github.com/zalando-incubator/Transformer/">Transformer</a> to convert the scenarios recorded as HAR files
into Locust’s input format, a Python file (the "locustfile").</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/272697ee7923d8c67cbda1b3bde0c3fa2dd02081_har_to_locust.png?auto=compress,format"></p>
<p>Transformer considers each HAR file as a single scenario. It takes every HTTP request recorded there, and expresses it
in Locust's words. The result is a locustfile that exactly replays these requests. Transformer can combine multiple HAR
files (i.e. multiple scenarios) into a single locustfile, allowing to replay many scenarios in the same load test, each
with its own customizable amount of load (more users visit the catalog than the Help page). And because there are always
exceptions, a plugin mechanism allows to arbitrarily modify and enrich each request by injecting pre- and
post-processing code in the locustfile. This allowed us to, amongst other things, replay dynamic requests requiring
temporary, JavaScript-generated tokens without actually executing any JavaScript.</p>
<p><strong>Zelt</strong>
The final piece of the puzzle, cost-effectively generating the required load at large scale, was solved by <a href="https://kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/public-presentations.html">our in-house
Kubernetes infrastructure</a>. We
built Zelt to orchestrate the execution of Transformer, the distribution of the generated locustfile, and the deployment
of the Locust controller and worker nodes into one of our Kubernetes clusters. It allowed us to easily provision,
scale-up/down, and execute our load tests.</p>
<p><strong>One more for the road</strong>
Another tool, a Node.js library called PuppetHAR, was created to allow us to programmatically generate HAR files from
Puppeteer scripts rather than manually in the browser; ultimately this was never used.</p>
<p><strong>In practice</strong>
We built these tools in close collaboration with our SRE team. They provided us with the scenarios, crafted using data
from our analytics teams to represent real user journeys through the Zalando website. They also provided us with the
inputs to the equations required to translate our internal target metrics, in requests per second (RPS), to Locust’s
input format of number of concurrent users.</p>
<p>To run the load tests, virtual <a href="https://corporate.zalando.com/en/newsroom/en/stories/situation-room-mastering-cyber-week">situation
rooms</a> were created including
us, SRE, and members of the component teams. Using the previously created locustfile, we used Zelt to deploy the load
testing infrastructure in Kubernetes, and used the Locust dashboard to initiate and control the tests.</p>
<p>As the tests were running, the teams that owned various component receiving the load were monitoring their production
components using our in-house monitoring tools and would let us know if and how things were showing signs of strain
under load. We used the same monitoring tools to observe our progress towards reaching our load targets and concluded
the tests once they had been reached and sustained for a period of time (or if a component team requested us to stop
because of a bottleneck found).</p>
<p>In our final configuration, we ended up running four Locust stacks consisting of 300 nodes each, and reached a total of
130,000 RPS observed.</p>
<p><strong>Learnings</strong>
Overall, the project was a success. We were able to execute end-to-end load tests against the production website on a
scale larger than the actual traffic received during the peak of the Cyber Week campaign. Thanks to this, the teams were
able to act upon the information gathered, discover their optimal scaling configuration, and fix the bottlenecks that
were discovered all before Black Friday.</p>
<p>Throughout the process, however, we faced some challenges that we needed to overcome.</p>
<p><strong>Reverse engineering</strong>
With all record and playback methods, there is no guarantee that what you record will be replayable without error as
states tend to change over time.</p>
<p>Our tooling was no different and we faced this issue frequently. Session identifiers would expire, articles would go out
of stock, rate limiting would kick-in, and security measures would catch us out.</p>
<p>For each instance we had to essentially reverse-engineer our own website and work out which piece was tripping us up and
how to work around it. Not only was this a technical challenge but also one of communication and coordination as we
needed to find the teams responsible for the components we were fouling and work with them to find solutions.</p>
<p>Often we could only verify our solutions during a load test as the symptoms would only appear in high-load scenarios,
this was obviously costly and slow. In order to try to alleviate this, we started working even closer with the component
teams, bringing them to sit with us and pair on developing solutions whilst they monitored their systems for us.</p>
<p><strong>Locust</strong>
We were happy with Locust initially, but as our solution grew more specific and the scale of the load increased, the
disadvantages of the tool started to show up.</p>
<p>Two of the Locust features that we relied on the most were the distributed mode of the test runners and the weight
system for the scenarios. As we learned the hard way, unfortunately the two features combined <a href="https://github.com/locustio/locust/issues/724">don’t work as expected on
a large scale</a>. We soon started to realize that the health of the Locust
project is far from what we hoped - some very old issues were not fixed, new issues were not addressed and the
maintainers were not responsive. By this time it was already too late to change the tool. Eventually we forked the
project and made the necessary changes to immediately address the most painful issue.</p>
<p><strong>Next steps
</strong>At the time of writing, we’ve already open-sourced <a href="https://github.com/zalando-incubator/Transformer">Transformer</a> and
<a href="https://github.com/zalando-incubator/zelt">Zelt</a>, and plan to open-source PuppetHAR in the future; so keep an eye on
our <a href="https://github.com/zalando-incubator/">Zalando Incubator homepage</a>!</p>
<p>Internally, we’re already preparing for Black Friday 2019 and continue to improve our tools and processes for ensuring a
smooth customer experience during any and all high-load situations.</p>Developing Zalando APIs2019-04-04T00:00:00+02:002019-04-04T00:00:00+02:00Maxim Tschumaktag:engineering.zalando.com,2019-04-04:/posts/2019/04/developing-zalando-apis.html<p>How Zalando software engineers develop internal and external APIs</p><h3><strong>How Zalando software engineers develop internal and external APIs</strong></h3>
<p>Imagine a distributed system consisting of 8,000+ active service applications; developed and operated by 300+ delivery
teams in <a href="https://jobs.zalando.com/tech/locations/">six tech hubs</a>. 1,200+ software engineers use <a href="https://opensource.zalando.com/tech-radar/">various
technologies</a> to implement business needs and are responsible end-to-end for
those components.</p>
<p>A pretty complex system of people and software. And a real challenge to manage the complexity and balance fast delivery
and technical dept.</p>
<p>We believe that interfaces are highly valuable technical assets. That’s why we decided early on standards for API
engineering including a common API specification language for RESTful service-to-service communication. In our case, it
is the <a href="https://www.openapis.org/">OpenAPI</a> standard for synchronous REST interface specification and <a href="http://json-schema.org/">JSON
Schema</a> for asynchronous events.</p>
<h3>API-as-a-Product and API First Principle</h3>
<p>Zalando is customer-obsessed. As software engineers at Zalando, we <a href="https://opensource.zalando.com/restful-api-guidelines/#api-as-a-product">treat our APIs as
products</a>, always putting ourselves “in the
customer’s shoes.” The best way to provide value is to create a well-designed, explicitly defined, discoverable,
reusable, easy-to-understand interface which implements the demanded functionality.</p>
<p>We believe in the <a href="https://opensource.zalando.com/restful-api-guidelines/#api-first">API First principle</a> and always
follow it. It allows us better alignment between a service provider and consumers (i.e. contract) and contributes
greatly to the API and overall system design quality.</p>
<p>And here is how we typically develop an API:</p>
<h3>API Design</h3>
<p>Often it starts with a business requirement or an idea for a new product. As a software engineer, I make myself familiar
with the domain and the requirements. Already, I think about who the potential consumers of my new API are, how they
interact with the interface and what are the main building parts (business processes, resources) of the domain and the
API.</p>
<p>The next step is to draft an API outside the code first. We adopted RESTful API web service principles with JSON as main
payload format, and use <a href="https://www.openapis.org/">OpenAPI</a> Specification language (a.k.a. Swagger Spec) as format for
our API descriptions.</p>
<p>API design is a crucial aspect of the API quality. In order to have the same look-and-feel experience for the API
consumers and to raise the quality bar of APIs, our engineers and architects condensed their knowledge and experience in
<a href="https://opensource.zalando.com/restful-api-guidelines/">Zalando API Guidelines</a>. I consult them often for design
principles and best practices when drafting a new API.</p>
<p>Zalando’s API Portal provides a central repository where API specifications of all deployed services can be discovered.
I regularly check related APIs to learn from API design practices and to align my application API with other service
APIs of our ecosystem.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/744bc650cffbdf996f309d276f1830f6bd26f123_api-portal.png?auto=compress,format"></p>
<p>The API Portal is the central hub for all API-related information. I can use a comprehensive search to find APIs with
their deployment and version information. Basically, I get all I need here to be able to consume the API: contact and
deployment information, service location, authorization and authentication requirements, and the most important part:
the OpenAPI specification of the interface. This is a great source for examples and inspirations providing even the
history of APIs.</p>
<p>With all information in place, I can draft a very first version of the API specification, using the editor of my choice,
be it Zalando’s IntelliJ IDEA “ <a href="https://plugins.jetbrains.com/plugin/8347-swagger-plugin">Swagger Plugin</a>”, <a href="https://swagger.io/tools/swagger-editor/">Swagger
Editor</a>, or vi.</p>
<p>All API specifications have to be compliant to our <a href="https://opensource.zalando.com/restful-api-guidelines/">API
Guidelines</a>. This ensures the same quality and look-and-feel
experience across all Zalando APIs. The <a href="https://engineering.zalando.com/posts/2015/09/on-apis-and-the-zalando-api-guild.html">API
Guild</a> is the owner of the
guidelines, but everyone is encouraged to contribute.</p>
<p>Becoming and staying compliant creates some efforts. Fortunately, some of the guideline rules can be automatically
checked by <a href="https://github.com/zalando/zally">Zally - our API Linter</a>. Zally is a set of open source tools to automate
compliance and quality assurance of RESTful APIs. It’s able to check lower-level aspects like the format, naming, as
well as higher-level interface specification details like error handling and security.</p>
<p>Now it is time to get some real feedback.</p>
<h3>Early Review and Feedback</h3>
<p>After a team-internal discussion and prototyping work, I ask our peers, the API consumers and other stakeholders, for
feedback. They should get the best experience and be able to easily integrate it into their components. Typically, I
create a GitHub (Enterprise) pull request, a great tool for collaborative reviews, on the API specification file. If the
review is a bigger one (new prominent, external, highly used API, or a bigger change) I additionally invite a special
group of API enthusiasts, the API Guild, and involved architects. They provide feedback on API guidelines compliance and
best practises, and inspire me to improve and harden my API design.</p>
<h3>API Implementation</h3>
<p>After the API design is aligned, implementation of the service is the easiest and the most fun part. We have a polyglot
microservice application environment. Based on our <a href="https://opensource.zalando.com/tech-radar/">Zalando Tech Radar</a>
principles, our teams have high autonomy to pick the best technologies to implement their services. Hence, there are
lots of ways to realize the API. Depending on the implementation use-case, I would pick, for instance, Spring for Java
or Kotlin, Akka HTTP for Scala application, or would Go for Resty. If I decide to use Python this time, our open sourced
<a href="https://github.com/zalando/connexion">Connexion</a> framework will implement a big part for me. It handles HTTP requests
as specified in API specification and maps endpoints to Python functions. Many teams manually implement the API
definitions. Sometimes, generators are also used to create, for example, Java or Scala client and server stubs out of
the API specification.</p>
<h3>Publishing and Operation</h3>
<p>In order to promote my newly implemented service, I’m going to publish its API. This is done via deployment artifact, in
our case a Docker image. All I need to do is to include the API specification into the image. That’s it. After a
deployment to our Kubernetes production infrastructure, the API and all context information appears in API Portal. From
now on the API’s history is tracked and it can be discovered by everyone at Zalando.</p>
<p>From the first deployment on, I’m interested in the performance of my API. With some lines of (Kubernetes deployment)
configuration, I can activate monitoring and get a ( <a href="https://opensource.zalando.com/zmon/">ZMON</a>) monitoring dashboard
“for free.” It is endpoint-based and provides metrics like the number of requests per status code classes, latency
(incl. percentiles), and some basic client load monitoring. Additionally, I can easily configure authorization &
authentication settings and rate limitations for the endpoints via deployment configuration. Especially in the times of
many microservices, this infrastructure features it is a great relief from the operational perspective.</p>
<h3>Conclusion and Outlook</h3>
<p>Our vision is to build new business capabilities in days, not in weeks, to be highly efficient in engineering and
operation of our SaaS ecosystem at scale, based on consistent high quality APIs that are sustainable and fun to use. We
are now closer to this vision due to our tools and infrastructure features, like API design principles and guidelines,
open API review culture, API portal, and API monitoring.</p>
<p>We are happy that our principles and tools find adoption outside Zalando by other tech companies and API enthusiasts.
Our open source <a href="https://opensource.zalando.com/restful-api-guidelines/">API Guidelines</a> and <a href="https://github.com/zalando/zally">API
Linter</a> gain external contributors and improve every day.</p>
<p>We plan to enrich our API service infrastructure with features like out-of-the-box monitoring,
authentication/authorization, rate limitation. Our API Portal will be a central access hub for relevant API service
operation information (e.g. like hostnames of deployed API services, effective rate limits) and will support backward
compatibility checks and subscriptions for notification on API changes, and much more. We will raise adoption and
developer experience via application-centric integration of all infrastructure services consistently supporting the
developer productivity journey over design, code, build, deploy, and operate phases.</p>
<p><em>If you want to learn more about API engineering at Zalando, please also check out InfoQ interview <a href="https://tinyurl.com/4efc8ss7">How Zalando Delivers
APIs with autonomous teams</a>, and earlier tech blog
post <a href="https://engineering.zalando.com/posts/2015/09/on-apis-and-the-zalando-api-guild.html">On APIs and the Zalando API Guild</a>.</em></p>A Story of Rust2019-03-28T00:00:00+01:002019-03-28T00:00:00+01:00Christian Douventag:engineering.zalando.com,2019-03-28:/posts/2019/03/story-rust.html<p>Introducing Rust in an Enterprise Environment</p><p>Introducing Rust in an Enterprise Environment</p>
<h3>Discovery</h3>
<p>Sometime in 2013, on the internet I stumbled over a new programming language called Rust. Taking a look at the language,
I was impressed by its high level features. At that time I was a backend Scala developer with a .Net background. When
examining Rust, I found most of the features I used every day like Pattern Matching, the “New Type Pattern” and a “Scala
like” Iterator API. But there was also something I really missed: No Nulls and no Exceptions. While also being a
low-level language without a garbage collector I was convinced to further follow the language progressing.</p>
<h3>Early Prototyping</h3>
<p>It was in 2016 when I joined Zalando as a Scala Developer. After half a year we were thinking about introducing a new
application for a simple task. Somehow the question came up on what technology to use and Rust was suddenly mentioned.
We did a prototype quickly, and implementing it was quite easy. It also turned out that implementing a domain model was
very painless, especially regarding serialization due to Rust’s high level abstractions. Unfortunately, we did not need
the application anymore but nevertheless Rust proved to be a valid candidate for solving our problems.</p>
<h3>The Experiment</h3>
<p>A short time later, we had some problems with our main service. It was a Scala web service that resides at a critical
position within Zalando. Under high load, the application consumed great amounts of memory and sometimes even crashed
with the GC running out of memory at almost 100% CPU load. This forced us to massively overscale the application. So we
asked ourselves what would happen if we rewrote the application in Rust. We did just that and it took just a few days to
reimplement the application. Load tests revealed that the Rust application had much better latencies, consumed less
memory and less CPU than the Scala application under the same load and, even better, it could handle more load without
crashing. It is of course always easier to rewrite an application than writing it from scratch.</p>
<p>We added some more features and then considered to take it live. This was where we faced the first challenge. Our lead
reminded us that Rust currently is not an “official” technology within Zalando and that taking the application live
would be a serious risk. That was of course correct. Our lead asked us to collect the requirements for safely taking
such an application live.</p>
<p>Afterwards, we approached Zalando’s Technologists Guild and presented our results during a Tech Stand Up. With our
Technologists Guild, we came to the conclusion that Rust should stay with the “Assess” state on our <a href="https://opensource.zalando.com/tech-radar/">Tech
Radar</a> until we gathered more experience. We also collected requirements for
deploying a Rust application but unfortunately things came to a halt since we had to focus on other topics.</p>
<p>What happened was that we started to implement some tooling in Rust.</p>
<h3>Justifying Rust</h3>
<p>It was in the middle of 2017 when we needed to implement a new service. By that time we already had a Rust Study Group
running and the Rust ecosystem evolved further. Since we knew that we couldn’t just start a service, we asked our lead
whether we could do it with Rust. It was a simple streaming application doing some REST calls and writing data to
Redis.</p>
<p>We asked our lead and again he had serious concerns. We would need really good reasons to use Rust over Scala, which was
still our main technology stack. He also had serious concerns on whether the tooling was ready for productive usage and
the question on how to onboard new team members with such a technology would also have to be answered too. There were of
course more questions and the stakes were high, but completely understandable from a lead’s perspective.</p>
<p>In the following weeks, reasons for using Rust were collected. We started to analyze the problems we had with our
current applications and figured out how those problems could be avoided with Rust. Of course there was also the
performance argument but that was definitely not the most important reason. The main reasons were Rust’s safety and
productivity features. But there was one more thing: With Rust we were able to use resources efficiently and there was
already the plan to move to Kubernetes. Being able to have small pods running on Kubernetes could be a real cost
saver.</p>
<p>There was a lot of communication with our lead and we got valuable feedback on the topics where we might need a bit more
reasoning. Well, things were moving slowly and the end of the year was near. At that point in time we had serious doubts
that we would ever use Rust for productive systems.</p>
<h3>When things become real</h3>
<p>It was at the end of 2017 when it was announced that the teams would be restructured due to changing requirements. We
were a team of six developers and would be reduced to four. When this was announced to us there was also another
revelation: Our lead said that from now on we would be a “Rust Team”. That was really unexpected and I have to admit
that I did not really know what to respond to that.</p>
<p>Since we were planning to replace our old system with a new one, we almost immediately started to implement the first
service we needed. It was a rather simple CRUD service, which was a good opportunity to onboard some of the team members
to Rust. The service was ready to be used more quickly than expected, even though it was not yet fully finished. Since
we needed more applications to reach our goal, we started to implement the smallest applications in parallel, thereby
gradually increasing the difficulty level for the team to the final service which fully utilizes non-blocking IO.</p>
<p>In the end we managed to reach our goal in time, thereby introducing a new technology. Currently we have two REST
services, a streaming application and multiple batching applications written in Rust all running on Kubernetes. The new
applications have been live serving data for two countries over 2018 and are expected to serve even more countries in
the near future. The resource usage of our applications is far below our former Scala services and reduce costs
remarkably.</p>
<h3>Conclusion</h3>
<p>With Rust, one can build microservices taking the word “micro” literally. Rust gives the developer an “if it compiles it
runs” experience which allows focus on business logic. Refactoring and even reengineering can be done quite fearlessly.
The compiler is very helpful and even suggests solutions. A newcomer coming from Scala or C# already knows concepts
like closures and the Iterator API which makes things a lot easier. And there is the borrow checker. Given enough
support, newcomers can learn to handle it while still being productive. But one still has to be a bit resistant to pain
when it comes to compile times and a lack of an easy-to-use “corporate” version of crates.io. When starting a project it
is beneficial to have an experienced Rust developer on board and to not just start from scratch. We are still waiting
for a stabilization of futures and async/await and the web ecosystem to become more mature since it is currently a
challenge to choose an appropriate web framework/toolkit.</p>
<p>For us, Rust has so far been a story of success and it is likely that it will stay like this.</p>Running Apache Flink on Kubernetes2019-03-22T00:00:00+01:002019-03-22T00:00:00+01:00Tobias Bahlstag:engineering.zalando.com,2019-03-22:/posts/2019/03/running-apache-flink-on-kubernetes.html<p>What I learned deploying Flink and a stream processing application on Kubernetes</p><p>Recently, I was developing a small stream processing application using <a href="https://flink.apache.org/">Apache Flink</a>.
Zalando uses Kubernetes as the default deployment target, so naturally I wanted to deploy Flink and the developed job to
our Kubernetes cluster. I learned a lot about Flink and Kubernetes along the way, which I want to share in this
article.</p>
<p><strong>Challenges</strong></p>
<p>Compliance - At Zalando, all code running in production has to be reviewed by at least two people and all deployed
artifacts have to be traceable to a git commit. The <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.7/tutorials/local_setup.html#run-the-example">default
way</a> of
deploying Flink Jobs is to upload a JAR containing the Job with any other required dependencies to a running Flink
cluster. This is not compatible with our internal compliance guidelines.</p>
<p>Container Orchestration Readiness - One of the key selling points of Flink is to do <em>fault tolerant</em> stream processing.
However - as will be outlined in the next section - the reliability features were not designed with container
orchestration systems in mind, which makes operating a Flink cluster on Kubernetes not as straightforward as it could
be.</p>
<p>Fragmented Documentation - Both Flink and Kubernetes are evolving quickly, rendering some documentation (especially blog
posts like this one and forum/newsgroup posts) out of date. Unfortunately, the official documentation currently does not
provide every information that is needed to run Flink in a reliable way on Kubernetes.</p>
<p><strong>Flink Architecture & Deployment Patterns</strong>
In order to understand how to deploy Flink on a Kubernetes cluster, a basic understanding of the architecture and
deployment patterns is required. Feel free to skip this section if you are already familiar with Flink.</p>
<p>Flink consists of two components, Job Manager and Task Manager. The Job Manager coordinates the stream processing job,
manages job submission and its lifecycle and allocates work to Task Managers. Task Managers execute the actual stream
processing logic. There should always be exactly one active Job Manager and there can be <em>n</em> Task Managers.</p>
<p>In order to enable resilient, stateful, stream processing, Flink uses <em>Checkpointing</em> to periodically store the state of
the various stream processing operators on durable storage. When recovering from a failure, the stream processing job
can resume from the latest checkpoint. Checkpointing is coordinated by the Job Manager - notably, the Job Manager knows
the location of the latest completed checkpoint which will get important later on.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e4659175b2b5f3efbfca911300b6e9828368c687_1.png?auto=compress,format"></p>
<p>Flink Clusters can be run in two distinct modes: The first mode, called <em>Standalone</em> or <em>Session Cluster,</em> is a single
cluster that is running multiple stream processing jobs. Task Managers are shared between jobs. The second mode is
called <em>Job Cluster</em> and is dedicated to run a single stream processing job.</p>
<p>A Flink Cluster can be run in <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/jobmanager_high_availability.html">HA
mode</a>. In this mode,
multiple Job Manager instances are running and one is elected as a leader. If the leader fails, leadership is
transferred to one of the other running Job Managers. Flink uses ZooKeeper for handling Leader Election.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/446af88a3e0dbd027722ed9126f61191fda098f0_2.png?auto=compress,format"></p>
<p><strong>Kubernetes Deployment</strong>
Out of the two modes described in the previous section, we chose to run Flink as a Job Cluster. Two reasons drove the
decision: The first reason is that the Docker image for Job Clusters needs to include the JAR with the Flink Job. This
neatly solves the compliance problem as we can re-use the same workflow as we are using for regular JVM applications.
The second advantage is that this deployment model allows to scale Task Managers independently for each Flink Job.</p>
<p>The Job Manager is modeled as a Deployment with one replica, Task Managers as a Deployment with <em>n</em> replicas. The Task
Manager discovers the Job Manager via a Kubernetes Service. This setup deviates from the official documentation that
recommends running the Job Manager of a Job Cluster as a Kubernetes Job. We think that using a Deployment is the more
reliable option in this case (which is a never-ending streaming job) as the Deployment will make sure that one pod is
always running whereas a Job could complete, leaving the cluster without any Job Manager. This is why our setup
resembles the one describing a session cluster in the
<a href="https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/kubernetes.html#session-cluster-resource-definitions">documentation</a>.</p>
<p>Failures of Job Manager pods are handled by the Deployment Controller which will take care of spawning a new Job
Manager. Since this is usually a relatively fast operation, this frees us from the need to maintain multiple Job
Managers in hot-standby, which would increase the complexity of the deployment. Task Managers address the Job Manager
with a Kubernetes Service.</p>
<p>As outlined above, the Job Manager keeps some state related to checkpointing in it’s memory. This state would be lost on
Job Manager crashes, which is why this state is persisted in ZooKeeper. This means that even though there is no real
need for the leader election and -discovery part of Flink’s HA mode (as is this handled natively by Kubernetes), it
still needs to be enabled just for storing the checkpoint state.</p>
<p>As we already had an etcd cluster and etcd-operator deployed in our Kubernetes cluster, we did not want to introduce
another distributed coordination system. We gave <a href="https://github.com/etcd-io/zetcd">zetcd</a> a try which is a ZooKeeper
API backed by etcdv3. This setup works fine, so we decided to stick with it.</p>
<p>One other issue we faced with this setup was that the Job Manager sometimes got stuck in an unhealthy state that only
could be fixed by restarting the Job Manager. This is done by a <em>livenessProbe</em> which checks if the Job Manager is still
healthy and the job is still running.</p>
<p>It is also noteworthy that this setup only works correctly with Flink > 1.6.1 as there was <a href="https://issues.apache.org/jira/browse/FLINK-10291">this
bug</a> that prevented resuming from checkpoints in job clusters.</p>
<p><strong>Conclusion</strong>
The above setup is now running in production for a couple of months and is serving our use case well. This is showing
that it is possible to reliably run Flink on Kubernetes, even though there are some small roadblocks on the way.</p>
<p><strong>Going Further</strong></p>
<ul>
<li>
<p><a href="https://youtu.be/w721NI-mtAA">“Flink in Containerland”</a> by Patrick Lucas - main inspiration of the points of this
post</p>
</li>
<li>
<p><a href="https://youtu.be/4B1Dd2qYDGQ">“Redesigning Flink’s Distributed Architecture”</a> by Till Rohrmann</p>
</li>
</ul>Open Source: February Updates - Release new projects, join Google Summer of Code Program2019-03-17T00:00:00+01:002019-03-17T00:00:00+01:00Hong Phuc Dangtag:engineering.zalando.com,2019-03-17:/posts/2019/03/oss-february-updates.html<p>This is a recap of open source activities and development at Zalando in the month of February.</p><h2>Project Highlights</h2>
<ul>
<li>
<p><a href="https://github.com/zalando-incubator/kube-metrics-adapter"><strong>Kube Metrics Adapter</strong></a> gained community attention as it was featured in a medium post <a href="https://medium.com/google-cloud/kubernetes-autoscaling-with-istio-metrics-76442253a45a">'Kubernetes autoscaling with Istio metrics'</a>. Users provided very positive feedback on the project. Kube Metrics Adapter is currently maintained by Developer Productivity team at Zalando. It is a general purpose metrics adapter for Kubernetes that can collect and serve custom and external metrics for Horizontal Pod Autoscaling.</p>
</li>
<li>
<p><a href="https://github.com/zalando-incubator/introscope"><strong>Introscope</strong></a> is a newly released project. It is a babel plugin and a set of tools for delightful unit testing of modern ES6 modules. It allows you to override imports, locals, globals and built-ins (like Date or Math) independently for each unit test by instrumenting your ES6 modules on the fly.</p>
</li>
<li>
<p><a href="https://github.com/zalando/postgres-operator"><strong>Postgres Operator</strong></a> is accepted as a mentor organization of Google Summer of Code, a global program focused on bringing more student developers into open source software development. This is the first year we participate in Google Summer of Code with Postgres Operator - a project to create an open-sourced managed Postgresql service for Kubernetes. Students can submit their proposal until April 9 -> <a href="https://summerofcode.withgoogle.com/organizations/5429926902104064/">Apply Now</a></p>
</li>
</ul>
<p><img alt="GSoC" src="https://engineering.zalando.com/posts/2019/03/gsoc.png"></p>
<hr>
<h2>Cloud Native: Bug squashing night!</h2>
<p>We are inviting users and contributors of Zalando Cloud Native Applications to meet project maintainers at our tech office here in Berlin. We will spend this evening together answering users questions, reviewing pull requests, improving documentations and fixing as many bugs as possible. <a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/259892690">Sign up now!</a></p>
<p>Participating projects:</p>
<ul>
<li><a href="https://github.com/zalando-incubator/kubernetes-on-aws">Kubernetes on AWS</a></li>
<li><a href="https://github.com/zalando/skipper">Skipper</a></li>
<li><a href="https://github.com/zalando/postgres-operator">Postgres-operator</a></li>
<li><a href="https://github.com/kubernetes-incubator/external-dns">External-DNS</a></li>
<li><a href="https://github.com/zalando-incubator/kube-ingress-aws-controller">AWS Ingress Controller</a></li>
</ul>
<h2>Zalando Open Source Around The World</h2>
<p>Meet and connect with Zalando developers and project maintainers at open source events around the world:</p>
<p><a href="https://events.linuxfoundation.org/events/kubecon-cloudnativecon-europe-2019/"><strong>KubeCon Europe</strong></a>, Barcelona, May 20 - 23: There are two sessions conducted by <a href="https://twitter.com/try_except_">Henning Jacobs</a>, Head of Developer Productivity and <a href="https://github.com/mikkeloscar">Mikkel Larsen</a>, Senior Software Engineer. Check out more details below:</p>
<ul>
<li>
<p><a href="https://kccnceu19.sched.com/event/MPcP/es-operator-building-an-elasticsearch-operator-from-the-bottom-up-mikkel-larsen-zalando-se#">Es-operator: Building an Elasticsearch Operator From the Bottom Up</a>: The talk will walk through how the Elasticsearch operator was designed, what problems it solves and how building it from the bottom up allowed getting it in production fast, gather more learnings and later extending the featureset to make it less manual to operate and reducing the cost of the overall infrastructure.</p>
</li>
<li>
<p><a href="https://kccnceu19.sched.com/event/MPcM/kubernetes-failure-stories-and-how-to-crash-your-clusters-henning-jacobs-zalando-se#">Kubernetes Failure Stories and How to Crash Your Clusters</a>: This talk will show Zalando’s approach to Kubernetes provisioning on AWS, operations and developer experience, especially horror stories of operating 100+ clusters, lessons learned from incidents, failures, user reports and general observations.</p>
</li>
</ul>
<p><a href="https://2019.pgday.it/en/"><strong>PostgreSQL Day Italy</strong></a>, Bologna, May 16 - 17: <a href="https://twitter.com/erthalion">Dmitry Dolgov</a> will speak about ‘<a href="https://2019.pgday.it/en/schedule/#session-37">PostgreSQL at low-level</a>’. In this session, he will discuss how much impact different knobs and options of the Linux kernel have on PostgreSQL and why, what would happen if you run databases in virtualized environment or inside a container. Dmitry will share experiences of running PostgreSQL inside Kubernetes, show how to see what's going on inside and how to break something spectacularly.</p>
<p><a href="http://microxchg.io/2019/index.html"><strong>The Microservices & Serverless Conference in Berlin</strong></a>, Berlin, Apr 1 - 2: <a href="https://twitter.com/otrosien">Oliver Trosien</a> and <a href="https://twitter.com/mikkeloscar">Mikkel Larsen</a> will share how Zalando utilizes Kubernetes to operate large-scale Elasticsearch clusters during their presentation titled 'Operating Elasticsearch in Kubernetes'.</p>
<p><a href="https://devops-gathering.io/"><strong>Devops Gathering</strong></a>, Bochum, Mar 11 - 13: <a href="https://twitter.com/try_except_">Henning Jacobs</a> conducted a session on ‘Ensuring Kubernetes Cost Efficiency across (many) Clusters’. His talk provided insights on how Zalando approaches this problem with central cost optimizations (e.g. Spot), cost monitoring/alerting, active measures to reduce resource slack, and automated cluster housekeeping.</p>
<iframe width="600" height="312" src="https://www.youtube.com/embed/4QyecOoPsGU" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<hr>
<h1>More reading</h1>
<ul>
<li>
<p><a href="https://github.com/zalando-incubator/kubernetes-on-aws/blob/dev/docs/postmortems/jan-2019-dns-outage.md">Total DNS outage in Kubernetes cluster</a></p>
</li>
<li>
<p><a href="https://opensource.zalando.com/docs">Zalando Open Source Documentation</a></p>
</li>
<li>
<p><a href="https://opensource.zalando.com/tech-radar/">The Tech Radar: Zalando selection of technology choices</a></p>
</li>
</ul>Rotating Engineers at Zalando2019-03-14T00:00:00+01:002019-03-14T00:00:00+01:00Lothar Schulztag:engineering.zalando.com,2019-03-14:/posts/2019/03/rotating-engineers-at-zalando.html<p>Rotating engineers to establish cross-functional knowledge sharing, encourage cross team collaboration, and bring greater product awareness.</p><p>For the past year, our group of Engineering Leads worked to improve collaboration and cross functional communication
across teams. This was the result of team retrospectives and employee surveys indicating required improvement in these
areas. One initiative which we took to address these issues was to implement role rotation amongst engineers. The goal
of this developer rotation was to establish cross-functional knowledge sharing, encourage cross team collaboration
within the department, and bring greater product awareness.</p>
<p><strong>Preparation</strong>
In order to prepare for the rotation, engineers we first required to answer a few questions: which teams are involved,
who can rotate, for how long and how we can ensure business continuity. For our first implementation of team rotation,
we limited it to 5 teams in the Developer Productivity department and those at the lead level and above were excluded.
Next, we put forth an open sign-up form to gauge interest in those looking to take part in the rotation where 20% of the
engineers signed up. Given this interest, we concluded that with proper preparation, a two-week rotation could be done
without impacting deliverables. This preparation included ensuring each team has proper onboarding documentation for new
team members requiring teams to brush up their documentation, another added benefit. Additionally, teams were asked to
prepare a good first issue for the new team members. As in the open source world, many projects help new joiners with
issues labeled <a href="https://help.github.com/articles/helping-new-contributors-find-your-project-with-labels/">good first issue or
similar</a>. And last, each new
team member was paired with a mentor who could answer questions and provide context. We called those mentors buddies.
This setup allowed those who wanted mentoring experience a way to practice their skills in guiding, managing and
coaching.</p>
<p><strong>Rotation</strong>
In December 2018 we started the first two-week engineering rotation. Those taking part moved desks to their new teams.
Buddies were paired up with rotating engineers to get their environment setup. They said <em>hello</em> in team stand ups and
were involved in other team meetings like team retrospectives, team lunches and department stand ups. Buddies helped
rotators to start with good first issues and paired with them along the way. Some of the rotators also got to see
different Engineering Lead styles in 1 on 1 meetings. Other rotators participated in answering support questions.</p>
<p><strong>Feedback / What we learnt</strong>
A follow up retrospective revealed strong positive feedback. The Rotation was perceived as a valuable experience to
understand better what other teams do. Also it was a learning experience about other teams’ products. Rotations also
helped in exchanging ideas about different process workflows and problem solving approaches of Developer Productivity
teams.</p>
<p>Their fresh view without history brought a different perspective to the teams which was beneficial. A great example of
what I would like to call a success here was a deployment visualization that spanned backend and frontend components was
driven by rotating engineers. The users’ feedback was very positive for the feature, so it was rolled out to all
clusters soon after. This demonstrated rotating engineers were able to have end user impact.</p>
<p>What we learned from the rotation retrospective and final survey was the need to reconsider the timing next time.
December contains a holiday season at the end of the month. That influenced some rotations. First issues should be
shaped in a way that rotating engineers are able to finish the tasks within the rotation time frame. Another point
mentioned was rotations - not only for engineers - should be performed regularly.</p>
<p><strong>Conclusion</strong>
In retrospective, the initiative of rotating engineers was a success. Throughout the rotation period in Developer
Productivity, we were able to sharpen the awareness of the teams across their processes, workflows, tasks and methods.
Both buddies and rotating engineers shared their experiences and knowledge with their original teams. It also
highlighted improvement areas for team processes and tasks such as offboarding, integration testing and access roles.
The success of the initiative was further indicated by the request for future, regularly conducted, rotation
opportunities. Our goal is to continue with regular rotations and to expand beyond the engineering role to management
and supporting functions.</p>How to Rock your Next Product Training2019-03-11T00:00:00+01:002019-03-11T00:00:00+01:00Aleksandra Piwowarektag:engineering.zalando.com,2019-03-11:/posts/2019/03/how-to-rock-your-next-product-training.html<p>Need to introduce end-users into your product? It can be fun: we show you how</p><h3><em>Need to introduce end-users into your product? It can be fun: we show you how</em></h3>
<p>Want to give your users a great first experience with your new IT application? User trainings for your software product
are the perfect opportunity. As team WMS Training, we develop and deploy training solutions for tech products within the
world of Zalando Logistics, and today we’ll show you how to quickly and easily develop a training session for your
product.</p>
<p>We’ll start you off with three steps to creating a user-centred product training and then follow up with a few ideas to
make your training more fun, memorable, and engaging.</p>
<h3><strong>Three Steps to User-Centred Product Trainings</strong></h3>
<p>Imagine that you and your team have been working on a new feature. After weeks of alignment, stakeholder management, and
development everything is ready for Monday’s go live. You’re eager to finally see these weeks of work materialize into a
solution for your end users.</p>
<p>As you’re finishing up your last email of the day, one of your stakeholders pops by your desk with a “quick question.”</p>
<p>“Just a quick question. I know we’re going live next week. Will there be trainings for our end users?”</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/486a121a97bfdf0a7bd8d5311eeb9911dc12bf46_picture1_corrected.png?auto=compress,format"></p>
<p>To which you reply, “What? Oh yes, of course. We’ll show them how to use it.”</p>
<p>But as they walk away, doubts begin to well up inside of you:*The go-live is coming up! There’s no time to prepare
anything… What if I bore them?... But, I’m not a trainer… What if they don’t get it?</p>
<p>*Well, don’t worry. Even if you don’t have a lot of training experience or time, we’ve got three steps that will help
you develop a user-centred training session quickly and easily, and can be applied to live trainings, webinars, and
eLearnings.</p>
<p>**Step 1. Identify your target group and their learning objectives
**</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/74a73d2861a3e0715acaac9607a9d19cfac85e82_screen-shot-2019-03-07-at-6.36.29-pm.png?auto=compress,format"></p>
<p>Think back to the last training you attended that felt completely irrelevant to you. Maybe it was a standard safety
training, with a focus on lifting things properly, because you work for a logistics company.</p>
<p>Chances are that if the training felt irrelevant, the content was not aligned with your personal learning objectives. By
identifying your target group and their training goals, you’re managing your training like a product, with the learner
as your user – whose problems you want to solve.</p>
<p><strong>Example:</strong>
As a production manager, I want to learn how to pull current performance data from the system in order to evaluate my
department’s output and react to it.</p>
<p>Learning objectives can be framed like user stories, which can make learning objectives clear.</p>
<p><strong><em>Try it out now</em>:
</strong>Let’s work through an example to gain a better understanding. Imagine someone from your family, totally new to
smartphones, and a few of their friends have recently developed a passion for photography. They want to show it off, so
they decide they’d like to start using Instagram. But they need your help, as someone who is into tech matters. Describe
the target group in this example. Where and how should they be trained? What should they be able to do after the
training?</p>
<p>**Step 2. Design an assignment to check that you’ve achieved learning objectives
**</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/efc385b000f431fdcf4ead1d00766ee9044c0be8_screen-shot-2019-03-11-at-11.27.02-am.png?auto=compress,format"></p>
<p>How will you know that your audience has achieved the learning objectives? Test their knowledge along the way and give
everyone a chance to practice.</p>
<p>We all know that practice makes perfect. During training, you have the unique opportunity to give the user the chance to
practice with you around, before jumping into it on their own.</p>
<p>If your system is still under development, but accessible, you can have your users login and search for test data or
even ask the participants to perform exactly the tasks that they will have in the future.</p>
<p>If this isn’t feasible, never fear. You can easily integrate knowledge checks, with questions like: “I can enter
performance data into the production screen. True or false?” This allows you to reiterate and reinforce key points in an
interactive way.</p>
<p><strong>Try it out now:
</strong>It’s time to create an assignment that will show you that your family member and their friends have learned what they
needed to. Take five minutes to identify one activity that will show you that they have fulfilled the learning
objectives you outlined in Step 1.</p>
<p>**Step 3. Determine what learners need to know to complete your assignment
**</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2ff206ca09d4a0485d264155b8588ff663314fae_screen-shot-2019-03-11-at-1.22.37-pm.png?auto=compress,format"></p>
<p>Now we come to the final step: creating training content. Often many of us run into the following trap; we start by
focusing on content, collecting any and all materials we have: technical descriptions, complex flowcharts, stakeholder
presentations, etc.</p>
<p>The problem? A good deal of that material may not actually be relevant to your audience and may not help them achieve
their learning objective. If anything, it may overwhelm them. These three steps will help you avoid that trap.</p>
<p>Now that you’ve built a user-centred training, how can you go one step further and ensure that the training engages your
audience? The key here is to remember that your audience has a limited attention span, so avoid long explanations when
possible. Instead, break down big concepts into smaller ones and leverage interactivity to make sure that you haven’t
lost anyone.</p>
<p>We’ve found that teaching content in an interactive way engages our audience, gives them a chance to practice what
they’ve learned, and helps them to better remember important points. It has the added benefit of allowing us to check
what they’ve learned. We have some examples of how you can gamify your training in the following section.</p>
<p><strong>Try it out now:
</strong>How many of you have read technical manuals? How much fun are those? Instead, think about how you can present your
family member and their friends with the information they need. Try to avoid information overload and provide your
target group only with what they need. If you can, bring in an element of interactivity to increase user engagement and
enjoyment.</p>
<p><strong>The result? A user-centred training that gives your audience the skills they need to successfully use your product and
leaves them with a great first impression.</strong></p>
<h3><strong>Three Easy-to-Implement Learning Activities</strong></h3>
<p>Not sure where to start when it comes to developing interactive content? We’ve got you covered. Here are a few ideas of
easy-to-add interactions.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/55a8c779e6361faa0ebad0c345bdc79f990ad40a_exercise1.png?auto=compress,format"></p>
<p>**
Objective:** The learner should be able to understand a high-level process or the data-flow between systems.</p>
<p><strong>Prepare:</strong> Develop a flowchart of the process or the data-flow (e.g. with Powerpoint). Print it out and cut the single
steps into puzzle pieces.</p>
<p><strong>Conduct:</strong> Divide your audience into teams of 2 - 5 participants. Every team gets a set of puzzle pieces and needs to
discuss the order of the workflow. Afterwards you show them your solution.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/367195374d3d3fcc00eb98a18076fe9d79fcf7d4_exercise2.png?auto=compress,format"></p>
<p>**
Objective:** The learner should be able to understand important terms used in your software product and know where to
find them on the screen.</p>
<p><strong>Prepare:</strong> Take a screenshot of your product and develop a slide with terms and descriptions of important screen
elements. Print it out and cut the single steps into puzzle pieces.</p>
<p><strong>Conduct:</strong> Divide your audience into teams of 2 - 5 participants. Every team gets a set of puzzle pieces and needs to
discuss their positions on the screenshot. Afterwards you show them your solution.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/55b019aa0eba972837507e541a5f1493cc942bcd_exercise3.png?auto=compress,format"></p>
<p>**
Objective:** The learner should be able to distinguish between options/make the right decision.</p>
<p><strong>Prepare:</strong> Formulate statements for certain decisions and why they are true. Delete parts from it and leave a blank
line. Print out the worksheets.</p>
<p><strong>Conduct:</strong> Divide your audience into teams of two or let them work alone. Everyone receives a worksheet and they need
to come up with the right solution. Afterwards you show them your solution.</p>
<p>If you want to learn more about the three step process, try out the free online course from
<a href="http://info.novoed.com/lxd-course">NovoEd</a>. We took the course last year and found it very useful. And feel free to
<a href="mailto:wms-training@zalando.de">get in touch</a> with us or to send us your solutions to your exercise if you’d like us to
check how you did. We’d love to see them!</p>How to Make Space for Research & Innovation?2019-02-28T00:00:00+01:002019-02-28T00:00:00+01:00Tobias Leonhardttag:engineering.zalando.com,2019-02-28:/posts/2019/02/make-space-research-innovation.html<p>Redesigning research and product development so that the explorative nature of data science becomes a driver for innovation</p><h3>Redesigning research and product development so that the explorative nature of data science becomes a driver for innovation</h3>
<p>Zalando leverages <a href="https://engineering.zalando.com/posts/2018/02/search-deep-neural-network.html">cutting edge</a> <a href="https://engineering.zalando.com/posts/2018/09/texture-distribution-artistic-expression.html">machine
learning</a> technologies to be
Europe’s leading online platform for fashion and lifestyle. In order to develop these products, data scientists and
product roles have to work together closely.</p>
<p>As the Agile Coaching Team, we went on a journey to discover ways to make our machine learning product development more
effective. As a result, we created tools and environments to help data scientists and product roles work together from
day one; making solutions better, testing ideas faster and simplifying collaboration. Now, people in very different
roles understand each other’s backgrounds and motivations better, reducing conflicts and handover costs. Solutions are
viewed early from multiple perspectives and often the best ideas come from places where you least expect them.</p>
<h3>Discovering the data science impossibility patterns</h3>
<p>To avoid jumping to solutions that would have no effect, we did over 20 interviews looking at multidisciplinary areas
and teams who build data science/machine learning (DS/ML) heavy products. We were looking out for “pains and gains” of
all involved roles, analyzing artefacts and work styles, shadowing their meetings.</p>
<p>We found several patterns for which we developed a solution:</p>
<ul>
<li><strong>“Data science takes a long time”</strong> - was often stated as a dogma by data scientists, product roles and leads. Of
course, there are technical constraints, large amounts of data to fetch and models to be tested and trained. But
much more it is about psychology: as data science takes time, it costs a lot of money, and where so much money goes,
impactful deliveries are expected. Therefore, expectations to deliver results piled high in big batches in their
backlogs and OKRs that “focus 120% on delivery” of course take a long time! That creates the second problem:</li>
<li><strong>“Data scientists do not know which customer problem they are solving”</strong> - as there is a complete focus on delivery
there is no time to do proper customer discovery and problem definition. Therefore backlog refinement, planning and
demoing gets “hard.” That creates the third problem:</li>
<li><strong>“Agile does not work in data science”</strong> - as “Scrum” is often a synonym for Agile, without a clear problem to
solve, Scrum ceremonies do not work. Also, retrospectives hardly help fix the ceremonies as they are not broken.
Instead, Scrum is just the “wrong” tool to discover customer problems.</li>
</ul>
<p>One of our biggest contributions as coaches was that we created time for innovation; a five-day learning journey
combining directed and self-driven workshops, an open space, coaching sessions, community work, peer-to-peer learning
and teach-back-formats in an co-creative way.</p>
<p>On this journey we outlined a three-day workshop around two top priority topics as real cases, each tackled by one
multidisciplinary initiative. The initiative pulled together experts from formerly separate teams, to deliver customer
value end-to-end. In this way, we progressed while learning (did not interrupt work with trainings), using
multidisciplinary collaboration while introducing it. Tackling real cases like “Personal Relevance in Browsing” and
“Transparency About Personalization” was a clear requirement to get the buy-in from the leads.</p>
<p>We had over 40 customer interviews performed by data scientists and engineers, fostering a much deeper understanding of
the customer and problem space they build solutions for. Almost as a side effect, it raised empathy for product roles,
improving collaboration and lowering the cost of handovers and conflicts.</p>
<p>With this journey, we served and enabled a toolchain for customer discovery, problem definition and small, fast and
cheap experiments of solutions “prototypes” (ideas, hypotheses, assumptions). Zalando packaged these tools in a
framework that we call 4D: Discover - Define - Design - Deliver.</p>
<p>One key element of the learning journey was the co-creation of a concept of how we can use “<em>hypothesis”</em> to steer
collaboration. Known from “hypothesis driven software development,” product work, as well as in data science, we
developed a concept that enables working with hypotheses across the whole DS/ML workflow. Starting with “Explorative
Research” managing input like time, effort, scope, as by the nature of science the output is not predictable during
exploration. We set as early finish criteria the capability to formulate a “directional hypothesis.” This enables us to
switch from “Explorative Research” to “Hypothesis Testing Research” gaining more transparency, predictability and
control. Combined with hypothesis-driven product work and engineering, we can use this and science as central elements
to streamline collaboration in DS/ML heavy products.</p>
<p>In a multidisciplinary setup we co-discovered the customer, and created hypotheses around their needs, pains and gains.
We tested these hypotheses early and learned, refined and iterated.</p>
<p>With this journey, we created space for the unknown; the place where innovation is rooted. We equipped our teammates
with the “right” tools and workflows, and sent them on a learning experience across all disciplines, ranks, teams and
units to find truly new land.</p>A Journey On End To End Testing A Microservices Architecture2019-02-21T00:00:00+01:002019-02-21T00:00:00+01:00Burim Shalatag:engineering.zalando.com,2019-02-21:/posts/2019/02/end-to-end-microservices.html<p>In microservices architecture there are different components working together to enable a business capability, therefore testing all of them can get tricky.</p><p>End to end testing is a testing technique used to test the flow of an application through a business transaction. In
microservices architecture there are different components working together to enable a business capability, therefore
testing all of them can get tricky. In this article you can read about our team’s journey:</p>
<ol>
<li>What our system looks like</li>
<li>What do you get from e2e testing?</li>
<li>How to define e2e tests</li>
<li>How to deal with authentication</li>
<li>What testing framework we choose</li>
<li>How to test canvas</li>
<li>How to test async flows</li>
<li>Automation</li>
</ol>
<p><strong>1 ) Our system</strong></p>
<p>In our team we maintain a system that offers business capabilities such as the ability to explore and filter orders. The
high level components that are used to enable that feature are: The front-end application, the <a href="https://samnewman.io/patterns/architectural/bff/">backend for the front
end</a>, various databases (PostgreSQL, Solr and DynamoDB), message
brokers (we use <a href="https://github.com/zalando/nakadi">nakadi</a>), and a bunch of microservices. You can read more details
about our architecture later in this article.</p>
<p><strong>2 ) What to expect from e2e testing</strong></p>
<p>As you can see the architecture is quite complex and things might break on different levels. You might have great unit
testing coverage for each component but if they can’t talk to each other, users expectations of the product are not
met.</p>
<p>You can introduce some integration testing but things might get out of sync if more than one team is responsible for the
same product or even if not all team members share equal ownership of each component (which they should).</p>
<p>You can achieve integration testing from a user perspective by mocking your dependencies (by intercepting requests).
This approach adds complexity in writing tests but on the other hand end to end testing creates complexity to run all
systems in a desired state where you can make your assertions confidentially.</p>
<p>Because we wanted to be able to ensure that whenever we are releasing a new feature we are not breaking anything else,
and because changes in the backend could introduce bugs if we are not on the same page, we decided to introduce end to
end testing in our systems. This way we could spot bugs on staging environments.</p>
<p>“Having end to end tests is also a very nice way to document all the user journeys of your application.”</p>
<p><strong>3) Coverage and tests definition</strong></p>
<p>The first step that we took was defining the scope of the systems that we we’re going to put under testing. It is
strongly advised that when you perform end to end testing you should put all the components under testing, but on the
scale of our company this is not always possible.</p>
<p>In our case we decided to do “domain scoped e2e testing”, because systems out of the domain might already have some e2e
testing and our systems are decoupled from each other. Also it is pretty hard to put systems that are out of the domain
in the desired state you need to perform your tests.</p>
<p>The architecture of the systems that we wanted to test is something like this:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/23dedd00f618579555479be132e249ec966ce4a4_untitled-diagram-1.jpg?auto=compress,format"></p>
<p>After the scope was known we defined the list of features that we wanted to test. Basically that was all the features,
but either way having a list around helps a lot. Once you have it, it is easy to group related features (one way of
grouping features is by business capability) and split them into smaller tasks so the whole team can work on them. This
will also help you prioritise groups and implement the important ones first. You can prioritise by urgency as well so
you have the critical ones covered first. This also helped us identify which of them require some support from other
teams and this way you remove some potential blockers early.</p>
<p><strong>4) Authentication on test environment</strong></p>
<p>One of the problems we had was authentication. We are using an <a href="https://en.wikipedia.org/wiki/Single_sign-on">SSO</a>
server to authenticate users. The only way to login through the SSO is by having an actual email address and a password.
In order to achieve this we needed some real accounts and to have those, there where a few implications. Because of this
we decided to authenticate users using an auto-generated token when running end to end tests and we only implemented
this feature for staging environment, and this way we by passed SSO.</p>
<p><strong>5) Choosing a testing framework that solves main problems</strong></p>
<p>So far so good. We had an idea what we wanted to achieve and pretty quickly we ended up thinking about how to start and
write end to end tests.</p>
<p>It is suggested that when you write and run e2e tests you should be able to have a deterministic state of all the
systems so that you can easily assert whether the action was performed as it should.</p>
<p>Our main problems were:</p>
<ul>
<li>
<p>Have a desired state of the system. This was hard because we didn’t own all the systems.</p>
</li>
<li>
<p>Have a desired state of the application. This was hard because we had a component that uses the html canvas.</p>
</li>
</ul>
<p>The first one was solved by having an API that allows us to insert some data into the system which normally was not an
application use case. The second one was solved by being able to talk to the state management component from e2e
tests.</p>
<p>Now comes the best part, choosing a testing framework. We did some research and we decided to focus on 2 options,
<a href="https://opensource.zalando.com/zalenium/">Zalenium</a> and <a href="https://www.cypress.io/">Cypress</a>. Options like Nightwatch and
puppeteer where considered as well. Both Zalenium and Cypress offered a really nice set of features like video
recording, pretty nice integration with CI and Docker, a clean API and a nice dashboard, but the final winner for us was
Cypress. We choose that because first of all our users mainly use Chrome. Also, Cypress seemed to be much faster than
Zalenium and it managed to solve the problem of flaky tests. Another cool Cypress feature is its dashboard which you
could use to interact with your tests. But the killer feature is that Cypress executes tests on the same environment as
your application.</p>
<p><strong>6) What if you want to test canvas?</strong></p>
<p>Some parts of our application are written in canvas, and interacting with canvas is almost impossible. We decided to
avoid canvas completely and interact with the application runtime. Our application is written in React and because
Cypress runs on the same environment as our application we could dispatch actions and read from state in our tests.</p>
<p><strong>7) Testing asynchronous flows</strong></p>
<p>An interesting problem while testing was how to test application parts which are highly asynchronous in terms of
communication with the backend. We have parts of the application that do short polling. To test this Cypress offers a
dynamic way of configuring
<a href="https://docs.cypress.io/guides/core-concepts/introduction-to-cypress.html#Timeouts">timeouts</a>. For instance you could
do something like:</p>
<p>`cy.get(‘some-selector’, {timeout: 50000})`</p>
<p>This way Cypress checks periodically whether this element is present but it retries until the timeout is done. As a
timeout value we simply used SLO targets which were agreed between teams.</p>
<p><strong>8 ) Automating tests</strong></p>
<p>Automating tests was quite straight forward. In our CI/CD server we spin up 2 containers, one that runs the application
and another one that runs the tests. After the process is done, those containers are destructed.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6a91cf2a8ba66382e30c88ecb59bbcb544fac093_e2e-test-automation.jpg?auto=compress,format"></p>
<p>All this process was quite fun to work on and I learned a lot. Having end to end tests helped us understand how users
could use our system and we automated Quality Assurance, something that was previously a manual process and sometimes
also error prone.</p>Typescript Best Practices2019-02-14T00:00:00+01:002019-02-14T00:00:00+01:00Thi Hong Van Phantag:engineering.zalando.com,2019-02-14:/posts/2019/02/typescript-best-practices.html<p>Learning Typescript as a backender</p><p>Typescript is becoming more and more popular. As with everything, there are good and bad sides. How good it is depends
on your usage on your application. This article will not discuss the good and bad sides of Typescript but some best
practices, which will help for some cases to get the best out of Typescript.</p>
<h3>1. Strict configuration</h3>
<p>Strict configuration should be mandatory and enabled by default, as there is not much value using Typescript without
these settings. Without it, programs are slightly easier to write but you also lose many benefits of static type
checking. The flags that need to be enabled in tsconfig.json are:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"forceConsitentCasingInFileNames"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="err"> </span><span class="nt">"noImplicitReturns"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="err"> </span><span class="nt">"strict"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"noUnusedLocals"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
</code></pre></div>
<p>The most important one is the "strict" flag, which covers four other flags that you can add independently:</p>
<ul>
<li>
<p><code>noImplicitThis</code>: Complains if the type of this isn’t clear.</p>
</li>
<li>
<p><code>noImplicitAny</code>: With this setting, you have to define every single type in your application. This mainly applies to
parameters of functions and methods.</p>
</li>
</ul>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">fn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="nx">worker</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">worker</span><span class="p">.</span><span class="nx">name</span>
</code></pre></div>
<p>If you don’t turn on noImplicit, any worker will implicitly be of any type.</p>
<ul>
<li><code>strictNullChecks</code>: null is not part of any type (other than its own type, null) and must be explicitly mentioned if
it is an acceptable value.</li>
</ul>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="nx">Worker</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">getName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">worker?</span><span class="o">:</span><span class="w"> </span><span class="kt">Worker</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">worker</span><span class="p">.</span><span class="nx">name</span>
</code></pre></div>
<p>This code snippet won’t compile because "worker" is an optional parameter and can be undefined.</p>
<ul>
<li><code>alwaysStrict</code>: Use JavaScript’s strict mode whenever possible.</li>
</ul>
<p>For further compiler options please find them here:</p>
<p><a href="https://www.typescriptlang.org/docs/handbook/compiler-options.html">https://www.typescriptlang.org/docs/handbook/compiler-options.html</a></p>
<h3>2. General types - prefer to use primitive types</h3>
<p>Use the primitive type number, string, boolean instead of String, Boolean, Number. These types refer to non-primitive
boxed objects which are never appropriately used in Javascript.</p>
<h3>3. Type inference</h3>
<p>Instead of explicitly declaring the type, let the compiler automatically infer the type for you. Because it knows better
which type it is:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'David'</span><span class="p">;</span><span class="w"> </span><span class="c1">//name is string</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">age</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">11</span><span class="p">;</span><span class="w"> </span><span class="c1">// age is number</span>
</code></pre></div>
<h3>4. Callback types</h3>
<p>By callback which returns value, can be ignored. Other case using void instead of any is prefered:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">cal</span><span class="p">(</span><span class="nx">x</span><span class="o">:</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">x</span><span class="p">();</span>
<span class="w"> </span><span class="nx">y</span><span class="p">.</span><span class="nx">doAnything</span><span class="p">();</span><span class="w"> </span><span class="c1">// ok but unchecked</span>
<span class="w"> </span><span class="p">}</span>
</code></pre></div>
<p>Using void is safer because it prevents using any value, which could be unchecked:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">cal</span><span class="p">(</span><span class="nx">x</span><span class="o">:</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="ow">void</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">x</span><span class="p">();</span>
<span class="w"> </span><span class="nx">y</span><span class="p">.</span><span class="nx">doAnything</span><span class="p">();</span><span class="w"> </span><span class="c1">// Error</span>
<span class="w"> </span><span class="p">}</span>
</code></pre></div>
<h3>5. Function parameters</h3>
<p>By function with a lot of parameters or parameters with the same type. It makes sense to change the function to take
parameter as an object instead</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">cal</span><span class="p">(</span><span class="nx">x</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">y</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">z</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span>
</code></pre></div>
<p>By such a function, it’s quite easy to call it with the wrong order of parameters. For instance:
<code>cal(x, z, y)</code>
Change the function to take an object:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">cal</span><span class="p">(</span><span class="nx">foo</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="nx">x</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">y</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">z</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">})</span><span class="w"> </span><span class="p">{}</span>
</code></pre></div>
<p>The function call will look like: <code>cal({x, y, z})</code> which makes it easier to spot mistakes and review code.</p>
<h3>6. Overloads - Ordering</h3>
<p>The more specific overloads should be put after the more general overloads. Example:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="nx">Person</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="nx">Worker</span><span class="w"> </span><span class="k">extends</span><span class="w"> </span><span class="nx">Person</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">tun</span><span class="w"> </span><span class="p">(</span><span class="nx">w</span><span class="o">:</span><span class="w"> </span><span class="kt">Worker</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">;</span>
<span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">tun</span><span class="w"> </span><span class="p">(</span><span class="nx">p</span><span class="o">:</span><span class="w"> </span><span class="kt">Person</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">;</span>
<span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">tun</span><span class="w"> </span><span class="p">(</span><span class="nx">a</span><span class="o">:</span><span class="w"> </span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="nx">any</span><span class="p">;</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">w</span><span class="o">:</span><span class="w"> </span><span class="kt">Worker</span><span class="p">;</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">tun</span><span class="w"> </span><span class="p">(</span><span class="nx">w</span><span class="p">);</span><span class="w"> </span><span class="c1">// y: any</span>
</code></pre></div>
<p>Should define the following order:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kr">declare</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">tun</span><span class="w"> </span><span class="p">(</span><span class="nx">a</span><span class="o">:</span><span class="w"> </span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="nx">any</span><span class="p">;</span>
<span class="w"> </span><span class="kr">declare</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">tun</span><span class="w"> </span><span class="p">(</span><span class="nx">p</span><span class="o">:</span><span class="w"> </span><span class="kt">Person</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">;</span>
<span class="w"> </span><span class="kr">declare</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">tun</span><span class="w"> </span><span class="p">(</span><span class="nx">w</span><span class="o">:</span><span class="w"> </span><span class="kt">Worker</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">;</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">w</span><span class="o">:</span><span class="w"> </span><span class="kt">Worker</span><span class="p">;</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">tun</span><span class="w"> </span><span class="p">(</span><span class="nx">w</span><span class="p">);</span><span class="w"> </span><span class="c1">// y: number</span>
</code></pre></div>
<p>Because the first matching overload will be resolved. When the more general one is declared, the less general one will
be hidden.</p>
<p>Overload - use optional parameter</p>
<p>In the following example, you can use optional parameter(s) for only one declared function</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="nx">Business</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cal</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">;</span>
<span class="w"> </span><span class="nx">cal</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">y</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">;</span>
<span class="w"> </span><span class="nx">cal</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">y</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">z</span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="nx">Business</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cal</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">y?</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">z?</span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</code></pre></div>
<p>But it only works for functions which have the same return type.</p>
<p>Overload - use union type</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="nx">Business</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cal</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">;</span>
<span class="w"> </span><span class="nx">cal</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">;</span>
<span class="w"> </span><span class="nx">cal</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</code></pre></div>
<p>Instead you might use union type like this:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="nx">Business</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cal</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">;</span>
<span class="w"> </span><span class="nx">cal</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="kt">number</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</code></pre></div>
<h3>7. Don’t use "bind"</h3>
<p>"bind" returns any. If you take a look into the definition of bind:</p>
<div class="highlight"><pre><span></span><code> bind (thisArg: any, ...anyArray: any[]) : any
</code></pre></div>
<p>This means that by using bind it’ll always return "any" and for bind() there is no type check, it accepts any type:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">add</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">,</span><span class="w"> </span><span class="nx">y</span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">x</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">y</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">curryAdd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">add</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="kc">null</span><span class="p">,</span><span class="w"> </span><span class="mf">111</span><span class="p">);</span>
<span class="w"> </span><span class="nx">curryAdd</span><span class="p">(</span><span class="mf">333</span><span class="p">);</span><span class="w"> </span><span class="c1">// Ok but no type checked</span>
<span class="w"> </span><span class="nx">curryAdd</span><span class="p">(</span><span class="s1">'333'</span><span class="p">)</span><span class="w"> </span><span class="c1">// Allowed because no type check</span>
</code></pre></div>
<p>Better to write it with arrow function:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">curryAdd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="o">:</span><span class="w"> </span><span class="kt">number</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">add</span><span class="p">(</span><span class="mf">111</span><span class="p">,</span><span class="w"> </span><span class="nx">x</span><span class="p">);</span>
<span class="w"> </span><span class="nx">curryAdd</span><span class="p">(</span><span class="mf">333</span><span class="p">)</span><span class="w"> </span><span class="c1">// Ok and type check</span>
<span class="w"> </span><span class="nx">curryAdd</span><span class="p">(</span><span class="s1">'333'</span><span class="p">)</span><span class="w"> </span><span class="c1">// Error</span>
</code></pre></div>
<p>So that with the static type check, the compiler discovers the wrong type and does not silently allow for any type by
binding. But in the new version of Typescript there will be more strictly-typed for "bind" on function types.</p>
<h3>8. Non existing value - prefer to use undefined as null</h3>
<p>When a value on an object property or a function parameter is missing, you can use Typescript optional "?" to say so.</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="nx">Worker</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">;</span>
<span class="w"> </span><span class="nx">address?</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</code></pre></div>
<p>Typescript "?" operator doesn’t handle null value. There are two values: null and undefined, but actually null is a
representation of a missing property. It’s then the same as undefined. That’s why it’s recommended to use undefined for
non existing values and forbid the use of null using TSLint rule:</p>
<p><code>{ "no-null-keyword": true }</code></p>
<p>It’s impossible to use typescript optional "?" to define a variable or function return type as undefined. In order to
try to safely handle a missing 'worker', before using its property, typescript can actually infer the type of a
parameter with type guards and we can actually use this to unwrap our optional worker:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kr">type</span><span class="w"> </span><span class="nx">Optional</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">T</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="kc">undefined</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">getName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">worker?</span><span class="o">:</span><span class="w"> </span><span class="kt">Worker</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="nx">worker</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">worker</span><span class="p">.</span><span class="nx">name</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s1">'no worker'</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">worker</span><span class="o">:</span><span class="w"> </span><span class="kt">Optional</span><span class="p">;</span>
</code></pre></div>
<p>or</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">worker</span><span class="o">:</span><span class="w"> </span><span class="kt">Worker</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="kc">undefined</span><span class="p">;</span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">getName</span><span class="p">(</span><span class="nx">worker</span><span class="p">));</span><span class="w"> </span><span class="c1">// 'no worker'</span>
</code></pre></div>
<p>The above code snippet will print 'no worker' because our worker is not defined but with this abstraction type we’ve
safely handled a missing object use case. So the "Optional" would be a little bit shorter and have the same
result.</p>On the Effectiveness of Online Marketing2019-02-07T00:00:00+01:002019-02-07T00:00:00+01:00Mathias Deschampstag:engineering.zalando.com,2019-02-07:/posts/2019/02/effectiveness-online-marketing.html<p>Measuring the incremental effect of online marketing to optimize advertising investment</p><h3><strong>Measuring the incremental effect of online marketing to optimize advertising investment</strong></h3>
<p>One of the core values at Zalando is to be <em>Customer Obsessed</em>, and this applies to online marketing as well. For many
Zalando customers, their experience starts with a catchy ad. Therefore, in <em>Personalized Marketing</em>, our mission is to
reach customers with a personalized message and suggest products tailored to their needs or wants.</p>
<p>By increasing the relevance of marketing, we aim to increase the number of customers interested in our offer, and, in
turn, generate profitable sales for Zalando. While doing so, we constantly face the “never-out-of-fashion” (to quote our
latest Christmas campaign) question: What does marketing really do?</p>
<h3><strong>Simple question, complex answers</strong></h3>
<p>So the central question is, “What is the true incremental effect of online marketing?”</p>
<ul>
<li>Would a customer have bought this pair of shoes even in the absence of marketing?</li>
<li>How much are we growing Zalando’s customer base thanks to online marketing?</li>
</ul>
<p>The answers to these questions are complex and multi-faceted. Measuring the incrementality and not mistaking a success
for a failure can nearly be impossible, as shown in [1]. Well-established and successful tech-giants are also not done
answering the question. Hohnhol, O’Brien and Tang in [2] tried to find the best way to measure the impact of marketing
beyond its short-term effect. Optimizing for the next few days or weeks may lower the impact of online marketing in the
long run. Google researchers [2] claim “<em>We have long recognized that optimizing for short-term revenue may be
detrimental in the long term if users learn to ignore the ads.</em>”</p>
<p>One of our objectives is to compute a Return-on-Investment (ROI) for every campaign. This metric allows us to allocate
our resources efficiently. We maximize sales generation and new customer acquisition for every campaign, given a ROI
target.</p>
<h3><strong>Performance measurement landscape</strong></h3>
<p>Zalando took up the challenge and aims to measure the performance of online marketing at scale. The ROI of our marketing
activities is computed through a pipeline composed of several products. While each of them would deserve a dedicated
blog post, this article aims to simply outline their purposes and main challenges.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/523fcf067cc35347ab7cc4043c4f8a941428fa5b_blog_post_1.png?auto=compress,format"></p>
<p><em>Figure 1: Product Pipeline Overview</em></p>
<p>We have built a flexible and scalable data infrastructure based on S3, Hive and Spark on AWS. Spark’s parallelizing
capabilities in combination with AWS EC2 ensure that we can meet our strict SLAs even with a continuously growing amount
of customers and traffic. In the future we plan to automatically scale the size of the cluster depending on the size of
the input data. We decoupled our sub-products and use Hive tables as interfaces between them. This allows for more
autonomy in regards to the product development and generally lets us move faster.</p>
<p>At the start of our pipeline, we source all marketing clicks, sales and conversions (e.g. customer acquisitions, app
installs) from Zalando’s Data Lake and DWH to build a structured and unified event data layer. This is one of our
greatest challenges since the data is very diverse in regards to quality, update frequency, syntax and semantic.
Therefore we are making great efforts to move from client-side towards server-side ad tracking and closely monitor our
data through data quality dashboards. After updating the event data layer, we use our internal cross-device graph to
create the customers’ journeys across all their devices, from first ad interaction to conversion.</p>
<p>Next, with our attribution model, we determine how much incremental value was created by every ad click. The
particularity of the attribution problem is its unknown ground truth. As we cannot interview every single customer, we
will never exactly know why a given customer bought their latest jacket on Zalando. We built a framework that allows us
to iterate quickly and test many different attribution models. We are using SQL for simple transformations, while Scala
is our choice for more complex computations. This way we are able to explore far beyond simple models (e.g. Last touch)
and leverage our huge dataset with more complex models.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/eab2f8919df7f0cc3ac78c9e057b1172a9ea31e9_blog_post_2.png?auto=compress,format"></p>
<p><em>Figure 2: Attribution Illustration</em></p>
<p>As reality is unknown, we are running many randomized experiments with the aim of causally inferring the incremental
impact of each marketing campaign. We use geo-based [3] and audience-based test methodologies to achieve this. In the
former, marketing activities are turned off in certain regions and we quantify the impact on revenue, profit and
customer acquisition. The latter splits a given customer base into two groups, giving one group a specific treatment and
measuring the difference in behaviour. We use the results to calibrate our attribution and ensure it reflects reality.</p>
<p>Continuously running such a large number of parallel experiments is a great challenge. The test results need to reflect
accurately the incrementality of marketing campaigns even though it can be highly affected by seasonality or
ever-changing consumer behaviour. Hence, we are currently building an <em>experimentation platform</em> that sets experiments
up, and analyze the results in an automated way.</p>
<h3><strong>Is That All?</strong></h3>
<p>The next logical step is bidding based on the ROI. We invest a lot resources to predict the performance of marketing.
Every day, we estimate the incremental profit marketing campaigns will generate in the coming weeks. Each impression can
lead to a click, each click can lead to a conversion. Every marketing campaign is a different time series, with its own
behavior and characteristics. The magnitude between time series may vary by several orders, and while most of them are
quite unique, it is possible to infer some similarities (embeddings is a solution). We are experimenting with state of
the art machine learning models such as DeepAR [4]. All of this makes it an extremely complicated and deeply
interesting problem to model.</p>
<p>The measurement of incrementality opens up many interesting topics that we also tackle in the Personalized Marketing
Team, such as generating the best ads or setting the best target and budget.</p>
<p>Come join us on our journey to build a best-in-class data driven marketing. You can apply for one of our positions as
<a href="https://jobs.zalando.com/jobs/848186-backend-engineer-personalised-marketing/">software engineer</a>, <a href="https://jobs.zalando.com/jobs/833928-big-data-engineer-personalized-marketing/">data
engineer</a> or <a href="https://jobs.zalando.com/jobs/1252140-senior-data-scientist-personalized-marketing/">data
scientist</a>.</p>
<p><em>Thanks to Pablo Croppi, Carolyn Hodgson, Dirk Petzoldt, Dominik Rief for reviewing this article, and to Yanwolf
Hoffmann for design help.</em></p>
<p><strong>REFERENCES</strong></p>
<p>[1] Randall A. Lewis and Justin M. Rao. On the Near Impossibility of Measuring the Returns to Advertising, 2013
[2] H. Hohnhol, D. O’Brien, D. Tang. Focusing on the Long-term: It’s Good for Users and Business
[3] J. Vaver and J. Koehler. Measuring Ad Effectiveness Using Geo Experiments, 2011
[4] V. Flunkert, D. Salinas, J. Gasthaus. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks</p>Open Source: January Updates - Celebrate 'I Love Free Software Day2019-02-07T00:00:00+01:002019-02-07T00:00:00+01:00Hong Phuc Dangtag:engineering.zalando.com,2019-02-07:/posts/2019/02/oss-january-updates.html<p>This is a recap of open source activities and development at Zalando in the month of January.</p><h2>Project Highlights</h2>
<p><a href="https://twitter.com/LionelMontrieux">Lionel Montrieux</a> brought <a href="https://nakadi.io">Nakadi</a> to <a href="https://fosdem.org/2019/schedule/event/nakadi">FOSDEM 2019</a>. This is one of the largest open source projects released by Zalando. Nakadi is a distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues. It is used in production by over a hundred teams daily and handles over 100 TB of data every day. Try out <a href="https://github.com/zalando/nakadi">Nakadi</a>!</p>
<iframe width="600" height="432" src="https://www.youtube.com/embed/eTQhGMc2EWg" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<h2>New projects</h2>
<p>Two new projects entered Zalando-Incubator this month:</p>
<ul>
<li>
<p><a href="https://github.com/zalando-incubator/Transformer">Transformer</a> is a tool to transform/convert web browser sessions (HAR files) into Locust load testing scenarios (locustfile). This tool can be used when users have HAR files (containing recordings of interactions with your website) that they then want to replay in load tests using Locust.</p>
</li>
<li>
<p><a href="https://github.com/zalando-incubator/autoscaler">Autoscaler</a> is a component that automatically adjusts the size of a Kubernetes Cluster so that all pods have a place to run and there are no unneeded nodes. This is a fork of <a href="https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler">Kubernetes Autoscaler</a>. Our goal is to test and deploy the project in a large scale environment at Zalando, then propose upstream contributions, so the whole community can benefit from our experiments.</p>
</li>
</ul>
<p>In addition to the project highlights, we recently released a <a href="https://opensource.zalando.com/docs/resources/harassment-policy">policy</a> to handle harassment in open source. At Zalando we encourage our employees to take active part in open source development. And we as a company commit to provide our full support to developers who engage in open source on Zalandos behalf. Find out more details <a href="https://engineering.zalando.com/posts/2019/02/open-source-harassment-policy.html">here</a>.</p>
<h2>Celebrate <a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/258359262">I Love Free Software Day</a></h2>
<p>Join Zalando this Valentine’s Day to celebrate our love for Free Software. This is a chance for us to show our appreciation to people who contribute to free and open source community. We are delighted to welcome our special guest speakers who devote several years to grow and foster FOSS movement around the world.</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Lennart_Poettering">Lennart Poettering</a> - Author of <a href="https://en.wikipedia.org/wiki/Systemd">systemd</a></li>
<li><a href="https://www.linkedin.com/in/einhverfr">Chris Travers</a> - Contributor of <a href="https://www.postgresql.org/">PostgreSQL</a> since 1999</li>
<li><a href="https://www.linkedin.com/in/miconda">Daniel-Constantin Mierla</a> - Core developer of <a href="https://www.kamailio.org">Kamailio</a></li>
<li><a href="http://linkedin.com/in/thilo-borgmann-902971b4">Thilo Borgmann</a> - Developer of <a href="https://ffmpeg.org">FFMPEG</a></li>
<li><a href="https://memcpy.io">Robert Foss</a> - Contributor of the <a href="https://en.wikipedia.org/wiki/Linux_kernel">Linux Kernel</a></li>
<li><a href="https://github.com/niccokunzmann">Nicco Kunzmann</a> - Mentor of <a href="https://fossasia.org">FOSSASIA</a></li>
</ul>
<p><strong>I Love Free Software Day was first introduced by Free Software Foundation Europe. Find more details <a href="https://fsfe.org/campaigns/ilovefs/index.en.html">here</a></strong></p>
<p><img alt="ilfs" src="https://engineering.zalando.com/posts/2019/02/ilfs.png"></p>
<h1>More reading</h1>
<ul>
<li>
<p><a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/258359262">I Love Free Software Day</a></p>
</li>
<li>
<p><a href="https://opensource.zalando.com/docs/reports/2019/january-2019/">Zalando Open Source: 2018 Year End Report</a></p>
</li>
<li>
<p><a href="https://opensource.zalando.com/docs">Zalando Open Source Documentation</a></p>
</li>
<li>
<p><a href="https://opensource.zalando.com/tech-radar/">The Tech Radar: Zalando selection of technology choices</a></p>
</li>
</ul>Defining a company policy to handle harassment in open source2019-02-04T00:00:00+01:002019-02-04T00:00:00+01:00Per Plougtag:engineering.zalando.com,2019-02-04:/posts/2019/02/open-source-harassment-policy.html<p>We're sharing our guidance on Open Source harassment</p><h3>Open Source Participation</h3>
<p>When you as a Zalando employee engage in open source communities as part of your work, you will interact with the wider open source communities outside Zalando - this is generally a good experience and collaborating with many different types of developers with different backgrounds is generally a positive input to your personal development.</p>
<p>However, there is also a small risk of encountering negative or even abusive behavior from community members when you act as an open source contributor or maintainer.</p>
<p>As an employer encouraging open source participation, we have decided to devise a <a href="https://opensource.zalando.com/docs/resources/harassment-policy/">policy</a> for how we as a company can support our employees in case of harassment.</p>
<h3>Statistics on harassment in open source</h3>
<p>An extensive <a href="https://opensourcesurvey.org/2017/">survey by Github</a> in 2017 showed that nearly one out of five have experienced negative behavior personally and 50% have witnessed it between other people - fortunately outright harassment is much less likely with 14% witnessing it and 3% experiencing it personally.</p>
<p>Witnessing and experiencing behavior such as name-calling, stereotyping and outright harassment can have a big negative impact on peoples desire to be part of open source communities, especially for women or ethnic or sexual minorities who are already underrepresented in the open source world (3% female, 16% ethnic minority, 7% sexual minority).</p>
<p>So, the open source community see an underrepresentation of minorities and those who do participate have a risk of encountering hostile behavior. Is the risk of harassment big? No - generally speaking the risk is low, but the impact of potential harassment is very real.</p>
<p>As an industry we must prioritize the topic of diversity in open source, abusive behavior should not be tolerated, and in the case of it happening, companies should be ready to support their employees in dealing with it.</p>
<h3>Supporting employee participation</h3>
<p>As an employer, Zalando encourages its employees to take active part in open source development. Developers are granted time to maintain the <a href="https://github.com/zalando-incubator">projects we release</a> and to contribute upstream to <a href="https://opensource.zalando.com/tech-radar/">projects which are of strategic importance</a> to Zalando. We as a company therefore have an obligation to ensure that we support employees who engage in open source on Zalandos behalf.</p>
<p>Support isn't just about granting time and resources for open source development, support is also understanding the potential risk employees face doing open source and to be ready to offer legal and HR guidance and understanding to employees in the event of harassment.</p>
<p>It is with this mindset that we have put together <a href="https://opensource.zalando.com/docs/resources/harassment-policy/">a formal policy for dealing with harassment in open source</a> for our maintainers and contributors, a policy which employees can use to determine where inside Zalando they can find help to deal with such behavior and also to clarify what they can expect from Legal and HR.</p>
<h3>The policy</h3>
<p>We have divided the policy into 2 parts: proactive and reactive measures.</p>
<p>First of all: <strong>proactively</strong> we recommend that employees only engage with projects who have a code of conduct in place, we also enforce that all new projects released by Zalando have a code of conduct in place as part of the <a href="https://github.com/zalando-incubator/new-project">boilerplate files</a> we provide. As part of our internal mandatory training for open source maintainers and during on-boarding of new employees we also make our expectations very clear: in case of behavior in breach of the code of conduct, it is expected you enforce the code or ask the open source team for help on how to act.</p>
<p>Secondly, if an employee do need guidance we <strong>reactively</strong> provide the following options:</p>
<ul>
<li>P&O (Zalando HR) can guide you on how to react to abusive behavior and help you determine if legal action is required. Talk to your lead if you need assistance, or reach out directly to the open source team.</li>
<li>The open source team will assist you in reporting the abuse to the responsible platform owner (such as Github)</li>
<li>Zalando legal will provide legal guidance in case such is required</li>
<li>If it is established there is a need to report the incident to law enforcement, P&O (Zalando HR) and Zalando legal will assist you in collecting evidence and file a report</li>
</ul>
<h3>A small step forward</h3>
<p>While policies will never solve the root cause, it is a step in the right direction. We believe in equal opportunity and access to the world of open source. We believe open source is important, not just to tech companies, but to society as a whole and we must all do what we can to ensure that the communities building the software that we all rely on is inclusive and safe for everyone to be part of.</p>The Product Playbook2019-01-31T00:00:00+01:002019-01-31T00:00:00+01:00Enzo Avigotag:engineering.zalando.com,2019-01-31:/posts/2019/01/product-playbook.html<p>Shared language and visualizing to deliver great products</p><h3>Shared language and visualizing to deliver great products</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f7fbfb0f4570eb7cee215c7ad702db9826993d01_ashton-clark-424090-unsplash.jpg?auto=compress,format"></p>
<p>*Football is an environment with changing variables that players and coaches need to react to. Teams attempt to move the
ball down the field by running or passing in a set number of plays.</p>
<p>*If you’ve ever watched a football game you will see coaches holding a subset of plays from the coach’s playbook they
think may work for the game they are playing. This lets them make decisions in the moment. A coach may have 1,000 plays
in the playbook, but will only use a fraction in a game situation. And each team may have a different playbook.</p>
<p>At Zalando, we came across the idea of creating playbooks for building products in a <a href="https://medium.com/great-products-dont-happen-by-accident/great-products-dont-happen-by-accident-f46323d8ad94">great
article</a>
by <a href="https://medium.com/u/ab13b5676c1a">Jon Lax</a>. We also spotted the nice application of it at
<a href="https://productcoalition.com/the-typeform-product-playbook-49e1a5cc3a08">Typeform</a>.</p>
<p><strong>What is “a play”?
</strong>*A play is an agreed upon set of actions the team takes in a given situation. When the coach says “let’s run Statue of
Liberty Buck Sweep” everyone knows what that means and knows what they need to do to execute that play.
A playbook is the collected knowledge of a coach or team on “HOW they do what they do”.</p>
<p>*It inspired us to make a playbook of how we build products — how we go from identifying value opportunities, delivering
solutions, iterating or dropping on them.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4d6d28a339725e35a6cbe4dc77b564726958acd2_1_txx_iiegwfjhpa5b0a4clg.png?auto=compress,format"></p>
<p>*Anything you do that has some repeatable action is a play.</p>
<p>*Visualizing our product development like this helps us highlight the emphasis we put on keeping things simple. It helps
to demystify and push people involved in product related tasks, and lead to continuous improvement.</p>
<p>Equal importance, it forces us to name our play which helped to create shared language:</p>
<ul>
<li>We provide some definition of the name as much as we can to make sure everyone understands what it means. A play
called MVP could have a lot of meanings.</li>
<li>Clarify the situations when to run a play. While most plays could be run at any point in a product’s life cycle most
plays are most effective in a certain situation; big bets require extra scoping efforts , quick wins go straight to
design kickoff , spikes are recurrent and ensure continuous discovery , running loads of AB tests in pre
product-market situation may not be best for us, etc.</li>
<li>Also, why is this play the right one to deliver value to the team?</li>
</ul>
<p>Let’s take a step back, and go through the playbook pillars:</p>
<ul>
<li>The 4Ds</li>
<li>Getting real</li>
<li>The 50% rule</li>
<li>Learning loops</li>
</ul>
<p><strong>🖖 The 4Ds
</strong>Maybe you believe the Customer Journey map method is best, the Double Diamond, the Hooked model, the six-weeks cycles
or the Lean Canvas.</p>
<p>It doesn’t matter.</p>
<p>Plays can be grouped anyway you want. Simply organize your plays to map into the each of the phases.</p>
<p>At Zalando we commonly use the <a href="https://medium.com/zalando-design/how-product-designers-and-data-scientists-can-work-together-18c568baeaf7">4Ds
framework</a>:
<strong>Discover, Define, Design, Deliver.</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/632f9b8e42d9c81688b07c5b11d1206b7e7a69ff_1_wsydoswaib6y1ynb6ouk2g.png?auto=compress,format"></p>
<p><em>Every team member contributes to “Discover” which leads to richer ideas and involvement.</em></p>
<p>The simplicity of the framework helps us ship early to the customer , the only validation we ultimately seek. It
ensures we develop great customer experience while ensuring business impact.</p>
<p>**📱 Getting real</p>
<p><em><em>This mantra is inspired from the essay, <a href="https://basecamp.com/books/Getting%20Real.pdf">Getting Real</a> of </em>37 Signals</em>.</p>
<p>*Getting Real is about skipping all the stuff that represents real (charts, graphs, boxes, arrows, schematics,
wireframes, etc.) and actually building the real thing.</p>
<p>Getting real is less. Less mass, less software, less features, less paperwork, less of everything that’s not
essential.</p>
<p>Getting Real is staying small and being agile.</p>
<p><strong>*❌ Things we don’t do:</strong></p>
<ul>
<li>Timelines that take months, version numbers roadmaps that predict the perfect future</li>
<li>Functional specs scalability debates</li>
<li>Endless preference options</li>
<li>Proprietary data formats</li>
<li>The “need” to hire dozens of employees</li>
<li>Ask users with hypothetical questions, instead we ask to complete tasks</li>
</ul>
<p>**
✅ Things we do:**</p>
<ul>
<li>Less meetings, less abstractions and less promises</li>
<li>To launch on time and on budget, we avoid throwing more time or money at a problem, instead we scale back the scope</li>
<li>It’s better to make half a product than a half-assed product</li>
<li>“Just-in-time” thinking</li>
<li>Multi-tasking team members</li>
<li>An open culture that makes it easy to admit mistakes</li>
<li>Basic documentation which makes clear what we do and includes people</li>
<li>Dead simple prototyping</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite%2F6bc3537a-1b23-4184-866d-7d133b6b4a53_1_pd7j5_vzgujhboxz9j7caa.gif?auto=compress,format"></p>
<p>*🎨Prototypes often start on a notebook</p>
<p>*Overall, less mass lets you change direction quickly. You can react and evolve. You can focus on the good ideas and
drop the bad ones.</p>
<p><strong>🌗 The 50% rule
</strong>We believe that for any business to succeed, you’ll need to achieve 3 things:</p>
<ul>
<li>a viable product/service</li>
<li>a large enough market</li>
<li><em>and</em> a way to reach to your customers</li>
</ul>
<p>As described by <a href="https://medium.com/u/e1a76f1f570">Gabriel Weinberg</a> in
<a href="https://www.amazon.com/Traction-Startup-Achieve-Explosive-Customer/dp/1591848369">Traction</a>:
*Startups often spend most their resources developing their products;
By the time they realize they need to get more customers and try to ramp up their sales+marketing efforts, they’ve run
out of money.</p>
<p>*This is why from the onset, we spent 50% of our time on product development and 50% on traction development. We can’t
predict which traction channels will work; the only way is to test them.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/67bcf7ad6b7178c0d54f08891c2991f6fdf487d0_1_n722l-zfnh_ornk3ywxasq.png?auto=compress,format"></p>
<p>To keep on the 50% rule, we share ‘simple’ documentation about what we are planning. This ensures alignment and makes it
clear what we are trying to achieve.</p>
<p><strong>💫 Learnings loops
</strong>At the core of our product DNA is collecting and sharing learnings. The 4Ds cycles foster learnings inside the team
and reinforce our plays.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/cf3f1fa01e1332830498c73d6eaf1c284ae037a8_1_soaxmnzkafifdzxkym7sig.png?auto=compress,format"></p>
<p><em>4Ds cycles are our “learning loops”</em></p>
<p>To ensure learnings circulate, we have three (internally) public initiatives:</p>
<ul>
<li>A team newsletter to highlight the achievements , but also the failures , from the past six weeks</li>
<li>A team website where are documented the features and AB Tests hypotheses and results</li>
<li>Real-time funnels to monitor the impact of each change , being traction or product</li>
</ul>
<p>Learning loops are what keep us ahead of competitors. They enable us to iterate or pivot based not only on instinct but
also on data. They ensure we ultimately ship in the right direction 🚢</p>
<p><strong>Conclusion</strong></p>
<ul>
<li>The product playbook is a powerful way of explaining our underlying thinking.</li>
<li>Thinking in terms of a play book provides a shared language and visualizes how we do things. It crystalizes a common
understanding of how we build products.</li>
<li>It allows us to embrace continual improvement: we remove old plays and continuously add new ones through learnings
loops.</li>
<li>It ensures our team dynamic: we look in the same direction and move forward against clear business goals.</li>
<li>To an extent, it helped to build our relationships.</li>
</ul>
<p><em>If you want to build that culture too, we’re looking for talented people to join Distributed Commerce! For details,</em>
<a href="https://jobs.zalando.com/de/jobs/?gh_src=4n3gxh1&search=%22distributed+commerce%22">check out our job page</a> ❤️</p>
<p><strong>The product playbook
<a href="https://docs.google.com/spreadsheets/d/12zHnlqWzhTnxZJ8R1T52MqqjaoRx_qkKcGjpBd9T8zo/edit?usp=sharing">template</a>. It’s
yours to make a copy and adapt it 👌</strong></p>Nakadi Goes to FOSDEM2019-01-29T00:00:00+01:002019-01-29T00:00:00+01:00Lionel Montrieuxtag:engineering.zalando.com,2019-01-29:/posts/2019/01/nakadi-goes-to-fosdem.html<p>Meet us at FOSDEM to get to know more about Nakadi</p><p><a href="https://nakadi.io"><strong>Nakadi</strong></a> is Zalando’s open source event streaming platform. It is based on Apache Kafka. It started as a simple HTTP proxy, providing a REST interface to publish and consume JSON messages. It quickly evolved, with the addition of schema validation and evolution, self-service authorization, a subscription API for easy consumption, deep integration with Zalando’s infrastructure, a SQL-over-streams engine, and much more. It has now become a real platform for event streaming, and plays an essential role in Zalando’s architecture.</p>
<p>Nakadi is meant to be simple to use, and self-service. With <a href="https://github.com/zalando-nakadi/nakadi-ui"><strong>Nakadi-UI</strong></a>, our open source web interface, users can create and manage resources such as event types, subscriptions, and SQL queries, by themselves. They can even inspect the contents of their event types, publish events, and get access to monitoring and alerting tools so they can keep on top of the health of their streams. Nakadi-UI is written in Elm, and it is probably one of the largest open source projects in that language.</p>
<p><img alt="Nakadi UI" src="https://engineering.zalando.com/posts/2019/01/nakadi-ui.png#center"></p>
<figcaption style="text-align:center">Fig. 1: A view of an event type and its schema in Nakadi-UI</figcaption>
<p><br/>
The Nakadi community has come up with a collection of great client libraries for the most used languages inside Zalando - Java, Scala, Python, and Golang. You can find Nakadi-UI, and all the community projects on <a href="https://github.com/zalando-nakadi">https://github.com/zalando-nakadi</a>. And Nakadi, its code and documentation, on <a href="https://nakadi.io">https://nakadi.io</a>. Head to the Nakadi-UI repository to get started right away with Docker-compose: you’ll get a local deployment of Nakadi and nakadi-UI with all their dependencies to play with.</p>
<p>At Zalando, we have been running Nakadi in production for over 3 years. These days, it handles over 100 TB of data every day. It is used by over a hundred teams daily, yet it is entirely maintained by a small team of 8 engineers. Not only do we develop and maintain Nakadi, but we also operate Zalando’s internal deployments, take care of operations, user support, 24x7 incident response, documentation, and much more.</p>
<p>This Sunday, at <a href="https://fosdem.org"><strong>FOSDEM</strong></a>, we will show how we manage to do all this - and still find time to write code. Join us at <a href="https://fosdem.org/2019/schedule/event/nakadi/">12:15</a> in the <a href="https://fosdem.org/2019/schedule/track/hpc,_big_data_and_data_science/">HPC, Big Data, and Data Science devroom</a> for our talk - or grab us in the hallway track during the weekend!</p>A Day in the Life of a Frontend Engineer at Zalando2019-01-24T00:00:00+01:002019-01-24T00:00:00+01:00Cristiano Correiatag:engineering.zalando.com,2019-01-24:/posts/2019/01/frontend-engineer-zalando.html<p>How we work at Zalando</p><p>You’ve probably never had the same day twice at your current job. At Zalando it’s no different. Here, it not only
depends on the product you're currently working on but also on your peers.</p>
<p>Actually, what's expected from a frontend engineer can vary according to a company philosophy or your own previous
experience: usually a frontend engineer can be seen as a Swiss army knife when in reality at Zalando, for example, we
see them as masters of trades.</p>
<p>If you're considering joining us as a frontend engineer, beware that a day in the life of a frontend engineer for us
usually means:</p>
<p><strong>…BEING A PROBLEM SOLVER / USING MULTIPLE HATS</strong>
First and foremost, you're going to be asked on a daily basis to come up with solutions. Topics change quite often since
a lot is asked: from defining data models and structuring APIs together with the Backend Engineers to challenging the
user interfaces defined by the design team. A day of a frontend can be a bit overwhelming at first, but there’s nothing
to be done but taking a deep breath and getting your hands dirty.</p>
<p>Your focus is always going to be the user, which means that you'll have users on your mind every day. It's expected from
a frontend engineer to have good UX notions and to always deliver the best experience to our customers.</p>
<p><strong>…SPEAKING JAVASCRIPT ALL DAY LONG</strong>
Discussing Javascript is basically what we do constantly.</p>
<p>Nobody knows Javascript from A to Z but since it is a technology that changes at the speed of light, being on top of it
is quite important and it is quite healthy to share knowledge amongst colleagues.</p>
<p>We heavily rely on frameworks and libraries at Zalando (mainly React but you can always encounter other things like
Angular, Vue or Polymer... (if you're curious about our stack, check out our <a href="https://opensource.zalando.com/tech-radar/">Tech
Radar</a>) and we do use other technologies for some explicit typing (like
Typescript or Flow). However, what we value most is :</p>
<ul>
<li>the knowledge of the language itself;</li>
<li>its core functionality;</li>
<li>its asynchronous/synchronous nature;</li>
<li>its browser APIs.</li>
</ul>
<p>We also take some time to consider what's best for the products: "Do I really need a library for this or do I know a
better solution?", "Is this piece of code performant?"- These questions we ask ourselves everyday.</p>
<p>Not being afraid of trying new technologies and new ways of implementing the same thing is also part of the job: It
takes a lot of experience to understand that a Senior Engineer is not the one that writes the most complex code but the
one that always writes the simplest instead!</p>
<p><strong>…WRITING THE BEST TEMPLATES</strong>
On a day to day basis, we know that a line of CSS can save quite a few of Javascript, so we take our templating very
seriously.</p>
<p>We take the time to make sure our HTML makes sense semantically-wise, as well as ensuring it is accessible to all of our
users and it’s clear to any colleague who may lay eyes upon it. We work in component based projects, so styling might
get overlooked or might not even be needed but we do care about clean and performing code, so we see CSS as a vital part
in order to achieve it.</p>
<p><strong>…ALWAYS BEING A STEP AHEAD</strong>
Being a frontend engineer is very demanding learning-wise: so we allocate a bit of our time to always keep pushing
forward and knowing what's coming.</p>
<p><strong>…HAVING QUALITY AND PERFORMANCE AS TOP PRIORITIES</strong>
Browsers are tricky. We know that we have to allocate some time in order to make sure everything is working correctly
and as intended… Debugging comes as a second nature: sometimes it's just a GraphQL Mutation or a PUT request that didn't
work but it's part of our job to know where to look for the mistake and figure out a proper solution.</p>
<p>Non-Functional Requirements are also there to be defended and challenged and we constantly need to figure out the most
efficient ways to achieve them.</p>
<p>Since we use open source technologies, we need to evaluate the risk of encountering vulnerabilities in our products
constantly. Every action our code allows (especially when communicating with backend services) is a potential security
problem, so we do what we can to prevent something like XSS or DOM manipulation from happening. As mentioned before, we
always have the best interests of our customers in mind, and that includes their data and assets.</p>
<p>Other than that, another part of our day-to-day is dedicated to preventing something from going wrong. Unit testing is
part of our definition of done. We are fans of UI testing/E2E tests and we have no problem in testing and verifying each
other's work.</p>
<p><strong>...NOT BEING AFRAID TO GO OPERATIONAL</strong>
As DevOps teams, we perform quite a lot of operational work (even if it's just taking care of a deployment). We don't
exactly expect all of us to be AWS or Kubernetes experts but do our best to train each other on all we need, so that we
can be the more independent.</p>
<p>We set up projects, from the simplest one to a complete and robust one, so pretty much all of us frontenders at Zalando
are familiar with tools like Webpack or Babel.</p>
<p>We also value Continuous Integration and Continuous Delivery and that's always a concern on a daily basis.</p>
<p><strong>…HAVING A CONSTANT AGILE MINDSET</strong>
Having worked with any Agile Methodology before is pretty important. It does not matter whether it was Scrum, Kanban or
a Tribe Model. What is important is that we work as a team and we place the team’s needs above our egos.</p>
<p>We work on scalable projects with lots of dependencies and external parties, so it's quite important to adapt the ways
of working to deliver the most value possible. We do it the Agile way.</p>
<p><strong>…BEING A MENTOR</strong>
Knowledge sharing. Someone next to you is always eager to learn more and another part of our day is dedicated to share
whatever we know is worth sharing.</p>
<p><strong>…BEING A COMMUNITY CONTRIBUTOR</strong>
Knowledge sharing: community version. However we can contribute, we are encouraged to do so. Doesn't matter if it is for
Open Source projects, speaking at conferences or organizing meet-ups, we do our best to help the surrounding
community.</p>
<p><strong>…BEING INTERNATIONAL IS KIND OF MANDATORY</strong>
We have over 100 nationalities at Zalando, so English is a big part of our day and that is irrespective of which office
or country we work in. Embracing different cultures is one of the most rewarding aspects of having such a diverse team
and it is a lot of fun, just like the dogs running on some floors to the Nerf gun wars.</p>
<p><strong>So… A DAY IN THE LIFE OF A FRONTEND ENGINEER FOR US MEANS:</strong>
Arriving to the office with an open mind; knowing that not everything is always going to be perfect and easy, but
striving for continuous improvement and getting better at what we do.</p>
<p>We are a united team that enjoys the journey of being on the same boat and solving problems together. On an individual
level, always being ready to share knowledge, to learn from others, as well as being responsible and accountable for the
amazing work that you can do, are some of the most important qualities that we hope new potential team members would
have.</p>
<p>Join Cristiano, as a Frontend Engineer in our Lisbon Tech Hub: <a href="https://zln.do/2FFE5v8">https://zln.do/2FFE5v8</a></p>The Magic Coaching Wand2019-01-10T00:00:00+01:002019-01-10T00:00:00+01:00Tobias Leonhardttag:engineering.zalando.com,2019-01-10:/posts/2019/01/magic-coaching-wand.html<p>How the Zalando Personalization Unit improved with a diagnostic</p><h3><strong>How the Zalando Personalization Unit improved with a diagnostic</strong></h3>
<p>In our coaching work, doing diagnostics can already create huge improvements without a lot of action on our part.
Working at scale, Zalando has around 150 tech teams, this helps create an impact on the whole organisation.</p>
<p>In this blog post, I will share the story of a diagnostic done in a unit of seven machine learning and data scientist
teams (ML/DS) in Berlin, Helsinki and Dublin. Key points include:</p>
<ul>
<li>a diagnostic is an improvement on its own: what gets measured gets improved, be it that the unit becomes aware of
blind spots or they get confirmation from an expert.</li>
<li>you can initiate improvements at scale if you do the diagnostic co-creatively and openly, having everybody in the
unit agree on the overall situation using tools like “Lean Change Canvas.”</li>
<li>systemic problems are visible; affecting local teams and roles, but can not be solved there. They need to be tackled
at a systemic level.</li>
</ul>
<p>What follows is a personal experience, and how sometimes solutions are not obvious and have to be found by following a
path that only emerges as you walk it.</p>
<h3>The universal key that did not unlock the door</h3>
<p>On the request of the Dedicated Owner (DO) of the unit we did interviews to get multiple perspectives on the “problem”.
We talked to the DO, the Leads, Senior Engineers, Data Scientists, Product Managers, Producers, UX… This is our
universal key of request clarification to differentiate symptoms from root causes and to find the systemic pattern.</p>
<p>Normally request clarifications unveil the path to a solution. This time it failed.</p>
<p>We talked to motivated, honest and open leads that really want to make a difference and support and grow their collages.
We met a DO that gives freedom and support to his leads and teams. We found really passionate Data Scientists, Engineers
and Product Managers. All of them were aware of the problems they collectively faced and what was causing them.</p>
<p>Why was an empowered group like this – with willingness and skill – not able to solve their own problems?</p>
<h3>“You can not understand a system until you try to change it."</h3>
<p>We started interacting with the system. Which of our solutions will it adapt and which ones will it refuse? Which
problem will the system allow to be solved? We tried a one day leadership training, three day agile workshop with two
teams, a session about agile at scale, story splitting, and a few more topics. We got a lot of good feedback for this
work. It caused a lot of local optimizations and improvements.</p>
<p>But listening to the people felt like the “problems” stayed the same. The mood of the people hardly changed. Are we as
humans so used to having problems that we refuse to let them go? That we actually miss them when they are gone?</p>
<h3>The big picture made small</h3>
<p>What are we not seeing? We tried a new approach. Based on “ <a href="https://leanpub.com/leanchangemethod">Jeff Anderson's Lean
Change</a>,” we created a simple canvas: Urgencies - Vision - Next Steps. Rooting it
in a more complex framework would allow us to scale the canvas later into a more powerful collaborative change board.</p>
<p>This time we asked the entire team to fill the canvases within a coaching session. The outcome was, once again,
unbelievable. All teams had a great vision of how they want to work. They know precisely which next steps they can take
to improve.</p>
<h3>The elephant in the room</h3>
<p>We have great teams. They have leaders that asks for and supports self-driven improvements.</p>
<p>Why don’t the teams “just do it”?</p>
<p>We asked the Dedicated Owner for a meeting. We prepared a room and then asked the Dedicated Owner to go pick the canvas
(urgencies, vision, next steps) from the open team spaces and pin them into this room. It was a physical and transparent
act of the Dedicated Owner to take care of the problems to start the meeting.</p>
<h3>The Breakthrough</h3>
<p>The magic happened in the meeting when we had the canvases from all seven teams on one wall in the session.</p>
<p>The Dedicated Owner started discovering the pattern. First the smaller, local patterns, then the systemic pattern that
seems to affect every team to different degrees but cannot be linked to a single team or role. These are the patterns
you can only see when you take a step back and look at the whole picture.</p>
<p>It wasn’t clear which role should drive which topic or improvement, when, and for what reason. We called this the
“Ownership Pattern.” We also saw that we were jumping from having an idea or a goal right into delivering on it. We
called this the “Product Pattern.”</p>
<p>On the local level, the responsibilities of who owned what and who did what seemed pretty clear. For topics “in between”
(i.e. two roles) and “across” (i.e. several teams) as well “through” (i.e. certain processes) there was a lot more
uncertainty.</p>
<p>Why? What happened in the past that we now have this pattern?</p>
<p>Zalando introduced further team autonomy and dedicated ownership. Zalando became successful because of its ability to
execute and deliver new products very quickly. What is the effect of this on the culture of this area? Are there even
more organisational or cultural influences? We were deep-diving into Zalando's past.</p>
<h3>The Magic Happens</h3>
<p>When we understood the origins we could understand the pattern and the effects. Now, could we initiate change by telling
everyone what insights we found? No.</p>
<p>The magic happens when everyone is having their own, <em>“Aha!”</em> moment, just like the Dedicated Owner in the meeting
before.</p>
<p>The next weeks we invested in creating these <em>“Aha!”</em> moments across the whole department, sharing and aligning the
insights in a self-exploratory way. We also made sure that no one felt blamed or hurt by the insights i.e. about their
role, but everyone had a shared understanding so we could jointly move on.</p>
<p>It was in this time we suddenly saw improvements happening in the unit without us triggering them: new boards, visual
backlogs, canvases, roadmaps, UX sketches, goal alignments started popping up on the walls. The teams were acting on
their next steps realizing their visions.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/67504ff1b73ac0b381c5c32a64e60bc4c30ec630_screen-shot-2018-12-19-at-4.29.11-pm.png?auto=compress,format"></p>
<p>It was not us coaches, but the Senior Data Scientists, Engineers, Producers, Leads, Product Specialists, Product Owner,
UX, … everybody moving a piece and making an improvement.</p>
<p>We coaches learned that a co-creative and open dialogs with personal moments can unlock the door to continuous
improvement. We learned that an outside perspective and self-reflection – without blaming or hurting anyone – is needed
from time to time to unstick and move forward as a unit.</p>
<p>On your next request, instead of creating an improvement plan you can try an open and co-creative diagnostic and – with
a bit of good fate – create a self-engaged and sustainable improvement at the scale of a unit.</p>
<p><em><a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1">Join us</a> at Zalando to initiate improvement and progress in our
units.</em></p>Open Source: December Review - Patroni, Machine Learning Meetup and more2019-01-07T00:00:00+01:002019-01-07T00:00:00+01:00Hong Phuc Dangtag:engineering.zalando.com,2019-01-07:/posts/2019/01/oss-december-update.html<p>This is a recap of open source activities and development at Zalando in the month of December.</p><h1>Project Highlights</h1>
<p><a href="https://github.com/zalando/patroni"><strong>Patroni</strong></a> - one of the most well-known open source projects of Zalando is now deployed as <a href="https://about.gitlab.com/2018/12/05/availability-postgres-patroni/">the Postgres Failover Manager on GitLab.com</a>. Patroni was created a few years back when we needed an automatic failover to manage hundreds of in-house clusters. The project was a fork of <a href="https://github.com/compose/governor">Compose Governor</a>, Patroni quickly overtook the original version and became one of the most widely used template for PostgreSQL High Availability these days. It is also adopted by <a href="https://www.ibm.com/blogs/bluemix/2018/09/an-update-on-the-updating-of-ibm-cloud-compose-for-postgresql">IBM Cloud</a>. Our team at Zalando published a <a href="https://patroni.readthedocs.io/en/latest/">searchable documentation site</a> to help users get started easily. Do check it out and <a href="https://github.com/zalando/patroni#community">join Patroni community</a> if you have any question.</p>
<p>Beside Patroni, Zalando also released other PostgreSQL driven projects such as:</p>
<ul>
<li>
<p><a href="https://github.com/zalando/spilo">Spilo</a> a Docker image that provides PostgreSQL and <a href="https://github.com/zalando/patroni">Patroni</a> bundled together. Spilo makes it simpler to deploy scalable Postgres clusters in a Kubernetes environment, and also do maintenance tasks.</p>
</li>
<li>
<p><a href="https://github.com/zalando/PGObserver">PGObserver</a>, PGObserver is a battle-tested monitoring solution for PostgreSQL databases. The project was originally developed to monitor performance metrics of Zalando's different PostgreSQL clusters.</p>
</li>
<li>
<p><a href="https://github.com/zalando-incubator/postgres-operator">Postgres-operator</a> is used internally to manage over 500 Postgres clusters across a large number of Kubernetes installations. Learn more about the current development of this project <a href="https://engineering.zalando.com/posts/2018/11/postgres-operator.html">here</a>.</p>
</li>
</ul>
<h1>Inside Zalando Open Source</h1>
<p><a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/256912495/"><strong>Machine Learning Meetup</strong></a> the Zalando Open Source Guild hosted a Holiday Hack event which brought together 71 researchers, developers and people who are interested in the field of Machine Learning to share knowledge and try out open source framework and solutions developed by Zalando Research and Engineering Teams.</p>
<p>During the event Zalando Open Source Maintainers conducted talks and guided the attendees to complete multiple challenges and hands-on exercises on two projects 1) <a href="https://github.com/zalandoresearch/flair">Flair - a natural language processing library</a> and 2) <a href="https://github.com/zalando/connexion">Connexion - a Swagger/OpenAPI framework for Python</a>.</p>
<p><img alt="Holiday Hack 2018, Berlin" src="https://engineering.zalando.com/posts/2019/01/holidayhack1.jpg"></p>
<hr>
<p><a href="https://opensource.zalando.com/docs/reports/2019/january-2019/"><strong>Open Source 2018 Year End Report</strong></a> it has been an amazing year for open source at Zalando: 25 new projects, 11.239 commits, 5.000 pull requests, with 31% coming from non-employees. We have seen activities, contributions across Zalando repositories throughout the entire year even in the busy holiday month of December. Click <a href="https://opensource.zalando.com/docs/reports/2019/january-2019/">here</a> to see the full report.</p>
<p><img alt="decembercontribution" src="https://engineering.zalando.com/posts/2019/01/december1.png"></p>
<h1>Zalando Open Source Around The World</h1>
<p><a href="https://events.linuxfoundation.org/events/kubecon-cloudnativecon-north-america-2018/"><strong>KubeCon, December 10 - 14</strong></a> <a href="https://de.linkedin.com/in/cyberdemn">Alexanders Kukushkin</a>, Database Engineer at Zalando, delivered a speech on 'Building your own PostgreSQL-as-a-Service on Kubernetes'.</p>
<iframe width="600" height="375" src="https://www.youtube.com/embed/G8MnpkbhClc" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<hr>
<p><a href="https://media.ccc.de"><strong>35c3, December 27 - 30</strong></a> a number of Zalandos participated in the 35th Chaos Communication Congress (35C3) - the annual four-day conference on technology, society and utopia organised by the <a href="https://www.ccc.de/">Chaos Computer Club</a> (CCC). This was a great opportunity for us to meet and connect with developers, tech communities and hackerspaces across Germany and Europe. <a href="https://www.linkedin.com/in/hongphucdang">Hong Phuc Dang</a>, Zalando Open Source Team, had the honor to speak at the Podium on <a href="https://youtu.be/NDcwl3n47ak">Feminist Perspectives on Inclusive and Diverse Spaces and Communities</a> where she exchanged lessons learned and ideas with other panelists on how to create and sustain more diversity in the tech community.</p>
<p><img alt="35c3" src="https://engineering.zalando.com/posts/2019/01/35c31.jpg"></p>
<h1>More reading</h1>
<ul>
<li><a href="https://opensource.zalando.com/docs/reports/2019/january-2019/">Zalando Open Source: 2018 Year End Report</a></li>
<li><a href="https://opensource.zalando.com/docs">Zalando Open Source Documentation</a></li>
<li><a href="https://opensource.zalando.com/tech-radar/">The Tech Radar: Zalando selection of technology choices</a></li>
</ul>Front-End Micro Services2018-12-06T00:00:00+01:002018-12-06T00:00:00+01:00Jeremy Colintag:engineering.zalando.com,2018-12-06:/posts/2018/12/front-end-micro-services.html<p>Fragments: limitations, solutions and our approach</p><p>The “micro frontends” idea has been around for a while now, with great resources such as <a href="https://medium.com/@tomsoderlund/micro-frontends-a-microservice-approach-to-front-end-web-development-f325ebdadc16">this Tom Söderlund
article</a>,
which includes a list of current existing implementations.</p>
<p>In this article, I would like to take an in-depth look at the reference implementation using fragments: explain what it
tries to achieve, where it falls short and possible solutions to those limitations.</p>
<p>What are <strong>Fragments</strong> in the first place? They can be described as isolated pieces of your HTML page, built and served
by independent services (and usually teams) such as Header, Product, Search, etc.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/933f056972448facab5a63ded6571388f24783ff_html_page_fragments.png?auto=compress,format"></p>
<p><em>Example of an e-commerce website using different fragments to render a product page.</em></p>
<p>There are at least four benefits from typical micro services that fragments are trying to bring to the front end:</p>
<ul>
<li><strong>Ease of deployments with better isolation</strong></li>
<li><strong>Improved scalability with smaller pieces</strong></li>
<li><strong>Technological stack isolation with API integrations</strong></li>
<li><strong>Localized complexity with every piece easier to reason about</strong></li>
</ul>
<p>All of those usually lead to more autonomous and engaged teams with an improved DevOps culture.</p>
<p>The idea of fragments was made popular by the Zalando project, <a href="https://www.mosaic9.org/">Mosaic</a>. Many companies like
<a href="https://engineering.hellofresh.com/front-end-microservices-at-hellofresh-23978a611b87">HelloFresh</a> are also following
this approach.</p>
<p>Main implementations for fragments include:</p>
<ul>
<li>Zalando’s <a href="https://github.com/zalando/tailor">Tailor</a>, inspired by Facebook’s
<a href="https://www.facebook.com/notes/facebook-engineering/bigpipe-pipelining-web-pages-for-high-performance/389414033919/">BigPipe</a></li>
<li><a href="https://micro-frontends.org/">Web Components</a> using <a href="https://en.wikipedia.org/wiki/Server_Side_Includes">Server Side
Includes</a> (<a href="https://www.youtube.com/watch?v=dTW7eJsIHDg">Michael Geers
talk</a>)</li>
<li>Web server HTML <strong><a href="https://en.wikipedia.org/wiki/Transclusion">transclusion</a></strong> using <a href="https://en.wikipedia.org/wiki/Edge_Side_Includes">Edge Side
Includes</a> (<a href="https://www.youtube.com/watch?v=4KVOuQDIfmw">Gustaf Nilsson Kotte
talk</a>)</li>
</ul>
<p>These fragments-based solutions claim <strong>technological stack isolation</strong> but in practice all those fragments are only
running a single framework (often React), which is probably a good thing as client bundle size would otherwise have to
include different frameworks.</p>
<p>However, they achieve <strong>ease of deployments</strong>, <strong>improved scalability</strong> and are easily <strong>server-side rendered</strong>. There is a small catch though.</p>
<p>Like on the back-end side, a distributed architecture managed by different teams slowly leads to inconsistencies and
different ways of doing things. While it might not be such a big deal for back-end side systems, creating inconsistent
user interfaces and user experiences is an issue most customer-facing websites cannot ignore. The split of your UI
components pipelines also means more infrastructure work to build and ship them to production.</p>
<p>Of course there are solutions to mitigate this. Immowelt for example went for a <a href="https://github.com/ImmoweltGroup/create-react-microservice">front-end micro service
boilerplate</a>. The boilerplate includes an advanced setup of
Immowelt’s front-end stack: React, Redux, Universal rendering, etc. The advantage is to reduce the time to setup for a
new service, limit fragmentation, share common practices between teams but still keep flexibility.</p>
<p>Another solution exposed and detailed by
<a href="https://allegro.tech/2016/03/Managing-Frontend-in-the-microservices-architecture.html">Allegro</a> is to compose the HTML
page from the same high-level front-end components whose unit they call “Box” and to focus on sharing and reusing
components. In this context, the unit or “Box” declares its data dependency and can include other “Boxes.”</p>
<p>Zalando also identified those issues, the most important ones for us, as a company, being the non consistent digital
experience, which penalises our brand proposition, together with the high barrier to entry for contributions from other
teams because of the complete technological stack required to build a new fragment.</p>
<p>We are currently working on a replacement for Tailor (Zalando’s fragments based approach) which we call "Interface
Framework" — an architecture stack composed of the following components:</p>
<ul>
<li>Fashion Store API: GraphQL API aggregation layer</li>
<li>Renderers: self-contained pieces of code declaring their own data dependency and visual representation</li>
<li>Recommendation System: backend service which decides which renderers to display for page composition</li>
<li>Rendering Engine: backend service and client-side runtime orchestrating the view composition based on the data
returned by the recommendation engine</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/900efc9faecc86d3bb9f7ed06bd84041751bf0d1_interface-framework.png?auto=compress,format"></p>
<figcaption style="text-align:center">Interface framework</figcaption>
<p>Renderers are developed using Design Systems components in a mono-repository to ensure consistency. The previously
redundant fragment’s stacks are now all centralised within the rendering engine which leads to faster on-boarding and
reduced time to market for feature teams developing renderers.</p>
<p>This new architecture also enables dynamic view composition: at any point in the user journey, the data layer can choose
how the page should look for personalisation purposes. We also want our partners to be able to build renderers
themselves so that they can seamlessly integrate their content within our website.</p>
<p><strong>Update:</strong> See also our series on details of the Inferface Framework:</p>
<ul>
<li><a href="https://engineering.zalando.com/posts/2021/03/micro-frontends-part1.html">Micro Frontends: from Fragments to Renderers (Part 1)</a></li>
<li><a href="https://engineering.zalando.com/posts/2021/09/micro-frontends-part2.html">Micro Frontends: Deep Dive into Rendering Engine (Part 2)</a></li>
<li><a href="https://engineering.zalando.com/posts/2023/07/rendering-engine-tales-road-to-concurrent-react.html">Rendering Engine Tales: Road to Concurrent React</a></li>
</ul>
<hr>
<p><em>We always look for talented Engineers to join Zalando as a <a href="https://jobs.zalando.com/en/tech/jobs/?gh_src=gk03hq&search=frontend&filters%5Bcategories%5D%5B0%5D=Software%20Engineering%20-%20Frontend">Frontend Engineer</a>!</em></p>Open Source: November Review - Maintainer training, new releases and more2018-12-06T00:00:00+01:002018-12-06T00:00:00+01:00Hong Phuc Dangtag:engineering.zalando.com,2018-12-06:/posts/2018/12/oss-november-updates.html<p>This is a recap of open source activities and development at Zalando in the month of November.</p><h1>Project Highlights</h1>
<p><a href="https://github.com/kubernetes-incubator/external-dns"><strong>ExternalDNS version 0.5.9</strong></a> is ready for testing. This project allows you to control DNS records dynamically via Kubernetes resources in a DNS provider-agnostic way. ExternalDNS also successfully made its way to <a href="https://github.com/kubernetes-incubator">the Kubernetes Incubator</a>. Check out <a href="https://github.com/kubernetes-incubator/external-dns/releases/tag/v0.5.9">the list of changes in this new release.</a></p>
<p><a href="https://github.com/zalando-incubator"><strong>Zalando-Incubator</strong></a> welcomed two brand new open source projects <a href="https://github.com/zalando-incubator/darty">1) Darty</a> - a data dependency manager for data science projects. It helps to share data across projects and control data versions and <a href="https://github.com/zalando-incubator/opentracing-sqs-java">2) opentracing-sqs-java</a> as the name explained itself, this is a Java utility library for simplifying instrumentation of <a href="https://github.com/zalando-incubator/opentracing-sqs-java">SQS</a> messages with <a href="http://opentracing.io">OpenTracing</a>.</p>
<p><a href="https://github.com/zalando/skipper"><strong>Skipper</strong></a> announced another new release this month. 1,400 commits were made since the project was first introduced in 2015. Skipper is an HTTP router and reverse proxy for service composition. It is designed to handle >300k HTTP route definitions with detailed lookup conditions, and flexible augmentation of the request flow with filters. This release includes a number of new features: apiMonitoring, east-west service-to-service API gateway setup in Kubernetes, automatic http redirects in kubernetes ingress controller running in GCP.</p>
<h1>Inside Zalando Open Source</h1>
<p><a href="https://opensource.zalando.com/#os-goals"><strong>Maintainer training program is working in progress.</strong></a> Early this month, the Open Source team begins to design a new training course for our existing and want-to-be Zalando project maintainers. While Zalando tech is well-known for doing open source in the open, we never stop exploring new ways to improve and scale up our projects across Zalando. This professional training initiative aims to enhance maintainers’ knowledge around adoption, compliance, community management and sustainability in open source, and thereby ensure they become confident to take full ownership of their open source projects independently. The course is expected to launched in Q1, 2019 and will cover the following topics:
- Introduction to a maintainer’s multiple roles
- Open source adoption guidelines
- Process to release open source
- Compliance
- Advocacy and stewardship
- Mentorship and coaching</p>
<p><a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/256912495/"><strong>Machine Learning meets Fashion.</strong></a> We are inviting researchers, scientists and anyone who is interested in the field of Machine Learning and AI to join our re-imaging fashion journey. We believe only by working together with the community worldwide, we can bring our technologies and the know-how to the next level. Zalando Research team has released a number of publications around the most exciting research topics, such as Deep Learning, Computer Vision and Natural Language Processing, Large Scale Bayesian Inference, Reinforcement Learning and Causality. And we are very proud to share our work with the community by releasing so far 13 research projects in open source with <a href="https://jobs.zalando.com/tech/blog/zalando-research-releases-flair/">Flair</a> - a natural language processing library as the most recent one.</p>
<iframe width="600" height="425" src="https://www.youtube.com/embed/bgDDfqB5iHM" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<figcaption style="text-align:center">Zalando Research Lab</figcaption>
<h1>Zalando Open Source Around The World</h1>
<ul>
<li><strong>HighLoad Moscow, November 8 - 9</strong> -- Zalando engineers participated in this conference to connected with local developers and Russian tech communities, at the same time, we had three presentations starting with <a href="https://www.linkedin.com/in/valgog">Valentine Gogichashvili</a> - our Head of Engineering, speaking about data engineering inside Zalando, <a href="https://twitter.com/try_except_">Henning Jacobs</a> - Head Developer Productivity who gave a lecture on <a href="https://www.slideshare.net/try_except_/optimizing-kubernetes-resource-requestslimits-for-costefficiency-and-latency-highload">‘Optimizing Kubernetes Resource Requests’</a> and finally <a href="https://github.com/CyberDem0n">Alexanders Kukushkin</a>, our Database Expert, shared his experiences on ‘the migration of a 10 TB PostgreSQL Cluster to AWS’</li>
</ul>
<p><img alt="Highload conference: Henning Jacobs (top), Valentine Gogichashvili (left), Alexander Kukushkin (right)" src="https://engineering.zalando.com/posts/2018/12/highload.jpg"></p>
<figcaption style="text-align:center">Henning Jacobs (top), Valentine Gogichashvili (left), Alexander Kukushkin (right)</figcaption>
<ul>
<li>
<p><strong>Open Source Diversity Meetup Berlin, November 20</strong> -- <a href="https://www.linkedin.com/in/hongphucdang/">Hong Phuc Dang</a> from Zalando Open Source Team shared her story on ‘what and why’ she started her open source journey in the first place. At Zalando, we are working hard to ensure that inclusion and diversity are firmly embedded in our culture, several incentives were introduced by the <a href="https://jobs.zalando.com/en/diversity">Diversity Guild</a> such as Women In Leadership, Inclusive Language, Diversity Day etc. Moving forward, we are working on increasing diversity across our open source projects.</p>
</li>
<li>
<p><strong>CodeMotion Berlin, November 20 - 21</strong> -- <a href="https://twitter.com/therealpadams">Paul Adams</a>, Zalando Open Source Lead, talked about <a href="https://berlin2018.codemotionworld.com/talk-detail/?detail=10425">Adopting Open Source Best Practice for the Enterprise</a>, with specific examples and policies that he and his team implemented inside Zalando.</p>
</li>
</ul>
<h2>More reading</h2>
<ul>
<li><a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/256912495">Holiday Hack: Machine Learning Meets Fashion Meetup</a></li>
<li><a href="https://opensource.zalando.com/docs">Zalando Open Source Documentation</a></li>
<li><a href="https://opensource.zalando.com/tech-radar">The Tech Radar: Zalando selection of technology choices</a></li>
</ul>Tag-based Navigation of a Fashion Catalog2018-11-29T00:00:00+01:002018-11-29T00:00:00+01:00Paul O'Gradytag:engineering.zalando.com,2018-11-29:/posts/2018/11/exploring-fashion-catalog.html<p>Exploring the Zalando Assortment by Browsing a Product Similarity Graph</p><h3><strong>Exploring the Zalando Assortment by Browsing a Product Similarity Graph</strong></h3>
<h3>Introduction</h3>
<p>As Europe's leading online fashion and lifestyle platform, Zalando is continually developing new features to enable our
customers to find the products they want. While the standard tools of Search, Categorization & Attribute Filtering are
par-for-the-course for purchasing items online, with an ever-expanding fashion assortment and an increase in the data
available to describe a product, this browsing experience is becoming more cumbersome and time-consuming, particularly
on mobile devices.</p>
<p>At Zalando's Fashion Insights Centre in Dublin, while keeping a focus on developing AI and Big Data driven products and
features in the medium term, we sometimes have time to explore new ideas with a longer-term vision. Either through our
annual <em>Hackweek</em> (an internal week long hackathon) or our <em>Slingshot</em> programme (an “intrapreneurship” program fostered
by Zalando's Innovation Lab). In this blog post we will share with you a project that has journeyed through, and
benefited from, both programmes, and present a new method for browsing an online fashion catalog using a <em>Product
Similarity Graph</em>.</p>
<h3>Product Similarity</h3>
<p>What do we mean when we say that two products are similar? Do we mean that the products are from the same fashion trend,
that they appear visually similar, or that they have a number of attributes in common, for example brand or product
type? In fact it can be all of these things, or a select few, summed up to create an overall similarity score between
two products, using all the data that makes sense for the task at hand.</p>
<p>When looking at the Zalando catalog in total, what does product similarity mean here? Well, it means calculating the
previously described similarity score for each product against all others, (sometimes referred to as a similarity
self-join) and storing the similarity scores in a suitable way. Typically, the scores are represented in a matrix
format, which leads to the construction of a <em>Product Similarity Matrix</em>, where each row contains the similarity scores
for one product against all others in the catalog, likewise for the columns since the matrix is <em>symmetric</em>.</p>
<p>Many of you will notice a potential pitfall as the catalog grows, which is, as the number of products, <em>n</em>, increases,
the number of scores required to be generated will increase by <em>n</em> for every new product added to the catalog, and as
such, the algorithmic complexity of generating a Product Similarity Matrix is <em>n</em>-squared, i.e., <em>O(n**2)</em>. Depending
on the use case the catalog size could be in the millions. To make this problem manageable we use distributed systems
and algorithms such as <a href="http://pyconie2016.pogrady.com/#18">Locality Sensitive Hashing</a>. However, we will spare the
details for our purposes here, for now just consider that our Product Similarity Matrix is big. We use the Product
Similarity Matrix within Zalando to provide a Product Similarity Service, which is currently used by our recommendation
team to tackle the cold-start problem associated with Collaborative Filtering.</p>
<h3>Product Similarity Graph</h3>
<p>The Product Similarity Matrix, due to its construction, can be easily interpreted as an <em>Adjacency Matrix</em>, which in
turn can be interpreted as a <em>Network Graph</em>, where products in the catalog are represented as nodes, and the similarity
relationship between products is represented by connections between nodes. Network graphs are interesting mathematical
objects and appear in a number of different areas of Computer Science, such as the study of online social networks (the
social graph) and the study of internet traffic (communications graph). Here we are interested in the relationships
between products in a fashion catalog, and we use a network graph to organise and store Zalando’s product data.</p>
<p>Below, we present a visualization of a Product Similarity Graph for a small number of the products sold by Zalando. As
is typical for graph visualizations we can see a rich structure emerge, where clusters of very similar products form.
These clusters correspond to high-level product attributes such as product type, with similar products being close
together, e.g., low shoes are clustered close to boots but far away from trousers. Within clusters, other more detailed
low-level attributes such as color, materials and styles create a distinction between products. Other connections
between clusters are far apart, which indicate a weak similarity relationship between products. However, these
connections offer an opportunity to browse and explore other parts of the graph and hence other items in the catalog,
offering both a <em>hunting</em> and <em>exploring</em> mode of product discovery. Finally, it important to note that the graph is not
fully connected, since many product pairs will exhibit no or low similarity.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ae212856e289ba9dd5d264ebcd7e9ce4819834bd_screen-shot-2018-11-29-at-7.26.10-pm.png?auto=compress,format"></p>
<p>*Figure 1: A visualization of the Product Similarity Graph for a small selection of items from the Zalando Catalog.
*</p>
<h3>Browsing via Graph Traversal</h3>
<p>From looking at the visualization above it is easy to imagine traversing through the graph to find similar products of
interest by choosing appropriate connections, quickly looking at one product then another until you find something you
would like to buy. In a similar way as you might do in a bricks and mortar fashion store, where you start browsing in
the trousers section, purchase an item, and decide to buy a matching shirt in another part of the store. However, how
would we implement this feature? How do we enable a customer to discover products by browsing a network graph?</p>
<p>Network Graphs have been studied for many years and there exists many different algorithms to extract information from a
graph, including algorithms to analyse the structure of a graph and algorithms to determine the optimal path through a
graph between two nodes. However, for our use case where we would like to use a graph to drive a browsing experience – a
scenario that has no predefined terminating node – there has not been much previous work. With this in mind, we present
here a new <em>Graph Exploration Algorithm</em> called <em>Graph Browser</em> to enable browsing on a Product Similarity Graph, and
provide a solution to the technical issues with browsing & exploring a graph in general.</p>
<h3>Introducing the <em>Graph Browser</em> Algorithm</h3>
<p>The Graph Browser algorithm enables browsing on a graph by generating a unique set of <em>Navigation Tags</em> for a product of
interest on the graph, which we call the anchor product. The tags are generated directly from the product data
neighbouring the anchor product and describe product attributes such as color, product type, brand etc. Furthermore, the
tags indicate <em>attribute differences</em> between the anchor product and its neighbours in the graph, and allow the user to
browse to a neighbouring product in the graph by simply selecting a tag. Common product attributes, such as color, may
reference many neighbouring products, resulting in the tag referencing many possible products, while other tags may
reference a single product. For the later case, the new product becomes the anchor product, for the former, the user
chooses a single product to be the new anchor. Once a new anchor product is selected a new set of navigation tags are
generated, and the process is repeated until the customer finds a product they would like to buy or exits the process.
In this way, the browsing experience is only ever concerned with products that are in the direct neighbourhood of the
anchor product, and the graph is traversed one connection at a time, where each product visited in turn is similar to
the previous but differing by the selected tag. To explain further we provide a diagram of the process in figure 2.
below.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7d8cdfe16de99b5396438f12d0527e51ea85c4de_screen-shot-2018-11-29-at-7.31.01-pm.png?auto=compress,format"></p>
<p>Figure 2: A diagram of a Product Similarity Graph and its Navigation Tags (presented in curly braces) as generated by
the Graph Browser Algorithm. Starting at P1, “black nike textile shoe,” one possible path to P3, “white fila leather
shoe,” would be to select the following navigation tags in succession: “white’ & ‘fila,” where selecting “white” will
assign P2 the new anchor, and selecting “fila” will move the anchor to P3. Alternatively, selection of the “white” and
“leather” tags would also bring the customer to P3, based on their preference for a leather shoe, over brand
preference.</p>
<p>To explain more formally, we present some details on the Graph Browser algorithm below, but first define some
preliminaries: A Product Similarity Graph is constructed of product nodes, P = {p1,...,pn}, and connections between
nodes that represent the similarity score between two products pi & pj, sij = sim_score(pi, pj). Each node has a record
of all the product attributes for that product, pi = {a1,...,am}, which are used to generate the navigation tags. The
algorithm is as follows,</p>
<ol>
<li>A single node in the graph is selected as the anchor node, pi.</li>
<li>A set of all connected nodes to pi is constructed, Pcon = {p1,...,pk}.</li>
<li>A mapping is constructed of the attribute differences between the anchor node, pi, and the attributes of the
products contained in the set of connected nodes, M = diff_attrs(pi, Pcon), where M[pj] returns the set of
attributes that differ, Dij = {a1,...,ad}, between products pj & pi.</li>
<li>We construct a mapping of single attributes, or tags, to the products they reference by inverting our attribute
difference mapping, and indexing attributes individually to the products they belong to, Q = tag_map(M), where
products in Pcon that have common differing attributes are indexed by the same tag, <em>i.e.</em>, the mapping Q[ai]
returns the connected product, or products, that the tag references in the graph.</li>
<li>The set of navigation tags, T = {t1,...,tp}, for the anchor, pi, corresponds to the keys of the mapping Q. The user
selects a tag, ts, from T and the new anchor product is selected from the indexed set of products, Q[ts], where pi
now becomes Q[ts] if there is a single product indexed, or is selected by the user if there are more than one
option.</li>
<li>The algorithm returns to step 1 and the process repeats.</li>
</ol>
<p>which completes the description of the Graph Browser algorithm. We present a small Python code implementation of the
Graph Browser algorithm at the end of this article.</p>
<h2>Other Considerations</h2>
<p>You will notice above that the initialization of the anchor node, pi, is not defined in the algorithm, however choices
include: Random initialization, user-specific recommendations and search query initialization. Furthermore, we do not
discuss how navigation tags are presented to the user, which can be presented using a ranking function that optimizes
the positioning of the tags, for example by using the similarity scores themselves or using a customer’s preferences.
Moreover, for the use case we present here we are interested in differences between products, however we could also
generate tags that represent common product attributes.</p>
<h2>Finally</h2>
<p>The Product Similarity Graph, Graph Browser algorithm and the Navigation Tags it generates all combine to produce a
quick and easy way to browse an online product catalog. While we are only beginning to explore the possibilities of the
Product Similarity Graph within Zalando, we are hopeful that it will be be used to drive some of our product discovery
and tag-based navigation features in the future.</p>
<h3>Appendix: Python Graph Browser Implementation</h3>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">networkx</span> <span class="k">as</span> <span class="nn">nx</span>
<span class="c1"># Define product records</span>
<span class="n">products</span> <span class="o">=</span> <span class="p">[</span>
<span class="c1"># ID, Brand, Type, Material, Color</span>
<span class="p">[</span><span class="s1">'P1'</span><span class="p">,</span> <span class="s1">'nike'</span><span class="p">,</span> <span class="s1">'shoe'</span><span class="p">,</span> <span class="s1">'textile'</span><span class="p">,</span> <span class="s1">'black'</span><span class="p">],</span>
<span class="p">[</span><span class="s1">'P2'</span><span class="p">,</span> <span class="s1">'nike'</span><span class="p">,</span> <span class="s1">'shoe'</span><span class="p">,</span> <span class="s1">'textile'</span><span class="p">,</span> <span class="s1">'white'</span><span class="p">],</span>
<span class="p">[</span><span class="s1">'P3'</span><span class="p">,</span> <span class="s1">'fila'</span><span class="p">,</span> <span class="s1">'shoe'</span><span class="p">,</span> <span class="s1">'leather'</span><span class="p">,</span> <span class="s1">'white'</span><span class="p">],</span>
<span class="p">[</span><span class="s1">'P4'</span><span class="p">,</span> <span class="s1">'fila'</span><span class="p">,</span> <span class="s1">'sock'</span><span class="p">,</span> <span class="s1">'synth'</span><span class="p">,</span> <span class="s1">'blue'</span><span class="p">]</span>
<span class="p">]</span>
<span class="c1"># Use jaccard index as similarity score</span>
<span class="k">def</span> <span class="nf">jaccard_index</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Returns the jaccard index between `x` & `y`.</span>
<span class="sd"> """</span>
<span class="n">x</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">x</span><span class="p">);</span> <span class="n">y</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">intersection</span><span class="p">(</span><span class="n">y</span><span class="p">))</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">union</span><span class="p">(</span><span class="n">y</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">product_similarity_matrix</span><span class="p">(</span><span class="n">products</span><span class="p">,</span> <span class="n">sim_score</span><span class="o">=</span><span class="n">jaccard_index</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Returns a Product Similarity Matrix for specified ``products`` &</span>
<span class="sd"> similarity score, ``sim_score``.</span>
<span class="sd"> """</span>
<span class="n">prod_sim_mat</span> <span class="o">=</span> <span class="p">{}</span>
<span class="c1"># n(n-1)/2 scores</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">product_i</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">products</span><span class="p">):</span>
<span class="k">for</span> <span class="n">product_j</span> <span class="ow">in</span> <span class="n">products</span><span class="p">[:</span><span class="n">i</span><span class="p">]:</span>
<span class="n">idi</span><span class="p">,</span> <span class="o">*</span><span class="n">attr_i</span> <span class="o">=</span> <span class="n">product_i</span>
<span class="n">idj</span><span class="p">,</span> <span class="o">*</span><span class="n">attr_j</span> <span class="o">=</span> <span class="n">product_j</span>
<span class="n">prod_sim_mat</span><span class="p">[(</span><span class="n">idi</span><span class="p">,</span> <span class="n">idj</span><span class="p">)]</span> <span class="o">=</span> <span class="n">sim_score</span><span class="p">(</span><span class="n">attr_i</span><span class="p">,</span> <span class="n">attr_j</span><span class="p">)</span>
<span class="k">return</span> <span class="n">prod_sim_mat</span>
<span class="k">def</span> <span class="nf">product_similarity_graph</span><span class="p">(</span><span class="n">prod_sim_mat</span><span class="p">,</span> <span class="n">products</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Combine Product Similarity Matrix and ``products`` to construct a</span>
<span class="sd"> Product Similarity Graph.</span>
<span class="sd"> """</span>
<span class="c1"># Create networkx graph</span>
<span class="n">PSG</span> <span class="o">=</span> <span class="n">nx</span><span class="o">.</span><span class="n">DiGraph</span><span class="p">()</span>
<span class="c1"># Add nodes and attrs to graph</span>
<span class="k">for</span> <span class="n">product</span> <span class="ow">in</span> <span class="n">products</span><span class="p">:</span>
<span class="n">id_</span><span class="p">,</span> <span class="o">*</span><span class="n">attrs</span> <span class="o">=</span> <span class="n">product</span>
<span class="n">PSG</span><span class="o">.</span><span class="n">add_node</span><span class="p">(</span><span class="n">id_</span><span class="p">,</span> <span class="n">attrs</span><span class="o">=</span><span class="n">attrs</span><span class="p">)</span>
<span class="c1"># Add edges and scores to nodes</span>
<span class="k">for</span> <span class="n">ind</span><span class="p">,</span> <span class="n">score</span> <span class="ow">in</span> <span class="n">prod_sim_mat</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="k">if</span> <span class="n">score</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">start</span><span class="p">,</span> <span class="n">end</span> <span class="o">=</span> <span class="n">ind</span>
<span class="n">PSG</span><span class="o">.</span><span class="n">add_edge</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">end</span><span class="p">,</span> <span class="n">score</span><span class="o">=</span><span class="n">score</span><span class="p">)</span>
<span class="n">PSG</span><span class="o">.</span><span class="n">add_edge</span><span class="p">(</span><span class="n">end</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="n">score</span><span class="o">=</span><span class="n">score</span><span class="p">)</span>
<span class="k">return</span> <span class="n">PSG</span>
<span class="k">def</span> <span class="nf">generate_diff_attrs</span><span class="p">(</span><span class="n">prod_sim_graph</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Generate a set of attribute differences between each pair of connected</span>
<span class="sd"> nodes in ``prod_sim_graph`` and add to edges.</span>
<span class="sd"> """</span>
<span class="k">for</span> <span class="n">anchor</span><span class="p">,</span> <span class="n">neighbour</span> <span class="ow">in</span> <span class="n">prod_sim_graph</span><span class="o">.</span><span class="n">edges</span><span class="p">():</span>
<span class="n">anchor_attrs</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">prod_sim_graph</span><span class="o">.</span><span class="n">node</span><span class="p">[</span><span class="n">anchor</span><span class="p">][</span><span class="s1">'attrs'</span><span class="p">])</span>
<span class="n">neighbour_attrs</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">prod_sim_graph</span><span class="o">.</span><span class="n">node</span><span class="p">[</span><span class="n">neighbour</span><span class="p">][</span><span class="s1">'attrs'</span><span class="p">])</span>
<span class="n">prod_sim_graph</span><span class="o">.</span><span class="n">edge</span><span class="p">[</span><span class="n">anchor</span><span class="p">][</span><span class="n">neighbour</span><span class="p">][</span><span class="s1">'diff_tags'</span><span class="p">]</span> <span class="o">=</span> \
<span class="n">neighbour_attrs</span> <span class="o">-</span> <span class="n">anchor_attrs</span>
<span class="k">return</span> <span class="n">prod_sim_graph</span>
<span class="k">def</span> <span class="nf">generate_nav_tags</span><span class="p">(</span><span class="n">prod_sim_graph</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Generate a navigation tag map for each node of the ``prod_sim_graph``.</span>
<span class="sd"> """</span>
<span class="k">for</span> <span class="n">anchor</span> <span class="ow">in</span> <span class="n">prod_sim_graph</span><span class="o">.</span><span class="n">nodes</span><span class="p">():</span>
<span class="n">tag_map</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">neighbour</span> <span class="ow">in</span> <span class="n">prod_sim_graph</span><span class="o">.</span><span class="n">neighbors</span><span class="p">(</span><span class="n">anchor</span><span class="p">):</span>
<span class="k">for</span> <span class="n">tag</span> <span class="ow">in</span> <span class="n">prod_sim_graph</span><span class="p">[</span><span class="n">anchor</span><span class="p">][</span><span class="n">neighbour</span><span class="p">][</span><span class="s1">'diff_tags'</span><span class="p">]:</span>
<span class="n">tag_map</span><span class="p">[</span><span class="n">tag</span><span class="p">]</span> <span class="o">=</span> <span class="n">tag_map</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">tag</span><span class="p">,</span> <span class="p">[])</span> <span class="o">+</span> <span class="p">[</span><span class="n">neighbour</span><span class="p">]</span>
<span class="n">prod_sim_graph</span><span class="o">.</span><span class="n">node</span><span class="p">[</span><span class="n">anchor</span><span class="p">][</span><span class="s1">'nav_tags'</span><span class="p">]</span> <span class="o">=</span> <span class="n">tag_map</span>
<span class="k">return</span> <span class="n">prod_sim_graph</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="c1"># Construct Product Similarity Matrix</span>
<span class="n">prod_sim_mat</span> <span class="o">=</span> <span class="n">product_similarity_matrix</span><span class="p">(</span><span class="n">products</span><span class="p">)</span>
<span class="c1"># Construct Product Similarity Graph</span>
<span class="n">PSG</span> <span class="o">=</span> <span class="n">product_similarity_graph</span><span class="p">(</span><span class="n">prod_sim_mat</span><span class="p">,</span> <span class="n">products</span><span class="p">)</span>
<span class="c1"># Generate difference attributes and attach to edges</span>
<span class="n">PSG</span> <span class="o">=</span> <span class="n">generate_diff_attrs</span><span class="p">(</span><span class="n">PSG</span><span class="p">)</span>
<span class="c1"># Generate navigation tags and attach to nodes</span>
<span class="n">PSG</span> <span class="o">=</span> <span class="n">generate_nav_tags</span><span class="p">(</span><span class="n">PSG</span><span class="p">)</span>
<span class="c1">### Browse the PSG using simulated user inputs ###</span>
<span class="c1"># Setup simulated user inputs</span>
<span class="n">anchor</span> <span class="o">=</span> <span class="s1">'P1'</span>
<span class="n">tag_selections</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'white'</span><span class="p">,</span> <span class="s1">'fila'</span><span class="p">,</span> <span class="s1">'sock'</span><span class="p">]</span>
<span class="n">product_selections</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'P2'</span><span class="p">]</span>
<span class="c1"># Run through simulated inputs</span>
<span class="k">for</span> <span class="n">selection</span> <span class="ow">in</span> <span class="n">tag_selections</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Anchor: </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">anchor</span><span class="p">))</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Tag selection: </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">selection</span><span class="p">))</span>
<span class="c1"># Graph Browser</span>
<span class="n">products</span> <span class="o">=</span> <span class="n">PSG</span><span class="o">.</span><span class="n">node</span><span class="p">[</span><span class="n">anchor</span><span class="p">][</span><span class="s1">'nav_tags'</span><span class="p">][</span><span class="n">selection</span><span class="p">]</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">products</span><span class="p">)</span><span class="o">></span><span class="mi">1</span><span class="p">:</span>
<span class="n">product_selection</span> <span class="o">=</span> <span class="n">product_selections</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">product_selection</span> <span class="ow">in</span> <span class="n">products</span><span class="p">,</span> <span class="s2">"Bad selection"</span>
<span class="n">anchor</span> <span class="o">=</span> <span class="n">product_selection</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Product selection: </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">product_selection</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">anchor</span><span class="p">,</span> <span class="o">=</span> <span class="n">products</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Bought product: </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">anchor</span><span class="p">))</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"</span><span class="se">\n\t</span><span class="s2">Fin.</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>Zalando Postgres Operator: One Year Later2018-11-26T00:00:00+01:002018-11-26T00:00:00+01:00Sergey Dudoladovtag:engineering.zalando.com,2018-11-26:/posts/2018/11/postgres-operator.html<p>The Postgres operator provides a managed Postgres service for Kubernetes. It extends the Kubernetes API with a custom “postgresql” resource that describes desired characteristics of a Postgres cluster, monitors updates of this resource and adjusts Postgres clusters accordingly. Zalando successfully uses the operator to manage more than 450 Postgres clusters across a large number of Kubernetes installations.</p><h1>Zalando Postgres operator: one year later</h1>
<p><a href="https://github.com/zalando-incubator/postgres-operator">The Postgres operator</a> provides a managed Postgres service for Kubernetes. It extends the Kubernetes API with a custom “postgresql” resource that describes desired characteristics of a Postgres cluster, monitors updates of this resource and adjusts Postgres clusters accordingly. Zalando successfully uses the operator to manage more than 450 Postgres clusters across a large number of Kubernetes installations.</p>
<h2>Moving to production</h2>
<p>More than a year and a half ago, Zalando prepared for running stateless and stateful applications alike on Kubernetes. With tens of teams working with hundreds of databases across multiple Kubernetes clusters, any kind of manual operations was out of the question. To keep the workload manageable Zalando’s database team therefore decided to automate the operations procedures.The <a href="https://coreos.com/blog/introducing-operators.html">operator pattern</a> well known in the Kubernetes universe turned out to be a perfect fit for the job.</p>
<p>At present the operator manages more than 400 Postgres clusters in Zalando: it watches requests for additions, deletions and updates of Postgres manifests and automatically carries out all necessary actions on the clusters. This saves time for engineers and the admins alike: instead of manually configuring numerous Kubernetes objects, they just submit a single YAML file describing the desired Postgres cluster setup, and the operator takes care of the rest.</p>
<p>A year ago, the operator just left the prototype stage and was still in its infancy. Since then we have extended it into a production-ready Postgres-on-Kubernetes managed service with numerous features such as:</p>
<ol>
<li><a href="https://github.com/zalando-incubator/postgres-operator/blob/master/docs/administrator.md#role-based-access-control-for-the-operator">Role-based access control</a>: By its very nature, the operator requires broad permissions to operate databases in the Kubernetes environment. Given the importance of security, we factored out a separate operator-specific service account and employed the RBAC capabilities of Kubernetes to precisely define the rights required by the operator adhering to the principle of least privilege.</li>
<li>Integration with external services: Postgres databases do not run in isolation but rather in the complex tech infrastructure. The seamless integration with existing tools is of great importance for our customer experience. Our generic <a href="https://github.com/zalando-incubator/postgres-operator/blob/master/docs/reference/cluster_manifest.md#sidecar-definitions">sidecar container support</a> enables running third-party applications side-by-side with the database pods. An example of such approach is a <a href="https://github.com/zalando-incubator/postgres-operator/blob/master/docs/reference/operator_parameters.md#scalyr-options">Scalyr sidecar</a> that transparently to the user ships the Postgres container logs to the Scalyr service, hence empowering employees to use standard log processing tools.</li>
<li>Log shipping of Postgres logs to cloud storage: While Postgres normally rotates its log files within one week, the operator and Spilo can join forces to continuously archive the database log history in the cloud for as long as necessary.</li>
<li><a href="https://github.com/zalando-incubator/postgres-operator/blob/master/docs/administrator.md#select-the-namespace-to-deploy-to">Support for multiple namespaces</a>. Namespaces enable us to better structure applications of different teams within a single Kubernetes cluster; a typical use case involves running experiments in a dedicated namespace and then deleting the no longer needed results by simply dropping the namespace. To take full advantage of multiple namespaces, we designed and built into the operator the ability to manage databases running in namespace other than the default one.</li>
<li>API versioning. We keep an eye on the ongoing evolution of Kubernetes and timely exploit the most useful features for the benefit of operator users. Since recently, we started to use Kubernetes-standard <a href="https://github.com/zalando-incubator/postgres-operator/blob/4543bfde96aac406240ee2f1faa591bae7c6b83d/docs/developer.md#code-generation">code generation</a> to implement the API of the “postgresql” custom resource. By doing so we introduced API versioning to the operator and greatly reduced the manual effort needed to support new Kubernetes versions within the operator codebase.</li>
<li>Last by not least, we recognized the ever increasing adoption of our software and for that reason contributed <a href="https://postgres-operator.readthedocs.io/en/latest/">the documentation</a> to ease running this service in the environments other than ours.</li>
</ol>
<p>Our efforts culminated in the release of the operator’s <a href="https://github.com/zalando-incubator/postgres-operator/releases/tag/v1.0.0">first stable version</a> in August 2018. As the software we have built proved to be such a success within Zalando, we reached out the broader cloud computing community to share the experience of developing and operating a managed stateful service on top of Kubernetes. We are pleased to share our achievements with the community at the top tier industrial conferences such as <a href="https://archive.fosdem.org/2018/schedule/event/blue_elephant_on_demand_postgres_kubernetes/">FOSDEM 2018</a> and <a href="https://kccna18.sched.com/event/GrU0">KubeCon North America 2018</a>.</p>
<h2>Want to delve in?</h2>
<p>If you want to know more, check out <a href="https://github.com/zalando-incubator/postgres-operator/blob/master/docs/index.md#talks">our talks</a> for a deeper technical perspective on what we are doing. For those of you who are willing to obtain hands-on experience with the hot technologies such as Postgres, Kubernetes, or golang in the thriving open-source environment, we prepared a list of <a href="https://github.com/zalando-incubator/postgres-operator/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22">good first issues</a>. Finally, we <a href="https://jobs.zalando.com/en/tech/jobs/">are always looking</a> for new team members who are eager to work with us full-time on the Zalando database infrastructure.</p>Zalando Research Releases “Flair”2018-11-22T00:00:00+01:002018-11-22T00:00:00+01:00Per Plougtag:engineering.zalando.com,2018-11-22:/posts/2018/11/zalando-research-releases-flair.html<p>Open sourcing machine learning research for natural language processing (NLP)</p><h3><strong>Open sourcing machine learning research for natural language processing (NLP)</strong></h3>
<p>Two years ago, <a href="https://research.zalando.com/">Zalando Research</a> launched with a clear purpose to ensure that Zalando
Tech is at the forefront of research in the areas of data science, machine learning, natural language processing and
artificial intelligence.</p>
<p>Our researchers’ work previously focused mainly within Zalando. Therefore, we are very excited to announce that we have
released “ <a href="https://github.com/zalandoresearch/flair">Flair</a>”; our state-of-the-art natural language processing (NLP)
library. Flair is under the MIT license and will continue as an actively maintained open source project under Zalando
leadership.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8496bef09500a26c88a6f42cb06576b99e7410f7_screen-shot-2018-11-22-at-5.47.19-pm.png?auto=compress,format"></p>
<p><em><a href="https://research.zalando.com/">Zalando Research</a> Team</em></p>
<p>The Flair project is our cutting edge framework for natural language processing (NLP), meaning a framework to give a
computer the ability to understand, tag and classify written texts. Flair is useful when you want to understand the
meanings of email messages, customer responses, website comments, or any other scenario where users submit text feedback
that you want to automatically classify or otherwise process.</p>
<p>The library is implemented in Python on top of the popular PyTorch deep learning framework. It packages pre-trained
models for NLP tasks, including <em>named entity recognition</em> (NER) to detect things like person or location names in text
and <em>part-of-speech tagging</em> to detect syntactic word types like verbs and nouns. It allows you to easily apply our
pre-trained models to your text, or train your own <em>sequence labeling</em> or <em>text classification</em> models.</p>
<p>For instance, we can train Flair to recognize fashion concepts such as <em>brands</em>, <em>colors</em> or <em>seasons</em> in text, or to
classify whole text documents into one or more categories. Check out the results of such below:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c84f8dad4b420c03fde63e94b342d701441603f6_screen-shot-2018-11-22-at-5.58.21-pm.png?auto=compress,format"></p>
<p>Due to its versatility, Flair is already part of several in-production systems at Zalando, as machine learning has
become a natural part of our engineering toolbox.</p>
<p><a href="https://github.com/zalandoresearch/flair">You can find documentation and the source code of Flair on Github.</a></p>
<p>This is an important milestone for the open source and research teams at Zalando. Having research mature into
in-production tooling and made available to the wider tech ecosystem as open source indicates a healthy and cutting-edge
engineering culture at Zalando.</p>
<h3>Comparison with the state-of-the-art</h3>
<p>Flair’s accuracy out-performs all of the previous best methods on a large range of NLP tasks; evaluated against
industry-standard datasets shows substantial improvements:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c81677f32c66cbb9fe3c839759b9bd0cabcc979b_screen-shot-2018-11-22-at-6.07.23-pm.png?auto=compress,format"></p>
<h3>Get involved</h3>
<p>We invite you to start using Flair. There is already <a href="https://github.com/zalandoresearch/flair/tree/master/resources/docs">extensive
documentation</a> available on how to use the
framework, so you can quickly get up and running and experiment with the models included, or train your own if you
wish.</p>
<p>There is a growing community around Flair already, contributing new features and support for other languages.</p>
<p><em>Work in an exciting tech environment. Check out our <a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1">jobs page</a>.</em></p>Train Deep Learning Models on AWS2018-11-08T00:00:00+01:002018-11-08T00:00:00+01:00Oleg Polosintag:engineering.zalando.com,2018-11-08:/posts/2018/11/train-deep-learning-models-aws.html<p>A real-life example of how to train a Deep Learning model on an AWS Spot Instance using Spotty</p><h3><strong>A real-life example of how to train a Deep Learning model on an AWS Spot Instance using Spotty</strong></h3>
<p><a href="https://github.com/apls777/spotty">Spotty</a> is a tool that simplifies training of Deep Learning models on AWS.</p>
<p><strong>Why will you ❤️this tool?</strong></p>
<ul>
<li>it makes training on AWS GPU instances as simple as a training on your local computer</li>
<li>it automatically manages all necessary AWS resources including AMIs, volumes and snapshots</li>
<li>it makes your model trainable on AWS by everyone with a couple of commands</li>
<li>it detaches remote processes from SSH sessions</li>
<li>it saves you up to 70% of the costs by using Spot Instances</li>
</ul>
<p>To show how it works, let’s take a non-trivial model and try to train it. I chose one of the implementations of
<a href="https://github.com/Rayhane-mamah/Tacotron-2">Tacotron 2</a>. It’s a speech synthesis system by Google.</p>
<p>Clone the repository of Tacotron 2 to your computer:</p>
<div class="highlight"><pre><span></span><code>git clone https://github.com/Rayhane-mamah/Tacotron-2.git
</code></pre></div>
<h3>Docker Image</h3>
<p>Spotty trains models inside a Docker container. So we need to either find a publicly available Docker image that
satisfies the model’s requirements, or create a new Dockerfile with a proper environment.</p>
<p>This implementation of Tacotron uses Python 3 and TensorFlow, so we could use the official Tensorflow image:
<a href="https://hub.docker.com/r/tensorflow/tensorflow/">tensorflow/tensorflow-gpu-p3</a>. But this image doesn’t satisfy all the
requirements from the “requirements.txt” file. So we need to extend this image and install all necessary libraries on
top.</p>
<p>Create the <em>Dockerfile</em> file in the root directory of the project:</p>
<div class="highlight"><pre><span></span><code><span class="n">FROM</span><span class="w"> </span><span class="n">tensorflow</span><span class="o">/</span><span class="n">tensorflow</span><span class="p">:</span><span class="n">latest</span><span class="o">-</span><span class="n">gpu</span><span class="o">-</span><span class="n">py3</span>
<span class="n">WORKDIR</span><span class="w"> </span><span class="o">/</span><span class="n">root</span>
<span class="c1"># install pyaudio library</span>
<span class="n">RUN</span><span class="w"> </span><span class="n">apt</span><span class="o">-</span><span class="n">get</span><span class="w"> </span><span class="n">update</span><span class="w"> </span>\
<span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">apt</span><span class="o">-</span><span class="n">get</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="o">-</span><span class="n">y</span><span class="w"> </span><span class="n">python3</span><span class="o">-</span><span class="n">pyaudio</span><span class="w"> </span>\
<span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">apt</span><span class="o">-</span><span class="n">get</span><span class="w"> </span><span class="n">clean</span><span class="w"> </span>\
<span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">rm</span><span class="w"> </span><span class="o">-</span><span class="n">rf</span><span class="w"> </span><span class="o">/</span><span class="k">var</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">apt</span><span class="o">/</span><span class="n">lists</span><span class="o">/*</span>
<span class="c1"># install other requirements</span>
<span class="n">COPY</span><span class="w"> </span><span class="n">requirements</span><span class="o">.</span><span class="n">txt</span><span class="w"> </span><span class="n">requirements</span><span class="o">.</span><span class="n">txt</span>
<span class="n">RUN</span><span class="w"> </span><span class="n">grep</span><span class="w"> </span><span class="o">-</span><span class="n">v</span><span class="w"> </span><span class="s1">'^pyaudio'</span><span class="w"> </span><span class="n">requirements</span><span class="o">.</span><span class="n">txt</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">requirements_updated</span><span class="o">.</span><span class="n">txt</span><span class="w"> </span>\
<span class="o">&&</span><span class="w"> </span><span class="n">pip3</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="o">-</span><span class="n">r</span><span class="w"> </span><span class="n">requirements_updated</span><span class="o">.</span><span class="n">txt</span>
</code></pre></div>
<p>Here we’re extending the original TensorFlow image and installing all other requirements (I couldn’t install the pyaudio
library through pip, so I did it using apt).</p>
<p>Also, create the .<em>dockerignore</em> file with the following content:</p>
<div class="highlight"><pre><span></span><code><span class="gh">#</span> ignore everything
**
<span class="gh">#</span> allow only requirements.txt file
!/requirements.txt
</code></pre></div>
<p>Otherwise, you would get an out-of-space error, because Docker will be copying the entire build context (including heavy
“training_data/” directory) to the Docker daemon.</p>
<h3>Spotty Configuration File</h3>
<p>Once we have the Dockerfile, we’re ready to write a Spotty configuration file. Create the <em>spotty.yaml</em> file in the root
directory of the project.</p>
<p>It consists of 3 sections: <em>project</em>, <em>instance</em> and <em>scripts</em>.</p>
<h3>Section 1: Project</h3>
<div class="highlight"><pre><span></span><code><span class="n">project</span><span class="o">:</span>
<span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">Tacotron2</span>
<span class="w"> </span><span class="n">remoteDir</span><span class="o">:</span><span class="w"> </span><span class="sr">/workspace/</span><span class="n">project</span>
<span class="w"> </span><span class="n">syncFilters</span><span class="o">:</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">exclude</span><span class="o">:</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="o">.</span><span class="na">idea</span><span class="cm">/*</span>
<span class="cm"> - .git/*</span>
<span class="cm"> - '*/</span><span class="n">__pycache__</span><span class="o">/*</span><span class="err">'</span>
<span class="o">-</span><span class="w"> </span><span class="n">training_data</span><span class="o">/*</span>
</code></pre></div>
<p>The section contains the following parameters:</p>
<ol>
<li><strong>Name of the project</strong>: The name will be used in names of AWS resources. For example, in the name of the S3 bucket
that will be used to synchronize the project code with the instance.</li>
<li><strong>Remote directory</strong>: It’s a directory where the project will be stored on the instance.</li>
<li><strong>Synchronization filters</strong>: Filters are being used to exclude directories which shouldn’t be synchronized with the
instance. For example, we ignore PyCharm configuration, Git files, Python cache files and training data.</li>
</ol>
<h3>Section 2: Instance</h3>
<div class="highlight"><pre><span></span><code><span class="n">instance</span><span class="o">:</span>
<span class="w"> </span><span class="n">region</span><span class="o">:</span><span class="w"> </span><span class="n">us</span><span class="o">-</span><span class="n">east</span><span class="o">-</span><span class="mi">2</span>
<span class="w"> </span><span class="n">instanceType</span><span class="o">:</span><span class="w"> </span><span class="n">p2</span><span class="o">.</span><span class="na">xlarge</span>
<span class="w"> </span><span class="n">volumes</span><span class="o">:</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">Tacotron2</span>
<span class="w"> </span><span class="n">directory</span><span class="o">:</span><span class="w"> </span><span class="o">/</span><span class="n">workspace</span>
<span class="w"> </span><span class="n">size</span><span class="o">:</span><span class="w"> </span><span class="mi">50</span>
<span class="w"> </span><span class="n">docker</span><span class="o">:</span>
<span class="w"> </span><span class="n">file</span><span class="o">:</span><span class="w"> </span><span class="n">Dockerfile</span>
<span class="w"> </span><span class="n">workingDir</span><span class="o">:</span><span class="w"> </span><span class="sr">/workspace/</span><span class="n">project</span>
<span class="w"> </span><span class="n">dataRoot</span><span class="o">:</span><span class="w"> </span><span class="sr">/workspace/</span><span class="n">docker</span>
<span class="n">ports</span><span class="o">:</span><span class="w"> </span><span class="o">[</span><span class="mi">6006</span><span class="o">,</span><span class="w"> </span><span class="mi">8888</span><span class="o">]</span>
</code></pre></div>
<p>The section contains the following parameters:</p>
<ol>
<li><strong>Region</strong>: AWS region where a Spot Instance will be launched.</li>
<li><strong>Instance type</strong>: Type of AWS EC2 instance.</li>
<li><strong>List of volumes</strong>: Each volume has a name, a directory where the volume will be mounted, and a size. When you’re
starting an instance the first time, the volume will be created. When you’re stopping the instance, a snapshot will
be taken and automatically restored next time.</li>
<li><strong>Docker</strong>: Here we set the path to our Dockerfile. An alternative approach is to build the image locally and push
it to the <a href="https://hub.docker.com/">Docker Hub Registry</a>, then you can use the name of the image instead of a file.
We set a working directory, it will be used by the scripts from the “scripts” section. Also, we can change a Docker
data root directory to a directory on the attached volume, then the downloaded images will be saved with a snapshot
of the volume. Next time it will take less time to restore the image.</li>
<li><strong>Ports</strong>: Ports to expose. In this example, we open 2 ports: 6006 for TensorBoard and 8888 for Jupyter Notebook.</li>
</ol>
<p>Read more about other parameters in the <a href="https://github.com/apls777/spotty/wiki/Configuration-File">documentation</a>.</p>
<h3>Section 3: Scripts</h3>
<div class="highlight"><pre><span></span><code><span class="n">scripts</span><span class="o">:</span>
<span class="w"> </span><span class="n">preprocess</span><span class="o">:</span><span class="w"> </span><span class="o">|</span>
<span class="w"> </span><span class="n">curl</span><span class="w"> </span><span class="o">-</span><span class="n">O</span><span class="w"> </span><span class="n">http</span><span class="o">://</span><span class="n">data</span><span class="o">.</span><span class="na">keithito</span><span class="o">.</span><span class="na">com</span><span class="sr">/data/speech/</span><span class="n">LJSpeech</span><span class="o">-</span><span class="mf">1.1</span><span class="o">.</span><span class="na">tar</span><span class="o">.</span><span class="na">bz2</span>
<span class="w"> </span><span class="n">tar</span><span class="w"> </span><span class="n">xvjf</span><span class="w"> </span><span class="n">LJSpeech</span><span class="o">-</span><span class="mf">1.1</span><span class="o">.</span><span class="na">tar</span><span class="o">.</span><span class="na">bz2</span>
<span class="w"> </span><span class="n">rm</span><span class="w"> </span><span class="n">LJSpeech</span><span class="o">-</span><span class="mf">1.1</span><span class="o">.</span><span class="na">tar</span><span class="o">.</span><span class="na">bz2</span>
<span class="w"> </span><span class="n">python3</span><span class="w"> </span><span class="n">preprocess</span><span class="o">.</span><span class="na">py</span>
<span class="w"> </span><span class="n">train</span><span class="o">:</span><span class="w"> </span><span class="o">|</span>
<span class="w"> </span><span class="n">python</span><span class="w"> </span><span class="n">train</span><span class="o">.</span><span class="na">py</span><span class="w"> </span><span class="o">--</span><span class="n">model</span><span class="o">=</span><span class="s1">'Tacotron-2'</span>
<span class="w"> </span><span class="n">tensorboard</span><span class="o">:</span><span class="w"> </span><span class="o">|</span>
<span class="w"> </span><span class="n">tensorboard</span><span class="w"> </span><span class="o">--</span><span class="n">logdir</span><span class="w"> </span><span class="sr">/workspace/project/</span><span class="n">logs</span><span class="o">-</span><span class="n">Tacotron</span><span class="o">-</span><span class="mi">2</span>
<span class="w"> </span><span class="n">jupyter</span><span class="o">:</span><span class="w"> </span><span class="o">|</span>
<span class="w"> </span><span class="o">/</span><span class="n">run_jupyter</span><span class="o">.</span><span class="na">sh</span><span class="w"> </span><span class="o">--</span><span class="n">allow</span><span class="o">-</span><span class="n">root</span>
</code></pre></div>
<p>Scripts are optional, but very useful. They can be run on the instance using the following command:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>spotty<span class="w"> </span>run
</code></pre></div>
<p>For this project we’ve created 4 scripts:</p>
<ul>
<li><strong>preprocess</strong>: downloads the dataset and prepares it for a training,</li>
<li><strong>train</strong>: starts training,</li>
<li><strong>tensorboard</strong>: runs TensorBoard on the port 6006,</li>
<li><strong>jupyter</strong>: starts Jupyter Notebook server on the port 8888.</li>
</ul>
<p>That’s it! The model is ready to be trained on AWS!</p>
<h3><strong>Spotty Installation</strong></h3>
<h3>Requirements</h3>
<ul>
<li>Python 3</li>
<li>Installed and configured AWS CLI (see <a href="http://docs.aws.amazon.com/cli/latest/userguide/installing.html">Installing the AWS Command Line
Interface</a>)</li>
</ul>
<h3>Installation</h3>
<ol>
<li>
<p>Install Spotty using <a href="http://www.pip-installer.org/en/latest/">pip</a>:</p>
<p>$ pip install -U spotty</p>
</li>
<li>
<p>Create an AMI with NVIDIA Docker. Run the following command from the root directory of your project (where the
<em>spotty.yaml</em> file is located):</p>
<p>$ spotty create-ami</p>
</li>
</ol>
<p>In several minutes you will have an AMI that can be used for all your projects within the AWS region.</p>
<h3>Model Training</h3>
<ol>
<li>
<p>Start a Spot Instance with the Docker container:</p>
<p>$ spotty start</p>
</li>
</ol>
<p>Once the instance is up and running, you will see its IP address. Use it to open TensorBoard and Jupyter Notebook later.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/10ebee3fca91065c05f120ae48aebcc6fc99880f_screen-shot-2018-11-06-at-4.07.37-pm.png?auto=compress,format"></p>
<ol>
<li>
<p>Download and preprocess the data for the Tacotron model. We already have a custom script in the configuration file to
do that. Just run:</p>
<p>$ spotty run preprocess</p>
</li>
<li>
<p>Once the preprocessing is done, train the model. Run the “train” script:</p>
<p>$ spotty run train</p>
</li>
</ol>
<p>On a “p2.xlarge” instance it will probably take around 8–9 days to reach 120 thousand steps. But you could use instances
with more performant GPUs to make the training faster.</p>
<p>You can detach this SSH session using <strong>Ctrl + b</strong>, then <strong>d</strong> combination of keys. The training process won’t be
interrupted. To reattach that session, just run the <em>spotty run</em> <em>train</em> command again.</p>
<h3>TensorBoard</h3>
<p>Start the TensorBoard using the “tensorboard” script:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>spotty<span class="w"> </span>run<span class="w"> </span>tensorboard
</code></pre></div>
<p>TensorBoard will be running on the port 6006. You can detach the SSH session using <strong>Ctrl + b</strong>, then <strong>d</strong> combination
of keys, it still will be running.</p>
<h3>Jupyter Notebook</h3>
<p>You can use Jupyter Notebook to download trained models to your computer. Use the “jupyter” script to start it:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>spotty<span class="w"> </span>run<span class="w"> </span>jupyter
</code></pre></div>
<p>Jupyter Notebook will be running on the port 8888. Open it using the IP address of the instance and the URL that you see
in the output of the command.</p>
<h3>SSH Connection</h3>
<p>To connect to the running Docker container via SSH, use the following command:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>spotty<span class="w"> </span>ssh
</code></pre></div>
<p>It uses a <a href="https://github.com/tmux/tmux/wiki">tmux</a> session, so you can always detach it using <strong>Ctrl + b</strong>, then <strong>d</strong>
combination of keys and attach that session later using the <em>spotty ssh</em> command again.</p>
<h3>Stop Instance</h3>
<p>Don’t forget to stop the instance once you are done! Use the following command:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>spotty<span class="w"> </span>stop
</code></pre></div>
<p>When you’re stopping the instance, Spotty automatically creates snapshots of the volumes. When you start an instance
next time, it will restore the snapshots automatically.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/bbda7d5dfc939477e23a8173d2577791057d4e2b_screen-shot-2018-11-06-at-4.29.52-pm.png?auto=compress,format"></p>
<h3>Conclusion</h3>
<p>Using Spotty is a convenient way to train Deep Learning models on AWS Spot Instances. It will save you not just up to
70% of the cost, but also a lot of time on setting up an environment for your models and notebooks. Once you have a
Spotty configuration for your model, everyone can train it with a couple of commands.</p>
<p><strong>If you enjoyed this article, please star the <a href="https://github.com/apls777/spotty">project on GitHub</a> and share this
article with your friends.</strong></p>
<p><em>This article was <a href="https://towardsdatascience.com/how-to-train-deep-learning-models-on-aws-spot-instances-using-spotty-8d9e0543d365">originally
published</a>
on Medium.</em></p>Open Source: October Review - Hacktoberfest, new releases and more.2018-11-06T00:00:00+01:002018-11-06T00:00:00+01:00Hong Phuc Dangtag:engineering.zalando.com,2018-11-06:/posts/2018/11/oss-october-updates.html<p>This is a recap of open source activities and development at Zalando in the month of October.</p><h2>Project Highlights</h2>
<ul>
<li>
<p><a href="https://github.com/zalando/connexion"><strong>Connexion version 2.0</strong></a> with OpenAPI 3 support is ready, <a href="https://engineering.zalando.com/posts/2018/11/connexion-20-release.html">check out what is new in our latest release!</a> Connexion is the <a href="https://swagger.io">Swagger/OpenAPI</a> first framework for Python on top of Flask with automatic endpoint validation & OAuth2 support. With 87 active contributors and more than 1,000 repositories that depend on Connexion worldwide makes this project one of the most successful open source releases of Zalando.</p>
</li>
<li>
<p><a href="https://github.com/zalando-incubator/postgres-operator"><strong>Postgres-Operator</strong></a> after one year of development, this operator now manages more than 500 Postgres clusters across a large number of Kubernetes installations inside Zalando. Our engineers do not need to manually configure numerous Kubernetes objects, they just submit a single text file describing the desired Postgres cluster, and the operator takes care of the rest. Postgres-operator was first started by Zalando’s database team to provide a managed PostgreSQL service for Kubernetes. <a href="https://github.com/zalando-incubator/postgres-operator">Try out the operator here!</a></p>
</li>
<li>
<p><a href="https://github.com/zalandoresearch/flair"><strong>Flair</strong></a> <a href="https://research.zalando.com/">Zalando Research</a> recently released a new version of this open source Natural Language Processing framework, it now runs on both Linux and Mac, <a href="https://github.com/zalandoresearch/flair">click here to test!</a> Flair gives users the ability to tag, classify and understand the meanings of email messages, customer responses, website comments, or any other scenario where users submit text feedback to be automatically classified or otherwise processed.</p>
</li>
</ul>
<h2>Inside Zalando Open Source</h2>
<ul>
<li>
<p><strong>Zalando hosted a Hack Night at the Berlin office to celebrate Hacktoberfest - the month of open source.</strong> The main event started with a number of lightning talks by open source projects, followed by a hacking session, where Zalando engineers gathered as teams and worked on challenges under domains of machine learning, database and web plug-ins. Project maintainers were present to support participants completing their very first contribution and pull request.</p>
</li>
<li>
<p><strong>The first Open Source Onboarding Training</strong> was conducted on October 9th as a part of Zalando’s Tech Bootcamp, where we explained to the new joiners the importance of open source at Zalando, how open source fits with Zalando’s culture and the way we work. During this training, we also highlighted the open source journey of a developer and guided people how to contribute to open source projects.</p>
</li>
<li>
<p><strong>Open Source Team released a promotion framework</strong> that helps engineering teams to grow an ecosystem around their open source projects through various outreach and onboarding activities. This framework includes <a href="https://opensource.zalando.com/docs/promoting/write-project-intro-blog/">blogging tips</a>, <a href="https://opensource.zalando.com/docs/promoting/promotion-channels/">utilizing social media</a>, <a href="https://opensource.zalando.com/docs/promoting/organize-release-party/">organizing a release party</a>, and <a href="https://opensource.zalando.com/docs/promoting/write-announcement-email/">writing tips for public announcement</a>.</p>
</li>
</ul>
<h2>Zalando Open Source Around The World</h2>
<ul>
<li>
<p><strong>PostgreSQL Conference Europe, October 23 - 26, 2018</strong>
at the PGConf Europe, <a href="https://github.com/CyberDem0n">Alexander Kukushkin</a>, Zalando Database Engineer, presented how Zalando migrated one of the largest Postgres clusters to AWS EC2 with <a href="https://github.com/zalando/patroni">Patroni</a>, a template for PostgreSQL High Availability with ZooKeeper, etcd, or Consul.</p>
</li>
<li>
<p><strong>Open Source Summit Europe, October 22 - 24, 2018</strong> our first speaker <a href="https://github.com/erthalion">Dmitry Dolgov</a>, Zalando Software Engineer, delivered a talk on PostgreSQL + Linux Kernel, showing common techniques of configuring the Linux kernel to work efficiently with PostgreSQL. The second speaker, <a href="https://github.com/perploug">Per Ploug</a>, Zalando Open Source Community Manager, gave a presentation on ‘Turning Policy into Tooling’, where he outlined concrete efforts, tools and services that Zalando have developed and uses to remove compliance barriers. Finally, Zalando InnerSource Manager, <a href="https://twitter.com/hpdang">Hong Phuc Dang</a> spoke on a 'Mentorship Panel' together with Open Source Program Managers of Intel, Google, Bitergia and a researcher from Inria. The discussion covered topics such as value of mentorship, mentorship metrics, challenges and diversity.</p>
</li>
</ul>
<p><img alt="Open Source Summit Europe: From left to right Julia Lawall, Senior Researcher (Inria) - Josh Simmons, Open Source Program Manager (Google) - Hong Phuc Dang, InnerSource Manager (Zalando) - Jeffrey Osier-Mixon, Open Source Program Manager (Intel)" src="https://engineering.zalando.com/posts/2018/11/oss-hong.jpg"></p>
<figcaption style="text-align:center">Open Source Summit Europe: From left to right –
Julia Lawall, Senior Researcher (Inria) - Josh Simmons, Open Source Program Manager (Google) - Hong Phuc Dang, InnerSource Manager (Zalando) - Jeffrey Osier-Mixon, Open Source Program Manager (Intel)</figcaption>
<ul>
<li><strong>Github Universe USA, October 16 - 17, 2018</strong>
Zalando joined Github, Oracle and Comcast on a panel discussion about ‘The keys to open source success for enterprise teams’.</li>
</ul>
<p><img alt="Github Universe: From left to right Bonnie Chatterjee, Director, Professional Services (GitHub) - Chad Arimura, Vice President of Serverless (Oracle) - Shilla Saebi, Open Source Community Lead (Comcast) - Per Ploug, Open Source Community Manager (Zalando)" src="https://engineering.zalando.com/posts/2018/11/ghu-per.jpg"></p>
<figcaption style="text-align:center">Github Universe: From left to right Bonnie Chatterjee, Director, Professional Services (GitHub) - Chad Arimura, Vice President of Serverless (Oracle) - Shilla Saebi, Open Source Community Lead (Comcast) - Per Ploug, Open Source Community Manager (Zalando)</figcaption>Connexion 2.0 Release2018-11-05T00:00:00+01:002018-11-05T00:00:00+01:00João Santostag:engineering.zalando.com,2018-11-05:/posts/2018/11/connexion-20-release.html<p>Today, we released Connexion 2.0 with OpenAPI 3 support.</p><p>Today, we released <a href="https://github.com/zalando/connexion">Connexion</a> 2.0 with OpenAPI 3 support.</p>
<p>Connexion is a Python framework that automagically handles HTTP requests based on <a href="https://www.openapis.org/">OpenAPI Specification</a>
(formerly known as Swagger Spec) of your API described in YAML format. Connexion allows you to write a Swagger specification,
then maps the endpoints to your Python functions.</p>
<p>Besides routing, Connexion also validates requests and responses automatically based on OpenAPI specifications, handles common
authentication schemes, supports API versioning and supports automatic serialization of payloads. It can use both
<a href="http://flask.pocoo.org/">Flask</a> and <a href="https://github.com/aio-libs/aiohttp">aiohttp</a> as backend servers.</p>
<p>Besides OpenAPI 3 support, this release includes a more streamlined internal structure, better adherence to Swagger 2.0 spec by
default, and support for basic authentication and apikey authentication. For a more detailed list of changes, check
<a href="https://github.com/zalando/connexion/#new-in-connexion-20">Connexion's Read Me</a>.</p>
<p>Connexion 2.0 would not have been possible without the help of all our 87
<a href="https://github.com/zalando/connexion/graphs/contributors">contributors</a>, specially our newest maintainer
<a href="https://me.dtkav.com/">Daniel Grossmann-Kavanagh</a>, who deserves most of the credit for this release.</p>#NoEstimates2018-11-01T00:00:00+01:002018-11-01T00:00:00+01:00Kaiser Anwar Shadtag:engineering.zalando.com,2018-11-01:/posts/2018/11/no-estimates.html<p>Why I advocate a practice of no estimates as a software engineer</p><h3><strong>Why I advocate a practice of no estimates as a software engineer</strong></h3>
<p>Before I get to the topic, I would like to clarify one thing: I don’t want to ban estimations generally from software
development, as there are good and solid reasons for it. In a nutshell, business needs to be predictable.</p>
<p>I want to show a software developer's view on how to reduce or even get rid of endless estimations meetings with
doubtful outcomes. Critics would argue that software developers should improve their estimation skills in order to:</p>
<ul>
<li>develop shared understanding within the team, especially in case of uncertainty</li>
<li>make informed decisions when very little data (about the product) is available</li>
<li>make the product more predictable</li>
</ul>
<p>But let’s go step by step on how no estimates lead to the same goals.</p>
<p><em><strong>Note:</strong> When I mention “team,” I mean a software development team and by “developer,” a software developer.</em></p>
<h3>Improving predictability - Is there only one way?</h3>
<p>There are many factors to improve the speed and predictability of software development. In case a team is missing some
of those factors, they should be set and measured as team objectives in the meantime. Here are some of them:</p>
<ul>
<li><strong>Stable and autonomous team:</strong> The product is built by the teams, so the focus should be to make them stable.
Autonomy gives teams self-confidence and fosters maturity, which allows for decisions on how to build things.</li>
<li><strong>Reducing meetings:</strong> Keeping the teams busy with meetings will result in less time for developing. Software
development consumes a lot of concentration and energy, so distractions should be kept to the minimum.</li>
<li><strong>Cadence of collaboration:</strong> Planning together with designers and product teams to clarify the tasks, building and
reviewing together to keep a high quality of code and strengthen knowledge sharing within the team.</li>
<li><strong>High visibility and transparency</strong>: Making work and progress of the development transparent to stakeholders and
manages will increase trust within the organization.</li>
</ul>
<h3>Really, no estimates?!</h3>
<p>Let’s start to gather some arguments for estimations. Business runs on goals and commitments, and deciding which product
should be built is a decision of computing cost against expected profit. With no estimates, how can costs be calculated?
How and which commitments should be made? How can different projects be compared?</p>
<p>At some level, estimations have to be done in order to have numbers like cost and time for business decisions. But
should this happen on the task level? No! Besides spending time to estimate every single task, it puts time pressure on
the developers. Estimation also includes guessing. How certain can you be before starting a task? What is more
important? Delivering on time or finding the best solution for the task? Let the developer decide this, as he/she is
building the product. The idea behind it is to let the teams focus on what they do best: building the right products,
which are reliable, stable and predictable.</p>
<h3>How can this look?</h3>
<p>Here, I want to quote one of the most active advocates of #NoEstimates (who’s been recommending this for as long as 15
years), Vasco Duarte:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2a74982d8e316bee3b98cebd72cd8f9c7cb5f293_screen-shot-2018-11-01-at-8.03.22-pm.png?auto=compress,format"></p>
<ol>
<li>The backlog should not contain <em>well</em> estimated stories, increasing the backlog after every sprint. Product should
define the most important task and after completing it, they should define what to proceed with i.e. being agile!</li>
<li>Chunks should be small. It’s easier to split the work within the team and keep everybody up to date about the code
base i.e. knowledge sharing!</li>
<li>Keeping chunks small decreases the developing and reviewing time. I feel amazed after finishing a task which didn’t
take several days or get merged with another task i.e. boosting motivation to the next level!</li>
<li>Receive feedback from customers/stakeholders i.e. again, being agile!</li>
</ol>
<p>My takeaway from the practice and experience of #NoEstimates is to empower teams to unleash their best. Provide them
with the space and environment where nobody tells them what needs to be done but rather they do what needs to be done
autonomously. My emphasis on #NoEstimates is to make sure that teams think about the value they deliver rather than
going into a vicious circle of discussions. In the end, what matters is the outcome and not the input. Teams, especially
developers, need to shift their mind towards value-driven development. This enables them to trigger discussions based on
the value they could possibly deliver to the end customer, which is often the missing piece of puzzle. Teams should not
follow the complaint mechanism but rather outcome-oriented, customer-centric decisions. Leaders play a vital role in
enabling this kind of culture by providing teams with a “safe to fail” environment, where praise is given for
experimentation. This, in the end, helps the team to grow and flourish.</p>
<p>For more information, why not check out some of my favorite sources on #NoEstimation: (1) <a href="https://www.youtube.com/watch?v=cgvB2wWvi8M">#NoEstimates
video</a> by Vasco Duarte. (1) <a href="https://plan.io/blog/noestimates-6-software-experts-give-their-view/">#NoEstimates: 6 Software Experts Give Their
View on the Movement</a> by Thomas Carney.</p>
<p>Join our team of developers. Check out our <a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1">jobs page</a>.</p>Singleton Types2018-10-25T00:00:00+02:002018-10-25T00:00:00+02:00Joachim Hofertag:engineering.zalando.com,2018-10-25:/posts/2018/10/scala-three-experiment.html<p>A Scala 3 Experiment</p><p>A Scala 3 Experiment</p>
<p>I'll start this post by admitting that I’ve never gone deeply into any kind of Scala coding on the typelevel. It's not
what I, as a common application (or microservice) developer, usually need.</p>
<p>Having stated that, of course, I might be missing out on a whole world of opportunities for better code without knowing.
And because of that, I put some effort into trying to understanding the features of Scala that might sound strange,
overly-theoretical, and maybe even useless, at first sight.</p>
<p>A concept I couldn't imagine a proper use case for was the so-called "singleton types" (also called "literal types" or
even "literal singleton types"). As it happens, I recently attended a <a href="https://slideslive.com/38907881/literal-types-what-they-are-good-for">Scala Days talk about
them</a>. In this talk, singleton types are used for
improving the type-safety of database queries, and inspired by this, I finally got an idea of where I could try them out
for myself.</p>
<p>Remember matrix multiplication from math, and how the dimensions have to fit? And how all the usual libraries for matrix
multiplication take matrices of any dimension, and then throw runtime exceptions when the dimensions don't fit? That's
what I'll use singleton types for in the following.</p>
<p>Let's start with the conventional approach, leaving out the actual multiplication details for brevity:</p>
<div class="highlight"><pre><span></span><code>final<span class="w"> </span>case<span class="w"> </span>class<span class="w"> </span>Matrix(n:<span class="w"> </span>Int,<span class="w"> </span>m:<span class="w"> </span>Int)<span class="w"> </span>{
<span class="w"> </span>def<span class="w"> </span>*(other:<span class="w"> </span>Matrix):<span class="w"> </span>Matrix<span class="w"> </span>=<span class="w"> </span>{
<span class="w"> </span>require(m<span class="w"> </span>==<span class="w"> </span>other.n,
<span class="w"> </span>s"matrix<span class="w"> </span>dimensions<span class="w"> </span>must<span class="w"> </span>fit<span class="w"> </span>(<span class="nv">$m</span><span class="w"> </span>!=<span class="w"> </span><span class="cp">${</span><span class="n">other</span><span class="o">.</span><span class="n">n</span><span class="cp">}</span>)")
<span class="w"> </span>Matrix(n,<span class="w"> </span>other.m)
<span class="w"> </span>}
}
</code></pre></div>
<p>In this piece of code, the runtime check ensures that the matrix dimensions are not just any integers, but also that
they actually fit, so that the matrix multiplication can work at all.</p>
<p>This check is only necessary because we allow for all kinds of integers here in the first place. This is not the only
option we have, though.</p>
<p>Here's a small Scala 3 REPL session (using dotr) that might surprise you:</p>
<div class="highlight"><pre><span></span><code>scala><span class="w"> </span>val<span class="w"> </span>y<span class="w"> </span>=<span class="w"> </span>3
val<span class="w"> </span>y:<span class="w"> </span>Int<span class="w"> </span>=<span class="w"> </span>3
scala><span class="w"> </span>val<span class="w"> </span>y:<span class="w"> </span>3<span class="w"> </span>=<span class="w"> </span>3
val<span class="w"> </span>y:<span class="w"> </span>Int(3)<span class="w"> </span>=<span class="w"> </span>3
scala><span class="w"> </span>val<span class="w"> </span>z:<span class="w"> </span>4<span class="w"> </span>=<span class="w"> </span>3
1<span class="w"> </span>|val<span class="w"> </span>z:<span class="w"> </span>4<span class="w"> </span>=<span class="w"> </span>3
<span class="w"> </span>|<span class="w"> </span>^
<span class="w"> </span>|<span class="w"> </span>found:<span class="w"> </span>Int(3)
<span class="w"> </span>|<span class="w"> </span>required:<span class="w"> </span>Int(4)
</code></pre></div>
<p>See how <em>y</em> and <em>z</em> somehow get their value ascribed as their type? This is what singleton types are about: A singleton
type is a type inhabited by exactly one value. So we might as well name the type after the value. Of course, singleton
types like <em>3</em> or <em>4</em> are subtypes of <em>Int</em>, just as singleton types like <em>"meep"</em> or <em>"foo"</em> are subtypes of
<em>String</em>.</p>
<p>This is all well and good, but how to make use of these types?</p>
<p>The basic idea here is to restrict the type of the two matrix dimensions to be the singleton type, instead of <em>Int</em>.
Then we can ensure at compile time that two dimensions are exactly the same number by ensuring that they have the same
singleton type.</p>
<p>In order to restrict a type to a singleton type, Scala 3 has a type called <em>Singleton</em>. Combined with the new <em>&</em> (very
similar to what Scala previously had with <em>with</em>, but symmetrical), we can express:</p>
<div class="highlight"><pre><span></span><code><span class="nx">A</span><span class="w"> </span><span class="nx">should</span><span class="w"> </span><span class="nx">be</span><span class="w"> </span><span class="nx">an</span><span class="w"> </span><span class="nx">integer</span><span class="w"> </span><span class="nx">singleton</span><span class="w"> </span><span class="k">type</span>
</code></pre></div>
<p>By writing:</p>
<div class="highlight"><pre><span></span><code>A <: Singleton & Int
</code></pre></div>
<p>Making use of this, we can define our matrix class in the following way:</p>
<div class="highlight"><pre><span></span><code><span class="k">type</span><span class="w"> </span><span class="nx">Dim</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">Singleton</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="nx">Int</span>
<span class="k">final</span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kd">class</span><span class="w"> </span><span class="nx">Matrix</span><span class="p">[</span><span class="nx">A</span><span class="w"> </span><span class="p"><:</span><span class="w"> </span><span class="nx">Dim</span><span class="p">,</span><span class="w"> </span><span class="nx">B</span><span class="w"> </span><span class="p"><:</span><span class="w"> </span><span class="nx">Dim</span><span class="p">](</span><span class="nx">n</span><span class="p">:</span><span class="w"> </span><span class="nx">A</span><span class="p">,</span><span class="w"> </span><span class="nx">m</span><span class="p">:</span><span class="w"> </span><span class="nx">B</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">def</span><span class="w"> </span><span class="o">*</span><span class="p">[</span><span class="nx">C</span><span class="w"> </span><span class="p"><:</span><span class="w"> </span><span class="nx">Dim</span><span class="p">](</span><span class="nx">other</span><span class="p">:</span><span class="w"> </span><span class="nx">Matrix</span><span class="p">[</span><span class="nx">B</span><span class="p">,</span><span class="w"> </span><span class="nx">C</span><span class="p">]):</span><span class="w"> </span><span class="nx">Matrix</span><span class="p">[</span><span class="nx">A</span><span class="p">,</span><span class="w"> </span><span class="nx">C</span><span class="p">]</span><span class="w"> </span><span class="p">=</span>
<span class="w"> </span><span class="nx">Matrix</span><span class="p">(</span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">other</span><span class="p">.</span><span class="nx">m</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>And with this, we get the compile-time behavior we were aiming for:</p>
<div class="highlight"><pre><span></span><code>scala><span class="w"> </span>val<span class="w"> </span>a<span class="w"> </span>=<span class="w"> </span>Matrix(2,<span class="w"> </span>4)
val<span class="w"> </span>a:<span class="w"> </span>Matrix[Int(2),<span class="w"> </span>Int(4)]<span class="w"> </span>=<span class="w"> </span>Matrix(2,4)
scala><span class="w"> </span>val<span class="w"> </span>b<span class="w"> </span>=<span class="w"> </span>Matrix(4,<span class="w"> </span>3)
val<span class="w"> </span>b:<span class="w"> </span>Matrix[Int(4),<span class="w"> </span>Int(3)]<span class="w"> </span>=<span class="w"> </span>Matrix(4,3)
scala><span class="w"> </span>a<span class="w"> </span>*<span class="w"> </span>b
val<span class="w"> </span>res10:<span class="w"> </span>Matrix[Int(2),<span class="w"> </span>Int(3)]<span class="w"> </span>=<span class="w"> </span>Matrix(2,3)
scala><span class="w"> </span>b<span class="w"> </span>*<span class="w"> </span>a
1<span class="w"> </span>|b<span class="w"> </span>*<span class="w"> </span>a
<span class="w"> </span>|<span class="w"> </span>^
<span class="w"> </span>|<span class="w"> </span>found:<span class="w"> </span>Matrix[Int(2),<span class="w"> </span>Int(4)](a)
<span class="w"> </span>|<span class="w"> </span>required:<span class="w"> </span>Matrix[Int(3),<span class="w"> </span>C]
<span class="w"> </span>|
<span class="w"> </span>|<span class="w"> </span>where:<span class="w"> </span>C<span class="w"> </span>is<span class="w"> </span>a<span class="w"> </span>type<span class="w"> </span>variable<span class="w"> </span>with<span class="w"> </span>constraint<span class="w"> </span><span class="nt"><:</span><span class="w"> </span><span class="err">Dim</span>
<span class="err">scala</span><span class="nt">></span><span class="w"> </span>val<span class="w"> </span>c<span class="w"> </span>=<span class="w"> </span>Matrix(3,<span class="w"> </span>5)
val<span class="w"> </span>c:<span class="w"> </span>Matrix[Int(3),<span class="w"> </span>Int(5)]<span class="w"> </span>=<span class="w"> </span>Matrix(3,5)
scala><span class="w"> </span>res10<span class="w"> </span>*<span class="w"> </span>c
val<span class="w"> </span>res11:<span class="w"> </span>Matrix[Int(2),<span class="w"> </span>Int(5)]<span class="w"> </span>=<span class="w"> </span>Matrix(2,5)
</code></pre></div>
<p>And that's actually all there is to it. As an aside, notice how the error message from the compiler is pretty concisely
telling us where we went wrong.</p>
<p>So here we are, having used singleton types to make a very simple matrix multiplication API a bit more typesafe, coming
out of this with one more tool in our Scala 3 tool belt.</p>
<p>And now, it's your turn to find more use cases for singleton types!</p>
<p>Got something to say about Scala? <a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1">Join</a> our tech team and leave
your mark!</p>Growing a Product Area at Zalando2018-10-18T00:00:00+02:002018-10-18T00:00:00+02:00Samir Hannatag:engineering.zalando.com,2018-10-18:/posts/2018/10/growing-product-area.html<p>The six month journey of the customer inbox multi-disciplinary team</p><h3><strong>The six month journey of the customer inbox multi-disciplinary team</strong></h3>
<p>The customer inbox multi-disciplinary area operates in the Fashion Store pillar of the Zalando platform organization.
The purpose of the Customer Inbox Unit is to serve customers personal and practical fashion messages, through multiple
channels, i.e. “Target the customers at the right time, at the right place.”</p>
<p>In this post, we share how the Customer Inbox area simplified and transformed from four delivery teams having a
component focus and complicated structures, to a performing unit with a business focus able to grow simple structures
healthily.</p>
<p>Complicated structures do not scale. Healthy organizations tackle the complexity of product development by growing
simple practices and simple structures. Simplicity allows complex mechanisms – like face to face conversation,
individual interactions, continuous improvement – to emerge and <a href="http://noop.nl/2011/04/it-takes-complexity-to-handle-complexity.html">it takes complexity to handle
complexity</a>.</p>
<p><strong>Customer centric teams
</strong>The <a href="https://engineering.zalando.com/posts/2018/08/data-science-products-multi-disciplinary-teams.html">multidisciplinary</a> delivery
teams had a strong technical component focus with certain benefits and also pitfalls: “It is not our component
responsibility” was often given as a reply to the product managers during product feature discussions. As a result,
product was writing technical requirements fitting the narrow technical teams’ purposes and a lot of time was spent on
organizing dependencies.</p>
<p>Together with the Head of Engineering, we triggered a team workshop, where all of the four teams’ members realized the
waste generated by component-focused teams, and decided to shift to business-focused teams. The realization happened
using the <a href="http://tastycupcakes.org/2016/05/5668/">easter egg simulation</a> (that we tweaked a little to illustrate
component vs business-focused teams).</p>
<p>Teams are now conducting end-to-end business initiatives, stretching across various components, and innersource (or
manage themselves) the dependencies. Product managers write business initiatives with a customer focus (as opposed to
tickets with narrow technical focus). The amount of overhead to manage dependencies is reduced. Customer centric
structures can grow.</p>
<p>An exemplary situation to illustrate the behavioral changes happened shortly after the workshop. In order to
differentiate between customers with and without commercial consent in the email templates, the product team asked our
Smart Communication Team for a corresponding data enrichment feature. The feature turned out to require API changes in a
service that is owned by the Communication Channels Team. Now the product team didn’t need to manage dependencies
anymore since the Smart Communication Team innersourced the API changes.</p>
<p><strong>Single source of information
</strong>Each team was using their components’ github repositories as a place for long term plans. The 400 issues distributed
in the 10 component backlogs were used as a baseline for planning.</p>
<p>Sitting down with the leadership team in an overall retrospective we analyzed the situation to understand the impact of
such a structure using <a href="https://less.works/less/principles/systems-thinking.html">causal loop diagramming</a>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/210b631ffde6c599c4895166c20eee942ccd663d_screen-shot-2018-10-18-at-6.33.43-pm-1.png?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b805e9b47cf7d5dac5d0fb018a93fceb3528af8b_screen-shot-2018-10-18-at-6.34.34-pm.png?auto=compress,format"></p>
<p>The leadership team explained to the Inbox Teams the impact of the current backlog structure. The unit cleaned up the
different repositories, paid back a part of the technical debt, and moved the rest of the technical debt to a single
product backlog. The team installed a zero bug policy. Any new bug generated during feature development is directly
fixed without debates. Small issues (improvement ideas, to-dos) not treated in the next two iterations are
systematically deleted.</p>
<p>Github backlogs are now used as a backlog for the next interval of work (sprint backlog) only. One <a href="https://less.works/less/framework/product-backlog.html">single product
backlog</a> is used to store product and non-functional
requirements. The number of tickets decreased to 100. This simplification of the structure allows better usage, adoption
and efficiency. All technical tickets are linked to the corresponding product ticket which increases transparency both
for the product and engineering side.</p>
<p><strong>Shared meeting structure
</strong>The causal loop diagram triggered another change. The Inbox Team had four planning sessions per cycle, with key people
filling up their day with these meetings, carrying dependencies from one session to another: in short, being
bottlenecks. Two of the three teams were not refining requirements at all before trying to pull them into a cycle,
making planning sessions long and ineffective. Team members were participating in these planning sessions without
engagement, NPS of the planning sessions was -70. Largely due to this ineffective setup, average delivery time was
usually 50% longer than planned.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/50bc1e414e4565a5bac29622d191085ad1fda767_screen-shot-2018-10-18-at-6.39.32-pm.png?auto=compress,format"></p>
<p>All Inbox Teams now run a common pre-planning (i.e. refinement) session and a common planning session where all teams
refine and plan at the same place, at the same time, for the next cycle of work (two weeks). Each session is a 60 to 90
minutes time box, key resources navigate from one team to another. Teams align alone, consulting others when needed for
technical support and dependency management. The team members are satisfied with the setup, NPS of the planning session
is up to 30, and plans are accurate. To quote one product manager: “When it’s Friday, the sprint is over and the work is
done.”</p>
<p><strong>Conclusion - Handling complexity through simplification
</strong>Movements like <a href="https://en.wikipedia.org/wiki/Cynefin_framework">Cynefin</a>, <a href="http://noop.nl/2011/04/it-takes-complexity-to-handle-complexity.html">management
3.0</a> and
<a href="https://less.works/less/principles/large_scale_scrum_is_scrum.html">LeSS</a> draw the same conclusions. To grow healthy
and tackle the complexity of product development at the scale of a big company, you need to un-scale and simplify your
structures. From an un-growable complicated structure of 10 backlogs, 400 issues, multiple planning sessions, and
component focused teams, the Customer Inbox area moved to a resilient and scalable structure with a single source of
information, shared planning sessions, and customer centric feature teams. Some thoughts from the team:</p>
<p><em>Omar Elasfar (Producer)
“Silos around specific components were torn down, unleashing our teams’ potential to be able to focus our customers and
incrementally deliver on solutions that solve their problems.”</em></p>
<p><em>Sina Golesorkhi (Engineer)
“We have now a team-centric mindset and people value the incremental development and collaboration more than before.”</em></p>
<p><em>Petra Graß (Product Lead)
“Thanks to the workshop, we now think rather about the products and the impact they have than about the team structure
and single features.”</em></p>
<p>These changes occurred in a period of six months driven by the area leadership team and supported by the agile coaches.
It became real thanks to the people working in the Inbox Team.</p>A Team for Teams2018-10-10T00:00:00+02:002018-10-10T00:00:00+02:00Katrin Elise Dreyertag:engineering.zalando.com,2018-10-10:/posts/2018/10/a-team-for-teams.html<p>How we revolutionized the way we worked agile</p><h3><strong>How we revolutionized the way we worked agile</strong></h3>
<p>One and a half years ago we started something new at Zalando. We asked all producers of our department to join one team
with the purpose of helping us create great teams to get things done in the best way possible.</p>
<p><strong>Where did we start from?</strong></p>
<p>The producer role had been introduced at Zalando to provide a team with whatever it lacked at a certain moment in time,
be it a roadmap, team building, process improvement, documentation or even testing. The role was an extremely clever
idea to get through ongoing organizational changes as it made sure that in times of restructuring
the most crucial needs of a team were met and the flow could continue. But as useful as it was to get Zalando through
the change, it caused very diverse perspectives on the responsibilities and capabilities of one and the same role. For
example, some producers would assume the responsibility for a team’s roadmap whereas others saw that responsibility with
the product managers. These diverging perspectives resulted in diverging expectations, and eventually disappointment and
frustration between producers, leads and product managers.</p>
<p>On top of that, producers were designed to be “part of the team,” helping the team to organize from within whatever it
took to do that.</p>
<p>This brought about a few questions and problems:</p>
<ul>
<li>How does a producer empower a team to grow autonomous, if he/she is a part of that team and continuously takes over
operational tasks?</li>
<li>A team change for a producer would mean a leadership change in many cases thus the long term development was often
interrupted.</li>
<li>How does he/she know what career level he/she is eligible for if there is no clear set of skills and the variety of
the execution differs so much?</li>
</ul>
<p>In Retail Operations, teams receive different input since every producer has a different focus. Producers learn to
effectively enable multiple teams in a shorter time. In this setup we might be able to help more teams with less
producers. Producers would no longer be a part of the team but would be partners to the team lead, working
collaboratively and could thus support on a wider set of topics, for example product delivery collaboration or
facilitating the alignment workshop for a high level architecture.</p>
<p><strong>So what did we do?</strong></p>
<p>All of this we wanted to tackle with our newly formed team of agile coaches, and so we had to set up quite a lot of
things to lift the role to the next level.</p>
<ul>
<li>Coaches would sit with the teams they worked with and for, but be directly answerable to a producer lead; they would
learn from and support each other. e.g. If an agile team coach is stuck with a challenge he/she gets support through
a team “intervision.”</li>
<li>We established a team with its own leader to take care of people development, creating common practices and
standards, and roughly organizing the assignments and be a back up expert for any questions the coaches might have</li>
<li>We clarified the role and expectations on each job grade</li>
<li>We looked into our processes to find out what setup this team needed to effectively integrate with the rest of the
department organisation. For example, did we start to align on the goal with the respective team leader before a
coach started to work with a team?</li>
<li>We invested in the development of the skills necessary to become excellent</li>
</ul>
<p><strong>Role Clarification</strong></p>
<p>The goal was to find the most crucial gap in our organisation, and taking our skills into account, narrow our
contribution down to the most impactful place. After interviews with engineering leads, and others, I found that the
teams needed professional support in the adoption of agile processes and excellent collaboration. Eventually we stopped
working as generalist producers and started to work as specialised agile team coaches. We inserted the term “team” as
there is a central team of agile coaches at Zalando, who develop the agile culture on a company level in coaching upper
management and whole departments, as well as offering standardized trainings. In contrast, we coach teams on their
processes and collaboration for a long period of time and facilitate cross-functional workshops in one department.</p>
<p><strong>Process</strong></p>
<p>However, to get to a place where this role could fully unfold its impact and fit in with the rest of the roles, we
needed to take a few more steps. First of all we had to define a setup with our most important stakeholders: the
engineering leads at retail operations. While the overlap of responsibilities in producers and leads would sometimes
cause frictions or responsibility diffusion, the new role should be complementary and supportive. Through a couple of
iterations we came up with a sponsorship model that starts with an engineering lead or product specialist requesting the
help of an agile team coach. The agile team coach then observes the team for a while and writes down what he or she
understood the problem was, the root causes he/she identified, and determines success measures and milestones throughout
the coaching. To be a little more specific here: Usually the leads ask us to “make the team faster.” However, each team
loses speed for different reasons. So the coach analyses the situation and comes back with insights on the “slowness.”
This could be that the collaboration with the product manager is difficult or that the team has not learned to speak
openly about issues. It could be that the team does not know how to turn a big problem into a small manageable chunks of
work. Each of these root causes need to be addressed with a different coaching approach. This is aligned with the
sponsor, the agile lead and if possible at that point in time, with the team.</p>
<p>The sponsor and the coach from then on have regular check-ins to talk about the status quo, next steps and distribution
of tasks amongst each other. In case of severe dissatisfaction on any side, the issue can be escalated to the agile
lead, who will mediate and try to reunite the sponsors and the coach. Our internal processes have evolved into a one
hour bi-weekly operational meeting, where we talk about our own organization and discuss management updates.</p>
<p>The second meeting is our one hour deep dive. Here we raise all topics that cannot be discussed in a short amount of
time but need some reflection or longer explanation. The last regular meeting we have is a half hour board meeting to
keep us updated on each other’s sponsorships.</p>
<p><strong>Development</strong></p>
<p>The last piece of the puzzle to a successful agile coaching team is the trainings we invested in. The combination of a
deep understanding of agile frameworks, team dynamics, innovation and moderation was not a standard for the producers at
Zalando and is only in rare cases to be found in SCRUM masters. So we identified three places for the development of an
agile team coach and took trainings accordingly.</p>
<p>The first area was team dynamics and team building. A three day training with an experienced academy in Berlin helped us
to learn basic concepts and the necessary attitude for facilitating team building, and just as important, it helped us
to get to know our limits. In the same quarter we visited an onsite change management training. For both trainings, we
made sure to check in on our learnings and experiences in our deep dive sessions. The second area to learn about was
delivery processes. As everyone on the team had a good understanding of SCRUM we took an intense training on Kanban,
which also helped us to better reflect the success of the process improvements back to our teams. The last area is
innovation and moderation. Team members took part in onsite visualization trainings, moderation trainings and we are now
learning about innovation formats to better be able to support cross-functional ideation and planning. We took time to
reflect on our learnings in our weekly deep dive sessions, co-planned a lot of the support we offered to the teams and
shared new tools and ideas regularly.</p>
<p><strong>So what did we gain?</strong></p>
<p>All of these measures were a big investment to our department and Zalando, and obviously it’s important to check whether
they were worth it. However, what we got from this change is considerable:</p>
<p>No role conflicts between engineering leads and product managers and agile team coaches anymore but appreciative,
respectful working relationships</p>
<p>Professional approach to change initiatives: instead of fixing one small problem after the other we develop change
strategies and measure our success</p>
<p>A coaching team that effectively and efficiently helps each other</p>
<p>Happy coaches and very good candidates in our last recruiting process</p>
<p>We gained trust beyond the tech teams and are currently involved in four non tech teams creating transparency, team
spirit and self organization for them</p>
<p>We extrapolated our skills beyond supporting the delivery process and instead we now contribute great facilitation in
the discovery, definition and design phases of the product development process</p>
<p>We have established a more mature agile culture in our department as the agile coaching team has established an
alignment on some best practices such as clear planning process, estimations and expectation workshops</p>
<p>On the basis of this, we also started to work with elements of Scaled agile frameworks, such as a board for overall team
coordination</p>Four Pillars Of Leading People2018-10-04T00:00:00+02:002018-10-04T00:00:00+02:00Sergio Laranjeiratag:engineering.zalando.com,2018-10-04:/posts/2018/10/four-pillars-leadership.html<p>Essential building blocks for strong leadership that enables people to grow and achieve results</p><h3><strong>Essential building blocks for strong leadership that enables people to grow and achieve results</strong></h3>
<p>The story of how I ended up working for Zalando in Berlin starts with a LinkedIn message from Joseph Wilkinson, one of
our tech recruiters. In tech, we get a lot of messages on LinkedIn, but this one was different and made me very
interested to know more about Zalando. I already knew something of the company because I was also working in a fashion
e-commerce platform, but I was not aware of how big and challenging Zalando was. From that first contact to starting at
Zalando was an easy decision. For me, it was the next big step to grow as a lead and to help the company grow even
more.</p>
<p>A little over seven months ago, I had the opportunity to help open and grow our third international tech location in
Lisbon, Portugal. The first six months were demanding, rewarding and impressive. Zalando is a remarkable and very
well-known company in Europe, but in Portugal our brand is not well established yet. Since we still don't have our
Fashion Store in Portugal, it's funny that some people and companies think we are still a small startup. Once they hear
the numbers, they’re impressed. The first seven months have been a mixture of being able to deliver products, build the
right environment for proper product development, as well as hiring the most talented and high-potential engineers,
product managers and designers. We clearly had a good start, since we are a small team but managed to already have an
impact in the company.</p>
<p>As a lead in a new tech hub, you have a clear influence on shaping the hub and its future. You need to lead by example
and be a beacon of responsibility and accountability that every lead should assume regardless of their company or
circumstances. Being one of the first leads at the tech hub is a great opportunity, as it allows me to create and shape
a culture of high performance teams. The last years have also taught me that as a lead, I always have to think about the
best interest of the company, business and the product. This can only be done with the right people and with the right
mindset, focused on achieving greatness together. To achieve that it requires a solid foundation from a lead. And so I
asked myself:</p>
<p><strong>WHAT MAKES A GOOD LEAD?</strong></p>
<p>What I have learned from leading teams for almost five years is that there are four pillars, which are essential for
strong leadership to enable people to grow and achieve results: <strong>Empathy, Inspiration, Trust and Honesty</strong>.**</p>
<p>Empathy
**For me, this is the number one skill every lead should have. Empathy helps in communication, solving problems,
ensuring the understanding between all people is the same. If you can't understand people you won't be able to
communicate with them effectively.</p>
<p><strong>Inspiration
</strong>As leads, we don't tell people what to do. We inspire them, give them access to all the information and together we
define the goals and how the journey will be. Every meeting, every talk you have with your team; it's an opportunity to
motivate people and to encourage that extra strength for the extra mile.</p>
<p><strong>Trust
</strong>You need to trust people and they will trust you. Being trustworthy for everyone, for all the ideas to become real, to
ensure the right level of autonomy happens in the teams, is the foundation of a high performance team.</p>
<p><strong>Honesty
</strong>And you can't achieve the above three qualities without being honest. Honesty to yourself and to your team. In every
moment, for the good and for the complicated moments. When you need to have a hard talk but also when you have to
recognize the teams and individual efforts. Honesty to yourself because leads are not owners of all knowledge, as well
as asking for help and support is also a part of being a great lead.</p>
<p>High performing teams are like family. You trust them no matter what and you know they are there for you. Building this
culture takes time: it requires hard work and then maintaining it is even harder. I envision a leadership that brings
clarity in uncertain moments. Each of these pillars is extremely hard to achieve and with every decision, talk or
restructure you do, but keep them in mind and remember the company’s values and culture to make the right decision.
Aiming towards a leadership able to inspire and share knowledge, so we all know the why and run towards the same goal,
is the ultimate end.</p>
<p>There are many different leadership styles and strategies, and I believe the ability to adapt to the environment you are
working in is crucial: Get to know the people you work with, understand their motivations and how you can empower and
sponsor them to be better. A good lead needs to be emotionally adaptable to the environment they are working in, and be
the guiding light, be committed to the team and to the company. Leading by example defines how much people will trust
you or not. A good leader knows the rules but a great leader knows when to break them.</p>
<p><strong>Leading high performing tech teams at Zalando
</strong>Zalando is a big company, so the challenges of leadership are also big. Every decision you make and software your team
develops has an influence on millions of people. It is an extremely diverse environment and a place where the broad
knowledge of engineering and product is continuously challenged. Your strategies need to be much more global across
departments and business units, so you need to communicate effectively. Zalando has been in my life for almost two years
and I've had the chance to get to know so many amazing teams and people; people I don't just call colleagues but
friends, and that is something that takes time and a trustworthy environment.</p>
<p>I have been in the Lisbon Hub for seven months now and we are building strong and high performance teams responsible for
the development and delivery of core Zalando products. The products we are currently developing will have a huge impact
on the company’s strategy and help to achieve the demanding goals we have for the future. The part I’m most passionate
about are the people in this team; working with them, facing challenges together, growing stronger every day, every
week. We are a small team that is capable of great things and that gets me out of bed every morning.</p>
<p><strong>Leading people to help them achieve their goals
</strong>In the end, the job of a lead is actually very easy. You just need to talk to people, understand their motivations,
their strengths and then define proper strategies on how those skills can be beneficial for the different goals and
objectives of teams and the business. As a lead you need to be prepared to fail and to act on failure, not just give up.
Also making sure communication is transparent for all team members, so that everyone knows the “why” and what will be
the result and impact expected. A lead has the responsibility and accountability to help the company meet its goals.
It's a day to day job that needs to be consistent.</p>
<p>The small victories of being able to influence, to mentor and see how the people you lead achieve their goals is very
rewarding. The most recent meaningful experience I had was talking to people I mentored and led in the past, and
learning they are now also leading people and have grown so much. Knowing that you were an example and had a direct
impact on this: on their growth, on their lives, makes it all worth it. There is no bigger satisfaction and sense of
achievement than when you help people to achieve their goals and dreams. It's the best feeling you can have as a person.
Just imagine all the times you felt you achieved something great and impressive, it was never alone, and you don't want
to celebrate on your own either. Helping your family, your friends, a colleague, a peer or team member, even a stranger,
to conquer their dreams allows you to leave a mark. If you think about what you do at your job and think how much what
you do impacts or helps others, it will help you make better and smarter decisions; not only as a lead, but as a
person.</p>
<p><em><a href="https://zln.do/2zNmE83">Join</a> Sergio in enabling others to leave their mark in our Lisbon Tech Hub as well as other
locations across Europe in Berlin, Dublin, and Helsinki.</em></p>The Journey to Connecting Retail2018-09-27T00:00:00+02:002018-09-27T00:00:00+02:00Javier Martin Pereztag:engineering.zalando.com,2018-09-27:/posts/2018/09/journey-connecting-retail.html<p>Digitizing brick & mortar fashion stores through Connected Retail</p><h3><strong>Digitizing brick & mortar fashion stores through Connected Retail</strong></h3>
<p>Everything started back in 2015 when Zalando was already successful as an online fashion retailer in Europe. However, a
B2B problem was identified that needed to be tackled: brick-and-mortar fashion stores need a way to increase their
sales. Seeing the need to connect offline with online in order to help merchants solve this problem, is when I joined
Zalando as a Product Manager in early 2016 at the newly established <a href="https://engineering.zalando.com/posts/2018/08/three-years-of-our-helsinki-tech-hub.html">Helsinki Tech
Hub</a>.</p>
<p>I started working on a topic back then called “Offline” and my first task was to do market research on the problem
mentioned above. I learned that the situation was not ideal, but if we could connect our online and offline sales
channels there would be three main areas we could improve and that could have a huge impact through a technical
solution. For <strong>Zalando</strong> this could mean more inventory and connecting local offerings online. For <strong>brick and mortar
stores</strong> it could mean stores generating additional sales and offer huge potential for opening up new in-store use
cases. And most importantly, for our <strong>Zalando customers</strong> this could mean reducing delivery times and creating new
experiences.</p>
<p>Based on my findings, I started to build a team to work on a pilot project. Our team, which envisioned being a “fashion
connector” soon became known as team “Silta”, (meaning “bridge” in Finnish), which was quite fitting as that is exactly
what we were aiming to do: bridge offline and online fashion. We wanted to digitize brick and mortar fashion stores to
help stores sell online, as well as reduce Zalando delivery times and improve the in-store customer experience.</p>
<p>In order to validate the hypothesis, we created a pilot with
<a href="https://corporate.zalando.com/en/newsroom/se/press-releases/connected-z-first-adidas-store-berlin-joins-zalando-fashion-platform">adidas</a>,
delivering a parcel from a Berlin brick and mortar store within 25 minutes of ordering from the online Zalando fashion
store. The pilot, which was launched in June 2016, was a great success and it gained a lot of recognition <a href="https://www.businessinsider.com/r-zalando-trials-same-day-free-deliveries-from-adidas-store-2016-6?r=US&IR=T&IR=T">in the
news</a>
and the e-commerce industry.</p>
<p>From a technical point of view, the pilot was not a scalable solution, but it validated our hypothesis, and after this,
the real work started to lay the foundation for a possible solution. After the pilot in the summer of 2016, our team
started to grow (we were already a team of four) and we started working in the product discovery phase. From the
stakeholder point of view we needed to deal with the cross-location complexity of having teams in Berlin and Helsinki.
This is where the position of product manager played a key role in the team by ensuring transparency and clear
information flow. During the last two and a half years, I have visited Berlin for business trips about 80 times, and I
remember times when I needed to travel to Berlin every week during several months in order to meet face to face with my
stakeholders, and to keep close to the users in order to keep the project moving forward swiftly.</p>
<p>By 2017, based on the pilot and the groundwork our team had done, Zalando decided to build a dedicated product to tackle
offline merchants’ problems. This was great news for Team Silta, and for me personally, having laid the foundation and
been along for the journey from the start. We decided to have a unique name that would be easy to identify both inside
and outside Zalando, which would simply describe what we were trying to do, so our product became known as <strong>Connected
Retail</strong>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/eb8872a541bd06e77dacbcd0b992cc290c1ad05b_screen-shot-2018-09-26-at-10.40.53-am.png?auto=compress,format"></p>
<p>Now, in September 2018, we have just soft-launched a pilot of our new Connected Retail product with a
<a href="https://www.zalando.de/seidensticker/">Seidensticker</a> store in Berlin that will help Zalando scale across Europe,
connecting thousands of offline stores and delighting millions of customers. Our MVP (Minimum Viable Product) is a
custom built Connected Retail system, and includes a ship-from-store feature. This launch is a very important milestone
for both the teams located in Helsinki and Berlin, who have worked on this topic since 2016 across locations, and it
will also play a big role in changing the way our brick and mortar merchants approach their customers.</p>
<p>However, there is still much to learn and one of the biggest challenges we face is “stock accuracy,” which is a
multidimensional problem. The problem is to try to solve how to identify what is being sold at the offline store and
what is being sold in other channels. Another complex problem that Connected Retail faces is how to digitize merchants
for whom technology is still largely an unknown and who don’t have their stores enabled for it. What I have learned is
that merchants know best how they work, and if we can build a product that will solve problems that merchants have, then
they will naturally use the product. Although Connected Retail still has many challenges to overcome, what drives us is
the vision of a future where every Zalando customer can purchase any article located in any physical store, making every
store a small frictionless warehouse.</p>
<p>From a product management point of view, it has been an amazing journey that has tackled the entire product life cycle;
starting from market research, product discovery, competitive analysis and moving into phases such as prototyping and
user testing and towards a MVP definition and launch. I am proud to see how far w’ve come and excited to see how far
<strong>Connected Retail</strong> will go in helping to digitize brick and mortar fashion stores.</p>
<p><em><a href="https://zln.do/2NMLHws">Join</a> Team Silta and other teams in our Helsinki Tech Hub to solve interesting technical
e-commerce problems, like Connected Retail.</em></p>Shop the Look with Deep Learning2018-09-12T00:00:00+02:002018-09-12T00:00:00+02:00Julia Lasserretag:engineering.zalando.com,2018-09-12:/posts/2018/09/shop-look-deep-learning.html<p>Retrieving fashion products based on a query image</p><h3><strong>Retrieving fashion products based on a query image</strong></h3>
<p>Have you ever seen a picture on Instagram and thought, “Oh, wow! I want these shoes”? or been inspired by your favourite
fashion blogger and looked for similar products (for example, on Zalando)? Visual search for fashion, the task of
identifying fashion articles in an image and finding them in an online store, has been the subject of an ever growing
body of scientific literature over the last few years (see for example [1-11]).</p>
<p>At Zalando, we have many outlets where this search is possible: our app, our Facebook chatbot, etc. We want to provide
our customers with the best shopping experience possible, and words are not always enough to describe fashion.</p>
<p>Visual search poses some interesting challenges: how to deal with variations in image quality, lighting, background,
different human poses and article distortion, or finding the right product in a large database in real-time.</p>
<p>Our working scenario so far has been to build on our home-grown
<a href="https://research.zalando.com/welcome/mission/research-projects/improving-fashion-item-encoding-and-retrieval/">FashionDNA</a>
to retrieve blazers, dresses, jumpers, shirts, skirts, trousers, t-shirts and tops in fashion images, with or without
backgrounds.</p>
<p><strong>Our Data Source
</strong>As a fashion company, Zalando creates outfits every day and therefore generates many fashion images annotated with
their corresponding products. This means that we can use state-of-the-art learning techniques such as deep nets which
have revolutionized computer vision. As can be seen in <em>Figure 1</em>, these images include full body poses, half-body
close-ups as well as detailed close-ups on a garment of interest. Although model poses are usually standardized and do
not really reflect the more natural poses found on Instagram, having these different kinds of shots allows us to handle
different scales. These images also display occlusions (shirts occluded by jackets for example) and back views.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/db91decfc77b8748802ef383e2dc3fa5825130cc_screen-shot-2018-09-12-at-2.50.10-pm.png?auto=compress,format"></p>
<p><em>Figure 1:</em> <em>Examples of images in our dataset. Image types (a-d) are query images featuring models, image type (e)
represents the articles we retrieve from.</em></p>
<p>Unfortunately, an overwhelming majority of our fashion images have standardised clean backgrounds as shown in <em>Figure
1</em>, which means we have to think of a work around to learn how to handle them.</p>
<p><strong>Studio2Shop: matching model
</strong>We have designed a ConvNet model that takes a fashion image with Zalando clean backgrounds and an assortment of
interest as input and returns a ranking of the products in the assortment for the eight categories mentioned above.</p>
<p>The products in the assortment are not represented by images, as is common in the literature, but by their
<a href="https://research.zalando.com/welcome/mission/research-projects/improving-fashion-item-encoding-and-retrieval/">FashionDNA</a>.
In other words, only a feature representation of the article is needed.</p>
<p><em>Figure 2</em> below illustrates the setting and the results we can get. On the left is the image of a person wearing an
outfit, on the right side are the 50 top-ranking products in the assortment. The articles that are actually present in
the outfit are marked in green.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f491ce0d172f50848db36fa28d83a8d331c43c08_screen-shot-2018-09-12-at-3.10.37-pm.png?auto=compress,format"></p>
<p><em>Figure 2:</em> *Random examples of the retrieval test using 20,000 queries against 50,000 Zalando articles. Query images
are in the left-most column. Each query image is next to two rows displaying the top 50 retrieved articles, from left to
right, top to bottom. Green boxes show exact hits.</p>
<p><em>To show its generalization capabilities, we have tested our model on part of an independent dataset published in [7],
without fine-tuning it. Results are shown in </em>Figure 3* below. Unfortunately, the dataset was modified to fit our
setting, so our performance is not comparable with the one reported in [7].</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/63f31783857b36ff2da4715ea921fb711502790e_screen-shot-2018-09-12-at-3.13.19-pm.png?auto=compress,format"></p>
<p>*Figure 3: Random examples of outcomes of the retrieval test on query images from DeepFashion In-Shop-Retrieval [7].
Query images are in the left-most column. Each query image is next to two rows displaying the top 50 retrieved articles,
from left to right, top to bottom. Green boxes show exact hits.</p>
<p>*Note that this exercise is a little academic as focusing on finding the exact products allows us to assess models
quantitatively. In fact, retrieving exact matches is not critical for two reasons: a) it is quite unlikely that the
exact product is part of the assortment, b) usually the customer feels inspired and a similar item will feel just as
rewarding to them, if not more, because they can have a rounder neckline for example.</p>
<p>Thanks to how this model is built, it is able to provide similar items as a by-product. <em>Figures 2 and 3</em> show that the
style of the 50 top-ranking garments fits the style of the outfit, and that these garments are quite similar to one
another.</p>
<p>This means that we can also retrieve similar products from other assortments. <em>Figure 4</em> below shows the 50 top-ranking
garments from a Zalando assortment on query images from [7], without our model being fine-tuned for such images.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7c28c99c8afe202d91beed82a00607b13a4bb383_screen-shot-2018-09-12-at-3.18.34-pm.png?auto=compress,format"></p>
<p>*Figure 4: Random examples of outcomes of the retrieval test on query images from DeepFashion In-Shop-Retrieval [7]
against 50,000 Zalando articles. Query images are in the left-most columns. Each query image is next to two rows showing
the top 50 retrieved articles, from left to right, top to bottom.</p>
<p>*The details of this work can be found in [12].</p>
<p><strong>Extension to images with backgrounds
</strong>Unfortunately, training a similar model for natural images would require large amounts of natural fashion images
annotated with products, which we don’t have. However, we do have large amounts of unannotated fashion images, in
particular those available from public datasets such as Chictopia (10k), but also our own in-house images. The advantage
of public datasets is that the segmentation’s ground-truth is given, whereas we have to segment our images ourselves.</p>
<p>Using these images and their segmentation, we have designed and trained Street2Fashion, a U-net-like segmentation model
that can find the person in the image and simply replaces the background with white pixels. The results shown in <em>Figure
5</em> below are good enough to focus on the fashion in the image.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/753fe88b4e950bbbb13cd8597aa7879385ec2d33_screen-shot-2018-09-12-at-3.27.54-pm.png?auto=compress,format"></p>
<p>*Figure 5: Examples of segmentation results on test images.</p>
<p><em>We use Street2Fashion as a preprocessing step, and build Fashion2Shop, a model with the same architecture as
Studio2Shop but trained on segmented images. We refer to the full pipeline described in </em>Figure 6* as
Street2Fashion2Shop. In practice, a query fashion image is processed by the segmentation model to remove the background,
and can then go through the matching model described above to be matched with appropriate products.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ef41963c85eab2c70e1afcdd47c23e1d62e13e58_screen-shot-2018-09-12-at-3.30.10-pm.png?auto=compress,format"></p>
<p>*Figure 6: Street2Fashion2Shop. The query image (top row) is segmented by Street2Fashion, while FashionDNA is run on the
title images of the products in the assortment (bottom row) to obtain static feature vectors. The result of these two
operations forms the input of Fashion2Shop which handles the product matching.</p>
<p>Figure 7* shows results obtained using Street2Fashion2Shop.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/21bc76726591fd7f98ca2907480249605fe4ab62_screen-shot-2018-09-12-at-3.31.28-pm.png?auto=compress,format"></p>
<p><em>(a) Random examples of Zalando products retrieval using query images from LookBook [13].</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0abfe20a539e2ca982e7f05d2f0c39826f8ef171_screen-shot-2018-09-12-at-3.32.15-pm.png?auto=compress,format"></p>
<p><em>(b) Random examples of Zalando products retrieval using query images from street shots.</em></p>
<p>*Figure 7: Qualitative results on external datasets. For each query image, the query image is displayed on the very
left, followed by the segmented image and by the top 50 product suggestions. Better viewed with a zoom.</p>
<p>*The details of this work will shortly be available in [14].</p>
<p>[1] X. Wang and T. Zhang. Clothes search in consumer photos via color matching and attribute learning. <em>Multimedia
Conference (MM)</em>, 2011.</p>
<p>[²] S. Liu, Z. Song, G. Liu, C. Xu, H. Lu and S. Yan. Street-to-shop: Cross-scenario clothing retrieval via parts
alignment and auxiliary set. <em>Conference on Computer Vision and Pattern Recognition (CVPR)</em>, 2012.</p>
<p>[3] J. Fu, J. Wang, Z. Li, M. Xu and H Lu. Efficient clothing retrieval with semantic-preserving visual phrases.
<em>Asian Conference on Computer Vision (ACCV)</em>, 2012.</p>
<p>[4] Y. Kalantidis, L. Kennedy and L.J. Li. Getting the look: Clothing recognition and segmentation for automatic
product suggestions in everyday photos. <em>International Conference on Multimedia Retrieval (ICMR)</em>, 2013.</p>
<p>[5] K. Yamaguchi, M.H. Kiapour and T.L. Berg. Paper doll parsing: Retrieving similar styles to parse clothing items.
<em>International Conference on Computer Vision (ICCV)</em>, 2013.</p>
<p>[6] J. Huang, R.S. Feris, Q. Chen and S. Yan. Cross-domain image retrieval with a dual attribute-aware ranking
network. <em>International Conference on Computer Vision (ICCV)</em>, 2015.</p>
<p>[7] Z. Liu, P. Luo, S. Qiu, X. Wang and X. Tang. Deepfashion: Powering robust clothes recognition and retrieval with
rich annotations. <em>Computer Vision and Pattern Recognition (CVPR)</em>, 2016.</p>
<p>[8] E. Simo-Serra and H. Ishikawa. Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for
Feature Extraction. <em>Conference on Computer Vision and Pattern Recognition (CVPR)</em>, 2016.</p>
<p>[9] X. Wang, Z. Sun, W. Zhang, Y. Zhou and Y.G. Jiang. Matching user photos to online products with robust deep
features. <em>International Conference on Multimedia Retrieval (ICMR)</em>, 2016.</p>
<p>[10] D. Shankar, S. Narumanchi, H.A. Ananya, P. Kompalli and K. Chaudhury. Deep learning based large scale visual
recommendation and search for e-commerce. <em>CoRR</em>, 2017.</p>
<p>[11] X. Ji, W. Wang, M. Zhang and Y. Yang. Cross-domain image retrieval with attention modeling. <em>Multimedia
Conference (MM)</em>, 2017.</p>
<p>[12] J. Lasserre, K. Rasch and R. Vollgraf. Studio2Shop: from studio photo shoots to fashion articles. <em>International
Conference on Pattern Recognition Applications and Methods (ICPRAM)</em>, 2018.</p>
<p>[13] D. Yoo, N. Kim, S. Park, A.S Paek and I. Kweon: Pixel-level domain transfer. <em>European Conference on Computer
Vision (ECCV)</em>, 2016.</p>
<p>[14] J. Lasserre, C. Bracher and R. Vollgraf. To appear in <em>Lecture Notes in Computer Science</em>, 2018.</p>
<p>-- Our visual search engines are currently powered by the company Fashwell, this work is at the research stage. --</p>Visual Creation and Exploration at Zalando Research2018-09-06T00:00:00+02:002018-09-06T00:00:00+02:00Nikolay Jetchevtag:engineering.zalando.com,2018-09-06:/posts/2018/09/texture-distribution-artistic-expression.html<p>Adversarial texture distribution learning as a tool of artistic expression</p><h3><strong>Adversarial texture distribution learning as a tool of artistic expression</strong></h3>
<p>Deep learning is progressing fast these days. Despite advances that were expected to happen sooner or later (e.g.
accurate face and speech recognition), there are some new developments that would have seemed like a pipe dream years
ago: neural networks can now generate realistic images just by looking at few examples of their properties.</p>
<p><a href="https://research.zalando.com/">Zalando Research</a> is currently exploring such methods and their potential to aid
Zalando’s content creation, private fashion labels, and sizing recommendation teams, and offer our customers a new
fashion experience. In addition, working with large image collection and generative machine learning models has great
synergy with cutting edge neural network art. The tools created for fashion research purposes are also useful as tools
for visual artistic creation and exploration.</p>
<p><strong>Generative texture models
</strong>Our research in texture generation is a good example of this. Earlier this year, we developed new deep learning
generative models to learn textures from just a few sample images, and textures are key ingredients in multiple artistic
techniques. Having a tool like a Periodic Spatial Generative Adversarial Networks, or PSGAN, to learn texture
distributions can lead to great flexibility in choosing applications for it. <em>Figure 1</em> shows what textures we can learn
and sample using only a single image <em>(Figure 2)</em> as training material.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d4c69c85c1ce205c79e27610c755eee7fd4a78e4_screen-shot-2018-09-06-at-18.03.01.png?auto=compress,format"></p>
<p><em>Figure 1. Ocean textures generated from our model, a PSGAN trained using the single image from Figure 2.</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3840942b9c4bf7d52c4a4e37d83edf7d8c58345d_screen-shot-2018-09-06-at-18.05.05.png?auto=compress,format"></p>
<p><em>Figure 2. An example ocean texture, which is used as training material for PSGAN</em></p>
<p><strong>Textures and mosaics</strong>
Mosaics are a classical artform starting from the times of ancient Romans and going to modern texture transfer
techniques. The artist Max Ernst captured textures by physically copying and using them for his painting, a technique
called <a href="https://www.modernamuseet.se/stockholm/en/exhibitions/max-ernst/collage-frottage-grattage/">Frottage</a>. In
present times, selecting a texture as a type of stylization and applying it to a large image with global composition is
a very popular case of ML art, as signified by the success of Neural Art Style Transfer, but also as seen in multiple
advertising campaigns such as <em>Figure 3</em>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1a30ed865bbb76ad2b769da44f9491359adb0f35_screen-shot-2018-09-06-at-17.52.19.png?auto=compress,format"></p>
<p><em>Figure 3. Example outdoor advertisement billboard using mosaic techniques for stylization, seen recently on the streets
of Berlin.</em></p>
<p>In follow-up work we apply generative texture synthesis to create high-resolution mosaics from input content images. It
was demonstrated at the <a href="https://nips2017creativity.github.io/">NIPS 2017 Workshop for Machine Learning and Art</a>, a
great venue to explore collaborations between machine learning researchers and artists. <em>Figure 4a</em> shows how the
texture process learned in the previous paragraph can be conditioned and used to stylize a human face from <em>Figure 4b.</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/29b965d9e59a3d21222a3c8189545610bea3f394_mosaicbig_epoch_099_glngl11_sig0.7_fc4.5_ngf140.jpg?auto=compress,format"></p>
<p><em>Figure 4a.</em> <em>A mosaic stylisation of the image from Figure 2b)</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3511e28b5e57208c95602b934b2a0636dc50d50a_an621a01u-q11__default__11.jpg?auto=compress,format"></p>
<p><em>Figure 4b. A fashion model from Zalando's catalogue.
*
</em><em>Texture control: from noise to music</em>*
Static image stylization works by conditioning the latent factors of textures on a target content image, which is a 2D
array. Textures flow into one another in space and the artist can play flexibly with our tool and condition on a moving
signal in time, which would then lead to smooth transitions and animations between textures in time. Music visualization
is one such application of our technique. Figure 5 shows our <a href="https://youtu.be/rhcr678Tja4">music video</a> accepted at the
ECCV 2018 Computer Vision for Fashion, Art and Design Workshop.</p>
<p>*Figure 5: water textures morphing, controlled by music audio signal.
*
With an appropriate audio descriptor we can map the distribution of audio samples to the distribution of textures. And
since music is a smoothly varying signal, we can create an animation frame by frame of a texture process controlled by
music. Selecting input images with a suitable theme (water) to train the PSGAN allows us to emphasize the artist’s
vision and represents a novel form of digital synesthesia. This tool also opens a totally new pathway for collaboration
between musicians and generative model visualizations. It is also a conceptually new approach to art, since we replace
the source of randomness with the variation of a music piece.</p>
<p>We will be showcasing the video at the workshop
<a href="https://www.eventbrite.ca/e/cocktails-in-containers-tickets-49304140010">mixer</a> on September 12th, at the Container
Collective in Munich.</p>
<p><em>Follow Nikolay Jetchev’s <a href="https://twitter.com/NJetchev">twitter account</a> for the latest developments and experiments
with art and generative deep learning.</em></p>Zalando Strengthens its InnerSource Strategy2018-09-05T00:00:00+02:002018-09-05T00:00:00+02:00Hong Phuc Dangtag:engineering.zalando.com,2018-09-05:/posts/2018/09/zalando-strengthens-innersource-strategy.html<p>Learn how teams at Zalando leverages InnerSource for cross-team collaboration</p><p>Zalando is known for its commitment to the open source world. Many of our engineers are active contributors of open source projects like <a href="https://github.com/zalando/patroni">PostgreSQL</a> or <a href="https://github.com/kubernetes-incubator/external-dns">Kubernetes</a>. The Zalando tech department currently consists of more than 2,000 employees that manage over 200 delivery teams and virtual teams. Zalando engineers are from 77 nations and based out of various locations across Europe which makes us super international but also quite distributed. Collaboration and alignment across delivery teams is challenging as the company continues to grow at an incredible speed. Enhancing InnerSource is an approach that could help Zalando to tackle those internal challenges.</p>
<h2>What is InnerSource</h2>
<p>InnerSource is an adaptation of open source software development practices within organizations. This means to apply the collaborative culture and open source methodologies to internal projects even if the projects are proprietary. At its essence, InnerSource does not only apply software development but also spread out to other business sectors such as Finance, Marketing, HR etc.</p>
<p>The Benefits of InnerSource are:</p>
<ul>
<li>To improve developer productivity by increasing cross-team alignment and collaboration.</li>
<li>To improve developer mobility by enabling our software engineers to contribute to the efforts of other teams and to get familiar with software projects and tools used by other teams.</li>
<li>To increase development speed by removing team blockers and pushing discoverability of existing software products and components.</li>
<li>To decrease onboarding time and improve knowledge handover by providing well documented and discoverable internal projects</li>
</ul>
<p>InnerSource at Zalando focuses on:</p>
<ul>
<li>Fostering the ‘open source’’ culture from within, encourage individual teams to open up their work and accept feedback and contributions from developers outside of their team.</li>
<li>Promoting pull request as an initial tool for cross-team collaboration.</li>
<li>Creating a platform where teams have a chance talk about their work and learn from each other.</li>
<li>Introducing InnerSource pilot projects around Machine Learning starting at Digital Foundation.</li>
<li>Developing collaborative documentations of team best practices and examples.</li>
</ul>
<h2>More about Inner source at Zalando</h2>
<ul>
<li>Check-out <a href="https://opensource.zalando.com/docs/resources/innersource-howto/">How to InnerSource</a> to learn how our development teams prepare for their Inner source participation.</li>
<li><a href="https://jobs.zalando.com/tech/jobs">Join Zalando Tech Team</a> and be part of the InnerSource movement</li>
</ul>Three Years of our Helsinki Tech Hub2018-08-30T00:00:00+02:002018-08-30T00:00:00+02:00Elina Zimpfertag:engineering.zalando.com,2018-08-30:/posts/2018/08/three-years-of-our-helsinki-tech-hub.html<p>Getting to know our Finnish tech hub as it turns three</p><h3><strong>Getting to know our Finnish tech hub as it turns three</strong></h3>
<p>In early 2015, Zalando decided to expand its tech expertise and open tech hubs around Europe. First up was Dublin in
April, and not far behind, the Helsinki Tech Hub was <a href="https://corporate.zalando.com/en/newsroom/se/press-releases/zalando-expands-european-tech-operations-helsinki">launched in August
2015</a>.
The Helsinki hub has had an exciting journey so far; from scaling to over 60 employees and designing a custom office to
fit our community in our first year, to continuing to grow to over <a href="https://corporate.zalando.com/en/newsroom/en/stories/introducing-helsinkis-100th-employee">100
employees</a> with over 30
nationalities by our <a href="https://www.hs.fi/mainos/tarinoitasuomesta/art-2000005344850.html">second anniversary</a>. Fast
forward to 2018, we look back on how we grew and what made us the unique Zalando Helsinki (or #Zelsinki) community we
are today. We spoke with one of the most integral members of our Helsinki Tech Hub, our #Zelsinki Community Manager
Elina Zimpfer, who has been with Zalando Helsinki since August 2016.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/74c46fb53b1f4d370a4407a228de242eb78ef0ea_10.-aeaf5101-24e9-4436-98b9-8da195d44fdd.jpg?auto=compress,format"></p>
<h4><strong>When you joined what was the Helsinki Tech Hub like?</strong></h4>
<p>The Zelsinki team was about half the size it is now when I joined. We were still at the old office, which was so full
that newbies didn’t get their own desks when starting. The renovation and the move to the new office had been delayed by
some weeks, so I started in the hurricane of moving offices and doing the final touches to the new location.</p>
<p>It is my everyday job to have an understanding of the local community and make sure that everyone in it is a part of the
global Zalando ecosystem. It is really important that we as remote sites are represented at the heart of Zalando, and
I’m proud to be a part of supporting that goal. The Helsinki Tech team has a very good record on giving talks at our
internal knowledge sharing platform, and I see that as an important window for us to showcase our work to the rest of
the company.</p>
<h4><strong>What kind of things did you do to build the community?</strong></h4>
<p>At a remote site and a smaller team, we have the chance to do many various activities and focus on smaller details. It
soon became clear that our Zelsinki people are very competitive, so different tournaments are very popular. In Zelsinki,
you can win a fantastic handmade trophy in almost anything, we have tournaments for pool, table tennis,
<a href="https://en.wikipedia.org/wiki/M%C3%B6lkky">mölkky</a>, and Mario Kart just to name a few. We also love to have fun
together and celebrate our achievements, so “cupcakes & bubbly” occasions are not unusual. In addition to our Helsinki
internal activities, participating in the global <a href="https://corporate.zalando.com/en/innovation/grassroots-tech-innovation">Zalando Tech Community
projects</a> and events is a good way to keep the
“one tech team” spirit alive. As Zalando is a relatively new company in the Helsinki Tech scene, it’s also important to
raise our profile and give back to the external community. I organize external meetups and other events, and encourage
our people to participate and give talks.</p>
<h4><strong>What is the most important thing about building a local community?</strong></h4>
<p>Each community has its own special features, traditions and characteristics, so the things that work for one community
might not work at others. It’s important to keep the customer in mind and personalize solutions to fit the community
members. Getting people involved in the creation of common events, is a great way to really grasp the needs of the
community. We have many special activities in our Helsinki community, amongst them our annual Summer Adventure.</p>
<p>It’s the most wonderful time of the year for Finns. Nightless nights, Finnish sunkissed strawberries, Midsummer and of
course for our Helsinki Tech Hub’s Zalandos, the “Zelsinki” Summer Adventure. This was our third Summer Adventure and we
made it a good one!</p>
<p>Not your typical summer party and activity, at Zalando we want to personalize our experiences for our users. This also
means we want our employees to have the same experience. Three years ago when we started out in Helsinki, we were a
newly grown team of 50 people with 60% of our new colleagues from outside of Finland. Most of our colleagues were
software engineers, so we knew they loved to solve problems and puzzles.</p>
<p>So we got to thinking about how we might show them around Helsinki, get the teams to know each other, and solve some
problems along the way to our summer team event location. Escape rooms and geocaching were very popular at the time, so
we decided to create an amazing race-type experience tailored for our Zelsinki team. And it was a hit!</p>
<p>That was the summer of 2016. Last summer, we decided to reiterate the game by involving our team more and we formed an
event committee. The goal again was to solve the puzzles in randomly selected teams to get to our secret end location.
We incorporated a theme and our Zelsinki Survivors had to face the jungle terrors all while getting to know some famous
Helsinki locations such as Hietaniemi beach and the Jean Sibelius monument.</p>
<p>For our third iteration this summer, the Zelsinki Adventure became somewhat legendary and we had even more interest from
our own community to create something great for their peers. Our Team Assistant Essi Marttila and myself were at the
helm, enabling and empowering our people to get involved, and got together a great organizing committee with diverse
skills. Jari Kalinainen created an iOS app, Antti Pennanen composed music, and myself and Essi came up with puzzles,
activities and an exciting Wild West themed storyline. We had more props than ever, and it might have been possible to
see a group of software engineers ride a hobby horse in a park in Helsinki city centre.</p>
<p>This is definitely a tradition that will last in our Helsinki Tech and we can’t wait to see how we develop the app in
years to come, and have new colleagues join in to create an amazing experience for their peers.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8a52c249f93ed32a08374fb09ff39c552d88ea61_zalando-sommerfest-2018-booth-0805.jpg?auto=compress,format"></p>
<h4><strong>What is the best thing about your job?</strong></h4>
<p>Definitely the people! We have such a great team! I also love the ever-changing nature of my job, every day is different
and a new challenge.</p>
<p>We have a great team here and we work on some of the cornerstones of the fashion store: personalization, browsing, new
emerging business, connected retail and logistics solutions. We’re looking forward to the next three years!</p>
<p><em>Check out our <a href="https://zln.do/2M6rrol">open positions</a> in Helsinki and join one of our Software Engineering teams.</em></p>Zalando at the DatSci Awards 20182018-08-23T00:00:00+02:002018-08-23T00:00:00+02:00Humberto Coronatag:engineering.zalando.com,2018-08-23:/posts/2018/08/data-science-products-multi-disciplinary-teams.html<p>Building data science products in multi disciplinary teams</p><h3><strong>Building data science products in multi disciplinary teams</strong></h3>
<p>For the last three years, I have been working on different data science projects at Zalando, helping our more than 24
million customers find the most relevant items in the assortment we have. Along the way, I have learned how to <a href="https://engineering.zalando.com/posts/2016/11/doing-data-science-the-cloud-and-distributed-way.html">scale
data science</a>, or how to <a href="https://engineering.zalando.com/posts/2018/02/innovation-digital-experience.html">build a
new personalization product from
scratch</a>. Thanks to my experiences, I
am a firm believer in having dedicated and
<a href="https://engineering.zalando.com/posts/2017/06/autonomous-motivation-in-technology-organizations.html">autonomous</a>
multi-functional teams to solve complex problems, especially when they involve learning.</p>
<p>As data scientists, we are used to looking at problems from a data perspective, which has helped the teams I have worked
with gain huge amounts of domain knowledge. We strive also to make data-driven decisions, where running A/B tests or
doing online and offline evaluations of the models we build are some of the most important tools we have. However, what
does it look like to work in a multifunctional team?</p>
<p>In a Zalando team, we usually have one or two data scientists, one or two engineers, a product manager, a designer, and
sometimes a business developer. The details change from team to team, but you get the picture. Not all of these people
are dedicated to the team 100% of their time, sometimes a designer can work with two or three teams, depending on their
areas of interest. The main advantage of working with this setup is that we are able to tackle uncertainties and risks
from many more angles, and way faster than on a researchers-only team.</p>
<p>Something I have learned when working with
<a href="https://medium.com/zalando-design/a-designers-ai-learning-journey-dd3f3079a299">designers</a>, is the many advantages of
early testing and prototyping, and their customer-centric approach to problem solving. Moreover, because they tend to
work in different products from similar areas, the knowledge transfer usually happens more naturally and also faster,
and completely changes the way we work. When working closely with our copywriting team, we learn how to communicate our
products in the right way for our customers, and working with engineers we learn how to make sure to build machine
learning solutions that scale; ones we are able to operate.</p>
<p>A very good example I have <a href="https://engineering.zalando.com/posts/2018/02/innovation-digital-experience.html">previously written
about</a> is the latest product I was in
charge of building, where we were able to collaboratively design a prototype to solve our customer problem of, “<em>How can
we make recommended content more transparent and relevant to our customers</em>?” We did this in four days, writing only a
minimum amount of code. We built six personalized prototypes for user testing, by manually adding “recommended” content
into a static version of the Zalando App. Instead of using an algorithm, we “faked” the algorithmic result by using
human expert curators to choose which content would be shown to each customer.</p>
<p>By faking the personalization part, we were not only able to understand our customers expectations about our product,
but we also saved months of development of an algorithmic solution that was not what the customer expected. In
particular, the feedback we got from our customers was far more specific and natural than when using non-personalized
prototypes. For example, instead of asking someone “imagine you love leather jackets and we recommend you matching
boots,” we can know beforehand that they bought a leather jacket last week, and we created “recommendations” of the
boots we thought would better match their style.</p>
<p>Working in this environment is also aligned with our principles of “ <a href="https://engineering.zalando.com/posts/2016/08/radical-agility-study-notes.html">radical
agility</a>” and
<a href="https://engineering.zalando.com/posts/2016/08/why-do-we-have-autonomous-teams.html">autonomous</a> teams. During the
process, everyone involved gained customer understanding and domain knowledge from the problem we are trying to solve,
something extremely valuable for data scientists. Moreover, iterating on this is way cheaper and faster than iterating
on A/B test cycles, even when we have a really strong
<a href="https://www.slideshare.net/ssarabadani/building-octopus-an-introduction">testing-as-a-service</a> infrastructure.</p>
<p>This is only one example that shows how much I like working with people from different backgrounds and functions, which
also proves how important diversity is for building great machine learning products, especially in a B2C market that
operates on a European scale like Zalando does.</p>
<p>*
Humberto Corona is a product specialist and data scientist in Zalando's <a href="https://www.youtube.com/watch?v=Bg1Gt5nP4bo&t=32s">Fashion Insights
Center</a> in Dublin. A regular contributor to the tech blog, Humberto
is a finalist in <a href="https://www.datsciawards.ie/finalists-2018/">this year's DatSci awards</a>, where this piece was
<a href="https://www.datsciawards.ie/blog/building-data-science-products/">originally published</a>. <a href="https://www.youtube.com/watch?v=vSaXqKCZrwQ">Ana Peleteiro
Ramallo</a> took the Data Scientist of the Year title in her role at Zalando
in 2017.*</p>
<p><em>Work with inspiring people like Humberto by applying to one of our open tech positions
<a href="https://jobs.zalando.com/tech/jobs/">here</a>.</em></p>Battle of the Frameworks2018-08-16T00:00:00+02:002018-08-16T00:00:00+02:00Lora Vardarovatag:engineering.zalando.com,2018-08-16:/posts/2018/08/battle-frameworks.html<p>How to choose a JavaScript framework?</p><h3><strong>How to choose a JavaScript framework?</strong></h3>
<p>Developers are often biased about their technology choices. At the beginning of the year, I was about to start working
on a new product and my team could choose any tech stack. I did not want to be one of these biased developers who chose
the framework they liked. I wanted to make an informed and educated decision. I already had experience with React and
AngularJS. I had a good knowledge of Angular and experience with TypeScript. But what about Vue, the framework that most
JavaScript developers wanted to learn according to <a href="https://stateofjs.com/">State of JavaScript</a> 2017 survey?</p>
<p>A friend of mine likes saying that JavaScript Frameworks are like weeds: everyday a new framework gets released. It does
feel like this, doesn’t it? I was quite skeptical about Vue when it was released, and to be honest, I was quite
skeptical about Vue a long time after it got released. Did we really need another JavaScript framework? I did not really
think so. But I had some free time on my hands and decided to use it to learn Vue so that I could make an informed
decision about which framework to choose.</p>
<p><strong>History Lesson
</strong>AngularJS was started as a side project at Google around 2009. Later it was open-sourced and v1.0 was officially
released in 2011.</p>
<p>React was developed at Facebook. It was open-sourced at JSConf US in May 2013.</p>
<p>Vue was created by Evan You after working for Google using AngularJS in a number of projects. He wanted to extract what
he really liked about AngularJS and build something lightweight. Vue was originally released in February 2014.</p>
<p>Angular 2.0 was announced at the ng-Europe conference in September 2014. The drastic changes in the 2.0 version created
considerable controversy among developers. The final version was released in September 2016.</p>
<p>Side note: AngularJS and Angular 2.0, which was later simply called Angular are two different frameworks. The naming
really caused a lot of confusion. I believe that the Angular team would have been better off choosing a different
name.</p>
<p>In December 2016 Angular 4 was announced, skipping version 3 to avoid a confusion due to the misalignment of the router
package's version, which was already distributed as v3.3.0. The final version was released in March 2017.</p>
<p>Nowadays, developers are looking for smaller, faster, and simpler technologies. All three frameworks (Angular, React and
Vue) are doing lots of work in this direction. You can expect pretty good performance from these frameworks.
<a href="https://www.stefankrause.net/wp/?p=454">Performance benchmarks</a> show similar performance.</p>
<p>In April 2017, Facebook announced React Fiber, a new core algorithm of React. It was released in September 2017.</p>
<p>Angular 5 was released in November 2017. Key improvements in Angular 5 included support for progressive web apps, and a
build optimizer.</p>
<p>Google pledged to do twice-a-year upgrades. Angular 6 was released in April 2018. Angular 7 will be released
September/October 2018.</p>
<p>In the beginning of 2018, a schedule was announced for phasing-out AngularJS: after releasing 1.7.0, the active
development on AngularJS will continue until the end of June 2018. Afterwards, 1.7 will be supported till June, 2021 as
long-term support.</p>
<p>The bottom line is that these three frameworks, React, Vue and Angular, are quite mature. And it is likely they’ll be
around for a while.</p>
<p><strong>Key Concepts
</strong>React use the Virtual DOM pattern. React creates an in-memory data structure cache, computes the resulting
differences, and then updates the browser's displayed DOM efficiently.</p>
<p>React is all about components. Your React codebase is basically just one large pile of big components that call smaller
components. Props are how components talk to each other; they are the data, which is passed to the child component from
the parent component. It’s important to note that React’s data flow is unidirectional: data can only go from parent
components to their children, not the other way around.</p>
<p>The component approach means that both HTML and JavaScript code live in the same file. React’s way to achieve this is
the JSX language. It allows us to write HTML like syntax which gets transformed to lightweight JavaScript objects.</p>
<p>To build an Angular application you define a set of components for every UI element, screen, and route. An application
will always have a root component that contains all other components. Components have well-defined inputs and outputs,
and lifecycle.</p>
<p>The idea behind dependency injection is that if you have a component that depends on a service, you do not create that
service yourself. Instead, you request one in the constructor, and the framework will provide you one. This allows you
to depend on interfaces, not concrete types. This results in more decoupled code and improves testability.</p>
<p>Property bindings makes Angular applications interactive.</p>
<p>Vue also makes use of the Virtual DOM like React.</p>
<p>In Vue.js the state of the DOM is just a reflection of the data state. You connect the two together by creating
"ViewModel" objects. When you change the data, the DOM updates automatically.</p>
<p>You create small, decoupled units so that they are easier to understand and maintain. In Vue the components are
ViewModels with pre-defined behaviours. The UI is a tree of components.</p>
<p>In Vue the HTML, JS and CSS for each component live in the same file. Some hate the single-file components, some people
love them. I personally think that they are very handy and can make you more productive as it reduces context switch.</p>
<p><strong>Ecosystems
</strong>This table that shows the libraries you may be familiar with in React or Angular or Vue alongside their equivalent in
the other frameworks:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/bc2b7168c6597ec10b2fa46361cc2b3d4656352a_screen-shot-2018-08-15-at-11.26.37.png?auto=compress,format"></p>
<p>It is important to note here that Angular is somewhat more prescriptive. Some developers do not like this and prefer to
have freedom choosing the tools they use. It is more of a personal preference.</p>
<p><strong>Lessons Learned
</strong>I had the idea to build a small app with all three frameworks and compare them. And so I did. But it was completely
unnecessary because of <a href="http://todomvc.com/">TodoMVC</a>; a project which offers the same Todo application implemented
using MV* concepts in most of the popular JavaScript frameworks of today.</p>
<p><a href="http://todomvc.com/">TodoMVC</a> is supposed to help you select an MV* framework. But the Todo app is way too simple.</p>
<p>If you are new to web development and you are learning a new framework, the TodoMVC is probably a good start.</p>
<p>But if you are experienced and would like to build real-world, more complex applications there are better
alternatives.</p>
<p>Some better alternatives are RealWorld and HNPWA.</p>
<p><a href="https://github.com/gothinkster/realworld">RealWorld</a> allows you to choose any frontend (React, Angular, Vue and even
more) and any backend (Node, Scala etc) and see how they power a real-world full-stack <a href="http://medium.com">medium.com</a>
clone.</p>
<p><a href="https://hnpwa.com/">HNPWA</a> A collection of unofficial Hacker News clients built with a number of popular JavaScript
frameworks. Each implementation is a complete Progressive Web App that utilises different progressive technologies.</p>
<p><strong><em>Lesson #1</em></strong> The Todo App is too simple.
Use RealWorld or HNPWA to see what a real-world application would look like. Play with them, build on them and learn.</p>
<p><strong>Lesson #2</strong> Documentation is very important.
Good documentation helps you to get started quickly. Vue really excels at documentation. This is one of the reasons why
it is so easy to get started with Vue.</p>
<p>React and Angular also have good documentation. Still not as good as Vue in my opinion.</p>
<p>The main problem with the Angular documentation is that often you will stumble upon documentation about AngularJS
instead and it can be very confusing and frustrating. That is why I said earlier that the Angular team would have been
better off if they had chosen a different name for Angular.</p>
<p><strong><em>Lesson #3</em></strong> Community is important.
When documentation fails, you learn that community is also very important. You want to be sure that it will be easy to
get help if you get stuck and cannot find information in the documentation. You want to choose a framework whose
corresponding communities are extensive and helpful; communities where you can discuss best practices and get solid
advice.</p>
<p>Ultimately, you need to answer the following question: *Would it be easy to hire more developers who are experts or
willing to work with and learn this framework?</p>
<p>***Other questions worth asking when choosing a JavaScript framework</p>
<p>**<em>How high is the “Bus Factor?”
</em>The “Bus Factor” is a number equal to the number of team members who, if run over by a bus, would adversely affect a
project. To put it more simply: Can other people continue working on your projects if you are hit by a bus tomorrow?</p>
<p>Remember that talent is hard to hire. You need to know how easy it is to find developers for each of the frameworks.
Also, what does the learning curve look like for each framework? Again, I think that Vue really excels here. It has the
lowest learning curve of the three.</p>
<p><em>What does the product roadmap look like?
</em>Is it just a prototype? Choose whatever, learn something new.</p>
<p>Would it have a single function that would never change? Do you have to ship it quickly? Choose whatever you are most
familiar with.</p>
<p>Is your product business critical? Probably it is a good decision to be more conservative in your choice.</p>
<p>Is the product going to evolve, have new features, etc.? It should be scalable in that case.</p>
<p><strong>Wrap Up
</strong>There is a point in your programming career when you realise that there isn’t a best framework. All the frameworks
solve the same problems but in different ways. Is it a good thing that there are so many alternatives? Yes. In my
opinion, the competition between Angular, Vue, React, and the other frameworks out there is very healthy. It brings a
lot of innovation and improvements in the entire JavaScript ecosystem. We all benefit from that no matter which
framework we work with.</p>
<p>We are developers. We like fighting about all sorts of important things like tabs versus spaces, trailing commas, etc.
Joking aside, it is somehow in our blood to fight about silly things. I feel that we should appreciate the improvements
all these JavaScript frameworks bring. Because there isn’t a best framework.</p>
<p><em>Don’t ask what the best framework is, ask what the most </em><em>suitable</em><em> framework for y</em><em>our product</em><em> and </em><em>your team</em><em>
is.</em></p>
<p>*
Work with open minds like Lora. Have a look at the Zalando <a href="https://jobs.zalando.com/tech/jobs/">jobs site</a>.*</p>The Future of Data Science2018-08-09T00:00:00+02:002018-08-09T00:00:00+02:00Pascal Pompeytag:engineering.zalando.com,2018-08-09:/posts/2018/08/future-data-science.html<p>Debunking the myth of the data science bubble</p><h3><strong>Debunking the myth of the data science bubble</strong></h3>
<p>We’ve all read articles indicating the looming decline of data science. Some coined the term ‘data science bubble,’ some
even went so far as set a date for the ‘death of data-science’ (they give it five years before the bubble implodes).
This reached a point where anyone working in the field needed to start paying attention to these signals. I have
investigated the arguments backing this ‘imminent death’ diagnostic, detected some biases, drafted an <a href="https://www.linkedin.com/feed/update/urn:li:activity:6424806057633284096/?commentUrn=urn%3Ali%3Acomment%3A(article%3A6974258448439374847%2C6424805976335093760)">early answer on
LinkedIn</a>,
the Zalando communication team picked on it, and following their encouragements, I prepared a revised version for the
Zalando Blog. This post doesn’t aim at making any bold predictions about the future without proper evidence. I always
found these to be relatively pointless. It just aims to point out that, for all the noise, there is no solid reason to
believe that any of us should worry about our jobs in the years to come. In fact, the very arguments used to prognose a
‘data science bubble’ can be turned around as reasons not to worry.</p>
<p>The arguments used by proponents of the data science bubble are generally of three sorts:</p>
<p>1- Increased commoditization</p>
<p>2- Data scientists should not become software engineers</p>
<p>3- Full automation</p>
<p><strong>Increased Commoditization:
</strong>It is clear that data science work is getting increasingly commoditized: almost all ML frameworks now come with
libraries of off-the-shelf models that are pre-architectured, pre-trained and pre-tuned. Want to do image
classification? Download a pre-trained ResNet for your favorite deep-learning framework and you are almost ready to go.
The net effect is that a single well-rounded data scientist can now solve in a week what a full team couldn't solve in
six months 10 years ago.</p>
<p>Does that mean less demand for data scientists? Certainly not, it only means that investing in data science is now
viable for a lot of domains for which data science was simply too expensive or too complex before. Hence a rising demand
for data science and data scientists. It is useful to take software engineering as a comparison here. Over the years,
most of the complexity around programming has been abstracted and commoditized. Only a few could start anything in
assembly, C made it much easier to develop complex projects, Java commoditised memory management, etc… Did it make the
demand for software engineers vanish? Certainly not, on the contrary, it increased their productivity and hence their
net value to any organisation.</p>
<p><strong>Data-scientists should not become software engineers:
</strong>I strongly disagree with this assessment: one wouldn’t believe the number of data science projects that end up in a
powerpoint presentation with pretty graphs and then just an ignominious death. Why? Because data scientists often lack
the ability to make their projects deliver continuous value in a well-maintained and monitored production environment.
95% of the data science projects I see do not make it past the POC stage. Going beyond the POC requires a software
engineering mindset.</p>
<p>It is still rare to find data-scientists actually capable of (1) putting a model in a production environment, and then
(2) guaranteeing that machine-learned based value is continuously delivered, monitored and maintained in the long run.
Sadly, that is precisely where the ROI for any data science investment lies. I am not sure pushing data scientists to
move towards management would help there: chronic over-powerpointing and the urge for serial POCs that never make it
beyond the MVP stage is very much a management-induced sickness. I am not saying data scientists should become software
engineers but, if anything, data-scientists need better engineering and software architecture abilities, not less.</p>
<p><strong>The risk of automation
</strong>Full automation is very unlikely, because in many regards, data science is still more an art than it is a technique.
There is a huge gap between the 'hello Mnist’ tensorFlow example and applying ML to a new domain for which no golden
data-set or known model archetype exists. Ever had to use crowdsourcing for gathering labels? Ever ventured into the
uncharted territories of ML? Ever had to solve a problem for which you couldn’t piggyback on an existing git repo? You
will know what I am talking about…</p>
<p>And there we enter the real discussion: Data scientists that are not able to go beyond the TensorFlow Mnist-CNN example,
the ResNet boilerplate or the vanilla word2vec + lstm archetype are indeed going to become extinct. The same way no
programmer can make a living out of the ‘Hello World’ code he/she wrote during the first year of college. But for those
who know how to go beyond that and make ML actually work in a continuous delivery environment, there is a bright future
in front of them and there are good reasons to think it will span much longer than the five years to come.</p>
<p>Sources:</p>
<p><a href="https://blogs.oracle.com/datawarehousing/the-end-of-the-data-scientist-bubble">https://blogs.oracle.com/datawarehousing/the-end-of-the-data-scientist-bubble</a></p>
<p><a href="https://towardsdatascience.com/the-data-science-bubble-99fff9821abb">https://towardsdatascience.com/the-data-science-bubble-99fff9821abb</a></p>
<p><a href="https://medium.com/@TebbaVonMathenstien/are-programmers-headed-toward-another-bursting-bubble-528e30c59a0e">https://medium.com/@TebbaVonMathenstien/are-programmers-headed-toward-another-bursting-bubble-528e30c59a0e</a></p>Agile Principles Over Frameworks2018-08-02T00:00:00+02:002018-08-02T00:00:00+02:00Samir Kecktag:engineering.zalando.com,2018-08-02:/posts/2018/08/agile-principles-over-frameworks.html<p>Embracing the diverse in working agile</p><h3><strong>Embracing the diverse in working agile</strong></h3>
<p>Very often I get asked what agile working looks like at Zalando. Do we use scrum? Do we use Kanban? Do we work with
LeSS? Do we use SaFE? The answer to all of these is, “Yes”.</p>
<p>As Agile Coaches we value principles more than frameworks. The principles are derived out of these diverse frameworks
and they evolve over time. We iterate them, we rewrite them and we focus only on the needs of Zalando. Our guiding
principles are:</p>
<ol>
<li><strong>Customer Centricity to build the right thing</strong> Through practices we ensure that the customer value provides the
direction for our work. We focus on solving customer problems and exploit market potential rather than finishing
work we do not see or know the purpose or impact of.</li>
<li><strong>Visualization for transparency and control</strong> As software development and other areas are brain and knowledge work,
we create transparency by visualizing our work. We visualize concepts, workflows, KPIs and therefore can align and
control the work.</li>
<li><strong>Accelerated Feedback to lower risk and increase learning</strong> All our work is aiming to get feedback as fast as
possible. Can we involve the user and customer directly and systematically in the process to receive feedback.
Everything else is just assumption and this principles avoids building on assumptions.</li>
<li><strong>Manage the flow to optimize for value creation</strong> The focus of our work is value creation and we focus on the work
items running through our system and optimize their flow. By focusing on those work items, we see work from a
different perspective than managing only people.</li>
<li><strong>Build quality in to enable sustainable delivery</strong> This principle ensures that through our ways of working, we have
quality as a default mode. Coding Practices and team policies are making quality not something to take of at the end
of the process, but as the standard of our work.</li>
<li><strong>Continuously improve to stay in the healthy performance zone</strong> Wherever we start, we constantly improve
everything. We improve the human interactions to ensure conflicts don’t build up. We also improve our ways of
working and our workflow. By constantly improving we remain agile, which is the sweet spot between chaos and
bureaucracy.</li>
</ol>
<p><strong>Why are we agnostic about frameworks?</strong>
We have more than 150 teams connected to software development. The skill sets and experiences of individuals are very
diverse. Why should we force a team to Scrum if they work more effectively in a different mode and with different rules?
As long as they are customer centric to build the right things, we give them the autonomy to use whatever frameworks or
parts of them as they like. The context of each team differs. Focusing on principles allows as much customization of
working style as possible and ensures that the most important practices are used.</p>
<p><strong>How do teams learn and use principles concretely?</strong></p>
<p>The teams learn about our principles in a two-day deep dive workshop. The aim is to get teams to understand the “why”
behind the principles. We found that an agile mindset is far more important than following a certain practice. Practices
themselves, without understanding them, do not lead to the intended impact. Next to understanding the “why,” we also
connect concrete practices to each principle. For example, we connected eight concrete practices to the principle of
“Accelerated Feedback” to lower risk and increase learning:</p>
<p>(1) Backlog refinement
(2) Estimation</p>
<p>(3) Story splitting
(3) Daily stand up
(4) Test driven development
(5) Continuous integration
(6) Emergent design</p>
<p>(7) Live concept test</p>
<p>(8) Vertical User Stories</p>
<p>Teams use the principles as a reflection moment, e.g. in retrospectives to see where they can improve. Some teams take
inspiration from the practices, see learning needs or potential for deep dives.</p>
<p>By working with principles over frameworks, we can work using agile methods agile at scale with a high level of
diversity. The mixture of mindset and concrete practices make them very impactful. The whole principles are teamwork
from Frank Ewert, Holger Schmeisky, Samir Hanna, Tobias Leonhardt and myself.</p>
<p><em>Work in an agile environment. Have a look at our <a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1">open tech
positions</a>!</em></p>Agile in People Operations2018-07-26T00:00:00+02:002018-07-26T00:00:00+02:00Sarah Guerriertag:engineering.zalando.com,2018-07-26:/posts/2018/07/agile-people-operations.html<p>Applying agile frameworks to HR processes</p><h3><strong>Applying agile frameworks to HR processes</strong></h3>
<p>At Zalando we set up multi-disciplinary teams to develop our products. We do not have a central tech unit, but tech is
distributed everywhere. This means that the way our techies work together has also spread across the company. Everywhere
in the organization people have touch points with agile frameworks and practices.</p>
<p>The team of people operations are the backbone of our HR processes, managing a high number of tickets around sick notes,
working contracts, job references, work permit support, SAP data maintenance or similar topics. The team consists of 80
people. You might think this is not the first use case to apply agile practices and frameworks. But we decided to try it
out.</p>
<p>To realize our goal of implementing agile practices within our team, we applied three principles:</p>
<ul>
<li>Visualization</li>
<li>Manage the flow</li>
<li>Continuous improvement</li>
</ul>
<h3><strong>Visualization</strong></h3>
<p><strong>What did we do?</strong>
In order to manage something properly, it needs to be transparent. We first started with a big board and then with a big
screen, measuring and displaying all relevant data, e.g. open tickets, lost calls, employee satisfaction, etc. Measuring
straight away already made a change, because we needed to think about ticket categories and immediately discovered
improvement points.</p>
<p><strong>What impact did it generate?</strong>
As a result the team now has complete transparency about tickets and they can manage them using the two following agile
principles.</p>
<h3><strong>Manage the flow</strong></h3>
<p><strong>What did we do?</strong>
We process the tickets differently now. Before the change, all tickets went into a big bucket and teams looked for their
topic based upon keywords. If a keyword was misleading sometimes the ticket was delayed or an important topic was
discovered too late. After making the data transparent, we now use a dedicated role (the Channel Manager) to dispatch
tickets three times a day. The Channel Manager is able to see patterns and build or change new keywords in direct
contact with the internal users. The motivation for this role is to become as automated as possible.</p>
<p><strong>What impact did it generate?
</strong>Moving from only managing teams to managing the flow of work items, we achieved faster cycle times and reduced our
backlogs. We can prioritize very easily and time-critical tickets can be managed adequately. All of this happened
through structured improvement.</p>
<h3><strong>Continuous Improvement</strong></h3>
<p><strong>What did we do?
</strong>We rearranged our workplace, so everybody has a clear view of our big screen. Every two weeks we have a standup for
around 10 minutes at our screen, looking at the numbers and briefly sharing patterns and improvement points. After the
standup, we move on to deep dives on an individual level. Looking at the visualized and omnipresent data, the team
members see very clearly how and where they can improve themselves and have easier access to bring in their
improvements. The same goes for the leadership team, having transparent results, gives rise to the team understanding
what impediments they need to remove for the teams on a more systematic and holistic level.</p>
<p><strong>What impact did it generate?
</strong>We are constantly improving and the level of tickets dispatched in an automated fashion is increasing every month.
Through this, we have managed to reduce our backlog and cycle time every month. We also set up our own way to discover
and tackle our issues quickly. Less big “improvement projects” eating a lot of resources, and more weekly improvements
based on KPIs and numbers.</p>
<p>Of course, there are still improvements to be made and we are not adapting all agile principles directly. Nevertheless,
it is a good example for us on how we improved the work of a people operations team through agile working.</p>
<p><em>Work in an agile environment. Our open positions are <a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1">here</a>.</em></p>Lean Testing, or Why Unit Tests are Worse than You Think2018-07-19T00:00:00+02:002018-07-19T00:00:00+02:00Eugen Kisstag:engineering.zalando.com,2018-07-19:/posts/2018/07/economic-perspective-testing.html<p>An economic perspective on testing</p><h3><strong>An economic perspective on testing</strong></h3>
<p>Testing is a controversial topic. People have strong convictions about testing approaches. Test Driven Development is
the most prominent example. Clear empirical evidence is missing, which invites strong claims. I advocate for an economic
perspective towards testing. Secondly, I claim that focussing too much on unit tests is not the most economic approach.
I coin this testing philosophy <em>“Lean Testing.”</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2050dbc47a11b89960c3656258b20a99b0e6ce46_1_8c333d_ynehg4q3udb1wta.jpeg?auto=compress,format"></p>
<p>The main argument is as follows: different kinds of tests have different costs and benefits. You have finite resources
to distribute into testing. You want to get the most out of your tests, so use the most economic testing approach. For
many domains (e.g. GUIs), tests other than unit tests give you more bang for your buck.</p>
<h3>Confidence and Tests</h3>
<p>The article ' <a href="https://blog.kentcdodds.com/write-tests-not-too-many-mostly-integration-5e8c7fff591c">Write tests. Not too many. Mostly
integration</a>' and the related
<a href="https://www.youtube.com/watch?v=Fha2bVoC8SE">video</a> by Kent C. Dodds express the ideas behind Lean Testing well. He
introduces three dimensions with which to measure tests:</p>
<ul>
<li>Cost (cheap vs. expensive)</li>
<li>Speed (fast vs. slow)</li>
<li>Confidence (low vs. high) (click doesn't work vs. checkout doesn't work)</li>
</ul>
<p>The following is the <em>'Testing Trophy'</em> suggesting how to distribute your testing resources.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8438d7cee15610a04bb176396b378f2de33521a5_0_n6d7eq_mtcidg1zr.png?auto=compress,format"></p>
<p>Compared to Fowler's <a href="https://martinfowler.com/bliki/TestPyramid.html">Testing Pyramid</a>, confidence as a dimension is
added. Another difference is that unit tests do not cover the largest area.</p>
<p>One of Kent C. Dodds' <a href="https://twitter.com/kentcdodds/status/977018512689455106">major insights</a> is that you should
actually consider the confidence a test gives you: "The more your tests resemble the way your software is used, the more
confidence they can give you."</p>
<h3>Return on Investment of Tests</h3>
<p>The <a href="https://en.wikipedia.org/wiki/Return_on_investment">Return on investment (ROI)</a> of an end-to-end test is higher
than that of a unit test. This is because an end-to-end test covers a greater area of the code base. Even taking into
account higher costs, it provides disproportionally more confidence.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/49ed571e403e1648544d811c0d09f0254d6935b0_1_un2-r5att1elndecsaot5g.jpeg?auto=compress,format"></p>
<p>Plus, end-to-end tests test the critical paths that your users actually take. Whereas unit tests may test corner cases
that are never or very seldomly encountered in practice. The individual parts may work but the whole might not. The
previous points can be found in ' <a href="http://250bpm.com/blog:40">Unit Test Fetish</a>' by Martin Sústrik.</p>
<p>Further, Kent C. Dodds claims that integration tests provide the best balance of cost, speed and confidence. I subscribe
to that claim. We don't have empirical evidence showing that this is actually true, unfortunately. Still, my argument
goes like this: End-to-end tests provide the greatest confidence. If they weren't so costly to write and slow to run we
would only use end-to-end tests. Although better tools like <a href="https://www.cypress.io/">Cypress</a> mitigate these downsides.
Unit tests are less costly to write and faster to run but they test only a small part that might not even be critical.
Integration tests lie somewhere between unit tests and end-to-end tests so they provide the best balance.</p>
<p><em>As an aside</em>: The term “integration test,” and even more so “end-to-end test,” seems to generate intense fear in some
people. Such tests are supposed to be brittle, hard-to-setup and slow-to-run. The main idea is to just not mock so
much.</p>
<p>In the React context of Kent C. Dodd’s article integration testing refers to not using shallow rendering. An integration
test covers several components at once. Such a test is easier to write and more stable since you do not have to mock so
much and you are less likely to test implementation details.</p>
<p>In the backend world, an integration test would run against a real database and make real HTTP requests (to your
controller endpoints). It is no problem to spin up a Docker database container beforehand and have its state reset after
each test. Again, these tests run fast, are easy to write, reliable and resilient against code changes.</p>
<h3>Code Coverage</h3>
<p>Another point is that code coverage has diminishing returns. In practice, most agree as most projects set the lower
bound for coverage to around 80%. There is actually supporting research such as ' <a href="https://www.microsoft.com/en-us/research/blog/exploding-software-engineering-myths/">Exploding Software-Engineering
Myths</a>.' What follows are general
arguments.</p>
<p>Even with 100% code coverage you trust your dependencies. They can, in principle, have 0% code coverage.</p>
<p>For many products, it is acceptable to have the common cases work but not the exotic ones ( <a href="http://250bpm.com/blog:40">Unit Test
Fetish</a>). If you miss a corner case bug due to low code coverage that affects 0.1% of your
users you might survive. If your time to market increases due to high code coverage demands you might not survive. And
"just because you ran a function or ran a line does not mean it will work for the range of inputs you are allowing" (
<a href="https://news.ycombinator.com/item?id=14297289">source</a>).</p>
<h3>Code Quality and Unit Tests</h3>
<p>There is the claim that making your code unit-testable will improve its quality. Many arguments and some empirical
evidence in favor of that claim exist so I will put light on the other side.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8cabde18b9bbe28fc47d55611d2519f142267602_1_di1sqwdld0c68q-sodxukg.jpeg?auto=compress,format"></p>
<p>The article ‘ <a href="http://250bpm.com/blog:40">Unit Test Fetish</a>’ states that unit tests are an <em>anti-architecture</em> device.
Architecture is what makes software able to change. Unit tests ossify the internal structure of the code. Here is an
example:</p>
<p>"Imagine you have three components, A, B and C. You have written extensive unit test suite to test them. Later on you
decide to refactor the architecture so that functionality of B will be split among A and C. you now have two new
components with different interfaces. All the unit tests are suddenly rendered useless. Some test code may be reused but
all in all the entire test suite has to be rewritten."</p>
<p>This means that unit tests increase maintenance liabilities because they are less resilient against code changes.
Coupling between modules and their tests is introduced! Tests are system modules as well. See ‘ <a href="https://rbcs-us.com/documents/Why-Most-Unit-Testing-is-Waste.pdf">Why Most Unit Testing
is Waste</a>’ for these points.</p>
<p>There are also some psychological arguments. For example, if you value unit-testability, you would prefer a program
design that is easier to test than a design that is harder to test but is otherwise better, because you know that you'll
spend a lot more time writing tests. Some further points can be found in ' <a href="http://iansommerville.com/systems-software-and-technology/giving-up-on-test-first-development/">Giving Up on Test-First
Development</a>'.</p>
<p>The article ' <a href="http://david.heinemeierhansson.com/2014/test-induced-design-damage.html">Test-induced Design Damage</a>' by
David Heinemeier Hansson claims that to accommodate unit testing objectives, code is worsened through otherwise needless
indirection. The question is if extra indirection and decoupled code is always better. Does it not have a cost? What if
you decouple two components that are always used together. Was it worth decoupling them? You can claim that indirection
is always worth it but you cannot, at least, dismiss harder navigation inside the code base and during run-time.</p>
<h3>Conclusion</h3>
<p>An economic point of view helps to reconsider the Return on Investment of unit tests. Consider the confidence a test
provides. Integration tests provide the best balance between cost, speed and confidence. Be careful about code coverage
as too high aspirations there are likely counter-productive. Be skeptical about the code-quality improving powers of
making code unit-testable.</p>
<p>To make it clear, I do not advocate to never write unit tests. I hope that I provided a fresh perspective on testing. As
a future article, I plan to present how to concretely implement a good integration test for both a frontend and backend
project.</p>
<p>If you desire clear, albeit unnuanced, instructions, here is what you should do: Use a typed language. Focus on
integration and end-to-end tests. Use unit tests only where they make sense (e.g. pure algorithmic code with complex
corner cases). Be economic. Be lean.</p>
<h3>Sources</h3>
<ul>
<li><a href="https://blog.kentcdodds.com/write-tests-not-too-many-mostly-integration-5e8c7fff591c">Write tests. Not too many. Mostly
integration</a></li>
<li><a href="http://250bpm.com/blog:40">Unit Test Fetish</a></li>
<li><a href="http://david.heinemeierhansson.com/2014/test-induced-design-damage.html">Test-induced Design Damage</a></li>
<li><a href="https://www.microsoft.com/en-us/research/blog/exploding-software-engineering-myths/">Exploding Software-Engineering
Myths</a></li>
<li><a href="https://rbcs-us.com/documents/Why-Most-Unit-Testing-is-Waste.pdf">Why Most Unit Testing is Waste</a></li>
</ul>
<h3>Additional Notes</h3>
<p>One of the problems of discussing the costs and benefits of unit tests is that the boundary between unit and integration
tests is fuzzy. The terminology is not completely unambiguous so people tend to talk at cross purposes.</p>
<p>To make it clear, low code coverage does not imply fewer bugs. As the late Dijkstra said (1969): “Testing shows the
presence, not the absence of bugs.”</p>
<p>There is research that didn’t find Test Driven Development (TDD) improving coupling and cohesion metrics. TDD and unit
tests aren’t synonyms but in the context of this article it’s still interesting: ‘ <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.304.1723&rep=rep1&type=pdf">Does Test-Driven Development Really
Improve Software Design Quality?</a>’
Another article ‘ <a href="https://blog.ndepend.com/unit-testing-affect-codebases/">Unit Testing Doesn’t Affect Codebases the Way You Would
Think</a>’ analyzes code bases and finds that code with more unit
tests has more cyclomatic complexity per method, more lines of code per method and similar nesting depth.</p>
<p>This article focussed on which kinds of tests you should distribute your automated testing budget. Let's take a step
back and consider reducing the automated testing budget altogether. Then we'd have more time to think about the
problems, find better solutions and explore. This is especially important for GUIs as often there is no 'correct'
behavior but there is 'good' behavior. Paradoxically, reducing your automated testing budget might lead to a better
product. See also ‘ <a href="https://www.youtube.com/watch?v=f84n5oFoZBc">Hammock Driven Development</a>’.</p>
<p>There is a difference between library and app code. The former has different requirements and constraints where 100%
code coverage via unit tests likely makes sense. There is a difference between frontend and backend code. There is a
difference between code for nuclear reactors and games. Each project is different. The constraints and risks are
different. Thus, to be lean, you should adjust your testing approach to the project you're working on.</p>
<p><em>Come work with our tech team. Open job positions <a href="https://jobs.zalando.com/tech/jobs/">here</a>!</em></p>Styling-API Reinvented2018-07-12T00:00:00+02:002018-07-12T00:00:00+02:00Michal Raczkowskitag:engineering.zalando.com,2018-07-12:/posts/2018/07/decoupled-styling-ui-components.html<p>Decoupled styling in UI components</p><h3><strong>Decoupled styling in UI components</strong></h3>
<h3>Styling isolation</h3>
<p>Styling isolation achieved via <a href="https://github.com/css-modules/css-modules">CSS-modules</a>, various
<a href="https://github.com/MicheleBertoli/css-in-js#features">CSS-in-JS</a> solutions or
<a href="https://developers.google.com/web/fundamentals/web-components/shadowdom">Shadow-DOM</a> simulation is already a commonly
used and embraced pattern. This important step in CSS evolution was really necessary for UI components to be used with
more confidence. No more global scope causing name conflicts and CSS leaking in and out! The entire component across
HTML/JS/CSS is encapsulated.</p>
<h3>Styling API - exploration</h3>
<p>I expect CSS technology to offer much more in the future. The encapsulation usually comes hand in hand with the
interface, for accessing what was hidden in an organised way. There are different ways to provide styling-APIs, for
customising the component CSS from the outside.</p>
<p>One of the simplest methods is to support modifiers; flags for the component, used to change appearance, behavior or
state:</p>
<p>This is convenient if there are a few predefined modifiers. But what if the number of different use cases grows? The
number of modifiers could easily go off the scale if we combined many factors, especially for non-enum values like
"width" or "height".</p>
<p>Instead we could expose separate properties that provide a certain level of flexibility:</p>
<p>In such cases different modifiers can simply be constructed by users of the component. But what if the number of CSS
properties is large? This solution also doesn't scale nicely. Another con is that any modification of the component's
styles will likely force us to change the API as well.</p>
<p>Another solution is to expose the class that will be attached to the root element (let’s assume it's not a global class
and proper CSS isolation technique is in place):</p>
<p>Attaching a class from the outside will effectively overwrite the root element CSS. This is very convenient for
positioning the component, with such CSS properties as: "position," "top," "left," "z-index," "width," and "flex.”
Positioning of the component is rarely the responsibility of the component itself. In most cases it is expected to be
provided from outside. This solution is very convenient and more flexible than former proposals. But it’s limited to
setting the CSS only for the component's root element.</p>
<p>The combination of the above solutions would likely allow us to address many use cases, but is not perfect, especially
for component libraries, where simple, generic and consistent API is very important.</p>
<p>Decoupled styling</p>
<p>I'd like to take a step back and rethink the whole idea of styling-API for components. The native HTML elements come
with minimal CSS, enough to make the elements usable. The users are expected to style them themselves. We are not
talking about "customisation", as there is basically no inherent styling in place to "customise". Users inject styling,
via a “class” attribute or “className” property:</p>
<p>In latest browsers like Chrome, we can also set the styling for more complex HTML5 elements like video elements:</p>
<div class="highlight"><pre><span></span><code><span class="p">.</span><span class="s s-Atom">fashion</span><span class="o">-</span><span class="s s-Atom">store</span><span class="o">-</span><span class="s s-Atom">video</span><span class="p">:</span><span class="o">:-</span><span class="s s-Atom">webkit</span><span class="o">-</span><span class="s s-Atom">media</span><span class="o">-</span><span class="s s-Atom">controls</span><span class="o">-</span><span class="s s-Atom">panel</span> <span class="p">{</span>
<span class="s s-Atom">background</span><span class="o">-</span><span class="s s-Atom">color</span><span class="p">:</span> <span class="s s-Atom">white</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>Thanks to <a href="https://developer.mozilla.org/en-US/docs/Web/Web_Components/Using_shadow_DOM">Shadow DOM</a> and
webkit-pseudo-elements users can set the styles not only for the root element, but also for important inner parts of the
video component. However webkit pseudo-elements are poorly documented and seem to be unstable. It’s even worse for
<a href="https://developer.mozilla.org/en-US/docs/Web/Web_Components/Using_custom_elements">custom elements</a>, because currently
it’s not possible to customise the inner parts of elements (::shadow and /deep/ have been
<a href="https://developers.google.com/web/updates/2017/10/remove-shadow-piercing">deprecated</a>). However, there are other
proposals that will likely fill the gap:</p>
<ul>
<li><a href="https://github.com/w3c/webcomponents/blob/gh-pages/proposals/Custom-Pseudo-Elements.md">Custom pseudo-elements</a></li>
<li><a href="https://drafts.csswg.org/css-shadow-parts/">CSS Shadow parts</a></li>
</ul>
<p>Let's summarise the native approach, which I call "decoupled styling":</p>
<ol>
<li>A component is responsible only for its functionality (and API) and comes with minimal or no styling</li>
<li>A component styling is expected to be injected from outside</li>
<li>There is styling-API in place to style the inner parts</li>
</ol>
<h3>Benefits</h3>
<p>The nature of styling is change, the nature of functionality (and API) is stability. It makes perfect sense to decouple
both. Decoupled styling actually solves many issues that UI-component library developers and users are facing:</p>
<ul>
<li>styling couples components together</li>
<li>same changelog for styling and functionality/API causes upgrading issues (e.g. forced migrations)</li>
<li>limited resilience - changes in styling propagate to all parts of the frontend project</li>
<li>costs of rewriting components to implement a new design</li>
<li>costs of rewriting/abandoning projects, because of outdated components</li>
<li>limitations of styling-API to address different use cases</li>
<li>bottleneck of component library constantly adjusted for different use cases</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3469fa5373c726adc3bc41d813500d93d1899cc6_decoupled-styling-in-ui-components---diagram-1.png?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/45172f3da146f32024cca54a78c6def7c5b18653_decoupled-styling-in-ui-components---diagram-2.png?auto=compress,format"></p>
<h3>API proposal</h3>
<p>In the world of custom UI components, many components are constructed from other components. Contrary to native HTML/CSS
implementation with injecting a single class name, here we need API for accessing the nested components. Let’s look at
the following proposal for the API.</p>
<p>Imagine a “Dialog” component that contains two instances of a “Button” component (“OK” and “Cancel” buttons). The Dialog
component wants to set the styling for OK button but leave the styling for the Cancel button unchanged (default):</p>
<div class="highlight"><pre><span></span><code>OK
Cancel
</code></pre></div>
<p>We used “classes” property to inject the CSS classes for two of Button’s internal elements; the icon and the text
elements. All properties are optional. It’s up to component itself to define its styling-API (set of class names
referencing their child elements).</p>
<p>To use Dialog with its default, minimal styling:</p>
<p>But for cases where we want to adjust the styles, we will inject it:</p>
<p>We injected a class that will be attached to the root element. But we can do much more:</p>
<p>The example above shows how we can access every level of nested components structure in the Dialog. We’ve set the CSS
classes for the root element and OK button. By doing that we will effectively overwrite the styling for the OK button,
that is preset inside Dialog.</p>
<p>In the same way we will be able to set the styling for components that contain Dialogs, and farther up, to the highest
level of the application. On the root level of the application, defining the styles will practically mean defining the
application theme.</p>
<h3>Implementation</h3>
<p>I implemented two examples using <a href="https://github.com/facebook/react/">React</a> and
<a href="https://github.com/Microsoft/TypeScript">TypeScript</a>, first with <a href="https://github.com/css-modules/css-modules">CSS
Modules</a> and second with <a href="https://github.com/emotion-js/emotion">Emotion</a>
(CSS-in-JS library). Both are based on the same concept:</p>
<ul>
<li>default, minimal styling for components is predefined as an isolated set of classes</li>
<li>styling-API (set of class names) is defined using TypeScript interface, with all properties optional</li>
<li>components allow injection of class names object (via “classes” parameter) which is “deeply-merged” with default
class names object, overwriting the styles</li>
</ul>
<p>React, TypeScript, CSS Modules: <a href="https://github.com/mrac/decoupled-styling-css-modules">https://github.com/mrac/decoupled-styling-css-modules</a>
React, TypeScript, Emotion: <a href="https://github.com/mrac/decoupled-styling-css-in-js">https://github.com/mrac/decoupled-styling-css-in-js</a></p>
<h2>Conclusion</h2>
<p>Decoupling styling from UI components may be a step towards making them really reusable, drawing from the original idea
behind Cascade Style Sheets to separate the presentation layer from UI logic and markup. Defining boundaries between UI
logic and markup on one side and styling on the other side would likely change the way UX designers collaborate with
engineers. Here designers would style components based on API provided by engineers. It would be easier now to specify
what constitutes a breaking-change within that contract. Putting an ever-changing skin on top of what is stable would
likely save costs, friction and contribute to software quality.</p>Dortmund Turns Six2018-07-10T00:00:00+02:002018-07-10T00:00:00+02:00Bianca Bartlingtag:engineering.zalando.com,2018-07-10:/posts/2018/07/dortmund-turns-six.html<p>Zalando’s maiden tech hub celebrates in style</p><h3><strong>Zalando’s maiden tech hub celebrates in style</strong></h3>
<p>With our 10th anniversary celebrations coming up, 2018 is a very special year in the Zalando universe. But while the
company celebrates 10 years, we in Dortmund are excited to celebrate our own birthday as we turn six.</p>
<p>Every year in July, we stop for a moment in Dortmund to reflect on our past journey together and celebrate the opening
of our Dortmund Tech Hub in 2012. Being the first technology hub outside Berlin, it still feels very special to be part
of this team and its continuing journey.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2727433a01b5f136b381fd4ec19a441a6d794f02_dortmund-six.jpg?auto=compress,format"></p>
<p>Back in the day, this was a big step for Zalando, one in which no one really knew what to expect. How would remote
communication work? How can the community and Zalando grow over several locations but remain one team? What would be the
core task here in Dortmund? Dortmund has a proud history as an industrial city, famous for coal and steel. A rather
pragmatic way of working has developed here over time. So for us, the local "Zalandos," the goal was clear: make it
work.</p>
<p>With a team of six developers and product managers from the outset, the office outgrew its initial space after our Smart
Inventories, Gift Card, and Payments teams increased in number and scope. Boasting over 90 employees now, our Dortmund
hub still holds an important role within the overall organization of Zalando, and maintains a consistent line of
communications with Berlin and beyond.</p>
<p>At our birthday celebration, we looked back on this awesome journey and our great accomplishments. We achieved six years
of growing together, taking our place in the Zalando universe, becoming known as the core pillar for Payment Services
and our Fulfillment network, the backbone of our business. At last count, we number over 13 different teams that all
contribute to core functionality of the Zalando Platform.</p>
<p>Here's to our future!</p>
<p><em>Join our Dortmund team by checking out our <a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1&location=Dortmund&search=">open
job</a> positions!</em></p>Utilizing the Finite State Machine2018-07-05T00:00:00+02:002018-07-05T00:00:00+02:00Ahmed Shehatatag:engineering.zalando.com,2018-07-05:/posts/2018/07/utilizing-finite-state-machine.html<p>How using a State Machine saved our apps & flows from refactoring</p><h3><strong>How using a State Machine saved our apps & flows from refactoring</strong></h3>
<p>There is a lot to learn about a "Finite State Machine" (FSM).</p>
<h3>A little intro: what is a FSM?</h3>
<p>A Finite State Machine is an abstract model of computation, which can be in only one finite state at a specific moment.
Finite State Machines are used to model problems in different domains such as AI, games, application flows, etc.</p>
<p>In simpler words: It describes how a program should behave by specifying pre-specified states and routes between them.</p>
<h3>A Real World Example</h3>
<p>Let's imagine a safe lock:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5ed72f8d040105aef40e7a9fc533049a9ff38a65_image.png?auto=compress,format"></p>
<p>Simply, this lock has two states: <strong>locked</strong> and <strong>open</strong>. Depending on the transitions between these states, below
diagram shows the routes/transitions.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2c993943329ed17cb2d2f20fc09f1771637d369c_image-1.png?auto=compress,format"></p>
<p>Let's say every action is a transition, so every button you click on the lock, it will still be in the same state:
<strong>button pressed</strong>.</p>
<p>Only after entering the correct combination, will the lock move to the <strong>open</strong> state. Afterwards, there is a security
timeout that returns to the locked state after a certain <strong>time has expired</strong>.</p>
<p>Let's imagine a <em>very simple</em> manual way to code this lock in Javascript:</p>
<div class="highlight"><pre><span></span><code><span class="k">const</span><span class="w"> </span><span class="n">OPEN_STATE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"open"</span><span class="p">;</span>
<span class="k">const</span><span class="w"> </span><span class="n">LOCKED_STATE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"locked"</span><span class="p">;</span>
<span class="k">const</span><span class="w"> </span><span class="n">lockTimeout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">3000</span><span class="p">;</span>
<span class="k">class</span><span class="w"> </span><span class="n">StateMachine</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">constructor</span><span class="p">(</span><span class="n">code</span><span class="p">){</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">LOCKED_STATE</span><span class="p">;</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">code</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">code</span><span class="p">;</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">entry</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">enterDigit</span><span class="p">(</span><span class="n">digit</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">entry</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">digit</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">unlockDevice</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">entry</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">code</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">OPEN_STATE</span><span class="p">;</span>
<span class="w"> </span><span class="n">setTimeout</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">lockDevice</span><span class="p">,</span><span class="n">lockTimeout</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">lockDevice</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">LOCKED_STATE</span><span class="p">;</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">entry</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="k">const</span><span class="w"> </span><span class="n">fsm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">StateMachine</span><span class="p">(</span><span class="s2">"123"</span><span class="p">);</span>
<span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">fsm</span><span class="o">.</span><span class="n">state</span><span class="p">);</span>
<span class="n">fsm</span><span class="o">.</span><span class="n">enterDigit</span><span class="p">(</span><span class="s2">"1"</span><span class="p">);</span>
<span class="n">fsm</span><span class="o">.</span><span class="n">unlockDevice</span><span class="p">();</span>
<span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">fsm</span><span class="o">.</span><span class="n">state</span><span class="p">);</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="nb">prints</span><span class="w"> </span><span class="s2">"locked"</span>
<span class="n">fsm</span><span class="o">.</span><span class="n">enterDigit</span><span class="p">(</span><span class="s2">"2"</span><span class="p">);</span>
<span class="n">fsm</span><span class="o">.</span><span class="n">unlockDevice</span><span class="p">();</span>
<span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">fsm</span><span class="o">.</span><span class="n">state</span><span class="p">);</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">still</span><span class="w"> </span><span class="s2">"locked"</span>
<span class="n">fsm</span><span class="o">.</span><span class="n">enterDigit</span><span class="p">(</span><span class="s2">"3"</span><span class="p">);</span>
<span class="n">fsm</span><span class="o">.</span><span class="n">unlockDevice</span><span class="p">();</span>
<span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">fsm</span><span class="o">.</span><span class="n">state</span><span class="p">);</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="s2">"unlocked"</span>
</code></pre></div>
<p>Every time <strong>unlockDevice()</strong> is called, it checks if the current entry matches the code. This is called the
<strong>transition condition</strong>. If true, it allows the state to transition to the next (or previous state).</p>
<p>Here are some examples of FSM libraries in Javascript that you might find useful:</p>
<ul>
<li><a href="https://github.com/jakesgordon/javascript-state-machine">https://github.com/jakesgordon/javascript-state-machine</a></li>
<li><a href="https://github.com/ianmcgregor/state-machine-js">https://github.com/ianmcgregor/state-machine-js</a></li>
</ul>
<h3>Our use case</h3>
<p>At <a href="https://engineering.zalando.com/">Zalando</a>, our team is responsible for building the Guest Checkout Flow to allow
non-Zalando customers to be able to purchase without an account. We first started with the basic flow and didn't have
much in mind on what was to come.</p>
<p>The basic flow was:</p>
<p><em>Product Page -> Personal Info -> Address Info -> Payment -> Confirmation -> Receipt</em></p>
<p>Every page in this design was responsible for the transition to the next page, example:</p>
<div class="highlight"><pre><span></span><code><span class="o">//</span><span class="w"> </span><span class="n">product</span><span class="o">-</span><span class="n">detail</span><span class="o">.</span><span class="n">js</span>
<span class="o">//</span><span class="w"> </span><span class="o">...</span>
<span class="k">const</span><span class="w"> </span><span class="n">buyButtonClicked</span><span class="p">()</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">goToPersonalPage</span><span class="p">();</span>
<span class="p">}</span>
<span class="o">//</span><span class="w"> </span><span class="o">...</span>
<span class="o">//</span><span class="w"> </span><span class="n">personal</span><span class="o">.</span><span class="n">js</span>
<span class="o">//</span><span class="w"> </span><span class="o">...</span>
<span class="k">const</span><span class="w"> </span><span class="n">confirmButtonClicked</span><span class="p">(</span><span class="n">personalInfo</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">personalInfoComplete</span><span class="p">(</span><span class="n">personalInfo</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">goToAddressPage</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="o">//</span><span class="w"> </span><span class="o">...</span>
</code></pre></div>
<p>But there's one small flaw with this simple design. It's not extendable, not even testable.</p>
<p>Our product team wanted to introduce some new functionality to the flow, namely "Login Functionality," which would
completely break the whole design.</p>
<p><em>Logged in users, without personal info or Address info:</em></p>
<p><em>Product Page -> Login -> Personal Info -> Address Info -> Payment -> Confirmation -> Receipt</em></p>
<p><em>Logged in users, without payment info:</em></p>
<p><em>Product Page -> Login -> Payment -> Confirmation -> Receipt</em></p>
<p><em>Logged in users, without address info, BUT HAVE PAYMENT:</em></p>
<p><em>Product Page -> Login -> Address -> Confirmation -> Receipt</em></p>
<p><em>Logged in users, without payment info:</em></p>
<p><em>Product Page -> Login -> Address Info -> Payment -> Confirmation -> Receipt</em></p>
<p>And what about Guest Users now? Too much if-else.</p>
<h3>Enter The State Machine</h3>
<p>This design screams for a state-machine like design. We laid down the states we want, defined some rules between them,
and let the state machine do it's magic.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d7049d0e016ee3b2e586d75b3671feccfe6f9b08_image-2.png?auto=compress,format"></p>
<p>This is a simplified example of how the FSM would work. If you notice, almost all pages return back to the FSM for
consultancy on <em>where to go next</em>. The FSM has validation rules that allows it to decide what to do next; it uses the
<a href="https://redux.js.org/">Redux</a> Store to decide.</p>
<p>We called this function, <strong>goNext()</strong>. We defined all the possible rules and transitions we have in the system; a
fallback would be to just render the product page if the state is not compatible with any of the transitions.</p>
<p>The state machine takes the state, follows through the rules and keeps <em>"going next"</em> until it finally reaches the
proper state.</p>
<p>An earlier example of a user with personal + address but with no payment would be:</p>
<p>*Personal state: User? Has personal? Yes? Go next.
Address State: Has address? Yes? Go next.
Payment: Has payment? No? Stay here.
*</p>
<h3>A challenge to that design</h3>
<p>A good challenge to this design was the implementation of “going back.” The state machine was design to always move
forward, right? What happens if the user decides to go back to the previous page? Luckily the Redux State System manages
this, however, it was not implemented in our initial design with <strong>goNext()</strong>. The answer is simple. We implemented
<strong>goPrev()</strong>, which would have the same concept as going forward, just the other way around. Same rules apply: different
direction. It worked quite well (after ironing out some nasty bugs).</p>
<h3>Pros of this FSM Design</h3>
<ul>
<li>Easily maintainable, transitions and states are clearly defined</li>
<li>Testable, unit tests can easily be written with pre-defined states for multiple scenarios</li>
<li>Easily extendable, allowing for new states to be just plugged in along with their rules</li>
</ul>
<h3>Cons of this FSM Design</h3>
<p>If some scenarios are not well defined, the FSM just redirects the user to the product page when they were almost in the
payment page. For example, if some underlying backend service (e.g., a payment provider) returns an unexpected response,
the Redux state would get corrupted and the FSM wouldn't know what to do, redirecting the user to the product page,
leaving the user confused with, "What on Earth happened to my credit card now?"</p>
<p>We try to cover as many scenarios as possible, also providing the user with a proper error page so that they do not get
confused.</p>
<p>A next-step improvement would be allowing the FSM to "re-try" if something fails.</p>
<p>And as they say, computers and humans aren't perfect.</p>
<p><em>Follow more of Ahmed's writing <a href="https://ashehata.me/">here</a>, or have a look at our <a href="https://jobs.zalando.com/tech/jobs/">open tech
positions</a> to work with people like him!</em></p>The State of Open Source2018-06-28T00:00:00+02:002018-06-28T00:00:00+02:00Per Plougtag:engineering.zalando.com,2018-06-28:/posts/2018/06/state-of-open-source.html<p>The evolution and future of open source at Zalando</p><h3><strong>The evolution and future of open source at Zalando</strong></h3>
<p>Open source software has been the core of Zalando’s tech stack since the company’s humble beginnings, selling flip-flops
from a basement 10 years ago; it’s part of our DNA as a tech company.</p>
<p>For engineering teams at Zalando, open source is a natural part of how we solve problems, we consult and share the
<a href="https://opensource.zalando.com/tech-radar/">TechRadar</a> for guidance on appropriate technologies to use, we contribute
to projects such as <a href="https://github.com/kubernetes-incubator/external-dns">Kubernetes</a>, and work in the open on a very
large part of our infrastructure setup such as <a href="https://github.com/zalando/nakadi">Nakadi</a>,
<a href="https://github.com/zalando/connexion">Connexion</a> and <a href="https://github.com/zalando/patroni">Patroni</a>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b68cc29b47f7cf3a876067182167021deea4254b_screen-shot-2018-06-27-at-18.13.15.png?auto=compress,format"></p>
<p>Today we are releasing the <a href="https://opensource.zalando.com/docs/reports/june-2018/">very first of our reports</a> on open
source at Zalando to give everyone inside and outside the company an insight into how we are actually doing. The
increased insight gives us many reasons to celebrate our contributions, but also signposts to take action in the areas
where we see a need to improve.</p>
<p>While the overall picture is positive and open source at Zalando is maturing, we also see challenges: contributions are
driven by a relatively small group of Zalandos, there are legal and organisational barriers to entry, and no consistent
process for open source work at Zalando.</p>
<p>This is where the newly created <a href="https://opensource.zalando.com/">open source team</a> will get to work, as outlined in the
report, our focus for the rest of the year is ensuring that proper processes are in place, that there is no uncertainty
on how to engage in open source, and that our open source efforts grow in size and reach.</p>
<p>We have drafted the following team objectives to work towards:</p>
<ul>
<li>Increase inter-team alignment with inner source initiatives</li>
<li>Align community interests with Zalando’s business interests</li>
<li>Create & nurture open source projects</li>
<li>Improve developer recruitment and retention</li>
<li>Ensure efficient & legally safe adoption and publishing of open source</li>
</ul>
<p>You can follow the process of the open source team on our <a href="https://github.com/zalando/zalando.github.io/issues">issue
tracker</a>, find our open sourced policies on our
<a href="https://opensource.zalando.com/">website</a>, and finally you can read the report on open source at Zalando
<a href="https://opensource.zalando.com/docs/reports/june-2018/">here</a>.</p>
<p><em>Come and work with us at Zalando Tech. Open job positions <a href="https://jobs.zalando.com/tech/jobs/">here</a>.</em></p>The Intrapreneurship Journey at Zalando2018-06-21T00:00:00+02:002018-06-21T00:00:00+02:00Luis Jose Borges Salazartag:engineering.zalando.com,2018-06-21:/posts/2018/06/innovation-intrapreneurship.html<p>Sharing our innovation stories: success, failures, and learnings</p><h3><strong>Sharing our innovation stories: success, failures, and learnings</strong></h3>
<p>Franzi, Humberto, Neil, Lenia, Vivek... These are just some names of the people who are willing to put in the extra
effort and run the additional mile to impact the organization in a way they haven’t done before. The stories of these
Zalando intrapreneurs are the ones I summarized at the <a href="https://innov8rs.co/madrid/">Innov8rs conference</a> in Madrid.</p>
<p>Back in October 2015, Zalando took a leap and opened a <a href="https://engineering.zalando.com/posts/2015/09/zalando-opens-new-playground-for-tech-innovation.html">new playground for Tech
Innovation</a>. It
allowed tech teams to experiment with emerging technologies, support product discovery, kickstart hardware initiatives
around 3D printing, and for prototyping of all sorts. Since then, our innovation strategy and approach have completely
changed. How do we innovate in what we do at Zalando? One of the many vehicles for innovation is
<a href="https://engineering.zalando.com/posts/2015/03/we-launched-it-the-zalando-space-shoe-video.html">Slingshot</a>, our
intrapreneurship program.</p>
<p><strong>Slingshot
</strong>A name drawn from aerospace engineering and orbital mechanics: “a gravitational slingshot is the use of the relative
movement and gravity of an astronomical object to alter the path and speed of a spacecraft, typically to save propellant
and reduce expenses”. Same as the influence it has on a spacecraft, Slingshot is Zalando’s intrapreneurship program
aimed to accelerate ideas or redirect their path. Fostered by the <a href="https://corporate.zalando.com/en/innovation/grassroots-tech-innovation">Innovation
Lab</a>, it is the opportunity to validate ideas in
a fixed time frame of 12 days within a three month period. Open to everyone at Zalando, it allows individuals and teams
to dedicate 20% of their paid working time to the project, and provides plenty of space, help and expertise to validate
their ideas for further funding: more time, more money, more support and much more passion and commitment.</p>
<p><strong>Our initial approach
</strong>Our playground a couple of years ago was mostly aimed at developing tech-focused ideas. This meant that innovators
came to our lab to experiment with brilliant visionary ideas with topics around chatbots, AR/VR, IoT and Conversational
Technologies. Some of these ideas were built based on how we understood the technology and how customers could use it to
browse and buy products from the Zalando Store. Good things happened back then, media coverage was one of them and the
team who developed our first fashion chatbot made it to the Facebook F8 Conference. We were just on the right track! At
least, that’s what we thought. We were not. Our use cases couldn’t be validated and our customers didn’t find them
useful. In simple terms: we were trying to create solutions where we didn’t understand the problem of our customers. Our
teams got so attached and fell in love with their solution to the extent that sometimes the problem lost its meaning
along the way.</p>
<p><strong>Our most valuable pivot
</strong>Lean startups talk about quick iteration, pivots, and how to be more innovative while maximizing precious resources.
Honestly, we failed here and there. It took us some time to realize some of the mistakes we made until we finally
decided to move away from a “Solution first” approach to a “Problem first” approach. What does this mean? Our innovation
approach focuses first on the customer problem backed with relevant insights and data, and then a defined problem
statement. Only then are we able to explore many different solutions, prototype one, two or three of them and validate
with the defined target audience. We didn’t reinvent the wheel for this, we relied on wide-spread tools and
methodologies, especially on <a href="https://engineering.zalando.com/posts/2016/11/the-sprint-exposed--how-we-use-it-at-zalando.html">Design
Sprints</a> from Google
Ventures.</p>
<p>An example of this approach is a recent project which came to us. These intrapreneurs wanted to explore <a href="https://engineering.zalando.com/posts/2017/09/a-state-of-the-art-method-for-generating-photo-realistic-textures-in-real-time.html">Computer
Vision</a>
as a solution for Search. When we started the project, we first tried to understand the problem from a user perspective.
We got out of the building, crafted a research question to know how our customers search for outfits and met several of
them. We came up with the user journey in <em>figure 1</em> after talking to them. The important revelation here was that most
of our users started their journey on social media channels and then had a lot of hacky ways to find a similar outfit.
This changed the course of the whole project and we figured out how to innovate these journeys and provide a superior
experience with less effort for our users. Starting with the customer problem: different approach, <em>huge</em> impact!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3aa26a46a683c22ec23f0336f5733b331ccac046_screen-shot-2018-06-21-at-18.54.08.png?auto=compress,format"></p>
<p><strong>Support is paramount
</strong>The work of our innovators can’t be done by themselves. We relied on the expertise of hundreds of our colleagues who
help us move forward every single one of the 12 allocated days. We are more than thankful about their willingness to
cooperate and work together to validate ideas. There are teams whose leads are keen and open to share their talent:
designers, developers, business, product people, etc. On top of that, none of these initiatives could happen without the
support of the management in Zalando, who invest in and support fostering a culture for innovation across the company
boosted by quarterly events like <a href="https://engineering.zalando.com/posts/2018/03/cross-department-hackathons.html">Hack
Week</a>.</p>
<p><strong>Our impact
</strong>Slingshot has been around since the end of 2015. Small experiments have been created in Hack Week, validated in
Slingshot, and new teams have been created and grown to something big, as big as Zalando’s <a href="https://corporate.zalando.com/en/newsroom/en/stories/data-personalization-and-future-sizing">Sizing
Team</a>, creating remarkable
business impact. It all started from the bottom and that is how we succeed with our people: we foster an environment for
bottom-up innovation boosted by the passion and drive of our people who have a deep desire to engage in building new
products that reinforce our <a href="https://corporate.zalando.com/en/company/zalandos-platform-strategy">platform strategy</a> and
shapes the future of Zalando.</p>
<p>Stories such as <a href="https://engineering.zalando.com/posts/2018/02/innovation-digital-experience.html">Innovation in Digital
Experience</a> by a Zalando data
scientist, also feature in this blog, showing how deeply we impact our people; people who are able to follow their
passion, are truly obsessed of solving customer problems, and who are eager to see their products coming to life.</p>
<p><strong>Moving forward
</strong>Slingshot is everywhere. What started as a “tech only” initiative has spread around the company and our learnings have
helped us have the right tools in place, iterate faster on our learnings, and offer a compelling value proposition
company-wide. That is to say, enabling a platform for intrapreneurs to kickstart radical ideas that generate business
value or fail fast. Our vision is also clear: to be the accelerator for Zalando visionaries to innovate our business.</p>
<p>At Zalando, more than 100 nationalities are represented, and we are all able to innovate and reimagine our business.
When it comes to business success, it is all about people, people and people.</p>
<p><em>Interested to know more? Luis gave a talk about this journey at Innov8rs Conference in Madrid in May. If you have any
questions or want to get in touch, you can reach him through his <a href="https://www.linkedin.com/in/luisjborges/">LinkedIn</a>.</em></p>All Aboard2018-06-14T00:00:00+02:002018-06-14T00:00:00+02:00Tim Alexander Leutholdtag:engineering.zalando.com,2018-06-14:/posts/2018/06/all-aboard.html<p>What new tech employees can expect from Zalando onboarding</p><h3><strong>What new tech employees can expect from Zalando onboarding</strong></h3>
<p>So, you’ve applied for a technical role at Zalando and you’ve just accepted the offer! If you’re wondering what to
expect, look no further. We are excited to share a peek behind the scenes, so you can see what awaits you in the first
few weeks of this journey, regardless of whether you’re joining in Berlin, Dortmund, Dublin, Hamburg, Helsinki or
Lisbon, to make sure you’re well-equipped to dive into life at Zalando.</p>
<p><strong>Part One</strong>
We designed a special onboarding program for our tech community. The program introduces our new hires to their work
environments. This starts with a company-wide onboarding day in which new hires from all across Zalando find out more
about Zalando’s history, structure, and strategic future. Zalando started as a very small company in Berlin in 2008, and
has now grown to be Europe’s largest online fashion platform with over 15,000 employees. As a new “Zalando,” you’re now
an integral part of this journey, and it’s our job to make sure you’re fully equipped to hit the ground running!</p>
<p><strong>Part Two: General Tech Onboarding</strong>
For most of you, Your second day begins with the Tech Onboarding Program, together with all the other tech newbies. This
is taken care of by the Zalando Tech Academy, our internal training centre which caters specifically to tech roles. This
is an exciting time, because you get to interact with future colleagues from across the tech spectrum; product managers
to UX designers, frontend and backend software engineers, as well as tech management, to get deeper insights into what
we do. Early exposure to these different aspects is what makes Zalando Tech such a great place to work.</p>
<p>In this four-day program, a major highlight is a full-day trip to one of our logistics centres near Berlin, where you
are able to see how Zalando Logistics operates on the inside, and how we employ technology to improve processes every
step of the way. Every one of Zalando’s employees works towards the same goal: to ensure a seamless customer experience,
whether you’re working in tech, on the business side, in logistics, or anywhere else. This trip to the warehouse is to
provide exposure to even more areas of Zalando’s business; helping our newbies appreciate the work that goes into
delivering customer satisfaction at every level.</p>
<p>The tech community is an extremely important aspect of working at Zalando, and one of our main goals from the beginning
is to make it easy for you to connect with as many people as possible. That’s why part of your onboarding includes an
overview of Zalando’s culture and community. Zalando is home to dozens of “
<a href="https://engineering.zalando.com/posts/2018/05/dublin-data-science-guild.html">guilds</a>”; self-organized groups of people
who gather to exchange knowledge, share their expertise, or just indulge in things they are passionate about. Our
community management team helps manage the guilds and provides the framework necessary for Zalando’s techies to thrive.
Especially for newcomers and international employees, this is the perfect opportunity to get involved, and receive
recommendations and support.</p>
<p><strong>Part Three: Engineering Bootcamp</strong>
After covering the basics in the first few days, software engineers embark on a further onboarding program, which dives
a bit deeper into the tech environment: Engineering Bootcamp. In this three day program, engineers gain deep insights
into our software development lifecycle, covering all the tools and technologies used by Zalando. You’ll be given
hands-on exercises on all of our tools, so that you can get to coding and deploying projects as soon as possible.</p>
<p>Topics covered include why we use REST APIs, how we use GitHub Enterprise, how we deploy software using
<a href="https://engineering.zalando.com/posts/2017/11/running-kafka-streams-applications-aws.html">AWS</a>, Stups and
<a href="https://engineering.zalando.com/posts/2017/06/postgresql-in-a-time-of-kubernetes.html">Kubernetes</a>, and much more. We
also introduce newbies to our “tech radar”: a graphical illustration of which frameworks, infrastructure, data
management tools, and languages are in use at Zalando, which are being trialed, and which are on hold and not
recommended for new projects.</p>
<p>Our tech onboarding program prepares you for almost any team within the Zalando Tech universe, but of course it’s
followed by team-specific onboarding, in which you learn the ins and outs of projects you’re working on and how they fit
into Zalando’s strategy.</p>
<p>Whether you’re joining the Zalando team as a software engineer, product manager, UX designer, data scientist, or any
other role, we’ve got you covered! At Zalando, we want to ensure that you have the best experience possible when joining
the team.</p>
<p>Now it’s up to you
<em>Interested in joining our tech team? No matter which tech hub you start in, we’ll bring you to Berlin for our company
onboarding. To discover open positions in our tech team, <a href="https://zln.do/2Jz0pt5">check out our jobs page</a>.</em></p>Loading Time Matters2018-06-11T00:00:00+02:002018-06-11T00:00:00+02:00Christoph Luetke Schelhowetag:engineering.zalando.com,2018-06-11:/posts/2018/06/loading-time-matters.html<p>How Zalando's overall site speed improved by more than 25% in five months</p><h3><strong>How Zalando's overall site speed improved by more than 25% in five months</strong></h3>
<p>We all know that providing a fast user experience is key. Still, it was somewhat a wake-up call for us last fall when we
saw our aggregated loading time increasing; not because we had increased latency in our systems but simply because the
share of mobile visits kept increasing. By now, over 75% of our traffic comes from mobile devices (nearly equally split
between app and web). And customer expectations are rising, especially on mobile!</p>
<p>We took this wake-up call as an opportunity to explore the impact of site speed in more detail. Yes, at Zalando every
millisecond of latency counts, but what’s the concrete impact of another 100 msec improvement? We analyzed the
correlations of loading time and revenue per session across every step of the user journey and for every device. The
pattern was very clear and consistent (even if somewhat different in size). Shorter loading times go hand in hand with
higher revenue per session. An A/B test brought the final confirmation: 100 msec loading time improvement led to a 0.7%
uplift in revenue per session.</p>
<p>At Zalando, we live our values by setting bold expectations and making them highly visible for everyone in the company.
Our ambition was a 20% loading time improvement in the first half of 2018. We’re excited that our efforts paid off and
we reached a 25% improvement on our overall loading time within five months. We’re obviously thrilled that this is noted
in <a href="https://medium.com/mmagermany/aldi-check24-dkb-and-opel-lead-the-list-of-germanys-fastest-mobile-websites-2641ffd70d6f">Google’s “Mobile Speed Leaderboards”
study</a>,
which rated us as the fastest mobile site in fashion retail. We’d like to share how we achieved this.</p>
<p>Given Zalando’s size, with hundreds of engineering teams and a breakneck pace of development, some teams are entirely
self-sufficient when it comes to managing their performance, while others embark on a crash program to eliminate
bottlenecks. That’s where Mission Control comes in: targeted engagements with engineering teams and Zalando’s Site
Reliability Engineering (SRE) program. Our site reliability engineers roll up their sleeves and apply their specialized
experience to achieve immediate results, while providing the tools for self-management after the engagement ends.</p>
<p>Over the last few months, a special focus has been on the optimization of the render time and time to interact with our
website. On almost every step of the user journey, the engineers reduced the time to interaction by decreasing the
amount of code that has to be executed. This sounds obvious, but it is not always easy to implement due to the chosen
technology.</p>
<p>We identified an older React version as one of the reasons for a slow loading time. So our platform team updated the
React version that we use from 15.6.1 to 16.2.0. This update was solely responsible for improving the JavaScript
execution time by over 100 milliseconds.</p>
<p>Our engineers from the Search and Browse team started the optimization with profiling their front end components with
the <a href="https://reactjs.org/docs/optimizing-performance.html#profiling-components-with-the-chrome-performance-tab">component-level
profiling</a>,
which was introduced in 15.4.0, and was turned on by default in React 16. It shows rendering time (mount and update) of
each component, and warns about possible performance bottlenecks like updates triggered in lifecycle methods. This was a
killer feature for us. Even if it is only available on development build, the proportion of rendering times resembles
the one of production build.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a4f168469326f3b9bd35f3535c6f45b6523382e3_profiling-desktop-before.png?auto=compress,format"></p>
<p>Combined with Chrome’s Performance Tab, it helped us to identify the bottlenecks.</p>
<p>When we looked into profiling results, it was clear that
<a href="https://gist.github.com/paulirish/5d52fb081b3570c81e3a">reflows</a> were the biggest bottlenecks. The purple boxes are
reflows in JavaScript execution on production.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5d166ad6a621b3c3222ca72b6e56e7a443706e26_reflow.png?auto=compress,format"></p>
<p>On mobile and tablet, <a href="https://github.com/jasonslyvia/react-lazyload">react-lazyload</a> for product images were triggering
two reflows. The Catalog page renders eight products on server-side and 76 products with client-side. The second reflow
took a very long time because it calculates the layout of a big area on the screen for the newly rendered 76 products.
We removed the lazyload and implemented Low Quality Image Placeholders (
<a href="https://www.guypo.com/introducing-lqip-low-quality-image-placeholders/">LQIP</a>) instead to avoid reflow at all.</p>
<p>Before (Mobile):</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/25c1f1d3b84a91614e7580784fb16757d8ad534d_profiling-mobile-before.png?auto=compress,format"></p>
<p>After (Mobile):</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ecb91d6f244f21446bf8e542612b20ec8694bf07_profiling-mobile-after.png?auto=compress,format"></p>
<p>On desktop and tablet, <a href="https://github.com/bvaughn/react-virtualized">react-virtualized</a> for a product filter dropdown
was triggering reflow. The product filter component does not show anything until it is clicked, but it was rendered to
provide links for crawlers. We stopped rendering the hidden product filter component and removed the reflow. For
crawlers, we prepared links generated with string concatenation outside of React components.</p>
<p>Before (Desktop):</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9959f0f61c54c82e33f8a2ba30bb45d6dd00fb83_profiling-desktop-before-1.png?auto=compress,format"></p>
<p>After (Desktop):</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2af99336e087bd9806f97f7df026297b82a2973a_profiling-desktop-after.png?auto=compress,format"></p>
<p>As a result, we managed to reduce JavaScript execution time of Catalog by about 200 milliseconds on desktop and about
300 milliseconds on mobile devices at 90 percentile.</p>
<p>Desktop:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b26627c4dea05c88306aa7cf90cacfbf4692d320_desktop-js-execution.png?auto=compress,format"></p>
<p>Mobile:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c405ae4a6a8e12fab0bfec494b09e8e754154446_mobile-js-execution.png?auto=compress,format"></p>
<p>Another optimisation we did, was reducing the bundle size. Only sending code that is necessary helped to optimize the
performance significantly. In the end each byte counts as JavaScript is expensive for the browser to process. Also,
surprisingly many visitors don’t have cache (needs data), so it’s important to keep JavaScript bundles as small as
possible. To identify where we have to look and where potentially the best results can be achieved, we used the
<a href="https://github.com/webpack-contrib/webpack-bundle-analyzer">webpack-bundle-analyzer</a>.</p>
<p>We identified libraries that are large in size but not very necessary for us and we used <a href="https://webpack.js.org/guides/tree-shaking/">tree
shaking</a> to eliminate dead-code. Unfortunately some CommonJS libraries did
not work well with tree shaking. In these cases, we removed the packages and chose a smaller alternative or wrote our
own. Also, we found out that some internal libraries were bundling their dependencies into their bundles with webpack.
This caused our bundle to have the same code multiple times because NPM’s deduping mechanism couldn’t find the
duplication.</p>
<p>By applying this approach we reduced the overall size of our Header Fragment by 25% (36.6 KB -> 27.4 KB gzipped):</p>
<p>Header Fragment (before):</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5a51951f08b0d273613d4daebefecc82a91c0847_header-before.png?auto=compress,format"></p>
<p>Header Fragment (after):</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5ed3bda9d4cdc1029241527bc18e7c78b2297d04_header-after.png?auto=compress,format"></p>
<p>Because each byte counts, we also reduced the page site in total (amount of DOM elements, JSON data size e.g. props).</p>
<p>React client-side hydration needs the props that are used for server-side rendering. The props are typically embedded
into HTML as JSON. In the JSON, we had some unnecessary properties in large arrays of objects that were passed through
from backend APIs. Removing those unused properties reduced the page up to 17 KB gzipped.</p>
<p>As the Zalando website uses SVG for icons, part of reducing the page size was also the SVG optimization. The <a href="https://github.com/svg/svgo">SVG
Optimizer</a> (SVGO) is a great tool for optimizing SVG images. We have already been using the
tool for a while, but recently we noticed that we had forgotten to do decimal precision optimization. It specifies the
precision of floating point coordinates. SVG images generated from graphic software usually have too precise numbers to
render pixels. After the optimization we reduced the SVG size by about 50%.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/fb2458c48011788c5d9aee47499f8b644dadf93f_svgo.png?auto=compress,format"></p>
<p>The biggest learning we had from our optimizations efforts is:</p>
<p>*Remove as many as possible of your dependencies, keep the amount of code as small as possible and your webpage will be
fast (again). A small and fast webpage will make your customers happy and will result in more conversion.</p>
<p>*Looking to the future, SRE is making a number of improvements to make it easier for Zalando’s hundreds of engineering
teams to self-manage their performance. It starts with setting expectations by Service Level Objectives that are
meaningful from the customer perspective. With expectations set, we measure our Service Level Indicators against those
expectations and we dive deep to optimize bottlenecks; -- that’s where distributed tracing comes in. With expectations
and deep instrumentation, we gain the ability to implement monthly error budgets to help engineering teams better
achieve operational excellence. The journey continues...</p>
<p><em>Join our <a href="https://jobs.zalando.com/tech/jobs/">tech team</a> at Zalando.</em></p>State Management in React2018-05-31T00:00:00+02:002018-05-31T00:00:00+02:00Kaiser Anwar Shadtag:engineering.zalando.com,2018-05-31:/posts/2018/05/state-management-react.html<p>Comparing Redux, MobX & setState in React</p><h3><strong>Comparing Redux, MobX & setState in React</strong></h3>
<p>by Kaiser Anwar Shad and revised by Eugen Kiss</p>
<p><em>Introduction</em></p>
<p>React is a declarative, efficient, and flexible JavaScript library for building user interfaces. Compared to other
frontend libraries and frameworks, React’s core concept is simple but powerful: <em>‘React makes it painless to design
simple views and renders by using virtual DOM’</em>. However, I don’t want to go into detail about virtual DOM here. Rather,
I want to show <em>three ways</em> how you can manage state in React. This post requires basic understanding about the
following state management approaches. If not, check out the docs first.</p>
<ol>
<li><strong>setState:</strong> React itself ships with built-in state management in the form of a component’s <em>`setState`</em> method,
which will queue a render operation. For more infos => <a href="https://reactjs.org/">reactjs.org</a></li>
<li><strong>MobX:</strong> This is a simple and scalable library applying tested functional reactive programming (TFRP), which
stands for: ‘<em>Anything that can be derived from the application state, should be derived. Automatically.’</em> For more
infos => <a href="https://mobx.js.org/">mobx.js.org</a></li>
<li><strong>Redux:</strong> Maybe the most popular state management solution for React. The core concepts are having a single source
of truth, immutable state and that state transitions are initiated by dispatching actions and applied with pure
functions (reducers). For more infos => <a href="https://redux.js.org/">redux.js.org</a></li>
</ol>
<p><em>Location</em></p>
<ol>
<li><strong>setState</strong> is used locally in the component itself. If multiple children need to access a parent’s local state,
the data can either be passed from the state down as props or, with less piping, using React 16’s new Context API.</li>
<li><strong>MobX</strong> can be located in the component itself (local) or in a store (global). So depending on the use case the
best approach can be used.</li>
<li><strong>Redux</strong> is providing the state globally. Means the state of the whole application is stored in an object tree
within a single store.</li>
</ol>
<p><em>Synchronicity</em></p>
<ol>
<li><strong>setState</strong> is asynchronous.*</li>
<li><strong>MobX</strong> is synchronous.</li>
<li><strong>Redux</strong> is synchronous.</li>
</ol>
<p>*<em>Why asynchronous?</em> Because delaying reconciliation in order to batch updates can be beneficial. However, it can also
cause problems when, e.g., the new state doesn’t differ from the previous one. It makes it generally harder to debug
issues. For more details, check out the <a href="https://github.com/facebook/react/issues/11527#issuecomment-360199710">pros</a>
and <a href="https://blog.cloudboost.io/3-reasons-why-i-stopped-using-react-setstate-ab73fc67a42e">cons</a>.</p>
<p><em>Subscription</em></p>
<ol>
<li><strong>setState</strong> is implicit, because it directly affects the state of the component. Changing the state of child
components can be done via passing props (or Context API in React 16).</li>
<li><strong>MobX</strong> is implicit, because it is similar to setState with direct mutation. Also component re-renders are derived
via run-time usage of observables. To achieve more explicitness/observability, actions can (and generally should) be
used to change state.</li>
<li><strong>Redux</strong> is explicit, because a state represents a snapshot of the whole application state at a point in time. It
is easy to inspect as it is a plain old object. State transformations are explicitly labeled/performed with actions.</li>
</ol>
<p><em>Mutability</em></p>
<ol>
<li><strong>setState</strong> is mutable because the state can be changed by it.</li>
<li><strong>MobX</strong> is mutable, because actions can change the state of the component.</li>
<li><strong>Redux</strong> is immutable, because state can’t be changed. Changes are made with pure functions which are transforming
the state tree.</li>
</ol>
<p>* With mutability the state can be changed directly, so the new state overrides the previous one. Immutability is
protecting the state from changes and (in Redux) instead of directly changing the state it dispatches actions to
transform the state tree into a new version.</p>
<p><em>Data structure</em></p>
<ol>
<li><strong>setState</strong> -</li>
<li><strong>MobX</strong> Graph: multidirectional ways; loops can be used. The state stays denormalized and nested.</li>
<li><strong>Redux</strong> Tree: is a special kind of graph, which has only one way: from parent to child. The state is normalized
like in a database. The entities only reference to each other by identifiers or keys.</li>
</ol>
<p><em>Observing Changes</em></p>
<ol>
<li><strong>setState</strong> -</li>
<li><strong>MobX:</strong> Reactions are not producing new values, instead they produce side effects and can change the state.</li>
<li><strong>Redux:</strong> An Object describes what happened (which action was emit).</li>
</ol>
<p><em>Conclusion</em></p>
<p>Before starting to write your application you should think about which problem you want to solve. Do you really need an
extra library for state management or is React’s built-in setState fulfilling your needs? Depending on the complexity
you should extend it. If you love to go for the mutable way and expect the bindings automatically, then MobX can fit
your needs. If you want to have a single source of truth (storing state in an object tree within a single store) and
keep states immutable, then Redux can be the more suitable solution.</p>
<p>Hopefully this post gave you a brief overview about the different ways to manage state in React. Before you start with
one of those libraries, I recommend to go through the docs of each. There are a lot more treasures to discover!</p>
<p><em>TL;TR:</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/59db9c034220aaeeb0e0bc4da677bc8bb36ff850_screen-shot-2018-05-31-at-14.15.39.png?auto=compress,format"></p>
<p><strong>This post is inspired by:</strong></p>
<ul>
<li><em>State Management With MobX & MobX-state-tree</em> <em>by Michel Weststrate - Workshop in React Amsterdam 2018</em></li>
<li><a href="https://www.youtube.com/watch?v=ZGVwMkrL2n0">Comparing Redux and MobX with two CTO's and React experts - state management using
reactjs</a></li>
</ul>
<p><em>Check out our open Software Engineering positions on our <a href="https://jobs.zalando.com/tech/jobs/tech/?gh_src=4n3gxh1">jobs
page</a>.</em></p>Scaling Agile at Zalando2018-05-17T00:00:00+02:002018-05-17T00:00:00+02:00Holger Schmeiskytag:engineering.zalando.com,2018-05-17:/posts/2018/05/scaling-agile-zalando.html<p>Sharing successful large scale agile experiences</p><h3><strong>Sharing successful large scale agile experiences</strong></h3>
<p>Zalando has been known for radical approaches to agility since 2015. In order to keep growing and staying successful we
took the next step in 2017 forming around 30 business units. Each business unit is formed around one particular business
problem, topic or product with end2end responsibility. All disciplines needed are inside this business unit from
commercial roles to tech teams.</p>
<h3>Challenges in large scale product groups</h3>
<p>Looking at this setup, we experience challenges. You’re probably familiar with this if you work in a similar setup or if
your company has around the size of one of our business units (<100 people).</p>
<ul>
<li>Who takes product decisions at this size with several teams and product people?</li>
<li>How to keep the focus on the actual product with so many technical components and intermediate steps?</li>
<li>How to enable 50 engineers to understand their everyday task contribution to the overall quarterly goals?</li>
<li>How to do sprint planning with 20 people?</li>
<li>How to handle cross-cutting concerns like 24/7 and platform work in a feature team setup?</li>
</ul>
<p>By far the biggest question was however: <em>How can this work inside Zalando?</em></p>
<h3>Our Solution Approach</h3>
<p>How to support these +30 business units to reach their business goals through agile working? Rome was not built in a
day. We knew we had to work by process and collaboration.</p>
<p>We used the power of our network and collected successful solutions from all units. The first and most important
observation was that no solution can be mechanically copied, but always has to be adapted to the specific needs of the
unit (“There are no best practices, only practices that work in a specific context”). To enable this adaption and
learning, in addition to the bare facts we collected:</p>
<ol>
<li>the story and motivation around the solutions</li>
<li>the details of how they are adopted</li>
<li>the (contact details of the) people who created them</li>
</ol>
<p>For the first factor, we invited people from these teams for teachback sessions open for everyone to share their
experiences in a try/avoid format.</p>
<p>Secondly, from these we created a 20 page guide on how to structure large teams with background details. Finally, we
connected people we talked to who have similar challenges to the pioneers, because all advice needs to be adapted to the
specific BU needs.</p>
<h3>Concrete Examples</h3>
<p>For example, the Fashion Store Apps group (5 teams) struggled with their narrow product definition: Each platform and
the API were treated as separate products, with seperate teams, backlogs, product owners, etc. These needed to be
managed, synchronized, and aligned, and code needed to be integrated. As you can imagine, somewhere along the way the
focus on the customer gets hard to find. To address this, the team <strong>redefined</strong> the <strong>product</strong> as “Fashion Store
Apps,” <strong>reorganized</strong> the <strong>teams</strong> to reflect this, and merged all <strong>backlogs</strong> into <strong>one</strong>.</p>
<p>Another example is how Personalization (6 teams) increased the understanding of the goals and unlocked possibilities. As
is typical in a large organization, goals and concrete products were difficult to define for this department and usually
the understanding did not transfuse to the engineering and data science teams. To tackle this, everyone (including
engineers) took responsibility for creating or refining the press releases that underlie the epics for the upcoming
quarter. Ideas to achieve goals are as likely to come from Product* as they are to come from delivery teams. The
concrete outcome is an aligned and <strong>commonly understood overview of the next quarter’s sprints</strong>. This led to much
higher involvement and identification during the quarter, and to more motivated teams.</p>
<h3>A LeSS introduction backwards</h3>
<p>These are only two examples from many more instances of how we scale agile at Zalando. The whole approach is somehow a
LeSS introduction backwards. We make note of what trials work, and we find a high similarity to the LeSS framework
without ever using the word or the whole framework. The practices emerged themselves as they made sense to the people
inside the organization. As one engineering lead put it <em>after</em> reading a
<a href="https://less.works/resources/learning-resources/books.html">LeSS</a> book, “It’s nice to see that we were not the only
ones with these ideas.”</p>
<p>Our key learning directed to all fellow Agile Coaches and Agile Change Agents is to not implement frameworks, but to
source from working solutions and share the successes.</p>
<p>Eventually we will end up in a form of LeSS organization without anybody inside Zalando connecting emotionally to the
framework itself.</p>
<p><em>If you would like to learn more, feel free to reach out to <a href="mailto:agility-coaching@zalando.de">agility-coaching@zalando.de</a> or have a look at our <a href="https://jobs.zalando.com/jobs/1114352-senior-agile-coach/">open
position</a>.</em></p>
<p><em>Many thanks for the input and support of our colleagues Samir Keck, Tobias Leonhard and Frank Ewert.</em></p>Dublin’s Data Science Guild2018-05-15T00:00:00+02:002018-05-15T00:00:00+02:00Humberto Coronatag:engineering.zalando.com,2018-05-15:/posts/2018/05/dublin-data-science-guild.html<p>How to establish and evolve your data science community</p><h3><strong>How to establish and evolve your data science community</strong></h3>
<p>In Zalando, we have many guilds: self-organized groups of people who share interests. The topics, scope, size, and ways
to organize the guilds varies. We have technical guilds like the
<a href="https://engineering.zalando.com/posts/2016/04/zalando-web-guild.html">web</a> or <a href="https://engineering.zalando.com/posts/2015/09/on-apis-and-the-zalando-api-guild.html">API
guilds</a>, local and artistic guilds
like the knitting guild in Helsinki, and some guilds that support the growth of people in certain job families, like the
Data Science Guild.</p>
<p>For more than two years, I have been co-organizing the Data Science Guild in our tech hub in
<a href="https://engineering.zalando.com/posts/2018/05/dublin-tech-hub-three.html">Dublin</a>, creating a place to share data science knowledge
and best practices, and creating a framework that allows the guild to evolve and grow autonomously.</p>
<p>I like to think of guilds like teams or products. This philosophy helps you build the kind of framework you need to run
or be part of a guild. It gives the guild a reason to exist, and it can potentially tell you when it is time to pivot or
move on to the next thing. When we started the data science guild, it was not perfect (and it isn’t perfect now), but as
we were “releasing” a new product, it was important to see if the product was viable: Do people find our talks valuable?
Can we guarantee to have content for 70% of the 52 weeks in a year?</p>
<p>Initially, we had really good feedback. Everyone was interested in giving talks, attendance was high, and as expected, a
lot of people had ideas on how to make the guild better! A few months after the initial positivity, we started to see
some challenges that needed to be addressed if the guild was going to survive. We needed a structure to maintain a
constant flow of content (talks, discussions, etc.,) and we needed to scale the organization by creating a sense of
collective ownership.</p>
<p>Once we saw there was both a need and value from having the data science guild after the initial ramp-up,
we established our mission, “Sharing Data Science (DS) knowledge
and experience to expand our DS expertise.” We devised and measured Objectives and Key Results (
<a href="https://engineering.zalando.com/posts/2016/11/delivering-a-cross-site-project.html">OKRs</a>), and implemented the collective ownership
model.</p>
<p>We focused on three types of content: Internal talks led by any of our data scientists, in which they present a topic in
depth. It could be a new library they are using, how to bring a model into production, or describing the results for the
latest A/B test the team ran. This type of format is very useful to do dry runs of conference talks. Secondly, we have a
“learning club,” a smaller and less formal setting, in which we discuss a recent scientific paper, or watch and discuss
a video lecture. Finally, we also invite researchers from universities to present their work. For now, we ask them to
present their work to PhD students, who benefit from getting feedback from our data scientists in different teams, and
seeing new opportunities for the application of their work in different contexts.</p>
<p>When we implemented the collective ownership model, we iterated a few times. The idea was to give our community the
opportunity to shape how the guild works, and to avoid having bottlenecks or too few people shouldering too much of the
work. At first, we had one person who had ownership for each of the topics; one in charge of the speaker lineup every
week, one sending the invites, and one taking care of the budget. Worth saying: that didn’t work. It required a lot of
alignment between individuals, adding unnecessary overhead, which no one enjoyed.</p>
<p>We settled on a much smaller model, where we have fewer contributors who are part of small committees for half year
periods (aligned with how we set and evaluate our OKRs). The structure looks like this:</p>
<ul>
<li>Our <strong>content team</strong> designs and maintains a content portfolio that reflects our OKRs. They plan the topics, invite
speakers, book rooms and send agenda updates.</li>
<li>Our <strong>audiovisual</strong> (<strong>AV) committee</strong> is a group of volunteers who know how to operate our not-so-easy-to-use AV
system for streaming and recording presentations. Lately, we also have support from IT for this topic, which eases
some of the burden.</li>
<li>Our <strong>social</strong> <strong>committee</strong> is in charge of coordinating the communication with the Data Science guild in Berlin
and running our social events (this involves selecting and buying a mountain cakes and sweets)</li>
</ul>
<p>When running the Data Science Guild, the most important aspects to consider is communication. Because we depend on our
colleagues to give talks, lead discussions or invite speakers for our external event series, I found that people are
much more willing to say “yes” to participating when the request comes in person. Face to face, our guild member can
spend time explaining exactly what is required, answer any questions the potential speaker has, and set a date in the
calendar for the talk. Of course, we then need to inform the attendees with enough time so they can plan to attend a
talk; nothing is worse than spending time preparing a talk to which no one shows up! Finally, we also communicate the
evaluation of our objectives and key results to our stakeholders and a wider internal audience in our monthly meeting;
that helps form the image and reputation of the guild inside our office.</p>
<p>Being part of the Data Science guild has been a wonderful experience. There have been plenty of internal and external
successes, and more importantly, we created a place where we Data Scientists come together beyond our day to day teams.
In the last two years, almost all of us have presented at least once (some much more), we have co-organized and
presented our work in two company-wide data science conferences, invited half a dozen PhD students to give talks in
Zalando, and we have built a reputation for openly sharing knowledge and best practices in the Data Science community in
Dublin, resulting in our members being shortlisted for two <a href="https://engineering.zalando.com/posts/2017/11/datsci-awards-2017.html">DatSci
Awards</a> in 2017. I believe Zalando is a “ <a href="https://annual-report.zalando.com/2017/magazine/heres-to-being-a-good-neighbor/">good
neighbor</a>”; a company where everyone
can make a positive impact in their community, whether that is with their team, a guild, or the whole company.</p>
<p><em>If you want to help the guild model grow, we have <a href="https://jobs.zalando.com/jobs/965101-open-source-community-manager-senior/?gh_src=4n3gxh1">community managers
positions</a>, and if you see
yourself being a regular contributor to our talks, you might consider applying for our <a href="https://zln.do/2GjNC74">data science or research
engineering</a> positions.</em></p>How to Make Product Management for Enterprise Systems Work2018-05-09T00:00:00+02:002018-05-09T00:00:00+02:00Dr Andreas Reichharttag:engineering.zalando.com,2018-05-09:/posts/2018/05/product-management-enterprise-systems.html<p>Moving from a more traditional internal IT setup to a product-driven culture</p><h3><strong>Moving from a more traditional internal IT setup to a product-driven culture</strong></h3>
<p>I love building enterprise systems, because you get to work with your customers/users every day and literally see their
lives change as you release new features. In my case, at Zalando, these are systems for fashion buying, supply chain
management, inventory management and procure-to-pay processes (e.g. paying our suppliers for merchandise we bought from
them). But building good enterprise systems is prone to failure, as author and product consultant Sam McAfee has
recently <a href="https://medium.com/startup-patterns/why-enterprise-agile-teams-fail-4ae64f7852d6">pointed out</a>.</p>
<p>While most product management articles and blogs talk about product management for consumer-facing or enterprise SaaS
products, I’ve found that many of the methods and insights discussed in these can also be applied to developing
enterprise software to be used by your company’s internal users.</p>
<p>This article is an attempt to share some of the lessons we’ve learned at Zalando. We’re Europe’s leading online fashion
platform and in the past few years we’ve moved from a more traditional “internal IT setup” to a product-driven culture
that develops industry-leading bespoke enterprise solutions.</p>
<h3>Allow your enterprise systems teams to own a problem</h3>
<p>As with any product decision, the first step to finding the best solution is to own and understand the problem. In a
traditional internal IT setup, someone from the “business side” has an idea of what to improve. Frequently, this idea
includes a detailed description of the solution they would like “IT to implement."</p>
<p>This is where the problems usually start. Many of these ideas create only limited value, and together they do not form a
coherent product vision. And this is not a surprise; these colleagues know their business but they are not trained
product managers. Unfortunately, the “product manager” (often called “business analyst” in such a setup) does not have
the authority to push back on such requirements, but rather focuses on translating the business requirements into a
detailed design that software engineers can then implement.</p>
<p>At Zalando, we changed this. We created <a href="https://engineering.zalando.com/posts/2018/02/innovation-digital-experience.html">multi-disciplinary
teams</a> that own specific business KPIs, together with
their internal users. For example, our Competence Centre Inventory Management co-owns merchandise availability (an
important KPI for every retailer) with the merchandise planners. The team consists of software engineers, product
managers, product designers, controllers and analysts, who work very closely with our 250 merchandise planners (their
internal users) and our 1,500-plus suppliers to identify the biggest levers to improve merchandise availability <strong>(1)</strong>
(and related KPIs). Some of these levers may require system improvements, some may not. They are free to decide what to
work on next. They are also free to decide what systems to build in-house and where to leverage external solutions. They
just need to ensure that the KPIs they co-own improve; and their progress is reviewed each quarter.</p>
<p>Bridge the Gap Between “Tech” and “Business”</p>
<p>The above setup made it even more important for us to break down functional silos, more specifically between commercial
teams (i.e. “the business”) and tech teams (engineers, product managers). This was a big challenge for us. When we
started, our commercial teams had never heard of agile or MVPs – and often just saw “MVP” as an excuse to cut down on
the features they wanted, or “agile” as an excuse for not being able to commit to release dates. Similarly, our tech
teams did not understand the main business decisions and the need for planning ahead. For example, when your business
grows by 20-25% a year, knowing whether a key problem, one that creates a lot of manual effort, will be solved in six
months or not, has significant implications on how many people you need to start hiring today. Similarly, when you plan
a new business line launch (such as Beauty as a new category on Zalando), and you need to prepare a big marketing
campaign and buy the relevant merchandise, all to go live at the start of a new season in nine months from now, you need
some form of a commitment that the required system functionality will be available by then. We found the tech team would
reject deadlines, as in their view, they contradicted an agile approach. It literally required us to bring two different
worlds together.</p>
<p>How did we go about this? From an organisational standpoint, we fully merged the “business” with the “IT” teams (we used
the terminology “tech is everywhere”). We now no longer have one big monolithic IT organization, but have formed smaller
units, where business and tech teams report into the same senior leaders. At the same time, we invested a lot into
strengthening the overall tech community through the creation of topic-specific guilds or interest groups <strong>(2)</strong>, and
other means, such as cross-team tech talks. We also invested in educating our commercial leaders in basic tech concepts,
such as agile development, MVPs and the importance of not building up technical debt.</p>
<p>At a more operational level, we aim for all colleagues in our Enterprise Systems Teams (engineers, product designers,
analysts, product managers) to take internships with their users regularly. This greatly helps to build a network and
increases understanding of the business problems. We’ve built very active key user communities around all our systems,
and use these communities to help us regularly align our feature backlog and priorities, usually monthly. We also ensure
that users are always present in sprint reviews. In addition, we regularly organise informal exchanges across
disciplines through random lunches and other events.</p>
<p>Recruit and Train the Right Product Managers</p>
<p>You may think this is all obvious, and you are right. However, recruiting and training the right kind of product manager
for this kind of internal role is not straightforward, and I would argue even more challenging than recruiting and
training product managers for other product management roles.</p>
<p>First, they need to be interested in redesigning and improving internal business processes and KPIs, a skill set and
interest not every product manager brings. Second, they need to be great at managing change. When you build
consumer-facing applications, the consumer either gets it or they will leave. With internal systems (and a captive
audience), you can achieve more impact when you combine a system change with a process change, i.e. a change of “how we
do things” (but this necessarily requires seeking and achieving buy-in from many stakeholders). This means that product
managers for in-house enterprise systems will spend a lot of time with users, taking them along on the journey, running
system training, and getting their users’ buy-in for the new solution.</p>
<p>Finally, our product managers need to make make-or-buy decisions and switch between developing a system in-house as part
of an agile team and implementing an external solution alongside a systems integrator. Traditionally, these two are
different roles with different skill sets. At Zalando, we want our product managers to own the problem and solve it in
the best way possible, and they can only do this if they know how to use both internal and external solutions to solve
that problem.</p>
<p>Of course, each of our product managers also needs to have full command of the main product management methodologies to
discover, define, design, and deliver the right solutions for our users and problems. To do this, we regularly work with
tools, such as press releases, pre-mortems, user story maps, design sprints (which we modified slightly), MVPs, and
different ways to prioritize problems and solutions.</p>
<p>So where do we find these unique product managers? Well, there (still) seems to be only a small number of product
managers out there who bring the full skill set we are looking for. Therefore, we either recruit people from a business
team and teach them product management, or recruit (consumer-facing) product managers and enable them to learn the
additional skills required. Luckily, by now we have quite a large product management community where colleagues can
exchange best practices and help each other. If you are starting from scratch, I suggest you do what we did a few years
ago: get a handful a smart people with different profiles and have them learn from each other, supported by a more
formal product management curriculum.</p>
<p>Of course, I could go into much more detail, but I’ll leave it here for now. I would love to hear any comments from the
in-house systems product community out there, so <a href="https://twitter.com/ZalandoTech">please share your thoughts with
us</a>.</p>
<p><strong>(1)</strong> There are various ways to measure this KPI but it roughly measures the percentage of customers that find the
product they are interested in being in stock at the retailer.</p>
<p><strong>(2)</strong> In guilds, colleagues who share the same technical interests can come together and exchange knowledge, e.g. we
have an API guild, a Scala guild, etc.</p>
<p><em>This piece was first published on
<a href="https://www.mindtheproduct.com/2018/04/make-product-management-enterprise-systems-work/">MTP</a>. Work in an amazing team
with people like Andreas. Check out our <a href="https://jobs.zalando.com/jobs/1148471/?utm_source=techblog&utm_medium=blog-p-o&utm_campaign=2018-pdm&utm_content=05-andreas-reichart">jobs
page</a>.</em></p>Many-to-Many Relationships Using Kafka2018-05-08T00:00:00+02:002018-05-08T00:00:00+02:00Michal Michalskitag:engineering.zalando.com,2018-05-08:/posts/2018/05/many-to-many-using-kafka.html<p>Real-time joins in event-driven microservices</p><h3><strong>Real-time joins in event-driven microservices</strong></h3>
<p>As discussed in my <a href="https://engineering.zalando.com/posts/2018/01/simplicity-by-distributing-complexity.html">previous blog post</a>,
Kafka is one of the key components of our event-driven microservice architecture in <a href="https://engineering.zalando.com/posts/2017/10/zalando-smart-product-platform.html">Zalando’s Smart Product
Platform</a>. We use it for sequencing
events and building an aggregated view of data hierarchies. This post expands on what I previously wrote about the
one-to-many data model and introduces more complex many-to-many relationships.</p>
<p>To recap: to ensure the ordering of all the related entities in our hierarchical data model (e.g. Media for Product and
the Product itself) we always use the same partition key for all of them, so they end up sequenced in a single
partition. This works well for a one-to-many relationship: Since there’s always a single “parent” for all the entities,
we can always “go up” the hierarchy and eventually reach the topmost entity (“root” Product), whose ID we use to derive
the correct partition key. For many-to-many relationships, however, it’s not so straightforward.</p>
<p>Let’s consider a simpler data model that only defines two entities: Products (e.g. Shoes, t-shirt) and Attributes (e.g.
color, sole type, neck type, washing instructions, etc., with some extra information like translations). Products are
the “core” entities we want to publish to external, downstream consumers and Attributes are meta-data used to describe
them. Products can have multiple Attributes assigned to them by ID, and single Attributes may be shared by many
Products. There’s no link to a Product in Attribute.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0d9717fd95a7798e3b15f7cb2d7d4cc049cd3342_screen-shot-2018-05-08-at-11.53.08.png?auto=compress,format"></p>
<p>Given the event stream containing Product and Attribute events, the goal is to create an “aggregation” application, that
consumes both event types: “resolves” the Attribute IDs in Product entities into full Attribute information required by
the clients and sends these aggregated entities further down the stream. This assumes that Attributes are only available
in the event stream, and calling the Attribute service API to expand IDs to full entities is not feasible for some
reason (access control, performance, scalability, etc.).</p>
<p>Because Attributes are “meta data”, they don’t form a hierarchy with the Product entity; they don’t “belong” to them,
they’re merely “associated” with them. It means that it’s impossible to define their “parent” or “root” entity and,
therefore, there’s also no single partition key they could use to be “co-located” with the corresponding Products in a
single partition. They must be in many (potentially: all) of them.</p>
<p>This is where Kafka API comes in handy! While Kafka is probably best known from its key-based partitioning capabilities
(see: <em>ProducerRecord(String topic, K key, V value</em>) in Kafka’s Java API), it’s also possible to publish messages
directly to the specific partition using the alternative, probably a less known <em>ProducerRecord(String topic, Integer
partition, K key, V value</em>). This, on its own, allows us to broadcast an Attribute event to all the partitions in a
given topic, but if we don’t want to hardcode the number of partitions in a topic, we need one more thing: producer’s
ability to provide the list of partitions for a given topic using the <em>partitionsFor</em> method.</p>
<p>The complete Scala code snippet for broadcasting events could now look like this:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">scala.collection.JavaConverters._</span>
<span class="n">Future</span><span class="o">.</span><span class="n">traverse</span><span class="p">(</span><span class="n">producer</span><span class="o">.</span><span class="n">partitionsFor</span><span class="p">(</span><span class="n">topic</span><span class="p">)</span><span class="o">.</span><span class="n">asScala</span><span class="p">)</span> <span class="p">{</span> <span class="n">pInfo</span> <span class="o">=></span>
<span class="n">val</span> <span class="n">record</span> <span class="o">=</span> <span class="n">new</span> <span class="n">ProducerRecord</span><span class="p">[</span><span class="n">String</span><span class="p">,</span> <span class="n">String</span><span class="p">](</span><span class="n">topic</span><span class="p">,</span> <span class="n">pInfo</span><span class="o">.</span><span class="n">partition</span><span class="p">,</span> <span class="n">partitionKey</span><span class="p">,</span> <span class="n">event</span><span class="p">)</span>
<span class="o">//</span> <span class="n">send</span> <span class="n">the</span> <span class="n">record</span>
<span class="p">}</span>
</code></pre></div>
<p>I intentionally didn’t include the code to send the record, because the Kafka’s Java client returns Java Future, so
converting this response to Scala Future would require some extra code (i.e. using <em>Promise</em>), which could clutter this
example. If you’re curious on how this could be done without the awful, blocking <em>Future { javaFuture.get }</em> or similar
(please, don’t do this!), you can have a look at the <a href="https://github.com/cakesolutions/scala-kafka-client/blob/master/client/src/main/scala/cakesolutions/kafka/KafkaProducer.scala#L246">code
here</a>.</p>
<p>This way we made the same Attribute available in all the partitions, for all the “aggregating” Kafka consumers in our
application. Of course it carries consequences and there’s a bit more work required to complete our goal.</p>
<p>Because the relationship information is stored in Product only, we need to persist all the received Attributes
somewhere, so when a new Product arrives, we can immediately expand the Attributes it uses (let’s call it “Attribute
Local View”, to emphasise it’s a local copy of Attribute data, not a source of truth). Here is the tricky part: Because
we’re now using multiple, parallel streams of Attribute data (partitions), we need an Attribute Local View <strong>per
partition</strong>! The problem we’re trying to avoid here, which would occur in case of a single Attribute Local View, is
overwriting the newer Attribute data coming from “fast” partition X, by older data coming from a “slow” partition Y. By
storing Attributes per partition, each Kafka partition’s consumer will have access to its own, correct version of
Attribute at any given time.</p>
<p>While storing Attributes per partition might be as simple as adding Kafka partition ID to the primary key in the table,
it may cause two potential problems. First of all, storing multiple copies of the same data means – obviously – that the
storage space requirements for the system are significantly raised. While this might not be a problem (in our case
Attributes are really tiny comparing to the “core” entities), this is definitely something that has to be taken into
account during capacity planning. In general, this technique is primarily useful for problems, where the broadcasted
data set is small.</p>
<p>Secondly, by associating the specific versions of Attributes with partition IDs, the already difficult task of
increasing numbers of partitions becomes even more challenging, as Kafka’s internal topic structure has now “leaked” to
the database. However, we think that growing the number of partitions is already a big pain (breaking the ordering
guarantees at the point where partitions were added!) that requires careful preparations and additional work (e.g.
migrating to the new topic with more partitions, rather than adding partitions “in place” to the existing one), so it’s
a tradeoff we accepted. Also, to reduce the risk of extra work we try to carefully estimate the number of partitions
required for our topics and tend to overprovision a bit.</p>
<p>If what I just described sounds familiar to you, you might have been using this technique without even knowing what it
is; it’s called <strong>broadcast join</strong>. It belongs to a wider category of so called map-side joins, and you can find
different implementations of it in libraries like
<a href="https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-joins-broadcast.html">Spark</a> or <a href="https://docs.confluent.io/current/streams/concepts.html#globalktable">Kafka
Streams</a>. However, what makes this implementation
significantly different is the fact that it reacts to the data changes in real-time. Events are broadcast as they
arrive, and local views are updated accordingly. The updates to aggregations on product changes are instant as well.</p>
<p>Also, while this post assumes that only Product update may trigger entity aggregation, the real implementation we’re
using is doing it on Attribute updates as well. While, in principle, it’s not a difficult thing to do (a mapping of
Attribute-to-Product has to be maintained, as well as the local view of the last seen version of a Product), it requires
significantly more storage space and carries some very interesting performance implications as single Attribute update
may trigger an update for millions of Products. For that reason I decided to keep this topic out of the scope of this
post.</p>
<p>As you just saw, you can handle many-to-many relationships in a event-driven architecture in a clean way using Kafka.
You’ll benefit from not risking having outdated information and not resorting to direct service calls, which might be
undesirable or even impossible in many cases. As usual, it comes at a price, but if you weigh pros and cons carefully
upfront, you might be able to make a well-educated decision to your benefit.</p>
<p><em>Like Michal's work and want to be part of the action at our Fashion Insights Center in Dublin? Keep up to date with our
<a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1&location=Dublin">Dublin jobs</a>.</em></p>Investing in the Future of Engineering and Design2018-05-03T00:00:00+02:002018-05-03T00:00:00+02:00Michael Achtzehntag:engineering.zalando.com,2018-05-03:/posts/2018/05/code-university.html<p>Our cooperation with CODE University</p><h3><strong>Our cooperation with CODE University</strong></h3>
<p>At Zalando, we strive to create an environment in which all our engineers, product, and design specialists feel they can
inspire each other, make their ideas a reality, and contribute to providing the best possible platform for Zalando’s
customers to have the ultimate customer experience.</p>
<p>Part of this is making sure we understand what the future generation of product managers, interaction designers, and
software engineers are thinking and what ideas and innovations they can bring to the table.</p>
<p>Since fall 2017, Zalando has been an official partner of <a href="https://corporate.zalando.com/en/newsroom/en/stories/why-zalando-cooperating-code">CODE
University</a>, a private,
state-recognized university of applied sciences based in Berlin, which offers courses in various software fields in a
hands-on environment, which is embedded in Berlin’s network of digital and tech enterprises. Zalando decided to get on
board with this exciting project in order to offer students insights into real-world applications of the techniques and
principles they learn about in their studies.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/61eb6e69f0cfb937907da0cd5b4acc3821becfa3_code-students.jpg?auto=compress,format"></p>
<p><em>Credit: Manuel Dolderer (CODE)</em></p>
<h3><strong>Bringing Zalando challenges into the classroom</strong></h3>
<p>The major part of Zalando’s cooperation with CODE is covered by the semester projects. Each semester, Zalando outlines
projects for which it seeks input from software engineers, product managers, and interaction designers. These projects
challenge the students to find successful solutions to realistic problems, and to facilitate their learning in a
dynamic, realistic working environment.</p>
<p>For the winter semester of 2017/2018, Zalando’s two projects included one which sought to make same-day delivery by bike
courier more efficient by integrating a chatbot to enable real-time delivery updates, while the other looked into the
potential that an autonomous, indoor drone could have on warehouse processes. Students who select the projects offered
by Zalando work together in groups to address issues with the most efficient solutions, while receiving information and
guidance from the Zalando employees who mentor these projects. At the end of the semester, the students’ projects are
presented to an audience which includes university faculty, other students, and business partners such as Zalando, and
forms part of their final assessment. Two groups which worked on Zalando projects were awarded Best Pitch and Best
Design for the winter semester 2017/2018.</p>
<p>According to the students, working on projects with real applications teaches them what it means to be innovative.
Student, Dominic von Zielinski says, “I don’t want people to have to think about my innovations while using them.
Because that’s what innovative means to me.”</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/63bc5b0884243a7ac8b15c0de806e6d86d916fd8_code-university.jpg?auto=compress,format"></p>
<h3><strong>Getting involved in Zalando’s Hack Week</strong></h3>
<p>Along with the longer-term projects which are an essential part of their studies at CODE University, Zalando also
invited CODE University students along to its new-format <a href="https://engineering.zalando.com/posts/2018/03/cross-department-hackathons.html">Hack Week in March of this
year</a>. The new format involves cross-collaboration
between company departments, such as Digital Foundation and People & Organization, to bring engineers and product
managers together with Zalando employees from different disciplines and to see how they can find digital solutions to
challenges within the company.</p>
<p>“What got me on board was the desire to get a better understanding of what’s possible in one week, and how things can
come together so quickly,” says Dominic von Zielinski. “In the end, what kept me on board was seeing diverse people get
together and do crazy and innovative stuff, all while being a part of the team.”</p>
<p>The awards ceremony awarded five different teams in different categories, acknowledging their hard work and providing
them with valuable feedback from Zalando’s senior management. Additionally, two teams (including one with CODE
University students) were given the opportunity to take part in the <a href="https://corporate.zalando.com/en/innovation/grassroots-tech-innovation">Zalando Innovation Lab’s Slingshot
Program</a>, in which winning team members are
given the resources they need to dedicate two sprints towards bringing their project to the next level.</p>
<p>“During Zalando’s Hack Week, the main challenge I experienced was to transform complexity into simplicity,” says Edmund
Maruhn, another CODE student who took part in Zalando’s Hack Week. “Once the week was up, I started to challenge myself
and went the extra mile to improve what we had done, for two reasons: I fell in love with the problem we worked on from
the first day, and I was motivated by the fact that we won an award!”</p>
<h3><strong>Investing in the future</strong></h3>
<p>By partnering with CODE university, Zalando wants to not only provide students with hands-on experience in finding
solutions to digital problems, it also seeks to become more involved in Berlin’s wider tech ecosystem, connecting with
other tech companies on this level to inspire students to be innovative, to always question the status quo, and to
continue to nurture the pool of tech talent which has made Berlin the tech hub it is today.</p>
<p><em>Be part of our innovative team! Check out our <a href="https://jobs.zalando.com/tech/jobs/">jobs page</a>.</em></p>Our Dublin Tech Hub Turns Three2018-05-02T00:00:00+02:002018-05-02T00:00:00+02:00Michael Achtzehntag:engineering.zalando.com,2018-05-02:/posts/2018/05/dublin-tech-hub-three.html<p>Celebrating Zalando’s first international tech hub</p><h3><strong>Celebrating Zalando’s first international tech hub</strong></h3>
<p>Three years ago, Zalando decided to start looking beyond Germany’s borders to tap into Europe’s pool of tech talent.
Diverse and brilliant minds from other European cities and beyond contributed to cementing Zalando’s place as Europe’s
most fashionable tech company. So, back in 2015, Zalando’s first move was across the Irish Sea, and now the team is very
excited to celebrate its third anniversary!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4d8222c1350f3db74123155747dbcd746e2fe4a3_zalando-cakes.png?auto=compress,format"></p>
<p><em>Every birthday deserves some cake.</em></p>
<p>The Dublin Fashion Insights Center was opened to tap into that market’s unique pool of <a href="https://engineering.zalando.com/posts/2018/02/innovation-digital-experience.html">talented data
scientists</a>. The team has grown significantly over
the years, now numbering almost 110 dedicated members. The last year alone saw the team write 8.7 million lines of code,
attend 17 conferences and hire 49 new colleagues from a staggering 2,500 applications.</p>
<p>One of our newest additions is Sean Mullaney, who joins us as Dublin’s first VP of Information. Sean has founded a
number of startups, and formerly worked at Google. He brings a huge amount of drive and experience to the Dublin office,
and is keen to help shape Zalando’s data strategy.</p>
<p>“My passion has always been on applied innovation, particularly how to combine Big Data, machine learning, and UX to
create high impact products and services,” says Sean.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/77ea702a545d86d9e51dd096ff7dd6145bc26b83_zalando-dublin-team-smaller.jpg?auto=compress,format"></p>
<p><em>Our Dublin team celebrates in style.</em></p>
<p>As it grows, the Dublin office will retain its focus on <a href="https://engineering.zalando.com/posts/2018/04/democratization-data-science.html">data
science</a> and provide Zalando with the <a href="https://engineering.zalando.com/posts/2017/10/zalando-smart-product-platform.html">tools it
needs</a> to drive strategic growth through artificial
intelligence and leveraging its large datasets, bringing together product managers, designers, data scientists and
software engineers from all backgrounds in Dublin’s Docklands.</p>
<p><em>If you live in Dublin or want to make the move to the land of saints and scholars, check out our Dublin tech hub jobs
<a href="https://zln.do/2rcnu9K">here</a>.</em></p>Short Story of a Long Migration2018-04-26T00:00:00+02:002018-04-26T00:00:00+02:00Oleksandr Volynetstag:engineering.zalando.com,2018-04-26:/posts/2018/04/migrating-java-8.html<p>How we migrated the Zalando Logistics Operating Services to Java 8</p><h3><strong>How we migrated the Zalando Logistics Operating Services to Java 8</strong></h3>
<p>“Never touch working code!” goes the old saying. How often do you disregard this message and touch a big monolithic
system? This article tells you why you should ignore common wisdom and, in fact, do it even more often.</p>
<h2>Preface</h2>
<p>Various kinds of migration are a natural part of software development. Do you remember the case when the current
database didn’t scale enough? Or maybe there is need for a new tech stack when the existing stack does not meet changing
requirements? Or perhaps the migration from the monolithic application to the microservice architecture is hard. There
could also be smaller-scale migrations like upgrading to a newer version of the dependency, e.g. Spring, or Java Runtime
Environment (JRE). This is the story on how a relatively simple task of migration from Java 7 to Java 8 was performed on
a large-scale monolithic application that has ultimate criticality to the business.</p>
<h2>Zalos as <em>the</em> service for Logistics Operations</h2>
<p>Zalos (Zalando Logistics System) is a set of Java services, backend and frontend, that contains submodules to operate
most functions inside the warehouses operated by Zalando. The scale of Zalos can be summarized by the following
statistics:</p>
<ul>
<li>more than 80,000 git commits,</li>
<li>more than 70 active developers in 2017,</li>
<li>almost 500 maven submodules,</li>
<li>around 13,000 Java classes with 1.3m lines of code, plus numerous production and test resource files,</li>
<li>operates with around 600 PostgreSQL tables and more than 3,000 stored procedures.</li>
</ul>
<p>Zalos 2, denoted as just Zalos below, is the second generation of the system, and has grown to this size over the past
five years. Patterns that were, at the time, easy to adopt for scaling up architectural functionality, have quickly
become a bottleneck with the growing number of teams maintaining it. It is deployed to all Zalando warehouses every
second week, and every week there is a special procedure to create a new release branch. Each deployment takes about
five hours, branching takes about the same time. When also considering urgent patches, it takes a significant portion of
each team’s time to do regular deployment or maintenance operations.</p>
<p>Now, what happens if the system is left unmaintained for a while? The package dependencies and Java libraries become
obsolete and, as a consequence, security instability grows. Then, one day one of the core infrastructure systems has to
change the SSL certificate, and this causes some downtime in all relevant legacy systems operating a deprecated Java
version. For the logistics services these problems might become a big disaster, and you start thinking: “What does it
take to migrate Zalos from Java 7 to Java 8?”</p>
<h3>Migration? Easy!</h3>
<p>With some basic experience with Java 9, the option to go even further has been rejected pretty fast: a combination of
Java-9 modularity and 500 sub-modules doesn’t look very positive. Well, bad luck. What else do you need to keep in mind
for Java 8 support? Spring? Sure. GWT? Maybe. Guava? Oh yes. Generics? This too.</p>
<p>This is a good time to talk about the tech stack for Zalos. It contains backend as well as frontend parts, both running
Spring 3. The backend uses PostgreSQL databases via the awesome
<a href="https://github.com/zalando-stups/java-sproc-wrapper">sprocwrapper</a> library. Both backend and frontend rely on
Zalando-internal parent packages to take care of dependency management. The frontend engine is GWT 2.4 with some
SmartGWT widgets. And, to mention a few more challenges, it uses Maven overlays with JavaScript but more on this
later.</p>
<p>Our first strategy was to bump as <strong>many</strong> package dependencies as we can. Spring 4 which fully supports Java 8, GWT
2.8.2 that already has support for Java 9, Guava 23.0, etc. We use GWT 2.4; a jump of over five years development-wise.
Hard dependency on our internal Zalando dependencies had ruled out the major Spring upgrade too. Guava 23 has deprecated
some methods and we would need to change quite an amount of code: again, a failure.</p>
<p>Let’s try an another strategy then: bump as <strong>little</strong> as we can. This strategy worked much better. We only needed to
have Spring 3.2.13 and Guava 20.0, plus required upgrades like <em>javassist</em> and <em>org.reflections</em>. The matrix of
compatible versions is shown in the appendix. GWT dependency was left untouched, although it limits our client code to
Java 7. A compromise but not a blocker: there is little active development of new GWT code anyway.</p>
<p>Now, overlays, or in our case <a href="https://en.wikipedia.org/wiki/Dependency_hell">Dependency Hel</a>l, is a <a href="https://maven.apache.org/plugins/maven-war-plugin/overlays.html">feature of
Maven</a> to include dependencies from a WAR or a ZIP file
and it “inlines” the complete package as is. And it does so with all its dependencies. As an example, this means, should
an overlay have a different version of <em>spring-core</em>, you get two versions of <em>spring-core</em> in the final WAR artifact.
When the application starts, it will get confused which version to use for which parts of the application, and various
<em>ClassNotFound</em> exceptions will pop up. Bad luck, republishing all war-overlays with updated dependencies is required.</p>
<h3>Go-live or don’t rush?</h3>
<p>It took just two weeks of highly-motivated and self-driven work for two people to crack the problem and run the
500-module monolith on the laptop with Java 8. It took two more weeks to deploy it to the staging environment after
fixing multiple issues. After that, it took two more <em>months</em> to finally deploy it to the production environment. Why so
long? Because we deal with the utmost critical system that has several serious constraints, and here they are:</p>
<ol>
<li><strong>Deployments.</strong> Deployment to production lasts up to five hours and it should not interfere with any other
deployment, due to internal limitations of the deployment system. With absolute priority for production deployment
there isn’t much time for experimenting with the migration. Solution? Tweaking the deployment service helped reduce
deployment time by about one third to have some freedom for experimenting on a staging environment.</li>
<li><strong>Development</strong>. There are still about 25 commits per day in the main branch. Breaking it would have a significant
impact on feature development, and it isn’t easy to experiment with JDK versions from the feature branch. This isn’t
good, but still there is a more serious constraint.</li>
<li><strong>Warehouse operations.</strong> They are the backbone of an e-commerce company and should not be interrupted by the
migration. The risk of any bug should be carefully minimized to maintain the service liveness.</li>
</ol>
<p>To solve at least two constraints, we created a concrete three-step plan on how we execute the migration in a safe
manner and be able to roll back at any time:</p>
<ol>
<li><strong>Upgrades of all packages compatible with both Java 7 and 8 without changing runtime version.</strong> This ensured that
there are no changes for deployment</li>
<li><strong>Switch to Java 8 runtime (JRE) keeping source code in Java 7 mode.</strong> This step ensured that we can safely change
the deployment settings without touching the code and dependencies.</li>
<li><strong>Switch to Java 8 development mode to fully support Java 8 features.</strong> No major deployment changes were done with
this step.</li>
</ol>
<p>In addition, except for a staging environment, every step was carefully tested on a so-called beta environment which
operates on production data.</p>
<h3>Outlook</h3>
<p>The migration was completed despite some failed attempts a few years ago. Several things have happened. The service has
become a little more stable and secure. The code can now be written with lambdas, method references, etc. Deployment
service has been improved too. But most importantly, the legacy system got attention. Even though we had one camp of
people who said, “We tried that before, why do you want to try again?” there was also the second camp with, “You are
crazy but yeah, do it”. No matter what was tried before and in what manner, it is never too late to try again.</p>
<p>Keep your legacy code under careful supervision: add code quality metrics, minimize maintenance efforts, optimize
release cycles. With this you will stop having “Legacy Nightmares” but rather have a maintained piece of code.</p>
<h2>Appendix</h2>
<p>Here is a list Maven dependencies and related changes that finally made it working together:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2204c2f0585e859f4225cd16ef9b07620f39d1d4_screen-shot-2018-04-22-at-19.07.48.png?auto=compress,format"></p>
<p>In addition, the following compilation and runtime settings were required:</p>
<ul>
<li>and properties for maven-compiler-plugin set to 1.8</li>
<li>tomcat 7, i.e. run services with “mvn tomcat7:run-war” and not “mvn tomcat:run-war” which uses tomcat 6 by default.</li>
</ul>
<p><em>Come work with us! Have a look at our <a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1">jobs page</a>.</em></p>Improving Efficiency in Offline Campaigns2018-04-24T00:00:00+02:002018-04-24T00:00:00+02:00Martin Lawrencetag:engineering.zalando.com,2018-04-24:/posts/2018/04/improving-efficiency-offline-campaigns.html<p>How Zalando uses an API to drive marketing profitability</p><h3><strong>Using an API to drive marketing profitability: a gift card study</strong></h3>
<p>Gift cards are becoming increasingly popular in the US and Europe. For time-pressed consumers trying to find a
convenient gift for friends and family, gift cards are an easy solution, and it shows: gift cards are projected to grow
at a 24% Compound Annual Growth Rate (CAGR) until 2023, According to Allied Market Research. Brands reap the benefits
too: they can use them to engage their best customers to onboard family and friends, driving profitable sales.</p>
<p>Zalando wants to build Europe’s most beloved gift card, but the market is competitive. This blog post outlines the
challenges of entering the market, describes our technical approach, and lists some key learnings.</p>
<h3><strong>Online - brave new world</strong></h3>
<p>Successful online businesses use data to their advantage. They make marketing decisions based on data from their own
sites and those of partners. Data science and machine learning are increasingly applied to aggregate campaign outcomes
into insights and provide guidance or even fully steer marketing investments. As a result, marketing processes are being
digitized rapidly.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c989b44273b15b03999f91c15b45a618958eca46_screen-shot-2018-04-24-at-09.38.15.png?auto=compress,format"></p>
<h3><strong>Offline - data scattered throughout proprietary systems</strong></h3>
<p>Yet to succeed in the market, online marketing prowess only brings you so far. To reach consumers on a broad scale, a
brand must sell its gift cards through leading retail stores, as well as offer them via employee benefit and loyalty
programmes.</p>
<p>E-commerce players like Zalando are used to easy integration with partners and have access to a wealth of marketing and
sales data. Providing comparable insights on campaigns executed with brick-and-mortar retailers and businesses presented
us with some tough nuts to crack, so we took a look at our business objectives.</p>
<h3><strong>I. Business Objectives</strong></h3>
<p><strong>1. Integrate with gift card distribution networks
</strong>Unlike e-commerce, gift cards are not only sold online: a very substantial share is actually sold through retail and
to businesses. A sprawling ecosystem of aggregators and resellers serves tens of thousands of businesses and retail
stores. If a brand wants to reach consumers on a broad base, it has to digitally integrate this vast ecosystem. But how
do you find an <strong>approach that scales</strong>?</p>
<p>**
2. Harvest data in real time
<strong>Using real-time data allows e-commerce players to observe the effects of campaigns in real time, giving them superior
control over their investments. In the offline world however, data points are few and far between and are constrained to
monthly CSV reports. So how do you </strong>enable real time data insights** into sales via brick-and-mortar businesses?</p>
<p><strong>3. Understand Return on Investment (ROI)
</strong>The right data makes the difference between running a profitable campaign and losing money. In this brave new world,
ROI steering has become a mandate. An apples-to-apples comparison between online, retail and business campaigns demands
using the same tools and metrics. Yet how do you <strong>A/B test campaigns</strong> in a brick-and-mortar world?</p>
<h3><strong>II. Digitally Integrating an Ecosystem</strong></h3>
<p>Our data strategy follows a simple mantra: a single API harvests all commercially relevant data from every partner in
real time. Standardized data backhaul reduces complexity and frees up engineers and analysts to focus on initiatives
that create business value. Real time data enables timely and precise business steering as well as improved operations.</p>
<p><strong>1. Migration to RESTful API - challenges</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/689af57c830f9b6ba91bb30aa04ea62908a71590_screen-shot-2018-04-24-at-09.41.09.png?auto=compress,format"></p>
<p><strong>a. Idempotency
</strong>Since gift cards represent financial value for the customer, it‘s important to provide a robust and fault-tolerant way
of operations. Errors, such as network problems, service interruptions, or user error are a fact of digital life. We
handle incidents based on best practices, yet one of the most important cornerstones for our API is idempotency. We
require our partners to provide a unique operation identifier on every call. Based on these identifiers, we can ensure
that the required operation will be executed only once, even in the case of repeating calls, which can happen due to
connectivity interruptions.</p>
<p><strong>b. Scalability
</strong>To address requirements towards a modern partner API, we designed our API to be scalable to address future growth.
During design and implementation, we took substantial efforts to determine performance limits and push beyond these
limits. Load testing is the best friend of the developer, allowing you to prove that your service is fulfilling the
business requirements. As a nice side effect it may expose hidden problems in the implementation. And it‘s always better
to find your problems and solve them before they affect actual customers.</p>
<p><strong>c. Security
</strong>Gift cards represent value, thus any processes that touch a gift card code must adhere to the highest standards of
compliance. We worked closely with internal auditing personnel to identify weaknesses in our process and address these.
Our basic mantra is that a gift card code should only be touched by the customer. This meant harmonizing diverse
existing processes, such as manual distribution by mail, transfer to SFTP servers or upload to websites, towards a
common, secure process.</p>
<p><strong>2. Backhaul of commercial data - challenges</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f228a70a7596baceb864ae4b0cb6fb0c13eac5ec_screen-shot-2018-04-24-at-09.41.48.png?auto=compress,format"></p>
<p><strong>a. Standardization</strong>
Though brick and mortar businesses harvest a wealth of commercial data from their networks, they use proprietary
schemes, making further processing of any such data almost impossible. We decided to mandate the provision of core sales
data from partners and customers. While this sounds somewhat onerous, it was essential for a solution designed to
address business requirements.</p>
<p><strong>b. Real time
</strong>To enable marketing executives to make the right decisions, data needs to be available with little delay. In today’s
increasingly competitive markets, a monthly report is not a solution; it is a problem. Using our API for data backhaul
gets rid of that headache.</p>
<p><strong>c. Attribution
</strong>To track how campaigns perform requires attribution to the specific partner that executed a particular campaign. In
retail this means knowing which retailer sells a particular gift card product at a particular time. Combining such data
with Salesforce-based campaign planning enabled us to assign discounts to individual cards.</p>
<p><strong>d. A/B testability</strong>
To execute A/B tests in the real world of retail demands the ability to execute a campaign in two specific, comparable
geographies. Thus, the retailer must be able to restrict campaigns by region, which is a substantial challenge. In
addition, analysis of results requires information about the specific store that sold an individual card.</p>
<h3><strong>III Learnings</strong></h3>
<p><strong>1. Plan efforts and timings generously</strong>
Migrating an operative API that connects external partners is a challenge. Any bug will directly impact customers and
revenue, so all inherent risks need to be addressed. Partners are generally loath to change a running system, thus
migration takes considerable time, during which any systems and processes must be operated in parallel.</p>
<p><strong>2. Focus on the what, be flexible on the how</strong>
Winning over partners to change a process is never easy. What made the difference was pragmatism; focusing on outcomes
while being flexible on the solution. We had to accept some constraints and put in some extra effort. But this way, we
got our partners to onboard in a timely way.</p>
<p><strong>3. Data sharing requires a win-win proposition
</strong>While the strategic value of data is clear to businesses, the rationale to share such data with partners is less so.
Today, sharing data with offline partners is mostly restricted to downloadable monthly reports. Providing extended data
in real time requires changes to systems and processes. To decide such discussion in your favor, you must think through
what benefits such change will provide to your partner.</p>
<p><strong>4. Real time data - a treasure trove for machine learning</strong>
Our data team was asked to provide advice on a number of incidents where gift cards were being misused in inventive
ways. The metadata collected for marketing purposes proved extraordinarily useful in such cases. Using machine learning,
we were able to precisely identify patterns of misuse. Real time data enables real time decisions.</p>
<h3><strong>IV Conclusion</strong></h3>
<p>While it takes considerable effort and persistence to get the necessary metadata from partners operating in the offline
world, our early investments in a modern data backhaul are paying off by providing the transparency we require.</p>
<p>The concrete and tangible benefits for us are:</p>
<ul>
<li>Full attribution of discounts granted by retail and business to profit calculation</li>
<li>Track campaign impact on a daily basis</li>
<li>Ability to steer on Return on Investment</li>
<li>A/B test offline campaigns across geographical regions</li>
</ul>
<p><em>Work with people like Martin, Roman and Marc. <a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1&location=Dortmund">Join the
team</a> at our Dortmund tech hub.</em></p>Distributed Cache2018-04-19T00:00:00+02:002018-04-19T00:00:00+02:00Rohit Sharmatag:engineering.zalando.com,2018-04-19:/posts/2018/04/distributed-cache-akka-kubernetes.html<p>Using Akka cluster-sharding and Akka HTTP on Kubernetes</p><h3><strong>Using Akka cluster-sharding and Akka HTTP on Kubernetes</strong></h3>
<p>This article captures the implementation of an application serving data over HTTP which is stored in cluster-sharded
actors and deployed on Kubernetes.</p>
<p><strong>Use case:</strong> An application, serving data over HTTP and with a high request rate, and the latency of order of 10ms with
limited database IOPS available.</p>
<p>My initial idea was to cache it in memory, which worked pretty well for some time. But this meant larger instances due
to duplication of cached data in the instances behind the load balancer. As an alternative I wanted to use Kubernetes
for this problem and do a proof of concept (PoC) of a distributed cache with Akka cluster-sharding and Akka-HTTP on
<a href="https://engineering.zalando.com/posts/2017/06/postgresql-in-a-time-of-kubernetes.html">Kubernetes</a>.</p>
<p>This article is by no means a complete tutorial to Akka cluster sharding or Kubernetes. It outlines knowledge I gained
while doing this PoC. The code for this PoC can be found
<a href="https://github.com/sharma-rohit/distributed-cache-on-k8s-poc">here</a>.</p>
<p>Let’s dig into the details of this implementation.</p>
<p>To form an Akka Cluster, there needs to a pre-defined ordered set of contact points often called <em>seed nodes</em>. Each Akka
node will try to register itself with the first node from the list of <em>seed nodes</em>. Once, all the seed nodes have joined
the cluster, any new node can join the cluster programmatically.</p>
<p>The ordered part is important here, because if the first seed node changes frequently then the chances of split-brain
increases. More info about Akka Clustering can be found <a href="https://doc.akka.io/docs/akka/2.5/cluster-usage.html">here</a>.</p>
<p>So, the challenge here with Kubernetes was the ordered set of predefined nodes, and here comes
<a href="https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/">StatefulSet</a>(s) and <a href="https://kubernetes.io/docs/concepts/services-networking/service/#headless-services">Headless
Services</a> to the rescue.</p>
<p>StatefulSet guarantees stable and ordered pod creation, which satisfies the requirement of our seed nodes, and Headless
Service is responsible for their deterministic discovery in the network. So, the first node will be “-0” and the second
will be “-1” and so on.</p>
<ul>
<li>is replaced by the actual name of the application</li>
</ul>
<p>The DNS for the seed nodes will be of the form:</p>
<div class="highlight"><pre><span></span><code>-...svc.cluster.local
</code></pre></div>
<h4>Steps:</h4>
<ol>
<li>Start with creating the Kubernetes resources. First, the Headless Service, which is responsible for deterministic
discovery of seed nodes(Pods), can be created using the following manifest:</li>
</ol>
<!-- -->
<div class="highlight"><pre><span></span><code><span class="n">kind</span><span class="o">:</span><span class="w"> </span><span class="n">Service</span>
<span class="n">apiVersion</span><span class="o">:</span><span class="w"> </span><span class="n">v1</span>
<span class="n">metadata</span><span class="o">:</span>
<span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">distributed</span><span class="o">-</span><span class="n">cache</span>
<span class="w"> </span><span class="n">labels</span><span class="o">:</span>
<span class="w"> </span><span class="n">app</span><span class="o">:</span><span class="w"> </span><span class="n">distributed</span><span class="o">-</span><span class="n">cache</span>
<span class="n">spec</span><span class="o">:</span>
<span class="w"> </span><span class="n">clusterIP</span><span class="o">:</span><span class="w"> </span><span class="n">None</span>
<span class="w"> </span><span class="n">selector</span><span class="o">:</span>
<span class="w"> </span><span class="n">app</span><span class="o">:</span><span class="w"> </span><span class="n">distributed</span><span class="o">-</span><span class="n">cache</span>
<span class="w"> </span><span class="n">ports</span><span class="o">:</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">port</span><span class="o">:</span><span class="w"> </span><span class="mi">2551</span>
<span class="w"> </span><span class="n">targetPort</span><span class="o">:</span><span class="w"> </span><span class="mi">2551</span>
<span class="w"> </span><span class="n">protocol</span><span class="o">:</span><span class="w"> </span><span class="n">TCP</span>
</code></pre></div>
<p>Note, that the “clusterIP” is set to “None.” Which indicates it’s a Headless Service.</p>
<ol>
<li>
<p>Create a StatefulSet, which is a manifest for ordered pod creation:</p>
<p>apiVersion: "apps/v1beta2"
kind: StatefulSet
metadata:
name: distributed-cache
spec:
selector:
matchLabels:
app: distributed-cache
serviceName: distributed-cache
replicas: 3
template:
metadata:
labels:
app: distributed-cache
spec:
containers:
- name: distributed-cache
image: "localhost:5000/distributed-cache-on-k8s-poc:1.0"
env:
- name: AKKA_ACTOR_SYSTEM_NAME
value: "distributed-cache-system"
- name: AKKA_REMOTING_BIND_PORT
value: "2551"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: AKKA_REMOTING_BIND_DOMAIN
value: "distributed-cache.default.svc.cluster.local"
- name: AKKA_SEED_NODES
value: "distributed-cache-0.distributed-cache.default.svc.cluster.local:2551,distributed-cache-1.distributed-cache.default.svc.cluster.local:2551,distributed-cache-2.distributed-cache.default.svc.cluster.local:2551"
ports:
- containerPort: 2551
readinessProbe:
httpGet:
port: 9000
path: /health</p>
</li>
<li>
<p>Create a service, which will be responsible for redirecting outside internet traffic to pods:</p>
<p>apiVersion: v1
kind: Service
metadata:
labels:
app: distributed-cache
name: distributed-cache-service
spec:
selector:
app: distributed-cache
type: ClusterIP
ports:
- port: 80
protocol: TCP
# this needs to match your container port
targetPort: 9000</p>
</li>
<li>
<p>Create an <a href="https://kubernetes.io/docs/concepts/services-networking/ingress/">Ingress</a>, which is responsible for
defining a set of rules to route traffic from outside internet to
<a href="https://kubernetes.io/docs/concepts/services-networking/service/">services</a>.</p>
<p>apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: distributed-cache-ingress
spec:
rules:
# DNS name your application should be exposed on
- host: "distributed-cache.com"
http:
paths:
- backend:
serviceName: distributed-cache-service
servicePort: 80</p>
</li>
</ol>
<p>And the distributed cache is ready to use:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d46f8bfb09aab2c504e7b82b0e72a075141af11b_screen-shot-2018-02-25-at-3.10.04-pm.png?auto=compress,format"></p>
<p><strong>Summary</strong>
This article covers Akka Cluster-sharding on Kubernetes with the pre-requirements of an ordered set of Seed Nodes and
their deterministic discovery in the network, and how it can be solved with StatefulSet(s) and Headless Service(s).</p>
<p>This approach of caching data in a distributed fashion offered the following advantages:</p>
<ul>
<li>Less database lookup, saving database IOPS</li>
<li>Efficient usage of resources; fewer instances as a result of no duplication of data</li>
<li>Lower latencies to serve data</li>
</ul>
<p>This PoC opens up new doors to think about how we cache data in-memory. Give it a
<a href="https://github.com/sharma-rohit/distributed-cache-on-k8s-poc">try</a> (all steps to run it locally are mentioned in the
<a href="https://github.com/sharma-rohit/distributed-cache-on-k8s-poc/blob/master/README.md">Readme</a>).</p>
<p><em>Interested in working at Zalando Tech? Our job openings are <a href="https://jobs.zalando.com/tech/jobs/">here</a>.</em></p>The Democratization of ‘Data Science As A Service’2018-04-17T00:00:00+02:002018-04-17T00:00:00+02:00Hugh Durkintag:engineering.zalando.com,2018-04-17:/posts/2018/04/democratization-data-science.html<p>How data science is becoming available ‘for the good of all’ businesses</p><h3><strong>How data science is becoming available ‘for the good of all’ businesses</strong></h3>
<p>In his 2010 Ted Talk “ <a href="https://www.ted.com/talks/matt_ridley_when_ideas_have_sex/">When Ideas Have Sex</a>,” Matt Ridley
posits that human prosperity was caused by one thing and one thing only; our unique human ability to specialise and
exchange ideas and tools.</p>
<p>Ridley’s example of the invention of the reading light illustrates how far we’ve come. Thousands of years ago, making an
hour of reading light required hunting an animal and killing it, before rendering it down to make a candle. Today, the
average human earns an hour of reading light in less than half a second. The reclaimed time is spent relaxing,
traveling, and working day to day in specialized industries for the benefit of other humans. Specialization and exchange
creates new technologies faster, and at an ever decreasing cost.</p>
<h3>The Democratization of Data Science</h3>
<p>‘Data science as a service’ is the latest way for humans to specialize and exchange data science ideas and tools, and is
fast accelerating a new wave of computing innovations at an ever decreasing cost. At Zalando, Europe’s leading online
fashion platform, we’ve been ‘all in’ on data science almost since the start of our journey to ‘ <a href="https://corporate.zalando.com/en/corporate-responsibility/what-drives-us">reimagine fashion for
the good of all</a>’; delivering customized
experiences, quality search results, and contextually relevant recommendations through AI and Machine Learning. Today,
we’re betting on ‘data science as a service’ as a new way to democratize the previously specialized power of data
science to teams across Zalando.</p>
<h3>Understanding Why Data Science Is Different</h3>
<p>To democratize technologies, you must understand how this ‘new’ innovation is similar and different from legacy
innovations that people already use; in this case, traditional ‘as a service’ API platforms.</p>
<p>First, most platform APIs typically enable users to do one of two things: perform
<a href="https://en.wikipedia.org/wiki/Create,_read,_update_and_delete">CRUD</a>-like operations to create, read, update, and
delete information from a central source of truth (the platform), or ask questions of pre-defined and indexed datasets
within a platform (most people call this ‘analytics’).</p>
<p>Data science platform APIs are different. Acronyms and words like
<a href="https://en.wikipedia.org/wiki/Natural-language_processing">NLP</a> and <a href="https://en.wikipedia.org/wiki/Deep_learning">deep
learning</a> are used to describe <a href="https://en.wikipedia.org/wiki/Data_science">data
science</a>, but what data scientists really do is help machines understand the
evolving, unstructured world around us as humans do. ‘Data science as a service’ APIs provide power by adding structure
to unstructured random inputs and questions like:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d16a15532e04094a61e29a39c2b222e21109a8ca_datasciencequestions.png?auto=compress,format"></p>
<p>In the examples above, what “this” is could refer to unstructured text, images, video, or audio. “Meaningful groups” and
“unusual things” could be subjective. Humans can be biased when answering questions like these. Machines don’t (yet)
have human biases, so helping them to understand unstructured inputs, and create loosely structured outputs requires a
different way for these machines to talk to each other, and to humans.</p>
<p>Second, most platform APIs deliver confident results in a binary way. As an example, querying an API for a set of
records created within a date range will deliver back the correct set of records created within that date range,
provided the data initially provided is accurate. Similarly, when an API is used to read a single record in a database,
the API will confidently retrieve that record and its contents for you.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/cbddf1ee70f4df6c3abaa9558e9701b1807e497c_america-architecture-buildings-755050.jpg?auto=compress,format"></p>
<p>Again, data science APIs are different. Imagine someone stopping you on the street, showing you the photo above, and
asking you, “What do you see in this picture?” Your answers might begin with “I’m pretty sure I see...” (a boat on the
Hudson River), or “I definitely see...” (the Empire State Building). You might also be asked clarifying questions like,
“Where do think you see it?” As your answers will be either high, medium, or low confidence answers, ‘data science as a
service’ APIs must also have a means to express their level of confidence.</p>
<h3>Evolving the API Developer Experience, for Data Science</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/157fbe05b50f4f30ef2903527a2621797d4a4c01_1.png?auto=compress,format"></p>
<p>At Zalando, we’ve formed a deep understanding of why ‘data science as a service’ APIs are different through building our
Fashion Content Platform Team. Simply put, our team of data scientists, engineers, designers, and product managers
develop fashion-focussed AI models, capabilities and APIs to enable any team in Zalando to integrate self-serve ‘data
science as a service’ APIs when building relevant and immersive experiences for customers. Here’s some key lessons we
learned along the way.</p>
<h3>Make it real</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2bf8e5a90d1abcb646e02d80fa2c27997b14f296_2.png?auto=compress,format"></p>
<p>‘Data science as a service’ APIs are different, and for both technical and non-technical users, it’s important to
understand why they’re different, by ‘making it real’. For non-technical users, easy to use demo UIs and a ‘Labs’
environment make it easy for any member of any team to understand what our deep learning and NLP models do, and how
integrating them can help them deliver unique customer experiences. They also take the mystery out of data science,
through familiar inputs, simple language, and visual responses with clear explanations. For technical users, ‘make it
real’ happens through easy to use tools to call APIs from within the documentation. Certain fields are pre-filled with
images and text to reduce time-to-first API call.</p>
<h3>What you see, what you get</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/33637c3c7c95ac66294a1f1a62f6986653f25f8e_3.png?auto=compress,format"></p>
<p>For image analysis deep learning APIs, developers must understand quickly how JSON responses – or features built with
them – might surface to customers within their applications. We carry interactive and visual cues through the
documentation and JSON responses are clear, and in-context too. Where relevant, Taxonomies are visual, using imagery to
quickly articulate what ‘A-line’, ‘Cropped’, and ‘Paisley’ might mean. For NLP text analysis APIs like Entity Relations,
JSON responses are structured for easy interpretation, and demo UIs are available for users to understand visually how
Entities relate to each other.</p>
<h3>Set expectations</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/861e89de3b1bc7b656251abd0ff6e66827206af7_4.png?auto=compress,format"></p>
<p>Many developers ‘fail first time’ when using ‘data science as a service’ APIs, as they feel they’ve incorrectly
integrated, or are not getting the results back they require. Like humans, machines are limited to what they know based
on what they’ve seen before, and simple, visual explanations within documentation help developers understand what the
machine knows now, and what it might be learning soon (the AI product roadmap). Providing examples of inputs that work,
and inputs that do not, will help set expectations, and likely help fuel your AI product roadmap with new feature
requests. Product limitations are always an opportunity to prioritise feature requests faster.</p>
<h3>Explain the seemingly obvious</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/fde685cced05668722dc5eded40315350d7b727b_5.png?auto=compress,format"></p>
<p>While terms like ‘confidence score’ and ‘features’ are part of day-to-day conversations amongst data science teams, it’s
easy to forget that developers newer to data science may not understand what they mean, or what their JSON output
represents. Stating the seemingly obvious not only helps developers adopt and integrate with APIs more quickly, it
provides an opportunity for all types of developers to skill up and learn about new technologies, and hopefully will
spark some ideas for them, too.</p>
<h3>Data science as a service for the good of all</h3>
<p>“Data science” in all its various forms has existed for more than 30 years, but the majority of businesses in the world
today don’t understand what it is, or don’t understand the benefits data science can deliver for their business. ‘Data
science as a service’ will address that knowledge and tools gap, enabling businesses everywhere to understand large
datasets, automate manual processes, and deliver relevant customer experiences. We’re way closer to the beginning than
the end of this journey at Zalando, and would love to hear from you if you’re as excited about the possibilities as we
are.</p>
<p><em>Work with people like Hugh. <a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1&location=Dublin">Join the team</a> at our
Dublin tech hub.</em></p>Discovering Design Sprints2018-04-12T00:00:00+02:002018-04-12T00:00:00+02:00Julia Millertag:engineering.zalando.com,2018-04-12:/posts/2018/04/discovering-design-sprints.html<p>Our experience of The Sprint</p><h3><strong>Our experience of The Sprint</strong></h3>
<p>About two years ago, <a href="http://jakeknapp.com/">Jake Knapp</a>, <a href="http://johnzeratsky.com/">John Zeratsky</a> and Braden Kowitz
from Google Ventures published “ <a href="http://www.thesprintbook.com/">The Sprint</a>.” They describe a methodology that helps
you answer critical business questions, develop ideas, or tackle problems in just five days, and last year Jake Knapp
shared his insights in a <a href="https://www.youtube.com/watch?v=z0X0ifo_JT8">fireside chat</a> at Zalando.</p>
<p>Last week, we had the chance to see it in action. In this article, we will not go into the details about how the Design
Sprint works, since it’s already described perfectly in the book. We will rather share another valuable asset with you:
our experience and learnings.</p>
<h3>Why Design Sprint?</h3>
<p>Our team started work on a new replenishment process and since Design Sprints were already <a href="https://engineering.zalando.com/posts/2016/11/the-sprint-exposed--how-we-use-it-at-zalando.html">successfully
used</a> at Zalando, we decided to give
them a try. We had already tested some crucial assumptions for the new process we were working on with an excel
prototype, and now wanted to craft the look of the customer interface to allow for deeper learnings.</p>
<h3>Setup and Preparations</h3>
<p><strong>The Team</strong></p>
<p>We recruited a multi-functional group of eight people across the department plus two facilitators. It’s quite a large
group, but we aimed for involving the whole developer team in the discovery as early as possible to get everyone on the
same page in terms of knowledge and decision-making. The following colleagues were involved:</p>
<ul>
<li>Five engineers</li>
<li>One process specialist</li>
<li>One UX designer</li>
<li>One product manager</li>
<li>Two producers to facilitate the workshop</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/74559b6dfcb28e7fe0e7024bad9ab86e575e8d09_screen-shot-2018-04-10-at-10.25.18.png?auto=compress,format"></p>
<p><strong>Learnings:</strong></p>
<ul>
<li>It’s hard to find a week where everyone who you would like to participate can get rid of all meetings and
appointments. But it’s definitely worth it!</li>
<li>Get a strong facilitator. You will not be able to manage the sprint and to contribute to the workshop at the same
time.</li>
</ul>
<h2>The War Room</h2>
<p>According to the guidance, we booked a room in our office for the whole week. Unfortunately, it had terrible acoustics
and turned out to be too small, so after the first day we moved to a bigger and more comfortable one.</p>
<p><strong>Learnings:</strong></p>
<ul>
<li>Test your room before you start the sprint! Make sure it’s not too noisy and it’s big enough. Your team should
easily fit in, together with the big whiteboards and all kinds of supplies. Don’t hesitate to change it if it
doesn’t feel right: your team will thank you! Apart from that, you will need all your energy to focus on the sprint
and not to waste any brainpower on room complaints.</li>
</ul>
<h3>What did we do and what did we learn?</h3>
<p><strong>Day 1: Knowledge Sharing and Alignment
</strong>The first day was dedicated to setting the frame. We introduced the roles of Decider and Facilitator, aligned on the
long-term goal, invited several experts from different teams for interviews, mapped the new process and formulated some
open questions.</p>
<p><strong>Status:</strong> Finally we get to do a Design Sprint!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c3bf718788b412a902724d15cfd6370d0c8d7ca1_img_3737.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b8d97ab28045556789e70254f427b3f2346103eb_screen-shot-2018-04-19-at-09.31.15.png?auto=compress,format"></p>
<p>**
Learnings:**</p>
<ul>
<li>Make sure to formulate the sprint questions correctly. Otherwise, you’ll have to come back to it in the middle of
the sprint, and will possibly lose a lot of energy clarifying it. Look for critical hypotheses you can verify or
falsify. This might put you out of your comfort zone, but you’ll only learn something new if you risk being wrong.</li>
<li>Map the process on a deeper level! We did it high-level in the beginning, but realised in the middle of the sprint
that we had to go deeper. It was extremely difficult to come up with the storyboard without a common understanding
of the different process steps. After some trial-and-error, we decided to step back and invest time in re-doing the
process map.</li>
</ul>
<p>**
Day 2: Sketching Solutions
**Having a common baseline of knowledge, we shared some ideas that we particularly liked on other products (“Lightning
Demos”) and could incorporate into our product. Then we started to sketch our solutions. Google Ventures suggests some
approaches how to structure the sketching process, and all of them are based on individual work. There is no exchange or
feedback planned for this half of the day, everyone just develops their own idea.</p>
<p><strong>Status:</strong> This idea is going to be good...</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/17bfa05f36208a3c989920a9c60e33f4bf150642_screen-shot-2018-04-10-at-11.02.26.png?auto=compress,format"></p>
<p><strong>Learnings:</strong></p>
<ul>
<li>Explain the purpose of the Lightning Demos. This is an extremely helpful exercise, but sometimes you end up pitching
your idea instead of just showing and explaining it.</li>
<li>Individual sketching might not always be the 100% right solution. Since there was no exchange on ideas, we felt like
we lost the momentum of combing great concepts or getting inspired by the work of others. Next time we would either
exchange or mix the suggested way of sketching with an iterative method such as rapid wire-framing. <strong>Day 3: Review
and Decide</strong> On Wednesday we reviewed the ideas drawn by the team, and decided what we are going to prototype. After
the Decider voted, we started to build a storyboard for our prototype. As mentioned already, at this point of time
we ran into several issues: it was not clear what exact question should be answered by the end of the week, and we
missed a detailed process map. We had to make a mind switch from, “We can test everything” to “We have to find the
most important hypothesis.” It was very difficult to accept that limitation and align on one statement.
Nevertheless, we overcame the difficulties. We aligned on one hypothesis to test and drew a detailed process map.
After this was accomplished, we came up with good results, but it was definitely emotionally the most exhausting day
of the sprint! <strong>Status:</strong> Roller coaster of emotions!</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e0ba9033493ae1d3b8acfed47f3231a2f229b096_screen-shot-2018-04-10-at-11.21.46.png?auto=compress,format"></p>
<p><strong>Learnings:</strong></p>
<ul>
<li>The Decider is the best role ever. Having a dedicated Decider role definitely has its advantages especially when
there is a time constraint. Our Decider did an awesome job by not just making the decision about which prototype we
should build, but also explaining the background and his thoughts to the group.</li>
<li>Make sure everyone knows the process. As mentioned already, after we realized that we were not on the same page we
had to spend some time on redoing the process map. <strong>Day 4: The Prototyping</strong> Here you go: after just 3 days you
start building things! Okay, not really building, since the prototype is just a facade. But it looks pretty much
like a real product!</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d728a90c7f27c9a64dbf00ef0c9b289439b0f2a3_screen-shot-2018-04-10-at-11.40.25.png?auto=compress,format"></p>
<p><strong>Learnings:</strong></p>
<ul>
<li>Assigning different roles really helped to get everything together in time. Everyone knew what they were responsible
for, and working on for the next few hours.</li>
<li>Don’t get lost in details. The main challenge appeared to be the decision on how to spend our time. We concluded:
Users don’t really care about how exact your numbers are or if your buttons looks awesome, they focus a lot more on
the workflow and how the data is displayed. So we invested more energy in the look of the interface and neglected
the correctness of the data. <strong>Day 5: The Moment of Truth</strong> The most exciting part of the workshop: Your team
watches how real users interact with the prototype! This day is definitely the hardest one for the interviewer. You
have to be well prepared to talk to users for five hours and be watched by the rest of the team.</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/89d2937ae16d0fa115d1a2c93efe0d914cc0bf1c_screen-shot-2018-04-10-at-12.00.02.png?auto=compress,format"></p>
<p><strong>Learnings:</strong></p>
<ul>
<li>Live testing is awesome! Getting real-life feedback was a great experience and provided a lot of insights. If you
are planning on doing a design sprint, invest some time in advance to find users and align on time slots.</li>
<li>Get some feedback from the team. While this wasn’t part of the design sprint concept we think it’s important to
always gather feedback and take away learnings for the next sprint to come.</li>
</ul>
<h3>What now?</h3>
<p>We are very satisfied with the return on time invested. It was a great experience with solid results we can further test
along the way. It can be very difficult to find a consensus within the team due to different opinions and approaches.
The Design Sprint turned out to be very useful not only for building and testing prototypes, but also for getting the
buy-in from the whole team, the management, and the users, and therefore increase everyone’s passion about the topic.
Additionally, doing a design sprint is an efficient and low-budget way to find out whether you are actually building the
right product for your users or if you need to re-think your approach.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a6438a74636337c968089f5691d43d77ca0d71e8_screen-shot-2018-04-10-at-12.02.29.png?auto=compress,format"></p>
<p>We hope we have encouraged you to try a design sprint for whatever problem you are facing right now.</p>
<p><em>Come work with us! Have a look at our <a href="https://jobs.zalando.com/tech/jobs/">jobs page</a>.</em></p>Managing Personalized Products2018-04-10T00:00:00+02:002018-04-10T00:00:00+02:00Terhi Hanninentag:engineering.zalando.com,2018-04-10:/posts/2018/04/managing-personalized-products.html<p>A product manager's insights on customization</p><p><strong>A product manager's insights on customization</strong></p>
<p>Personalization is a common term with digital products. But what does it actually mean, why do we do it, and how does it
affect the product manager?</p>
<p>To illustrate, let me tell a personal story. I have gone to the same hairdresser for 10 years. He has seen a big part of
my life, with big changes and evolutions. He knows my preferences. I like to have my appointments in the morning. I like
to try new styles, but I expect him to come up with the new ideas. Additionally, he knows my hair, so he is able to
produce a great end result every time. He remembers all of this about all of his regular customers, and because of that,
not only does he do great cuts, he is able to serve everyone in a slightly different, personal way.</p>
<p>So the basis of personalization is good memory plus the <em>ability</em> to use everything you remember about the person.</p>
<p>The story also illustrates why personalization is important. It is almost impossible to make me go anywhere else,
because the cost of switching is just too high. The trust that has been built over time in this relationship is so
strong, I don’t have any interest in exploring alternative service providers. No matter how technically superior some
other hairdresser is, he can’t win without the relationship.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/99748b090362f7eb24dcefaf2a27936d5c385a14_screen-shot-2018-04-10-at-09.31.33.png?auto=compress,format"></p>
<p>At platforms like Zalando, the challenge of personalization comes from three facts: the scale of stock and customers
(300,000 items and 23 million customers), the requirement of automated relationship building, and the tricky balance
between being cool or creepy.</p>
<p>How does a product manager’s work change when the product is personalized? If we define product management as the
intersection of tech, business and design, we can observe the impact from these three angles.</p>
<p>Design is all about the user experience. If you haven’t been customer obsessed before, now is the time. You need to be
able to understand your customers at a very deep level. What are the aspects of the experience that actually add value,
if personalized? Is it just the item recommendations, or everything from marketing to delivery? What is the right level
of personalization, what makes it feel good and valuable? This might vary a lot depending on the culture and the
individual. What are the customer’s goals at different times? Are they exploring or exploiting? How do you recognize
those goals and then help customers reach them? Personalization forces you to a new level of customer insight, user
research, and design.</p>
<p>From a business point of view, you need to level up your KPI skills. Zalando is divided into independent product teams
which are responsible for one functional part of the whole; think of home page, category page, product page, wishlist,
search, etc. Personalization must run consistently through all of these. Personalization alone does not make one single
sale. So what is the goal and how do you measure the value of personalization? Looking at the short term, such as
click-through rate, might lead us to optimize for interest, but it’s not <em>valuable</em>, causing negative effects further
along in the customer journey. Optimizing for the current session might also be harmful for long-term relationship
building. If the vision is to build a relationship like I have with my hairdresser, then you need to go for the
long-term KPIs like the customer lifetime value, and some measures on how deep the relationship is; maybe by how diverse
the usage of the shop is, and how widely the user shops across categories. Measuring these kinds of KPIs and attributing
them correctly is not trivial.</p>
<p>In tech, we turn to data. First of all, you need the data that is predictive to your problem. You might have to start by
building something just to gather data that enables the next steps. You can get to a certain level with rule-based
systems, but at some point you enter the world of Machine Learning. Being a Product Manager with a ML product requires
understanding of both data science and machine learning. You will be modeling human behaviour with a machine, so you
need to be able to facilitate that discussion with the design, business, and tech experts. Data Science lives up to its
name; it is science, not your standard software development. Be prepared for model exploration, data gathering,
cleaning, labeling, lots of iterations, and discussions on when the performance is good enough. Dig up your statistics
and probability skills, because you need to understand how those relate to what the customer will experience.</p>
<p>In summary, personalization offers an exciting challenge for a product manager to stretch their skills in all aspects of
the role. But like in every product, the core is still the same: Deep understanding of the customer problem, and an
exciting vision for the solution will take you far.</p>
<p><em>Come work for us in our Helsinki tech hub. Take a look at our jobs
<a href="https://jobs.zalando.com/en/?location=Helsinki&search=helsinki&utm_source=techblog&utm_medium=blog-b-organic&utm_campaign=2018-zfi&utm_content=03-helsinki-terhi-personalizedproducts">here</a>.</em></p>The Perks of Being in a Hackathon2018-04-05T00:00:00+02:002018-04-05T00:00:00+02:00Izabela Bratovictag:engineering.zalando.com,2018-04-05:/posts/2018/04/perks-of-hackathon.html<p>How stepping out of our comfort zone led to a hackathon victory</p><h3><strong>How stepping out of our comfort zone led to a hackathon victory</strong></h3>
<p>Zalando Tech doesn't just <a href="https://engineering.zalando.com/posts/2018/03/cross-department-hackathons.html">put on hackathons</a>, we love
to attend them too! Here, we catch up with software engineers, Lisa Knolle and Izabela Bratovic about their time at
#picturepunk.</p>
<p>At the end of last year we took part in a <strong>hackathon</strong>. We came to this decision for the sake of exposing ourselves to
new experiences, new people, and new technologies. Other than our in-house equivalent, the esteemed <a href="https://corporate.zalando.com/en/newsroom/en/stories/hack-week-becomes-hack-weeks-zalandos-solution-bearing-hackathon-now-celebrated"><strong>Zalando
HackWeek</strong>, we were both completely inexperienced in participating in such events. Our event of choice
was</a>
<a href="https://www.dpa.com/de/unternehmen/dpa-zeichnet-aus/picturepunk/#picture-punk">#picturepunk</a>. It was an event hosted
by the German Press Agency (DPA) and in cooperation with Adobe, Microsoft, and Google News Lab. Its focus was set on
media journalism and the participants were tasked with finding ways to further improve the industry.</p>
<p>Following the “preparation is key” rule, we sat together and started to brainstorm for the next game-changing idea we
wanted to bring with us. Three fruitful hours later, we decided our sheer willingness to work and contribute would just
have to do. As the date quickly approached, the “hackathon anxiety” started to slowly settle in as well. Backing out was
out of the question. But of course, this initial anxiety turned out to be completely unjustified.</p>
<p>The hackathon was kicked off with an idea pitch that was open for all contestants. Seeing that our brainstorming wasn’t
so off-point immediately boosted our morale. More than a few ideas piqued our interest. The pitches ranged from enabling
journalists to create smart photostories using an app, to making use of blockchain technologies for image licensing.
Luckily, a rather interesting group of participants started to gather around the idea that appealed to us the most, and
sooner rather than later, we found ourselves jotting down features on some post-its together with our freshly founded
team. Our goal was set on finding ways to help journalists comb through the vast amounts of available stock media
material, optimizing their search results and saving them some of their precious time. Our team of seven people
consisted of three UX designers, a media science student, and an entrepreneur; the latter two both with some programming
experience and, finally, the two of us full-time software engineers.</p>
<p>The next 48 hours now seem like nothing more than a short blur. Slightly sleep-deprived and high on a constant supply of
Club Mate, we gave it our best shot to build the MVP. This was achieved using some technologies we have expertise in
mixed with some that we don’t get to use in our day-to-day jobs. The sponsoring companies made sure we had their APIs at
our disposal, and it was just as compelling exploring them as it was using them. Access to <a href="https://cloud.google.com/vision/">Google’s Cloud Vision
API</a> for image analysis and <a href="https://azure.microsoft.com/en-us/services/cognitive-services/">Microsoft Cognitive
Services</a>, which detect human emotions on images, were
some of the tools we had the privilege of using. It was enlightening to see the state of technology in that field and
try to put it to good use. Our application’s backend pulled media from Adobe Stock, enriched it with relevant metadata
using the aforementioned APIs, and handed it over to our friendly user interface. The journalists would then be
presented with many options of filtering through these images, be it by selecting metatags, liking or disliking images
that come up, or even by details on those very images that were detected through image analysis in the previous step.
Having less than 48 hours to prove our skills and put all of that together was what motivated us most and kept us
going.</p>
<p>But, as we would soon find out, building a <strong>MVP</strong> does not necessarily make a winner; it was the team of cool
individuals we worked with. Having met on the spot, armed with very different skill sets and personalities, we worked
together towards a common goal. The designers’ resourcefulness and fast-paced working style made it possible for the
team to effortlessly impress the jury. Equally important; our teammates’ aptitude towards the business side of the
startup world was what brought our presentation to another level.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d1f12d5bf34406ab1afd4753d7c726a04774eaa9_screen-shot-2018-04-05-at-17.11.03.png?auto=compress,format"></p>
<p><em>Photo credit: Silas Stein</em></p>
<p>Finally, and contrary to our expectations, our team was awarded by the judges in two out of four categories: Best
Overall and Best of API. You can imagine why we can only recommend the experience! A hackathon can give you so much room
to learn interesting new things, meet people who share the same drive as you, and the opportunity to challenge yourself
in a different domain or even industry.</p>
<p>Come work for us with people like Lisa and Izabela. Have a look at our <a href="https://jobs.zalando.com/tech/jobs/">jobs page</a>!</p>Cross-Department Hackathons at Zalando2018-03-29T00:00:00+02:002018-03-29T00:00:00+02:00Michael Achtzehntag:engineering.zalando.com,2018-03-29:/posts/2018/03/cross-department-hackathons.html<p>Making Innovative Ideas a Reality</p><p>Last week, at our new-format <a href="https://corporate.zalando.com/en/newsroom/en/stories/hack-week-becomes-hack-weeks-zalandos-solution-bearing-hackathon-now-celebrated">Zalando Hack
Week</a>,
two important departments dropped their day-to-day tasks and embarked on what, for many of them, was their first ever
hackathon.</p>
<p>How did this come about? Zalando’s new Hack Week is a departure from the
<a href="https://engineering.zalando.com/posts/2016/12/the-finish-line--hack-week-5-awards-and-more.html">hackathons</a> we have
organised in the past, which typically only involved the tech department and happened once a year. However, as Europe’s
most fashionable tech company, technology is such a huge part of everything we do. This presents us with an excellent
opportunity to make the hackathon truly cross-departmental, and to involve talented and innovative minds from many
different areas of the business. And since innovative ideas don’t wait around to crop up once a year, our Hack Week has
become a quarterly event.</p>
<p>This time, our Senior Vice President (SVP) of the People & Organization department, Boris Ewenstein, and our VP of
Digital Foundation, Eric Bowman, got together to organize a collaborative hackathon between their teams, with the goal
of awarding the best projects with the chance to make their ideas a reality. The theme this time? <strong>For the Good of All:
10 Years of Zalando.</strong> Innovating and improving processes across the company to create an even better workplace.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f3b2168e3ebaa7848d4a0031048cd72aa52af2f3_zalando-hack-week---23-march-2018---image-copyright-dan-taylor---dandantaylorphotography.com-4.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/24ed4fc79d53ee7f2218e9cd32ce7e341ab2d5f9_zalando-hack-week---23-march-2018---image-copyright-dan-taylor---dandantaylorphotography.com-9.jpg?auto=compress,format"></p>
<p>What followed was a frenzied week of brainstorming, designing, discussing, and testing. Our judges, consisting mostly of
senior members of the two departments, listened to the pitches of some 50+ teams, and awarded prizes to the best five.
On top of the company’s recognition for their excellent work, the jury provided the winners with helpful feedback on
their projects, should they wish to continue their work on the prototype.</p>
<p><strong>The winners:</strong></p>
<ul>
<li>The <strong>Bravest of the Brave</strong> award went to the group wanting to introduce Zalando’s very own cryptocurrency.</li>
<li>The group wanting to enable the spread of AI insights across Zalando was awarded the <strong>Money Makers</strong> prize.</li>
<li>The winners of the <strong>Geeks of the Week</strong> went to the group wanting to make cluster-based requests on
<a href="https://engineering.zalando.com/posts/2017/03/an-open-source-pulse-check-at-zalando-for-2017.html">Skipper</a> a
reality.</li>
<li>The <strong>Swiss Army Knife</strong> award went to the group behind the idea of creating a one-stop resource of information for
internationals looking to move to Berlin for a job at Zalando.</li>
<li>This quarter’s <strong>Customer Heroes</strong> wanted to bring data from different fashion stakeholders, from the designer
through to the buyer and the producer, together to allow fashion teams at Zalando to create products with even more
different combinations of fabrics, design, colours, and patterns.</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ba7404a13f64b140f2dfee7fa8fe394ac5b75d0d_zalando-hack-week---23-march-2018---image-copyright-dan-taylor---dandantaylorphotography.com-119.jpg?auto=compress,format"></p>
<p>The awards were presented in a glitzy awards ceremony, complete with presenters dressed to the nines who called the
participants and judges onto the stage to show recognition for their hard work and excellent initiative.</p>
<p>The most coveted prize was, of course, the golden ticket to <a href="https://corporate.zalando.com/en/innovation/grassroots-tech-innovation">Zalando’s Slingshot
Program</a>, our internal entrepreneurial
development incubator. Teams who win the golden ticket are given all the resources they need, and dedicate 20% of their
working time for a short period to their projects, taking them to the next level and developing the first steps. In this
way, Zalando recognizes the <a href="https://engineering.zalando.com/posts/2015/03/we-launched-it-the-zalando-space-shoe-video.html">great
ideas</a> pitched by
Zalando’s employees, from the bottom up, and helps them make them a reality.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/07845b39c1362ad968a3d535eb198845b5dc07af_zalando-hack-week---23-march-2018---image-copyright-dan-taylor---dandantaylorphotography.com-218.jpg?auto=compress,format"></p>
<p>This time around, two golden tickets were awarded: The first was to the winners of the Swiss Army Knife award, for their
resource for internationals arriving in Berlin. The second went to a team who is setting up a mentorship matching
program, allowing people to connect across the organization with the aim of improving diversity at the management level.
Congrats to all groups involved!</p>
<p>At Zalando, we dare all of our employees to <a href="https://corporate.zalando.com/en/company/who-we-are">Reimagine Fashion</a>.
This can be done on many different levels, and at Zalando, this all starts with technology. <a href="https://twitter.com/ZalandoTech">Stay
tuned</a> for more insights into our hack weeks, and how our <a href="https://engineering.zalando.com/posts/2018/03/just-run-game-day.html">cross-functional
teams</a> are working together to reinvent fashion, tech and
Zalando.</p>
<p><em>Reimagine fashion by working with one of our ace teams. Job listings are <a href="https://jobs.zalando.com/tech/jobs/">here</a>!</em></p>Discovering a Future in Tech2018-03-27T00:00:00+02:002018-03-27T00:00:00+02:00Vivi Brooketag:engineering.zalando.com,2018-03-27:/posts/2018/03/discovering-future-tech.html<p>How former Zalando trainee, Anriika Kauppi, found her calling</p><h3><strong>How former Zalando trainee, Anriika Kauppi, found her calling</strong></h3>
<p>Fresh out of high school, Anriika Kauppi, 19, was interested in becoming a teacher, but instead of taking the scholastic
route, she did a summer traineeship at Zalando’s <a href="https://engineering.zalando.com/posts/2017/12/helsinki-100-employee.html">Helsinki Tech
Hub</a>. With a family background in tech, Anriika wanted to see
what the field had to offer as a career. A year and a half later, she has lived abroad for three months, completed
another internship in the tech field, and applied to study engineering. Now Anriika is in her first year of studying
engineering at the <a href="http://www.tut.fi/en/home">Tampere University of Technology</a> in Finland, and she is passionate about
inspiring her peers and young girls to study technology too.</p>
<p><strong>Do you remember your first encounter with “tech”? What was it?
</strong>It depends on what you define as “tech,” but when I was little, one of my first impressions was when my father showed
me a computer that had a dummy program that taught the basics of how computers work. I printed my name a hundred times.
At that time, however, it didn’t sound very fascinating, probably because it didn’t even occur to me that what it was
doing was so special. In retrospect, after working at Zalando and starting to study, I became fascinated and realized
the possibilities of coding.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/90096463cdc29607b777735b7664eb4fb9cb9249_image1.jpg?auto=compress,format"></p>
<p><em>Anriika stretched herself as a trainee at Zalando.</em></p>
<p><strong>What was your role at Zalando? Why was it interesting for you?</strong>
I was a trainee at Zalando in the summer of 2016 for three months, right after graduating from high school. I did
subjective testing for one of the projects in the Helsinki tech hub. Together with this, I did application testing on a
newly soft-launched application. I also planned how to automate my work for when my traineeship was up, so that the team
could work independently without me. My idea was around testing one search engine’s accuracy against another one’s
automatically, which in the end, the team implemented and used after my traineeship was over.</p>
<p>I loved finding the shortcomings of the search engines. However weird they were, there was always still some logic
behind them, and figuring this out was fascinating to me. So much so, I wondered about the fun I could have working in
the tech industry. If I could solve problems such as these in my career, I definitely wanted to go study IT at
university, even though my previous goal was to become a school teacher.</p>
<p>One year after my traineeship at Zalando, I was accepted to study Information Technology at the Tampere University of
Technology in Finland. I am currently in my first year, studying Python and C++.</p>
<p><strong>What are some of your favourite tech products?
</strong>I can’t name one, but the ideal kind of tech product for me is one that’s simple-to-use and aesthetically pleasing,
while combining multiple touchpoints in one device. For example, some mobile banking applications in Finland do this
very well.</p>
<p><strong>Who is a hero of yours? Why?</strong>
My grandma. She studied programming already before it was taught in universities. Basically, she learned programming
with <a href="https://en.wikipedia.org/wiki/Punched_tape">punched tape</a> before the ‘70s, and she worked with microprocessors.
She often wondered whether she was one of the first female microprocessor programmers in Finland, as at the time there
were hardly any.</p>
<p>Nowadays, it’s also great that I can connect with my father on a similar level. He is also a software developer, and
it’s cool to be able to have deep tech-focused conversations around topics we’re both passionate about, and to learn
from his experience.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/166cd85effffe46e61e1cca668608e03307c4f0b_image4.jpg?auto=compress,format"></p>
<p><em>Anriika is part of “Zelsinkas”: Zalando Helsinki’s Women in Tech</em></p>
<p><strong>Where do you see yourself in ten years?
</strong>I see myself working abroad in an international tech company specializing in perhaps cyber security, testing, or UX
design with a programming aspect. Although, I don’t yet know 100% which direction I’ll go in because the tech world is
so vast, and it changes constantly. I have only started to discover a small part of it so far, but I’m excited by the
opportunities to continue solving the problems that intrigue me. I could also combine teaching and IT. In Finland, they
started to teach coding in all elementary and middle schools as of September 2016.</p>
<p>Now, I want to encourage others, especially girls, to understand the opportunities and possibilities of tech, and show
that it is a creative and fun field with a highly logical mindset.</p>
<p>Looking back at that moment when I saw the dummy program as a child, I realize now that the program was probably only a
for-loop and print function. It’s s pretty awesome that I finally get it!</p>
<p><em>In February 2018, <a href="https://www.linkedin.com/in/anriika-kauppi/">Anriika</a> represented herself and Zalando as part of
“Zelsinkas”: Zalando Helsinki’s Women in Tech, at the <a href="https://superada.net/brief-in-english/">Super-Ada</a> event for 16
to 22-year-old women. The event encourages women to study and start a career in technology. Our <a href="https://jobs.zalando.com/tech/locations/?gh_src=4n3gxh1">Helsinki tech
hub</a>, as well as our other locations around Europe: Berlin,
Dublin & Lisbon, are looking for inspirational female tech talent to <a href="https://jobs.zalando.com/tech/culture/?utm_source=tech&utm_medium=blog-b-organic&utm_campaign=2018-glbl&utm_content=03-anriika-womenshistorymonth">join
us</a>
in creating amazing experiences for our customers. Happy Women’s History Month!</em></p>In Praise of TypeScript2018-03-22T00:00:00+01:002018-03-22T00:00:00+01:00Dmytro Zharkovtag:engineering.zalando.com,2018-03-22:/posts/2018/03/make-node-apis-great-typescript.html<p>Insights on making NodeJS APIs great</p><h3><strong>Insights on making NodeJS APIs great</strong></h3>
<p>NodeJS is getting more and more popular these days. It’s gone through a long and painful history of mistakes and
learning. By being a “window” for front-end developers to the “world of back-end,” it has improved the overall tech
knowledge of each group of engineers by giving them the opportunity to write actual end-to-end solutions themselves
using familiar approaches. It is still JavaScript, however, and that makes most back-end engineers nauseous when they
see it. With this article and a number of suggestions, I would like to make NodeJS APIs look a bit better.</p>
<p>If you prefer looking at code over reading an article, <a href="https://github.com/DmitriyNoa/typescript-nodejs-sample">jump</a> to
the sample project directly.</p>
<p>As a superset of JavaScript, TypeScript (TS) enhances ES6 inheritance with interfaces, access modifiers, abstract
classes and methods (yeap, you read it correctly... abstract classes in JS), static properties, and brings strong
typings. All of those can help us a lot. So, let’s walk through these cool features and check out how can we use them in
NodeJS applications.</p>
<p>I split this post into two parts: an overview and actual code samples. If you know TS pretty well, you can jump to part
two.</p>
<p><strong>PART 1. OVERVIEW</strong></p>
<p><strong>INTERFACES, CLASSES, ABSTRACT CLASSES, AND TYPE ALIASES
</strong>When I first tried TS, sometimes I felt like it went nuts checking and applying types. It’s technically possible to
define variable type with type aliases, interfaces, classes and abstract classes so they really look pretty similar–kind
of twins or quadruplets in this case–but as I looked into TypeScript more, I found that just like siblings they are
actually really individual.</p>
<p><strong>Interfaces</strong> are “virtual structures” that are never transpiled into JS. Interfaces are playing a double role in TS.
They can be used to check if class implements certain patterns, and also as type definitions (so called “structural
subtyping”).</p>
<p>I really like how TS allows us to extend interfaces so we can always modify already existing ones to our own needs.</p>
<p>Say we have a middleware function that performs some checks on request and adds additional property to requests named
“supeheroName.” TS compiler will not allow you to add it on a standard express request, so we can extend this interface
with needed property.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span> <span class="n">Request</span><span class="p">,</span> <span class="n">Response</span> <span class="p">}</span> <span class="kn">from</span> <span class="s2">"express"</span><span class="p">;</span>
<span class="n">interface</span> <span class="n">SuperHeroRequest</span> <span class="n">extends</span> <span class="n">Request</span> <span class="p">{</span>
<span class="n">superheroName</span><span class="p">:</span> <span class="n">string</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>And then use it in a route:</p>
<div class="highlight"><pre><span></span><code><span class="nt">app</span><span class="p">.</span><span class="nc">router</span><span class="p">.</span><span class="nc">get</span><span class="o">(</span><span class="s2">"/heroes"</span><span class="o">,</span><span class="w"> </span><span class="o">(</span><span class="nt">req</span><span class="o">:</span><span class="w"> </span><span class="nt">SuperHeroRequest</span><span class="o">,</span><span class="w"> </span><span class="nt">res</span><span class="o">:</span><span class="w"> </span><span class="nt">Response</span><span class="o">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="err">if</span><span class="w"> </span><span class="err">(req.superheroName)</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="err">res.send("I'm</span><span class="w"> </span><span class="err">Batman")</span>
<span class="w"> </span><span class="p">}</span>
<span class="err">}</span><span class="o">);</span>
</code></pre></div>
<p>Of course, let’s not forget about the main function of interfaces; enforcing classes to meet a particular contract.</p>
<div class="highlight"><pre><span></span><code><span class="n">interface</span><span class="w"> </span><span class="n">Villain</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">name</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">;</span>
<span class="w"> </span><span class="n">crimes</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">[];</span>
<span class="w"> </span><span class="n">performCrime</span><span class="p">(</span><span class="n">crimeName</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">):</span><span class="w"> </span><span class="nb nb-Type">void</span><span class="p">;</span>
<span class="p">}</span>
<span class="o">/*</span><span class="w"> </span><span class="n">Compiler</span><span class="w"> </span><span class="n">will</span><span class="w"> </span><span class="n">ensure</span><span class="w"> </span><span class="n">that</span><span class="w"> </span><span class="n">all</span><span class="w"> </span><span class="n">properties</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">IVillain</span><span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">are</span><span class="w"> </span><span class="n">specified</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">implementing</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">throw</span><span class="w"> </span><span class="n">an</span><span class="w"> </span><span class="n">errors</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">compile</span><span class="w"> </span><span class="n">time</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">something</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">missing</span><span class="o">.</span><span class="w"> </span><span class="o">*/</span>
<span class="k">class</span><span class="w"> </span><span class="n">SuperVillain</span><span class="w"> </span><span class="n">implements</span><span class="w"> </span><span class="n">Villain</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">public</span><span class="w"> </span><span class="n">name</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">;</span>
<span class="w"> </span><span class="n">public</span><span class="w"> </span><span class="n">crimes</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">[];</span>
<span class="w"> </span><span class="n">constructor</span><span class="p">(</span><span class="n">name</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="n">crimes</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">name</span><span class="p">;</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">crimes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">crimes</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">performCrime</span><span class="p">(</span><span class="n">crime</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">crimes</span><span class="o">.</span><span class="n">push</span><span class="p">(</span><span class="n">crime</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">getCrimesList</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">crimes</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="k">const</span><span class="w"> </span><span class="n">doctorEvil</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">SuperVillain</span><span class="p">(</span><span class="s2">"Doctor Evil"</span><span class="p">);</span>
<span class="n">doctorEvil</span><span class="o">.</span><span class="n">performCrime</span><span class="p">(</span><span class="s2">"Takeover the world"</span><span class="p">);</span>
<span class="n">doctorEvil</span><span class="o">.</span><span class="n">performCrime</span><span class="p">(</span><span class="s2">"Eat a donut"</span><span class="p">);</span>
<span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">doctorEvil</span><span class="o">.</span><span class="n">getCrimesList</span><span class="p">());</span>
</code></pre></div>
<p><strong>Abstract classes</strong> are usually used to define base level classes from which other classes may be derived.</p>
<div class="highlight"><pre><span></span><code><span class="n">abstract</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">Hero</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">constructor</span><span class="p">(</span><span class="n">public</span><span class="w"> </span><span class="n">name</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="n">public</span><span class="w"> </span><span class="n">_feats</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">[])</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Similar</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">interfaces</span><span class="w"> </span><span class="n">we</span><span class="w"> </span><span class="n">can</span><span class="w"> </span><span class="n">specify</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="n">signature</span><span class="p">,</span><span class="w"> </span><span class="n">that</span><span class="w"> </span><span class="n">should</span><span class="w"> </span><span class="n">be</span><span class="w"> </span><span class="n">defined</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">derived</span><span class="w"> </span><span class="n">classes</span><span class="o">.</span>
<span class="w"> </span><span class="n">abstract</span><span class="w"> </span><span class="n">performFeat</span><span class="p">(</span><span class="n">feat</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">):</span><span class="w"> </span><span class="nb nb-Type">void</span><span class="p">;</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Unlike</span><span class="w"> </span><span class="n">interfaces</span><span class="w"> </span><span class="n">abstract</span><span class="w"> </span><span class="n">classes</span><span class="w"> </span><span class="n">can</span><span class="w"> </span><span class="n">provide</span><span class="w"> </span><span class="n">implementation</span><span class="w"> </span><span class="n">along</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="n">signature</span><span class="o">.</span>
<span class="w"> </span><span class="n">getFeatsList</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">_feats</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="k">class</span><span class="w"> </span><span class="n">SuperHero</span><span class="w"> </span><span class="k">extends</span><span class="w"> </span><span class="n">Hero</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">constructor</span><span class="p">(</span><span class="n">name</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="n">_feats</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">super</span><span class="p">(</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">_feats</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">performFeat</span><span class="p">(</span><span class="n">feat</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">_feats</span><span class="o">.</span><span class="n">push</span><span class="p">(</span><span class="n">feat</span><span class="p">);</span>
<span class="w"> </span><span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="err">`</span><span class="n">I</span><span class="w"> </span><span class="n">have</span><span class="w"> </span><span class="n">just</span><span class="p">:</span><span class="w"> </span><span class="o">$</span><span class="p">{</span><span class="n">feat</span><span class="p">}</span><span class="err">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="k">const</span><span class="w"> </span><span class="n">Thor</span><span class="p">:</span><span class="w"> </span><span class="n">SuperHero</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">SuperHero</span><span class="p">(</span><span class="s2">"Thor"</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="s2">"Stop Loki"</span><span class="p">]);</span>
<span class="n">Thor</span><span class="o">.</span><span class="n">performFeat</span><span class="p">(</span><span class="s2">"Save the world"</span><span class="p">);</span>
<span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">Thor</span><span class="o">.</span><span class="n">getFeatsList</span><span class="p">());</span>
<span class="o">//</span><span class="w"> </span><span class="n">Abstract</span><span class="w"> </span><span class="n">classes</span><span class="w"> </span><span class="n">can</span><span class="w"> </span><span class="n">be</span><span class="w"> </span><span class="n">used</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">well</span><span class="o">.</span>
<span class="k">const</span><span class="w"> </span><span class="n">Hulk</span><span class="p">:</span><span class="w"> </span><span class="n">Hero</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">SuperHero</span><span class="p">(</span><span class="s2">"Bruce Banner"</span><span class="p">);</span>
<span class="n">Hulk</span><span class="o">.</span><span class="n">performFeat</span><span class="p">(</span><span class="s2">"Smash aliens"</span><span class="p">);</span>
<span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">Hulk</span><span class="o">.</span><span class="n">getFeatsList</span><span class="p">());</span>
<span class="o">//</span><span class="w"> </span><span class="n">A</span><span class="w"> </span><span class="n">try</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">instantiate</span><span class="w"> </span><span class="n">abstract</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">will</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="n">work</span>
<span class="k">const</span><span class="w"> </span><span class="n">Loki</span><span class="p">:</span><span class="w"> </span><span class="n">Hero</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">Hero</span><span class="p">(</span><span class="s2">"Thor"</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="s2">"Stop Loki"</span><span class="p">]);</span>
</code></pre></div>
<p>As you can see, we can potentially use all of those by specifying a variable type. So what should be used and when?
Let's sum it up.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e5f3879aec28d1e10e84581c6af8b67182a56de1_screen-shot-2018-03-22-at-14.02.02.png?auto=compress,format"></p>
<p><strong>Type aliases</strong> can be used to define primitive and reference types: string, number, boolean, object. You can’t extend
type aliases.</p>
<p><strong>Interfaces</strong> can define only reference (object) types. TS documentation recommends that we use <strong>interfaces</strong> for
object type literals. Interfaces can be <strong>extended</strong> and can have multiple merged declarations, so users of your APIs
may benefit from it. <strong>Interface</strong> is a <strong>“virtual” structure</strong> that never appears in compiled JavaScript.</p>
<p><strong>Classes,</strong> as opposed to interfaces, not only check how an object looks but ensure <strong>concrete implementation</strong> as
well.</p>
<p><strong>Classes</strong> allow us to specify the <strong>access modifiers</strong> of their members.</p>
<p>The TS compiler always transpiles classes to actual JS code, so they should be used if an actual instance of the class
is created. <strong>EcmaScript</strong> native classes can be also used as a type definitions.</p>
<div class="highlight"><pre><span></span><code><span class="nt">let</span><span class="w"> </span><span class="nt">numbersOnly</span><span class="o">:</span><span class="w"> </span><span class="nt">RegExp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">/</span><span class="cp">[</span><span class="mi">0</span><span class="o">-</span><span class="mi">9</span><span class="cp">]</span><span class="o">/</span><span class="nt">g</span><span class="o">;</span>
<span class="nt">let</span><span class="w"> </span><span class="nt">name</span><span class="o">:</span><span class="w"> </span><span class="nt">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Jack"</span><span class="o">;</span>
</code></pre></div>
<p><strong>Abstract classes</strong> are really a mix of the previous two, but as it’s not possible to instantiate them directly you can
only use them as a type, if an instance is created from a derived class that doesn’t provide any additional methods or
properties.</p>
<p><strong>ACCESS MODIFIERS
</strong>Unfortunately, JS doesn’t provide access modifiers so you can’t create, for example, a real private property. It’s
possible to mock private property behaviour with closures and additional libraries, but such code looks a bit fuzzy and
rather long. TS solves this issue just like any other Object Oriented Programming language. There are three access
modifiers available in TS: <strong>public</strong>, <strong>private</strong> and <strong>protected</strong>.</p>
<p><strong>PART 2. THE APPLICATION OR A DIVE INTO THE CODE.</strong></p>
<p>So now, when we know and have all the tooling we need, we can build something great. For example, I would like to build
a backend part of a MEAN (MongoDB, ExpresJS, Angular, NodeJS) stack; a simple RESTful service that will allow us to make
CRUD operations with some articles. As including all the code will make this post too long, I’ll skip some parts, but
you can always check the full version in the GitHub
<a href="https://github.com/DmitriyNoa/typescript-nodejs-sample">repository</a>.</p>
<p>For project structure, see below:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/54ae4bee074c0dd5d7198e7dee7d21e3f2c04587_screen-shot-2018-03-22-at-14.12.02.png?auto=compress,format"></p>
<p>To make code more declarative, easier to maintain and reusable, I’ll take advantage of ES6 classes and split the
application into logical parts. I’m leaving most of the explanation in the comments.</p>
<p><strong>./classes/Server.ts</strong></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="o">*</span> <span class="k">as</span> <span class="n">express</span> <span class="kn">from</span> <span class="s2">"express"</span><span class="p">;</span>
<span class="kn">import</span> <span class="o">*</span> <span class="k">as</span> <span class="n">http</span> <span class="kn">from</span> <span class="s2">"http"</span><span class="p">;</span>
<span class="kn">import</span> <span class="o">*</span> <span class="k">as</span> <span class="n">bodyParser</span> <span class="kn">from</span> <span class="s2">"body-parser"</span><span class="p">;</span>
<span class="kn">import</span> <span class="o">*</span> <span class="k">as</span> <span class="n">mongoose</span> <span class="kn">from</span> <span class="s2">"mongoose"</span><span class="p">;</span>
<span class="kn">import</span> <span class="o">*</span> <span class="k">as</span> <span class="n">dotenv</span> <span class="kn">from</span> <span class="s2">"dotenv"</span><span class="p">;</span>
<span class="kn">import</span> <span class="o">*</span> <span class="k">as</span> <span class="n">logger</span> <span class="kn">from</span> <span class="s2">"morgan"</span><span class="p">;</span>
<span class="o">/*</span> <span class="n">Create</span> <span class="n">a</span> <span class="n">reusable</span> <span class="n">server</span> <span class="k">class</span> <span class="nc">that</span> <span class="n">will</span> <span class="n">bootstrap</span> <span class="n">basic</span> <span class="n">express</span> <span class="n">application</span><span class="o">.</span> <span class="o">*/</span>
<span class="n">export</span> <span class="k">class</span> <span class="nc">Server</span> <span class="p">{</span>
<span class="o">/*</span> <span class="n">Most</span> <span class="n">of</span> <span class="n">the</span> <span class="n">core</span> <span class="n">properties</span> <span class="n">belove</span> <span class="n">have</span> <span class="n">their</span> <span class="n">types</span> <span class="n">defined</span> <span class="n">by</span> <span class="n">already</span> <span class="n">existing</span> <span class="n">interfaces</span><span class="o">.</span> <span class="n">IDEs</span> <span class="n">users</span> <span class="n">can</span> <span class="n">jump</span> <span class="n">directly</span> <span class="n">to</span> <span class="n">interface</span> <span class="n">definition</span> <span class="n">by</span> <span class="n">clicking</span> <span class="n">on</span> <span class="n">its</span> <span class="n">name</span><span class="o">.</span> <span class="o">*/</span>
<span class="o">/*</span> <span class="n">protected</span> <span class="n">member</span> <span class="n">will</span> <span class="n">be</span> <span class="n">accessible</span> <span class="kn">from</span> <span class="nn">deriving</span> <span class="n">classes</span><span class="o">.</span> <span class="o">*/</span>
<span class="n">protected</span> <span class="n">app</span><span class="p">:</span> <span class="n">express</span><span class="o">.</span><span class="n">Application</span><span class="p">;</span>
<span class="o">/*</span> <span class="n">And</span> <span class="n">here</span> <span class="n">we</span> <span class="n">are</span> <span class="n">using</span> <span class="n">http</span> <span class="n">module</span> <span class="n">Server</span> <span class="k">class</span> <span class="nc">as</span> <span class="n">a</span> <span class="nb">type</span><span class="o">.</span> <span class="o">*/</span>
<span class="n">protected</span> <span class="n">server</span><span class="p">:</span> <span class="n">http</span><span class="o">.</span><span class="n">Server</span><span class="p">;</span>
<span class="n">private</span> <span class="n">db</span><span class="p">:</span> <span class="n">mongoose</span><span class="o">.</span><span class="n">Connection</span><span class="p">;</span>
<span class="o">/*</span> <span class="n">restrict</span> <span class="n">member</span> <span class="n">scope</span> <span class="n">to</span> <span class="n">Server</span> <span class="k">class</span> <span class="nc">only</span> <span class="o">*/</span>
<span class="n">private</span> <span class="n">routes</span><span class="p">:</span> <span class="n">express</span><span class="o">.</span><span class="n">Router</span><span class="p">[]</span> <span class="o">=</span> <span class="p">[];</span>
<span class="o">/*</span> <span class="n">This</span> <span class="n">could</span> <span class="n">be</span> <span class="n">done</span> <span class="n">using</span> <span class="n">generics</span> <span class="n">like</span> <span class="n">syntaxis</span><span class="o">.</span> <span class="n">You</span> <span class="n">can</span> <span class="n">choose</span> <span class="n">which</span> <span class="ow">is</span> <span class="n">looking</span> <span class="n">better</span> <span class="k">for</span> <span class="n">you</span>
<span class="n">private</span> <span class="n">routes</span><span class="p">:</span> <span class="n">Array</span> <span class="o">=</span> <span class="p">[];</span>
<span class="o">*/</span>
<span class="o">/*</span> <span class="n">public</span> <span class="n">modifiers</span> <span class="n">are</span> <span class="n">default</span> <span class="n">ones</span> <span class="ow">and</span> <span class="n">could</span> <span class="n">be</span> <span class="n">omitted</span><span class="o">.</span> <span class="n">I</span> <span class="n">prefer</span> <span class="n">to</span> <span class="n">always</span> <span class="nb">set</span> <span class="n">them</span><span class="p">,</span> <span class="n">so</span> <span class="n">code</span> <span class="n">style</span> <span class="ow">is</span> <span class="n">more</span> <span class="n">consistent</span><span class="o">.</span> <span class="o">*/</span>
<span class="n">public</span> <span class="n">port</span><span class="p">:</span> <span class="n">number</span><span class="p">;</span>
<span class="n">constructor</span><span class="p">(</span><span class="n">port</span><span class="p">:</span> <span class="n">number</span> <span class="o">=</span> <span class="mi">3000</span><span class="p">)</span> <span class="p">{</span>
<span class="n">this</span><span class="o">.</span><span class="n">app</span> <span class="o">=</span> <span class="n">express</span><span class="p">();</span>
<span class="n">this</span><span class="o">.</span><span class="n">port</span> <span class="o">=</span> <span class="n">port</span><span class="p">;</span>
<span class="n">this</span><span class="o">.</span><span class="n">app</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s2">"port"</span><span class="p">,</span> <span class="n">port</span><span class="p">);</span>
<span class="n">this</span><span class="o">.</span><span class="n">config</span><span class="p">();</span>
<span class="n">this</span><span class="o">.</span><span class="n">database</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">private</span> <span class="n">config</span><span class="p">()</span> <span class="p">{</span>
<span class="o">//</span> <span class="nb">set</span> <span class="n">bodyParser</span> <span class="n">middleware</span> <span class="n">to</span> <span class="n">get</span> <span class="n">form</span> <span class="n">data</span>
<span class="n">this</span><span class="o">.</span><span class="n">app</span><span class="o">.</span><span class="n">use</span><span class="p">(</span><span class="n">bodyParser</span><span class="o">.</span><span class="n">json</span><span class="p">());</span>
<span class="n">this</span><span class="o">.</span><span class="n">app</span><span class="o">.</span><span class="n">use</span><span class="p">(</span><span class="n">bodyParser</span><span class="o">.</span><span class="n">urlencoded</span><span class="p">({</span> <span class="n">extended</span><span class="p">:</span> <span class="n">true</span> <span class="p">}));</span>
<span class="o">//</span> <span class="n">HTTP</span> <span class="n">requests</span> <span class="n">logger</span>
<span class="n">this</span><span class="o">.</span><span class="n">app</span><span class="o">.</span><span class="n">use</span><span class="p">(</span><span class="n">logger</span><span class="p">(</span><span class="s2">"dev"</span><span class="p">));</span>
<span class="n">this</span><span class="o">.</span><span class="n">server</span> <span class="o">=</span> <span class="n">http</span><span class="o">.</span><span class="n">createServer</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">app</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="err">!</span><span class="n">process</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">PRODUCTION</span><span class="p">)</span> <span class="p">{</span>
<span class="n">dotenv</span><span class="o">.</span><span class="n">config</span><span class="p">({</span> <span class="n">path</span><span class="p">:</span> <span class="s2">".env.dev"</span> <span class="p">});</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="o">/*</span> <span class="n">A</span> <span class="n">simple</span> <span class="n">public</span> <span class="n">method</span> <span class="n">to</span> <span class="n">add</span> <span class="n">routes</span> <span class="n">to</span> <span class="n">the</span> <span class="n">application</span><span class="o">.</span> <span class="o">*/</span>
<span class="n">public</span> <span class="n">addRoute</span><span class="p">(</span><span class="n">routeUrl</span><span class="p">:</span> <span class="n">string</span><span class="p">,</span> <span class="n">routerHandler</span><span class="p">:</span> <span class="n">express</span><span class="o">.</span><span class="n">Router</span><span class="p">):</span> <span class="n">void</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">routes</span><span class="o">.</span><span class="n">indexOf</span><span class="p">(</span><span class="n">routerHandler</span><span class="p">)</span> <span class="o">===</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="n">this</span><span class="o">.</span><span class="n">routes</span><span class="o">.</span><span class="n">push</span><span class="p">();</span>
<span class="n">this</span><span class="o">.</span><span class="n">app</span><span class="o">.</span><span class="n">use</span><span class="p">(</span><span class="n">routeUrl</span><span class="p">,</span> <span class="n">routerHandler</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">private</span> <span class="n">database</span><span class="p">():</span> <span class="n">void</span> <span class="p">{</span>
<span class="n">mongoose</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">process</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">MONGODB_URI</span><span class="p">);</span>
<span class="n">this</span><span class="o">.</span><span class="n">db</span> <span class="o">=</span> <span class="n">mongoose</span><span class="o">.</span><span class="n">connection</span><span class="p">;</span>
<span class="n">this</span><span class="o">.</span><span class="n">db</span><span class="o">.</span><span class="n">once</span><span class="p">(</span><span class="s2">"open"</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="s2">"Database started"</span><span class="p">);</span>
<span class="p">});</span>
<span class="n">mongoose</span><span class="o">.</span><span class="n">connection</span><span class="o">.</span><span class="n">on</span><span class="p">(</span><span class="s2">"error"</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="s2">"MongoDB connection error. Please make sure MongoDB is running."</span><span class="p">);</span>
<span class="n">process</span><span class="o">.</span><span class="n">exit</span><span class="p">();</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="n">public</span> <span class="n">start</span><span class="p">():</span> <span class="n">void</span> <span class="p">{</span>
<span class="n">this</span><span class="o">.</span><span class="n">app</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">app</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"port"</span><span class="p">),</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">((</span><span class="s2">" App is running at http://localhost:</span><span class="si">%d</span><span class="s2"> in </span><span class="si">%s</span><span class="s2"> mode"</span><span class="p">),</span> <span class="n">this</span><span class="o">.</span><span class="n">app</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"port"</span><span class="p">),</span> <span class="n">this</span><span class="o">.</span><span class="n">app</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"env"</span><span class="p">));</span>
<span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="s2">" Press CTRL-C to stop</span><span class="se">\n</span><span class="s2">"</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">export</span> <span class="n">default</span> <span class="n">Server</span><span class="p">;</span>
</code></pre></div>
<p>I have set the <em>server</em> and <em>app</em> properties to “<strong>protected”</strong> as I want to keep them <strong>private</strong>, so it’s not possible
to override or access them directly. They could be reachable from derived classes. For example, if we want to add web
sockets support to our server, we can extend it with a new class and use “server” or an “app” properties as we need.</p>
<p><strong>./classes/SocketServer.ts</strong></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">Server</span> <span class="kn">from</span> <span class="s2">"./Server"</span><span class="p">;</span>
<span class="kn">import</span> <span class="o">*</span> <span class="k">as</span> <span class="n">io</span> <span class="kn">from</span> <span class="s2">"socket.io"</span><span class="p">;</span>
<span class="k">class</span> <span class="nc">SocketServer</span> <span class="n">extends</span> <span class="n">Server</span> <span class="p">{</span>
<span class="o">/*</span> <span class="n">this</span><span class="o">.</span><span class="n">server</span> <span class="n">of</span> <span class="n">a</span> <span class="n">parent</span> <span class="n">Server</span> <span class="k">class</span> <span class="nc">is</span> <span class="n">protected</span> <span class="nb">property</span><span class="p">,</span> <span class="n">so</span> <span class="n">we</span> <span class="n">can</span> <span class="n">access</span> <span class="n">it</span> <span class="n">to</span> <span class="n">add</span> <span class="n">a</span> <span class="n">socket</span><span class="o">.</span> <span class="o">*/</span>
<span class="n">private</span> <span class="n">socketServer</span> <span class="o">=</span> <span class="n">io</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">server</span><span class="p">);</span>
<span class="n">constructor</span><span class="p">(</span><span class="n">public</span> <span class="n">port</span><span class="p">:</span> <span class="n">number</span><span class="p">)</span> <span class="p">{</span>
<span class="nb">super</span><span class="p">(</span><span class="n">port</span><span class="p">);</span>
<span class="n">this</span><span class="o">.</span><span class="n">socketServer</span><span class="o">.</span><span class="n">on</span><span class="p">(</span><span class="s1">'connection'</span><span class="p">,</span> <span class="p">(</span><span class="n">client</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="s2">"New connection established"</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">export</span> <span class="n">default</span> <span class="n">SocketServer</span><span class="p">;</span>
</code></pre></div>
<p>Going back to the application.</p>
<p><strong>./app.ts</strong></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">Server</span> <span class="kn">from</span> <span class="s2">"./classes/Server"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">ArticlesRoute</span> <span class="kn">from</span> <span class="s2">"./routes/Articles.route"</span><span class="p">;</span>
<span class="n">const</span> <span class="n">app</span> <span class="o">=</span> <span class="n">new</span> <span class="n">Server</span><span class="p">(</span><span class="mi">8080</span><span class="p">);</span>
<span class="n">const</span> <span class="n">articles</span> <span class="o">=</span> <span class="n">new</span> <span class="n">ArticlesRoute</span><span class="p">();</span>
<span class="n">app</span><span class="o">.</span><span class="n">addRoute</span><span class="p">(</span><span class="s2">"/articles"</span><span class="p">,</span> <span class="n">articles</span><span class="o">.</span><span class="n">router</span><span class="p">);</span>
<span class="n">app</span><span class="o">.</span><span class="n">start</span><span class="p">();</span>
</code></pre></div>
<p>As we can have multiple kinds of articles (products) e.g. electronic, fashion, digital, etc. and they might have rather
different sets of properties, I’ll create a base <strong>abstract class</strong> with a number of default properties that should be
common for all types of articles. All other properties can be defined in derived classes.</p>
<p><strong>./classes/AbstractArticle.ts</strong></p>
<div class="highlight"><pre><span></span><code><span class="o">//</span> <span class="n">put</span> <span class="n">basic</span> <span class="n">properties</span> <span class="n">into</span> <span class="n">abstract</span> <span class="n">class</span><span class="o">.</span>
<span class="kn">import</span> <span class="nn">ArticleType</span> <span class="kn">from</span> <span class="s2">"../enums/ArticleType"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">BaseArticle</span> <span class="kn">from</span> <span class="s2">"../interfaces/BaseArticle"</span><span class="p">;</span>
<span class="kn">import</span> <span class="o">*</span> <span class="k">as</span> <span class="n">uuid</span> <span class="kn">from</span> <span class="s2">"uuid"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">Price</span> <span class="kn">from</span> <span class="s2">"../interfaces/IPrice"</span><span class="p">;</span>
<span class="n">abstract</span> <span class="k">class</span> <span class="nc">AbstractActrticle</span> <span class="n">implements</span> <span class="n">BaseArticle</span> <span class="p">{</span>
<span class="n">public</span> <span class="n">SKU</span><span class="p">:</span> <span class="n">string</span><span class="p">;</span>
<span class="n">constructor</span><span class="p">(</span><span class="n">public</span> <span class="n">name</span><span class="p">:</span> <span class="n">string</span><span class="p">,</span> <span class="n">public</span> <span class="nb">type</span><span class="p">:</span> <span class="n">ArticleType</span><span class="p">,</span> <span class="n">public</span> <span class="n">price</span><span class="p">:</span> <span class="n">Price</span><span class="p">,</span> <span class="n">SKU</span><span class="p">:</span> <span class="n">string</span><span class="p">)</span> <span class="p">{</span>
<span class="n">this</span><span class="o">.</span><span class="n">SKU</span> <span class="o">=</span> <span class="n">SKU</span> <span class="err">?</span> <span class="n">SKU</span> <span class="p">:</span> <span class="n">uuid</span><span class="o">.</span><span class="n">v4</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">export</span> <span class="n">default</span> <span class="n">AbstractActrticle</span><span class="p">;</span>
</code></pre></div>
<p>For this example, I’ll create a Shoe class that will derive from an AbstractArticle class and set its own properties.</p>
<p><strong>./classes/Shoe.ts</strong></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">AbstractActrticle</span> <span class="kn">from</span> <span class="s2">"./AbstractArticle"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">ArticleType</span> <span class="kn">from</span> <span class="s2">"../enums/ArticleType"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">Colors</span> <span class="kn">from</span> <span class="s2">"../enums/Colors"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">FashionArticle</span> <span class="kn">from</span> <span class="s2">"../interfaces/FashionArticle"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">Price</span> <span class="kn">from</span> <span class="s2">"../interfaces/Price"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">Sizes</span> <span class="kn">from</span> <span class="s2">"../enums/Sizes"</span><span class="p">;</span>
<span class="k">class</span> <span class="nc">Shoe</span> <span class="n">extends</span> <span class="n">AbstractActrticle</span> <span class="n">implements</span> <span class="n">FashionArticle</span> <span class="p">{</span>
<span class="n">constructor</span><span class="p">(</span><span class="n">public</span> <span class="n">name</span><span class="p">:</span> <span class="n">string</span><span class="p">,</span>
<span class="n">public</span> <span class="nb">type</span><span class="p">:</span> <span class="n">ArticleType</span><span class="p">,</span>
<span class="n">public</span> <span class="n">size</span><span class="p">:</span> <span class="n">Sizes</span><span class="p">,</span>
<span class="n">public</span> <span class="n">color</span><span class="p">:</span> <span class="n">Colors</span><span class="p">,</span>
<span class="n">public</span> <span class="n">price</span><span class="p">:</span> <span class="n">Price</span><span class="p">,</span>
<span class="n">SKU</span><span class="p">:</span> <span class="n">string</span> <span class="o">=</span> <span class="s2">""</span><span class="p">)</span> <span class="p">{</span>
<span class="nb">super</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="p">,</span> <span class="n">price</span><span class="p">,</span> <span class="n">SKU</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">export</span> <span class="n">default</span> <span class="n">Shoe</span><span class="p">;</span>
</code></pre></div>
<p>You might have noticed that Shoe class implements FashionArticle interface. Let’s take a look at it and see how we can
benefit from Interfaces and possibility to extend those.</p>
<p>.<strong>/interfaces/BaseArticle.ts</strong></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">ArticleType</span> <span class="kn">from</span> <span class="s2">"../enums/ArticleType"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">Price</span> <span class="kn">from</span> <span class="s2">"./Price"</span><span class="p">;</span>
<span class="n">interface</span> <span class="n">BaseArticle</span> <span class="p">{</span>
<span class="n">SKU</span><span class="p">:</span> <span class="n">string</span><span class="p">;</span>
<span class="n">name</span><span class="p">:</span> <span class="n">string</span><span class="p">;</span>
<span class="nb">type</span><span class="p">:</span> <span class="n">ArticleType</span><span class="p">;</span>
<span class="n">price</span><span class="p">:</span> <span class="n">Price</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>Extension of interfaces allows us to extend our own interfaces with additional properties.</p>
<p>.<strong>/interfaces/FashionArticle.ts</strong></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">Colors</span> <span class="kn">from</span> <span class="s2">"../enums/Colors"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">BaseArticle</span> <span class="kn">from</span> <span class="s2">"./BaseArticle"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">Sizes</span> <span class="kn">from</span> <span class="s2">"../enums/Sizes"</span><span class="p">;</span>
<span class="n">interface</span> <span class="n">FashionArticle</span> <span class="n">extends</span> <span class="n">BaseArticle</span> <span class="p">{</span>
<span class="n">size</span><span class="p">:</span> <span class="n">Sizes</span><span class="p">;</span>
<span class="n">color</span><span class="p">:</span> <span class="n">Colors</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>We can also extend already existing interfaces. As an example, I’ll create an FashioArticleModel interface that will
extend the Document interface from Mongoose and our FashionArticle interface so we can use it when creating database
schema.</p>
<p>.<strong>/interfaces/FashionArticleModel.ts</strong></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span> <span class="n">Document</span> <span class="p">}</span> <span class="kn">from</span> <span class="s2">"mongoose"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">FashionArticle</span> <span class="kn">from</span> <span class="s2">"./FashionArticle"</span><span class="p">;</span>
<span class="n">interface</span> <span class="n">FashionArticleModel</span> <span class="n">extends</span> <span class="n">FashionArticle</span><span class="p">,</span> <span class="n">Document</span> <span class="p">{};</span>
<span class="n">export</span> <span class="n">default</span> <span class="n">FashionArticleModel</span><span class="p">;</span>
</code></pre></div>
<p>Using IFasionArticleModel interface in the schema allows us to create a model with properties from both the Mongoose
Document and FashionArticle interfaces.</p>
<p>.<strong>/schemas/FashionArticle.schema.ts</strong></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span> <span class="n">Schema</span><span class="p">,</span> <span class="n">Model</span><span class="p">,</span> <span class="n">model</span><span class="p">}</span> <span class="kn">from</span> <span class="s2">"mongoose"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">FashionArticleModel</span> <span class="kn">from</span> <span class="s2">"../interfaces/FashionArticleModel"</span><span class="p">;</span>
<span class="n">const</span> <span class="n">ArticleSchema</span><span class="p">:</span> <span class="n">Schema</span> <span class="o">=</span> <span class="n">new</span> <span class="n">Schema</span><span class="p">({</span>
<span class="n">name</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
<span class="nb">type</span><span class="p">:</span> <span class="n">Number</span><span class="p">,</span>
<span class="n">size</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
<span class="n">color</span><span class="p">:</span> <span class="n">Number</span><span class="p">,</span>
<span class="n">price</span><span class="p">:</span> <span class="p">{</span>
<span class="n">price</span><span class="p">:</span> <span class="n">Number</span><span class="p">,</span>
<span class="n">basePrice</span><span class="p">:</span> <span class="n">Number</span>
<span class="p">},</span>
<span class="n">SKU</span><span class="p">:</span> <span class="n">String</span>
<span class="p">});</span>
<span class="o">//</span> <span class="n">Use</span> <span class="n">Model</span> <span class="n">generic</span> <span class="kn">from</span> <span class="nn">mongoose</span> <span class="n">to</span> <span class="n">create</span> <span class="n">a</span> <span class="n">model</span> <span class="n">of</span> <span class="n">FashionArticle</span> <span class="nb">type</span><span class="o">.</span>
<span class="n">const</span> <span class="n">ArticleModel</span><span class="p">:</span> <span class="n">Model</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="s2">"Article"</span><span class="p">,</span> <span class="n">ArticleSchema</span><span class="p">);</span>
<span class="n">export</span> <span class="p">{</span><span class="n">ArticleModel</span><span class="p">};</span>
</code></pre></div>
<p>I hope this example application already shows how TypeScript can make your code more declarative, self documentable and
potentially easier to maintain. Using TS is also a good exercise for frontend developers to learn and apply OOP
paradigms in real life projects, and backend developers should find many familiar practices and code constructs.</p>
<p>Finally I would suggest to jump into Articles route class and check a CRUD functionality of the application.</p>
<p>.<strong>/routes/Articles.route.ts</strong></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span> <span class="n">Request</span><span class="p">,</span> <span class="n">Response</span><span class="p">,</span> <span class="n">Router</span> <span class="p">}</span> <span class="kn">from</span> <span class="s2">"express"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">ArticleType</span> <span class="kn">from</span> <span class="s2">"../enums/ArticleType"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">Colors</span> <span class="kn">from</span> <span class="s2">"../enums/Colors"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">Shoe</span> <span class="kn">from</span> <span class="s2">"../classes/Shoe"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">Sizes</span> <span class="kn">from</span> <span class="s2">"../enums/Sizes"</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span> <span class="n">ArticleModel</span> <span class="p">}</span> <span class="kn">from</span> <span class="s2">"../schemas/FashionArticle.schema"</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">FashionArticleModel</span> <span class="kn">from</span> <span class="s2">"../interfaces/FashionArticleModel"</span><span class="p">;</span>
<span class="k">class</span> <span class="nc">ArticlesRoute</span> <span class="p">{</span>
<span class="n">public</span> <span class="n">router</span><span class="p">:</span> <span class="n">Router</span><span class="p">;</span>
<span class="n">constructor</span><span class="p">()</span> <span class="p">{</span>
<span class="n">this</span><span class="o">.</span><span class="n">router</span> <span class="o">=</span> <span class="n">Router</span><span class="p">();</span>
<span class="n">this</span><span class="o">.</span><span class="n">init</span><span class="p">();</span>
<span class="p">}</span>
<span class="o">//</span> <span class="n">Putting</span> <span class="nb">all</span> <span class="n">routes</span> <span class="n">into</span> <span class="n">one</span> <span class="n">place</span> <span class="n">makes</span> <span class="n">it</span> <span class="n">easy</span> <span class="n">to</span> <span class="n">search</span> <span class="k">for</span> <span class="n">specific</span> <span class="n">functionality</span>
<span class="o">//</span> <span class="n">As</span> <span class="n">this</span> <span class="n">method</span> <span class="n">will</span> <span class="n">be</span> <span class="n">called</span> <span class="ow">in</span> <span class="n">a</span> <span class="n">context</span> <span class="n">of</span> <span class="n">a</span> <span class="n">different</span> <span class="n">class</span><span class="p">,</span> <span class="n">we</span> <span class="n">need</span> <span class="n">to</span> <span class="n">bind</span> <span class="n">methods</span> <span class="n">objects</span> <span class="n">to</span> <span class="n">current</span> <span class="n">class</span><span class="o">.</span>
<span class="n">public</span> <span class="n">init</span><span class="p">()</span> <span class="p">{</span>
<span class="n">this</span><span class="o">.</span><span class="n">router</span><span class="o">.</span><span class="n">route</span><span class="p">(</span><span class="s2">"/"</span><span class="p">)</span>
<span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">getArticles</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">this</span><span class="p">))</span><span class="o">=</span>
<span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">createArticle</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">this</span><span class="p">));</span>
<span class="n">this</span><span class="o">.</span><span class="n">router</span><span class="o">.</span><span class="n">route</span><span class="p">(</span><span class="s2">"/:id"</span><span class="p">)</span>
<span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">getArticleById</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">this</span><span class="p">))</span>
<span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">updateArticle</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">this</span><span class="p">))</span>
<span class="o">.</span><span class="n">delete</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">deleteArticle</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">this</span><span class="p">));</span>
<span class="p">}</span>
<span class="o">//</span> <span class="n">I</span><span class="s1">'m not a huge fan of JavaScript callbacks hell and especially of using it in NodeJS, so I'</span><span class="n">ll</span> <span class="n">use</span> <span class="n">promises</span> <span class="n">instead</span><span class="o">.</span>
<span class="n">public</span> <span class="n">getArticles</span><span class="p">(</span><span class="n">request</span><span class="p">:</span> <span class="n">Request</span><span class="p">,</span> <span class="n">response</span><span class="p">:</span> <span class="n">Response</span><span class="p">):</span> <span class="n">void</span> <span class="p">{</span>
<span class="n">ArticleModel</span><span class="o">.</span><span class="n">find</span><span class="p">()</span>
<span class="o">.</span><span class="n">then</span><span class="p">((</span><span class="n">articles</span><span class="p">:</span> <span class="n">FashionArticleModel</span><span class="p">[])</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">(</span><span class="n">articles</span><span class="p">);</span>
<span class="p">})</span>
<span class="o">.</span><span class="n">catch</span><span class="p">((</span><span class="n">errror</span><span class="p">:</span> <span class="n">Error</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">console</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="n">errror</span><span class="p">);</span>
<span class="p">})</span>
<span class="p">}</span>
<span class="n">public</span> <span class="n">getArticleById</span><span class="p">(</span><span class="n">request</span><span class="p">:</span> <span class="n">Request</span><span class="p">,</span> <span class="n">response</span><span class="p">:</span> <span class="n">Response</span><span class="p">):</span> <span class="n">void</span> <span class="p">{</span>
<span class="n">const</span> <span class="nb">id</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">params</span><span class="o">.</span><span class="n">id</span><span class="p">;</span>
<span class="n">ArticleModel</span>
<span class="o">.</span><span class="n">findById</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span>
<span class="o">.</span><span class="n">then</span><span class="p">((</span><span class="n">article</span><span class="p">:</span> <span class="n">FashionArticleModel</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">(</span><span class="n">article</span><span class="p">);</span>
<span class="p">})</span>
<span class="o">.</span><span class="n">catch</span><span class="p">((</span><span class="n">error</span><span class="p">:</span> <span class="n">Error</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">console</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="n">error</span><span class="p">);</span>
<span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">status</span><span class="p">(</span><span class="mi">400</span><span class="p">)</span><span class="o">.</span><span class="n">json</span><span class="p">({</span> <span class="n">error</span><span class="p">:</span> <span class="n">error</span> <span class="p">});</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="n">public</span> <span class="n">createArticle</span><span class="p">(</span><span class="n">request</span><span class="p">:</span> <span class="n">Request</span><span class="p">,</span> <span class="n">response</span><span class="p">:</span> <span class="n">Response</span><span class="p">):</span> <span class="n">void</span> <span class="p">{</span>
<span class="n">const</span> <span class="n">requestBody</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">body</span><span class="p">;</span>
<span class="n">const</span> <span class="n">article</span> <span class="o">=</span> <span class="n">new</span> <span class="n">Shoe</span><span class="p">(</span><span class="n">requestBody</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="n">requestBody</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">requestBody</span><span class="o">.</span><span class="n">size</span><span class="p">,</span> <span class="n">requestBody</span><span class="o">.</span><span class="n">color</span><span class="p">,</span> <span class="n">requestBody</span><span class="o">.</span><span class="n">price</span><span class="p">);</span>
<span class="n">const</span> <span class="n">articeModel</span> <span class="o">=</span> <span class="n">new</span> <span class="n">ArticleModel</span><span class="p">({</span>
<span class="n">name</span><span class="p">:</span> <span class="n">article</span><span class="o">.</span><span class="n">name</span><span class="p">,</span>
<span class="nb">type</span><span class="p">:</span> <span class="n">article</span><span class="o">.</span><span class="n">type</span><span class="p">,</span>
<span class="n">size</span><span class="p">:</span> <span class="n">article</span><span class="o">.</span><span class="n">size</span><span class="p">,</span>
<span class="n">color</span><span class="p">:</span> <span class="n">article</span><span class="o">.</span><span class="n">color</span><span class="p">,</span>
<span class="n">price</span><span class="p">:</span> <span class="n">article</span><span class="o">.</span><span class="n">price</span><span class="p">,</span>
<span class="n">SKU</span><span class="p">:</span> <span class="n">article</span><span class="o">.</span><span class="n">SKU</span>
<span class="p">});</span>
<span class="n">articeModel</span>
<span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="o">.</span><span class="n">then</span><span class="p">((</span><span class="n">createdArticle</span><span class="p">:</span> <span class="n">FashionArticleModel</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">(</span><span class="n">createdArticle</span><span class="p">);</span>
<span class="p">})</span>
<span class="o">.</span><span class="n">catch</span><span class="p">((</span><span class="n">error</span><span class="p">:</span> <span class="n">Error</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">console</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="n">error</span><span class="p">);</span>
<span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">status</span><span class="p">(</span><span class="mi">400</span><span class="p">)</span><span class="o">.</span><span class="n">json</span><span class="p">({</span> <span class="n">error</span><span class="p">:</span> <span class="n">error</span> <span class="p">});</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="n">public</span> <span class="n">updateArticle</span><span class="p">(</span><span class="n">request</span><span class="p">:</span> <span class="n">Request</span><span class="p">,</span> <span class="n">response</span><span class="p">:</span> <span class="n">Response</span><span class="p">):</span> <span class="n">void</span> <span class="p">{</span>
<span class="n">const</span> <span class="nb">id</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">params</span><span class="o">.</span><span class="n">id</span><span class="p">;</span>
<span class="n">const</span> <span class="n">requestBody</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">body</span><span class="p">;</span>
<span class="n">const</span> <span class="n">article</span> <span class="o">=</span> <span class="n">new</span> <span class="n">FashionArticle</span><span class="p">(</span><span class="n">requestBody</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="n">requestBody</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">requestBody</span><span class="o">.</span><span class="n">size</span><span class="p">,</span> <span class="n">requestBody</span><span class="o">.</span><span class="n">color</span><span class="p">,</span> <span class="n">requestBody</span><span class="o">.</span><span class="n">price</span><span class="p">,</span> <span class="n">requestBody</span><span class="o">.</span><span class="n">SKU</span><span class="p">);</span>
<span class="n">ArticleModel</span><span class="o">.</span><span class="n">findByIdAndUpdate</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">article</span><span class="p">)</span>
<span class="o">.</span><span class="n">then</span><span class="p">((</span><span class="n">updatedArticle</span><span class="p">:</span> <span class="n">FashionArticleModel</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">(</span><span class="n">updatedArticle</span><span class="p">);</span>
<span class="p">})</span>
<span class="o">.</span><span class="n">catch</span><span class="p">((</span><span class="n">error</span><span class="p">:</span> <span class="n">Error</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">console</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="n">error</span><span class="p">);</span>
<span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">({</span> <span class="n">err</span><span class="p">:</span> <span class="n">error</span> <span class="p">});</span>
<span class="p">})</span>
<span class="p">}</span>
<span class="n">public</span> <span class="n">deleteArticle</span><span class="p">(</span><span class="n">request</span><span class="p">:</span> <span class="n">Request</span><span class="p">,</span> <span class="n">response</span><span class="p">:</span> <span class="n">Response</span><span class="p">):</span> <span class="n">void</span> <span class="p">{</span>
<span class="n">const</span> <span class="n">articleId</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">params</span><span class="o">.</span><span class="n">id</span><span class="p">;</span>
<span class="n">ArticleModel</span><span class="o">.</span><span class="n">findByIdAndRemove</span><span class="p">(</span><span class="n">articleId</span><span class="p">)</span>
<span class="o">.</span><span class="n">then</span><span class="p">((</span><span class="n">res</span><span class="p">:</span> <span class="nb">any</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">status</span><span class="p">(</span><span class="mi">204</span><span class="p">)</span><span class="o">.</span><span class="n">end</span><span class="p">();</span>
<span class="p">})</span>
<span class="o">.</span><span class="n">catch</span><span class="p">((</span><span class="n">error</span><span class="p">:</span> <span class="n">Error</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">console</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="n">error</span><span class="p">);</span>
<span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">({</span> <span class="n">error</span><span class="p">:</span> <span class="n">error</span> <span class="p">});</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">export</span> <span class="n">default</span> <span class="n">ArticlesRoute</span><span class="p">;</span>
</code></pre></div>
<p>As a conclusion, <strong>TypeScript</strong> is a powerful tool that brings a really flexible, reach type checking system to your
code. It also introduces enhanced well-known patterns like <strong>interfaces</strong>, <strong>abstract classes</strong> and <strong>access
modifiers</strong>.</p>
<p>Of course, the application is not ready for production use, as we have to cover everything with tests and set up a
proper development environment, but we can cover that in the future.</p>
<p><em>Work with engineers like Dmytro. Have a look at our <a href="https://jobs.zalando.com/jobs/952450-senior-java-engineer-content-solutions-team/">jobs
page</a>.</em></p>The Art of Ontology2018-03-20T00:00:00+01:002018-03-20T00:00:00+01:00Katariina Karitag:engineering.zalando.com,2018-03-20:/posts/2018/03/semantic-web-technologies.html<p>Introducing Semantic Web Technologies at Zalando</p><p>Introducing Semantic Web Technologies at Zalando</p>
<p>Two years ago, in March 2016, the newly-opened Helsinki office wondered about the expressivity of our current product data. What do attributes like <em>material construction</em> or <em>sport quality</em> really mean, and how can we use them to create
meaningful fashion experiences online for our customers?</p>
<p>It was around that time that, after working for five years solely in the art sector and building digital strategies for
various classical music organisations, I applied for Zalando in hopes of improving my existing technology skills and
perhaps learn some <a href="https://engineering.zalando.com/posts/2018/01/why-we-do-scala.html">Scala</a>. It quickly became
apparent that what the company and teams in Helsinki actually needed, was a technical solution for better fashion
understanding. In order to innovate the online shopping experience, we need to be fluent in fashion, and for that we
need background information. We need to be able to know, for example, what vegan clothes are. While product data does
express whether the item is made of wool or leather, etc., nowhere is it explicitly expressed that the absence of these
materials equals “vegan.” So how can we, and the customer, know this?</p>
<p>So, I gave the Scala course a pause and dove into building a fashion knowledge graph for Zalando using
<a href="https://www.w3.org/TR/turtle/">Turtle</a>, <a href="https://json-ld.org/">JSON-LD</a> and <a href="https://www.python.org/">Python</a>.</p>
<h3>The Benefits of a Knowledge Graph at Zalando</h3>
<p>Semantic web technologies use knowledge graphs, technically known as named directed graphs, to provide background data
held by humans in a machine-readable form. For example, the knowledge graph knows that:</p>
<ol>
<li>Silk and wool are animal-based fibers and leather is an animal-based material.</li>
<li>Vegan means refusing to use any animal-based products.</li>
</ol>
<p>Therefore, an application using the knowledge graph can interpret the product data of wool, silk, or leather as not
suitable for vegans, or only offer products that are vegan by excluding those items with materials made from animals.</p>
<p>In practice, this means that Zalando can:</p>
<ol>
<li>Understand the word "vegan" in a search without ever expressing it in our product data explicitly,</li>
<li>Offer special values in filters, such as "vegan",</li>
<li>Show a page with knowledge about vegan clothing that includes articles on vegan fashion, outfits for vegans, vegan
clothing collections, and vegan-appropriate products from our catalogue.</li>
</ol>
<p>Not only do we make the implicit human knowledge explicit, we also store what kind of information it is and how it
relates to other kinds of knowledge. For example, we know that being vegan is a type of global awareness, as is favoring
sustainable clothes. By understanding the underlying structure of these snippets of human understanding, we can do even
more. We can:</p>
<ol>
<li>Intelligently suggest links for further browsing.</li>
<li>Apply business rules. For example, when a customer is browsing a particular brand, we might not suggest competing
brands.</li>
<li>Know which attributes are complementary to each other and which ones are opposites of each other.</li>
</ol>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d17083dedbdee3ad8824a4875ce038df35e45718_katariina.jpg?auto=compress,format"></p>
<p><em>Katariina presents her work in our Dublin tech hub.</em></p>
<h2>Not all Artificial Intelligence is Machine Learning</h2>
<p>Many times semantic web is either overlooked or confused with machine learning (ML). Artificial intelligence nowadays is
mostly a synonym for machine learning. However, semantic web is a branch of artificial intelligence that is very
different from machine learning.</p>
<p>Typically, when I state my profession as a semantic web practitioner, I hear what most of my colleagues hear: “But
Machine Learning does it better!” I agree with <em>better</em> to some extent, but I am not so sure about the <em>it</em>. ML does
many things better; things semantic web technologies are not really good at. Different learning algorithms are great at
finding and recognising patterns in large data sets. A knowledge graph does not really do that. What it is good at, is
providing additional human knowledge to what has been learned or could be learned.</p>
<p>Recently, there have been numerous expert discussions on the explainability of machine learning methods, and more
general discussions about making machine learning processes less opaque. Deep-learning methods are statistical models
based on neural networks that fit to large datasets and learn layers of so called "weights," or numeric parameters. They
produce highly complex and functional black boxes that work, but cannot really answer <em>why they work.</em> Further research
explores how to add ontology information to the learning structure and thus to explain what has been learned.</p>
<p>I find this research exciting, because I think we can get the best results by combining both approaches. Years ago <a href="https://users.ics.aalto.fi/praiko/papers/mlg10.pdf">I
co-wrote a paper</a> on how a classic machine learning application of
document classification based on word-to-vector could be improved by few percentage points when adding information from
a general-domain knowledge graph. Therefore, I have found the sentence “But Machine Learning does it better!” too
dualistic for comparing two very different branches of artificial intelligence. It is like saying that “fish tastes
better!” when I express my liking to Béarnaise sauce.</p>
<p>I have one more thought I would like to share that I have developed over the years I have worked with the arts. Much
like any artistic endeavour, fashion seeks to find new and unusual combinations. It seeks to break the status quo and to
shock us in just the right way. Trends love contradicting each other. At some point pink and red do not go together
ever, at another point in time it is a total trend. Machine learning can only do recommendations based on existing data.
With the knowledge graph however, we can ask for fashion experts to give us these glances to the future, describe the
self-contradicting world that fashion is, and help us present our contents to our customers in a meaningful way.</p>
<p><em>Want to change the fashion landscape like Katariina? Check out our <a href="https://jobs.zalando.com/en/?location=Helsinki&search=helsinki&utm_source=techblog&utm_medium=blog-b-organic&utm_campaign=2018-zfi&utm_content=03-helsinki-katariina-semweb">Helsinki job
postings</a>.</em></p>
<p><strong>*References</strong>
* <a href="https://www.linkedin.com/pulse/shared-realities-ontology-tech-louisa-heinrich/">https://www.linkedin.com/pulse/shared-realities-ontology-tech-louisa-heinrich/</a></p>
<p><a href="https://blog.openai.com/discovering-types-for-entity-disambiguation/">https://blog.openai.com/discovering-types-for-entity-disambiguation/</a></p>
<p><a href="https://www.wired.com/story/greedy-brittle-opaque-and-shallow-the-downsides-to-deep-learning/">https://www.wired.com/story/greedy-brittle-opaque-and-shallow-the-downsides-to-deep-learning/</a></p>Why MobX?2018-03-15T00:00:00+01:002018-03-15T00:00:00+01:00Eugen Kisstag:engineering.zalando.com,2018-03-15:/posts/2018/03/why-mobx.html<p>Removing the burden of state management</p><h3><strong>Removing the burden of state management</strong></h3>
<p>State management and change propagation are arguably some of the hardest challenges in GUI programming. Many tools
promised to save us from their burden. Only a few remained. Among them is <a href="http://mobxjs.github.io/mobx">MobX</a> and its
flavor of <a href="https://github.com/meteor/docs/blob/version-NEXT/long-form/tracker-manual.md#transparent-reactive-programming">Transparent Reactive
Programming</a>.</p>
<p>To understand the appeal of MobX, it is helpful to first understand how React revolutionized GUI programming.
Traditional approaches allow description of the initial state of the GUI. Further GUI state transitions must be
accomplished with references to GUI elements and piece-wise mutations. This is error-prone as edge cases are easily
missed. With React you describe the GUI at any given point in time. Put differently, taking care of GUI state
transitions, e.g. manipulating the DOM, is a thing of the past: Your GUI code has become declarative.</p>
<p>React’s crucial advantage is making the “how” of updating the GUI transparent. That idea is reapplied by MobX. Not for
GUI manipulation code, but instead for state management and change propagation. In fact, combining both React and MobX
is synergistic since even though React nicely takes care of how to update the GUI, when to update the GUI remains
cumbersome without MobX. Cross communication between components is the biggest pain point.</p>
<p>Let us explore an example in React and JavaScript illustrating MobX’s advantages:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span> <span class="n">observable</span> <span class="p">}</span> <span class="kn">from</span> <span class="err">‘</span><span class="n">mobx</span><span class="err">’</span>
<span class="kn">import</span> <span class="p">{</span> <span class="n">observer</span> <span class="p">}</span> <span class="kn">from</span> <span class="err">‘</span><span class="n">mobx</span><span class="o">-</span><span class="n">react</span><span class="err">’</span>
<span class="n">const</span> <span class="n">store</span> <span class="o">=</span> <span class="n">observable</span><span class="p">({</span>
<span class="n">itemCount</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="n">lastItem</span><span class="p">:</span> <span class="n">null</span>
<span class="p">})</span>
<span class="n">const</span> <span class="n">handleBuy</span> <span class="o">=</span> <span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="o">=></span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">store</span><span class="o">.</span><span class="n">itemCount</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">store</span><span class="o">.</span><span class="n">lastItem</span> <span class="o">=</span> <span class="n">name</span>
<span class="p">}</span>
<span class="n">const</span> <span class="n">handleClearCart</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">store</span><span class="o">.</span><span class="n">itemCount</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">store</span><span class="o">.</span><span class="n">lastItem</span> <span class="o">=</span> <span class="n">null</span>
<span class="p">}</span>
<span class="n">const</span> <span class="n">Cart</span> <span class="o">=</span> <span class="n">observer</span><span class="p">(()</span> <span class="o">=></span>
<span class="n">Items</span> <span class="ow">in</span> <span class="n">cart</span><span class="p">:</span> <span class="p">{</span><span class="n">store</span><span class="o">.</span><span class="n">itemCount</span><span class="p">}</span>
<span class="n">Clear</span> <span class="n">cart</span>
<span class="p">)</span>
<span class="n">const</span> <span class="n">Header</span> <span class="o">=</span> <span class="n">observer</span><span class="p">(({</span><span class="n">children</span><span class="p">})</span> <span class="o">=></span>
<span class="n">Header</span>
<span class="p">{</span><span class="n">children</span><span class="p">}</span>
<span class="p">)</span>
<span class="n">const</span> <span class="n">LastBought</span> <span class="o">=</span> <span class="n">observer</span><span class="p">(()</span> <span class="o">=></span>
<span class="n">Last</span> <span class="n">bought</span> <span class="n">item</span><span class="p">:</span> <span class="p">{</span><span class="n">store</span><span class="o">.</span><span class="n">lastItem</span><span class="p">}</span><span class="o">.</span>
<span class="p">)</span>
<span class="n">const</span> <span class="n">Main</span> <span class="o">=</span> <span class="n">observer</span><span class="p">(()</span> <span class="o">=></span>
<span class="n">Buy</span> <span class="n">shoes</span>
<span class="n">Buy</span> <span class="n">shirt</span>
<span class="p">)</span>
</code></pre></div>
<p>The example represents a very simplified e-commerce page. Any resemblance to Zalando’s fashion store is, of course,
coincidental. <a href="https://llqor09qlq.codesandbox.io/">Here is a demo</a> and its <a href="https://codesandbox.io/s/github/eugenkiss/mobx-cart-example-zalando">live-editable source
code</a>. The behavior is as follows: Initially, your
cart is empty. When you click on “Buy shoe” your cart item count increases by one and the recently bought component
shows “shoe”. A click on “Buy shirt” does the same for “shirt”. You can also clear your cart. In the live example you
will see that only the cart in the header is re-rendered but not the header itself. You will also see that the recently
bought component does not rerender when you buy the same product in succession.</p>
<p>Notice that the dependency between the observable variables itemCount and lastItem, and the components is not explicitly
specified. Yet, the components correctly and efficiently rerender on changes. You may wonder how this is accomplished.
The answer is that MobX implicitly builds up a dependency graph during execution of the components’ render functions
that tracks which components to rerender when an observable variable changes. A way to think of MobX is in terms of a
spreadsheet where your components are formulas of observable variables. Regardless of how the “magic” works underneath,
and which analogy to use, the result is clear: You are freed from the burden of explicitly managing change
propagation!</p>
<p>To sum up, MobX is a pragmatic, non-ceremonial, and efficient solution to the challenge of state management and change
propagation. It works by building up a run-time dependency graph between observable variables and components. Using both
React and MobX together is synergistic. Currently, MobX is used in just a few projects inside Zalando. Seeing as MobX
offers a great deal to anyone writing GUI programs, I am confident that its adoption will rise in the future.</p>
<p><em>Keep in touch with fresh Zalando Tech news. Follow us on <a href="https://twitter.com/ZalandoTech">Twitter</a>.</em></p>Zalando Tech: Lisbon2018-03-13T00:00:00+01:002018-03-13T00:00:00+01:00Vivi Brooketag:engineering.zalando.com,2018-03-13:/posts/2018/03/interview-michael-duergner-lisbon.html<p>Engineering Lead, Michael Duergner on the company’s newest tech hub</p><h3><strong>Engineering Lead, Michael Duergner on the company’s newest tech hub</strong></h3>
<p>Now in its 10th year of operation, Zalando continues to focus on a smooth and inspiring digital experience for its
customers with the <a href="https://corporate.zalando.com/en/newsroom/en/stories/photo-essay-lisbon-tech-hub-announcement">opening of its latest tech hub in
Lisbon</a>, among other
initiatives. We talk to engineering lead, Michael Duergner, who tells us more about the newest Zalando location and the
company’s <a href="https://annual-report.zalando.com/2017/magazine/zalando-gets-personal/">personalization</a> plans.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1a58748b79d065038e10dbf705ebea71ab82a301_2018-01-20-photo-00003432.jpg?auto=compress,format"></p>
<p><em>Michael leads the engineering team in Lisbon, Zalando's third international tech hub.</em></p>
<p>**
Do you remember your first encounter with “tech”? What was it?
**Oh, that’s a very long time ago. My dad took care of organizing the annual running event for the local sports club and
just got his first “laptop.” Back then, it was more like a huge battery attached to a tiny screen which had orange text
on a black background. It had an enormous 40MB hard disk. I did my first “programming” on it with some “scripting” for
the “database” solution to improve the process for printing the participation certificates for all the runners.</p>
<p><strong>Tell us about your time at Zalando so far.
</strong>I joined the company as part of the acquisition of my startup, AMAZE, so my onboarding was a little different compared
to the normal Zalando onboarding. The first real accomplishment for us as a team back then was navigating the Zalando
tech stack, and getting our services migrated over and deployed. Afterwards, I had the possibility to move to our
Helsinki office and lead a team there focusing on building prototypes to integrate external partners into Zalando. I
also got involved with a couple of the guilds early on; mainly technologists, API and Open Source groups, and I enjoyed
having an impact on shaping our future with the technologies we use, and encouraging teams to write better APIs.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/fa356d14ec23a8f79281d2a00999ed01632c778e_2017-03-21-helsinki_michael-duergner-9512.jpg?auto=compress,format"></p>
<p><em>Michael in our Helsinki tech hub.</em></p>
<p><strong>Can you talk about the new Zalando tech hub in Lisbon? It will focus on the Digital Experience, right?
</strong>Yes, we will be working on Digital Experience related topics only in Lisbon. More specifically we will work on how we
can get external partners integrated into the next generation of the Fashion Store. For example, we’re working on
integrating outfits from external publishers and bloggers in a really scalable way.</p>
<p><strong>What are your first impressions of Lisbon both as a city to live in and as a place to work? What about the Lisbon tech
scene?
</strong>My first impressions of Lisbon are really positive. You can feel the lively startup scene and a strong “getting things
done” vibe around here. People are really motivated to build something new. Besides that, being close to the beach is a
huge plus. The tech scene as a whole is developing quite quickly with new meetups happening regularly. We have already
spoken at a few meetups. Last night, Portuguese native, Luis Mineiro from our Berlin headquarters gave a talk at the
<a href="https://www.meetup.com/DevOps-Lisbon/?_cookie-check=FIaX8LuZDYAkvn_l">DevOps Lisbon meetup</a> group on reliability
patterns. We look forward to hosting our own meetups soon too!</p>
<p><strong>What are some of your favourite tech products and startups?
</strong>I’m a fan of Kickstarter, and have a few of these “this is the next big thing” tech products at home; most of them
related to photography. When it comes to startups, I really like <a href="https://buffer.com/">Buffer</a> for the product they are
building, and <a href="https://basecamp.com/">Basecamp</a> for the type of company they managed to create; focusing on creating a
long-lasting business rather than some super-hyped startup. I’m looking forward to discovering my favorite startups in
Lisbon or Portugal in general. I’ve met some inspiring entrepreneurs at different meetups already.</p>
<p><em>Check out our open positions in <a href="http://zln.do/2EL0do1">Lisbon</a> to join Michael’s engineering team.</em></p>How to Spot a Bad Product2018-03-08T00:00:00+01:002018-03-08T00:00:00+01:00Vadym Kukhtintag:engineering.zalando.com,2018-03-08:/posts/2018/03/how-to-spot-bad-product.html<p>Red flags to look out for in badly written projects.</p><h3><strong>Red flags to look out for in badly written projects.</strong></h3>
<p>Let’s talk about common <strong>red flags</strong> or alternatively, how to define badly-written project.</p>
<p>Many of us have experienced a project which is crying and begging for something drastic to change, or even for it to be
put out of its misery altogether, but alas; we don’t have the heart or the resources to “pull the plug” as it were. From
year to year, this poor project grows and grows; each day with new fixes and features being added. It can become so
painful and cumbersome in the end, it simply isn’t tenable anymore. You fix something in one place, but in other parts
something crashes, and so on…</p>
<p>In this post, we’ll help you work out at an earlier stage if a project is bad–without checking source code (excluding
project structure).</p>
<p>So, how to define a project that should be rewritten because of age or quality?</p>
<p>There are some <strong>red flags</strong> to look out for:</p>
<p><strong>1. Lack of or no communication
</strong>When you open a project for the first time you should be able to:</p>
<ul>
<li>Understand what is the project about.</li>
<li>Be able to see the technologies which are used in it.</li>
<li>Have an overview of the project structure or an overview of the main business workflows.</li>
<li>See the team that maintains the project.</li>
<li>See how the project relates to other teams/services.</li>
<li>See how to run the project.</li>
</ul>
<p>At the very least you should see which team maintains the project, then if you have any of the other enquiries listed,
you will be able to contact them and ask them about every detail you need. If most of the above points are missing, this
is a major <strong>red flag</strong>.</p>
<p><strong>2. No tests
</strong>When you’re finished with the documentation, you have some information to think about. If it is <a href="https://www.fastcompany.com/3036707/five-ways-to-write-better-technical-documentation">good
documentation</a>, you will be able
to understand if a project is covered or not. But if you have no documentation, you should check the project
structure.</p>
<p>If a project is good you will see a “tests” folder or “.spec files” within services or components with some code in
them.</p>
<p>The <strong>red flag</strong> here is if there are no tests, or there are created folders or files for tests but they are empty. Even
worse, is a situation where there are tests, but they are commented out.</p>
<p><strong>3. Repository state and activity
</strong>A project should have a <strong>remote</strong> <a href="https://zalando.github.io/">repository</a>. If a project is deployed locally, that is
not just your everyday <strong>red flag</strong> it is a <em>huge</em> sign that something is dramatically wrong with this project.</p>
<p>A sample repository is a GitHub repo.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2c1f37b5fe3d4cbc671b4056f5260c95471b5a9b_screen-shot-2018-02-27-at-11.57.25.png?auto=compress,format"></p>
<p>When you go into the repository, you should check for the following:
<strong>a) Huge numbers of branches:</strong></p>
<ul>
<li>Developers are not cleaning up the branches they deployed to production.</li>
<li>Too many features developed at the same time.</li>
</ul>
<p><strong>b) No README.md:</strong></p>
<ul>
<li>Without the help of contributors, you won’t be able to run the project or deploy it correctly. Sometimes instead of
a README.md, it will be Wiki page or GitHub Pages; this is ok.</li>
</ul>
<p><strong>c) Number of issues and how fast they are approached:</strong></p>
<ul>
<li>You can check the latest issues (about the last 10) and check their dates and comments flow. If contributors answer
fast that means they are supporting the project and you can easily use their code in your own codebase.</li>
</ul>
<p><strong>d) Last time any changes were made:</strong></p>
<ul>
<li>You can check latest Pull Requests or open issues.</li>
</ul>
<p>Some rules for <a href="https://github.com/zalando/zalando-howto-open-source">open source</a> repositories when a project has
already been in production some time:</p>
<ul>
<li>Small numbers of contributors.</li>
<li>Small numbers of stars and forks.</li>
</ul>
<p>If you follow the above guidelines, you'll find that you can spot <strong>red flags</strong> more easily and hopefully save yourself
some time!</p>
<p><em>Want to work on great projects? <a href="https://jobs.zalando.com/tech/jobs/">Join our team</a> at Zalando Tech.</em></p>Just Run a Game Day2018-03-06T00:00:00+01:002018-03-06T00:00:00+01:00John Colemantag:engineering.zalando.com,2018-03-06:/posts/2018/03/just-run-game-day.html<p>Scaling operational excellence</p><h3><strong>Scaling operational excellence</strong></h3>
<p>Zalando is iterating our production incident handling process. The previous process had a dedicated Tier One 24/7 team
who coordinated the incident response communication while escalating to the service-owning Tier Two team(s) for a
resolution. That has been rationalized to those service-owning teams handling their incident flow, from alert to
post-mortem to reduce time to resolution.</p>
<p>Zalando’s Customer Data Platform (CDP) is composed of several teams co-located in our <a href="https://jobs.zalando.com/en/?location=Dublin&search=">Dublin
office</a>, each focused on solving specific insights or core data
problems to improve the fashion experience for our customers. Depending on the needs of each problem space, our team
sizes can range from <a href="https://www.scrum.org/forum/scrum-forum/5759/development-team-size">two to nine</a> people, depending
on the team’s life cycle stage. Their autonomy manifests as variation in the technology stacks deployed by each team.</p>
<p>The people who build a service are in the best position to support it, and while small teams can be very focused and
effective, the reality is that operating an on-call rota within a newly formed team of perhaps two people is likely not
sustainable and–I suggest–unnecessary.</p>
<p>This post will describe one method we use to increase the pool of people creating production services who are also
comfortable with sharing the on-call support responsibility. I also address how we establish a positive feedback loop to
build, document and operate better systems.</p>
<h3>The Challenge</h3>
<p>Being on-call every second week either as primary or secondary support can compromise work-life balance in subtle ways
e.g. scheduling leave, illness cover, family events, or meeting up with that friend who’s back in town for just one
night.</p>
<p>As “vertically scaling” an individual is not always practical, we can try to horizontally scale the pool of individuals
available to join the on-call rota.</p>
<p>The individual members within a cross-functional team naturally have varying degrees of SRE/SysOp skills, experience,
and confidence in being on-call. We have found the range to be somewhat more pronounced when teams are composed of a mix
of data scientists and engineers who have diverse backgrounds, and thus have different expectations for what supporting
a production service and responding to an incident entails.</p>
<p>Add to this the challenge that every individual participating in the on-call rota can support <em>other teams’ services</em>
and you may see furrowed brows and a drop in confidence on your colleagues’ faces as they wonder how that might pan out
at 3am.</p>
<p>We learned that <strong>Game Days</strong> help calibrate expectations, raise confidence, and scale the pool of on-call support.</p>
<h3>What Is a Game Day?</h3>
<p>In three words: Exercising operational excellence.</p>
<p>In more words: practicing your incident handling process, dogfooding your documented playbooks, auditing your checks and
alerts coverage, literally testing your backups (at least once a year), providing a fire drill for your service.</p>
<p>Take your pick and add your own.</p>
<p>Typically, a Game Day is a window during work hours when failure modes are artificially induced within one or more
system components in a controlled manner. The supporting team then responds.</p>
<p>The specifics can vary by the maturity of the teams, services, and cultures involved. The window may be a recurring
calendar invite or could have the element of surprise. The failure may be created on a sacrificial, staging, or
production environment. The facilitator may be a rotating team member or an external actor.</p>
<p>The purpose is the same regardless: to provide the team with an opportunity to experiment and learn more about their
systems’ failure modes, validate documented recovery steps, and develop mitigation strategies for that 3am alert–whoever
is on-call.</p>
<p>I imagine you don’t want you or your colleague to be left frustrated, having to escalate a production incident where a
trivial but non-obvious recovery step is missing from the 24/7 playbook just because this may be the first time it has
been tested by someone other than the initial author.</p>
<h3>How We Started</h3>
<p>As part of a wider operational excellence effort in CDP to prepare for scaling the number of services offered, we
started small within two of the teams. Within my team of two engineers and two data scientists, we joined the on-call
rota, but some of us felt under prepared–apprehensive even–when shadowing others.</p>
<p>With buy-in from the team, we drafted a simple template document listing inputs, a rough execution plan, and outputs to
record successes and opportunities we expected to discover along the way. I scheduled Monday afternoon in our calendar
so the team knew the window, but intentionally omitted further details about how it might unfold. This was intended to
inject some realism into the scenario and maintain an air of <a href="https://i1.wp.com/www.neuronphaser.com/wp-content/uploads/2015/09/grumpy-cat-as-dungeonmaster.jpg">mystery and
suspense</a>.</p>
<p>We wanted to bootstrap the process in a safe and efficient manner, so we used what we already had available where
possible, and as this was the first iteration, opted to use our staging environment to avoid impacting other teams. We
aim to be able to safely run Game Days on our production environment in coordination with stakeholders to surface
brittle dependencies on our services and prevent cascading failures.</p>
<p>Cloning our production <a href="https://github.com/zalando/zmon">ZMON</a> alerts, tagging them as Game Day-specific, and targeting
the staging environment enables them to be quickly toggled before and after future Game Days. The alerts were configured
to page our CDP 24/7 pilot rota through <a href="https://www.opsgenie.com/">OpsGenie</a> at the time. Our staging environment
writes logs into <a href="http://eu.scalyr.com">Scalyr</a> where we had some saved queries.</p>
<p>We use <a href="https://gatling.io/">Gatling</a> to drive load test scenarios and employ this in two forms:</p>
<ul>
<li>A low-intensity 1req/s “trace” to give higher resolution than we typically get with our 60sec-interval ZMON checks,
which would run throughout the Game Day window.</li>
<li>A variety of relatively high intensity load tests which would run for a shorter period e.g. 2000req/s for 30 minutes
for the particular service under test.</li>
</ul>
<p>We had previously put effort into maintaining our section of the joint CDP
<a href="https://landing.google.com/sre/book/chapters/introduction.html#emergency-response-g0sKcpiL">Playbooks</a> written with
<a href="https://www.gitbook.com/">Gitbooks</a> but they could use some shaking out.</p>
<p>We planned on two phases to exercise some first principles root cause analysis; the first not being covered by the
playbook, and the second simulating human error and misconfiguration.</p>
<p>Our API stack is a Play app in front of DynamoDB tables that make calls to a handful of other services, is relatively
simple, and allows some creativity in breaking it in new and convincing ways to keep future Game Days exciting.</p>
<p>For the first scenario, I opted to simply kill the API EC2 instances. Not having more sophisticated tooling prepared, I
ran a rudimentary Bash for loop from my machine which lists and kills running instances in the stack. It is, admittedly,
a contrived yet possible and effective method of inducing a failure.</p>
<p>For the second, I planned to break our <a href="https://github.com/sebdah/dynamic-dynamodb">Dynamic DynamoDB</a> autoscaler, which
we had been using before AWS AutoScaling was available. We had wrapped it in a docker image to add some heartbeat
keep-alive functionality and to pull the configuration from an S3 bucket on startup. This config is under version
control and peer review, but infrequently changed and manually deployed, presenting an opportunity to demonstrate the
effect of a regex typo on production systems which is only visible under load.</p>
<h3>First Game Day</h3>
<p>As you might guess, it did not go as planned.</p>
<p>Phase One alerts manifested as zero active members on the ELB and a spike in errors for which there was no playbook. We
worked through the problem with some occasional coaching by me, checking Scalyr logs, examining the AutoScaler history,
finding the EC2 instance console logs, determining the instances were being gracefully terminated and using CloudTrail
to discover the source of the API calls. This took 56 minutes from the first alert to service restoration, with vital
gains for the team in experience and confidence.</p>
<p>After some reflection and a write-up of the post-mortem document, we triggered Phase Two by launching the more intensive
load test to expose the DynamoDB autoscaler configuration typo (replaced an ‘n’ with an ‘m’). This went more smoothly:
the proverbial ice had been broken and the team followed the playbook steps to triage the issue by manually scaling the
table before tracing the issue.</p>
<p>While our playbook for high DynamoDB latency and throttled requests did mention Dynamic DynamoDB autoscaling failure as
a possible cause, there was no specific playbook addressing it. Resorting to first principles, the logs revealed no
activity relating to the affected tables, which narrowed it to the config, and redeploying the autoscaler restored
service. This time it was 27 minutes from first alert to service restoration.</p>
<p>Even aside from value of discovering what did and did not work well from a technical perspective, the 51% improvement in
restoration time made it a worthwhile exercise. From our discussion afterwards, we were happy to have gained confidence,
established a baseline of experience, and to have calibrated our expectations of what handling an incident actually
involves.</p>
<p>Since this first iteration, we have run two more Game Days within our team, made them monthly recurring events, and we
are working with the other teams in our group to bootstrap, peer-review and improve the process, and build more
sophisticated tooling. Through this and other efforts, we are seeing improvements in the ratio of production services to
the people supporting them. Faster resolutions and collaborative learning have also emerged in our happily less frequent
post-mortems.</p>
<p><em>Want to know more about Zalando Tech, Dublin. Check out <a href="https://www.youtube.com/watch?v=Bg1Gt5nP4bo&t=2s">this
video</a>!</em></p>Data Analysis with Spark2018-03-01T00:00:00+01:002018-03-01T00:00:00+01:00Mohd Nadeem Akhtartag:engineering.zalando.com,2018-03-01:/posts/2018/03/data-analysis-spark.html<p>Apache’s lightning fast engine for data analysis and machine learning</p><h3><strong>Apache’s lightning fast engine for data analysis and machine learning</strong></h3>
<p>In recent years, there has been a massive shift in the industry towards data-oriented decision making backed by
enormously large data sets. This means that we can serve our customers with more relevant, personalized content.</p>
<p>We in the Digital Experience team are tasked with analysing Big Data in order to gather insights and support the product
team with the decision making process. This includes finding our customers’ top-rated articles. We can then organize
outfits related to those items and help customers make choices in the fashion store. Or we can leverage on similar
customer behaviour and suggest an article they might want in future.</p>
<h3>Problem</h3>
<p>As data is rapidly growing, we need a tool which can clean and train the data fast enough. With large datasets,
sometimes it take days to finish the job, which results in some very frustrated data analysts. Let’s have a look at some
of the problems:</p>
<ul>
<li>Latency while training the data</li>
<li>Less performance optimization</li>
</ul>
<h3>Why Spark is good for <a href="http://blog.cloudera.com/blog/2014/03/why-apache-spark-is-a-crossover-hit-for-data-scientists/">data science</a>?</h3>
<p>Focusing on organizing data and analysing it with the help of Spark, first we will try to understand how Spark behaves
“under the hood.”</p>
<ul>
<li>Simple API’s</li>
<li>Fault tolerance</li>
</ul>
<p>Fault tolerance made it possible to analyse large datasets without the fear of failure, such as instances where one node
out of 1,000 nodes failed and the whole operation needed to be performed again.</p>
<p>As personalization becomes an ever more important aspect of the Zalando customer journey, we need a tool that enables us
to serve the content in approximate real time. Hence, we decided to use Spark as it retains fault tolerance and
significantly reduces latency.</p>
<p><em>Note:</em> Spark keeps all data immutable and in-memory. It achieves this using ideas from functional programming such as
fault tolerance, which works by replaying functional transformation over original datasets.</p>
<p>For the sake of comparison, let’s recap the Hadoop way of working:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6df7c8b1bd1f26174a773f388fd68957b26f6faa_screen-shot-2018-02-26-at-12.10.53.png?auto=compress,format"></p>
<p>Hadoop saves intermediate states to disk and communicates over a network. If we consider the logistic regression of a ML
( <a href="http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/">machine
learning</a>) model,
then each iteration state is saved back to disk. The process is very slow.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3a5a3c3a07fb118a971a6bbf72ecd78a7d14e5f8_screen-shot-2018-02-26-at-12.12.28.png?auto=compress,format"></p>
<p>In the case of Spark, it works mostly in-memory and tries to minimize data transportation over a network, as seen below:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ea78c9f6167f2c7767724b69c4776a55dbcca1c2_screen-shot-2018-02-26-at-12.15.40.png?auto=compress,format"></p>
<p>Spark is powerful with operations like logistic regression where multiple iterations to train the data are required.</p>
<p><em>Note:</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/34d99d3efb5e467bedaa85ff3702e1b560d2b8c2_screen-shot-2018-02-26-at-12.25.12.png?auto=compress,format"></p>
<p>Spark laziness (on transformation) and eagerness (on action) is how Spark optimises network communication using the
programming model. Hence, Spark defines transformations and actions on Resilient Distributed Data (RDD) to support this.
Let’s take a look:</p>
<p><strong>Transformations:</strong> They are lazy. Their resultant RDD is not immediately computed. e.g map, flatMap.</p>
<p><strong>Actions:</strong> They are eager. Their result is immediately computed. e.g collect, take(10).</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/53665d6d2e70cd22891177b6d825fd4084a27957_screen-shot-2018-02-26-at-12.27.15.png?auto=compress,format"></p>
<p>The execution of filters is deferred until a “take” action is applied. What’s important here is that Spark is not
performing a filter on all logs. It will be executed when a “take” action is called and stops as soon as “10 Error log”
is fulfilled.</p>
<p>Long story short, we know that latency makes a big difference and wastes a lot of time for data analysts. In-memory
computation significantly lowers latency, and Spark is smart enough to optimize on the basics of action.</p>
<p>The figure below shows the hierarchy of Spark functioning. The Spark context is:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/015cffe325d1541fec127229da7f1fda1fc4d4cf_screen-shot-2018-02-26-at-12.30.44.png?auto=compress,format"></p>
<p>Spark is organized in a master/workers topology. In the context of Spark, the driver program is a master node whereas
the executor nodes are the workers. Each worker node runs the same task and returns the results to the master node. The
resource distribution is handled by a cluster manager.</p>
<p>A Spark programming model is a set of processes running on a cluster.</p>
<p>All these processes are coordinated by a driver program:</p>
<ul>
<li>Runs the code that created sparkContext, creates RDDs and sends off transformations and actions.</li>
</ul>
<p>The processes that run the computation and store data of your application are executors:</p>
<ul>
<li>Returns computed data to the driver.</li>
<li>Provides in memory storage for cached RDD’s.</li>
</ul>
<p>For Big Data processing, the most common form of data is key-value pairs. In fact, in a 2004 mapReduce research paper
the designer states that key-value pairs is a <a href="https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf">key
choice</a> in designing
mapReduce. Spark enables us to project down such complex data types to key-value pairs as Pair RDD.</p>
<p><em>Useful:</em> Pair RDD allows you to act on each key in parallel or regroup data across a network. Moreover, it provides
some additional methods such as “groupByKey(), reduceByKey(), join.”</p>
<p>The data is distributed over different nodes and with operations like groupByKey shuffling the data over a network.</p>
<p>We know reshuffling the data over a network is bad. But I’ll explain why the data is reshuffled shortly.</p>
<p>Let’s take an example:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ef9433d40f894342fdc842a5f0f4d929abd507b8_screen-shot-2018-02-26-at-12.34.26.png?auto=compress,format"></p>
<p><em>Goal:</em> Calculate how many articles and how much money is spent by each individual over the course of month.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/37a083231113ac9fbb9f7eed4884ee0d14643261_screen-shot-2018-02-26-at-12.35.31.png?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f40e2b55e9919aae71afc2d06ff0995e54a1d406_screen-shot-2018-02-26-at-12.36.00.png?auto=compress,format"></p>
<p>Here, we can see that groupByKey shuffles the data over a network. If it’s not absolutely required we don't send it. We
can perform reduceByKey instead of groupByKey and reduce the data flow over a network.</p>
<h3>Optimizing with Partitioners</h3>
<p>There are few different kinds of partitioner available:</p>
<ol>
<li>Hash partitioners</li>
<li>Range partitioners</li>
</ol>
<p>Partitioning can bring enormous performance gains, especially in the shuffling phase.</p>
<h3>Spark SQL for Structured data</h3>
<p>SQL is used for analytics but it's a pain to connect data processing pipelines like Spark or Hadoop to SQL database.
Spark SQL not only contains all the advance database optimisation, but also seamlessly intermixes SQL queries with
Scala.</p>
<p>Spark SQL is a component to the Spark stack. It has three main goals:</p>
<ul>
<li>High performance, achieved by using techniques from the database.</li>
<li>Supports relation data processing.</li>
<li>Supports new data sources like JSON.</li>
</ul>
<h3>Summary</h3>
<p>In this article, we covered how Spark can be optimized for data analysis and machine learning. We discussed how latency
becomes the bottleneck for large datasets, as well as the role of in-memory computation, which enables the data
scientist to perform real-time analysis.</p>
<p>The highlights of Spark functionality that make life easier:</p>
<ul>
<li>Spark SQL for structured data helps in executing queries either in-memory or persisted on disk.</li>
<li>Spark ML for classification of data with different models like logistic regression.</li>
<li>Spark RDD which is a Key-value pair helps in data exploration or analysis.</li>
<li>Spark pre-optimization with partitioned methodology with less network shuffle.</li>
</ul>
<p>We believe this will take personalization to a whole new level, thus improving the Zalando user journey.</p>
<p><em>Discuss Spark in more detail with <a href="https://twitter.com/mohdnadeem">Nadeem on Twitter</a>. Keep up with all Zalando Tech
job openings <a href="https://jobs.zalando.com/tech/jobs/">here</a>.</em></p>Zalando @ FOSDEM2018-02-22T00:00:00+01:002018-02-22T00:00:00+01:00Paul Adamstag:engineering.zalando.com,2018-02-22:/posts/2018/02/fosdem-not-average-conference.html<p>Why FOSDEM is not your average conference</p><h3><strong>Why FOSDEM is not your average conference</strong></h3>
<p>I could get cheeky with semantics and point out that the “M” in <a href="https://fosdem.org/2018/">FOSDEM</a> stands for “Meeting”.
But I’ll play nice and focus instead on the specifics of the event itself. FOSDEM has been running since 2001. In that
time, it has grown to become <em>the</em> open source community event for Europe. Over a two-day event, thousands of attendees
descend upon the ULB in Brussels to attend what is, in reality, a collection of conferences.</p>
<h3>A Conference of Conferences</h3>
<p>The “primary” FOSDEM conference is constructed from keynotes and “main tracks,” of which there are eight. So it’s
already pretty big. In addition, there are the “Dev Rooms”; independently curated mini-conferences on specific topics
ranging from geospatial, to retrocomputing, to virtualization. For the FOSDEM newcomer, the choice can be daunting.</p>
<p>Once upon a time, I did my part for open source by introducing the world to my <a href="http://baggerspion.net/2017/02/fosdem-2017/">personal FOSDEM survival
guide</a>. The “Law of Limited Participation” continues to be my personal
FOSDEM mantra: <strong>Don’t go to talks</strong>. If you insist on going to talks, especially the more popular talks, I highly
recommended actually attending the slot <em>before</em> your choice to ensure you get a seat. Otherwise, you might still be
queuing when the speaker you want to see has already started taking questions.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6168e5849361304dd695688768ae8ae96683510b_screen-shot-2018-02-22-at-11.46.29.png?auto=compress,format"></p>
<p>Speaking of talks you really wanted to see…</p>
<h3>Zalando Speaks at FOSDEM</h3>
<p>Zalando did not come here to play. At this year’s FOSDEM we presented three talks on a wide variety of topics:</p>
<ul>
<li>“ <a href="https://fosdem.org/2018/schedule/event/nakadi/">Nakadi Event Broker</a>,” Lionel Montrieux (Software Engineer).
<strong>Watch it <a href="https://video.fosdem.org/2018/H.2215/nakadi.webm">here</a>.</strong></li>
<li>“ <a href="https://fosdem.org/2018/schedule/event/blue_elephant_on_demand_postgres_kubernetes/">Blue elephant on-demand: Postgres +
Kubernetes</a>,” Jan Mußler
(Database as Service), Oleksii Kliukin (Database Engineer). <strong>Watch it
<a href="https://video.fosdem.org/2018/H.1302/blue_elephant_on_demand_postgres_kubernetes.webm">here</a>.</strong></li>
<li>“ <a href="https://fosdem.org/2018/schedule/event/documentjs_to_document_a_styleguide_and_source_code/">Automating style guide
documentation</a>,” Ferit
Topcu (Software Engineer). <strong>Watch it
<a href="http://bofh.nikhef.nl/events/FOSDEM/2018/UD2.119/documentjs_to_document_a_styleguide_and_source_code.webm">here</a>.</strong></li>
</ul>
<p>Every Zalando talk was “sold out”. We care about giving back to open source at Zalando and teaching others about the
work we do; our open source projects are an important aspect of that. It was great to see so much interest! (Thanks for
queuing for so long, folks!)</p>
<h3>Perfecting The Imperfect</h3>
<p>FOSDEM is not a conference. I’m not about to brand it “unique” but I certainly do not know any other event like it. As
it has grown (and it’s grown fast!) the organizers, the volunteers, the sponsors and, most importantly, the attendees,
have grown with it.</p>
<p>Attendance is free. The content is of a high quality. The attendees are as much part of the experience as the speakers.
The beer is cheap (the Club Mate is expensive). The fries have mayo. O’Reilly sells books. Debian sells t-shirts.
Everyone has stickers.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/08366806c916cc8fabf4326e8edae7130e60fe66_screen-shot-2018-02-22-at-11.55.23.png?auto=compress,format"></p>
<p>I was proud to see Zalando be a visible part of FOSDEM's crazy mix this year and, as we ramp up our voice around open
source in 2018, I’m looking forward to seeing what we can contribute to this event in 2019.</p>
<p><em>Come work with people like <a href="https://twitter.com/therealpadams">Paul</a>. Have a look at our <a href="https://jobs.zalando.com/tech/jobs/">jobs
page</a>.</em></p>
<p>FOSDEM video music by
<a href="https://www.google.com/url?q=https://www.bensound.com&sa=D&ust=1519298620629000&usg=AFQjCNH81gwgjZ1t_QAd9N8yiXSmgELwXg">https://www.bensound.com</a></p>Innovation in Digital Experience2018-02-20T00:00:00+01:002018-02-20T00:00:00+01:00Humberto Coronatag:engineering.zalando.com,2018-02-20:/posts/2018/02/innovation-digital-experience.html<p>Multi-functional teams make for a greater customer journey</p><h3><strong>Multi-functional teams make for a greater customer journey</strong></h3>
<p>When I started in Zalando Tech, I hadn’t worked with a product manager before, and I had probably never seen a UX
designer, a UI designer, a researcher or a business developer before either. My world was data science, more
specifically, personalization and recommender systems. In this isolated bubble, data scientists often thought we could
solve all problems without help, but in the last two years, I came to understand why we need to stop thinking that way.
What follows is my journey of building a product, following the principles of discovery, definition, design, and
delivery.</p>
<p>In 2016, just before our annual <a href="https://engineering.zalando.com/posts/2016/12/hack-week-5-is-live.html">Hack Week</a>
event, I found myself trying to find Christmas presents for my two sisters, but I had no idea about their fashion taste,
what their favourite colors were, or which brands they preferred. Long story short, with three days at the hack event
and a really small team (of two people), we hacked a product to solve that problem. This personalization product –as yet
under wraps– allows customers to discover and explore fashion content in a newly personalized way, putting a smooth and
intuitive customer experience first. It got really good first impressions and we received a <a href="https://engineering.zalando.com/posts/2016/12/the-finish-line--hack-week-5-awards-and-more.html">Hack Week
Award</a>.</p>
<p>I really wanted to bring this product to our customers, so I started talking with our <a href="https://corporate.zalando.com/en/innovation/innovation">Innovation
Lab</a> to be part of [<em>Slingshot,</em> our incubation program.
Together with Slingshot, we started building a multi-functional team, did all the discovery and definition work we
needed to, and pitched the idea to senior management who decided not only to sponsor our Slingshot round, but also gave
us the go ahead to start building the product! At that moment, it felt like a dream come true.</p>
<p>We built a small team of really great people with all the functions we thought we needed: a UX designer, UX researcher,
UI designer, mobile engineer, product manager, the
design](https://corporate.zalando.com/en/innovation/grassroots-tech-innovation)
<a href="https://engineering.zalando.com/posts/2018/01/how-to-talk-about-design.html">sprint</a> master, and me. We talked to at
least 15 different teams to learn about our customers, similar projects, and future collaborations. For the first time
ever, I got to design and deliver something without writing a single line of code, and most importantly, we put it
before the eyes of real customers in just four days.</p>
<p>During these two weeks, I learned that dreams don’t just happen; you work, really hard to achieve them. Being
responsible for a product for the first time was really challenging. Some days we felt we made no progress, or
discovered others had tried similar ideas and failed, but we followed the <a href="https://engineering.zalando.com/posts/2016/11/the-sprint-exposed--how-we-use-it-at-zalando.html">Design Sprint
Framework</a>, made
decisions, and kept moving forward. I attribute our success to having highly-diverse teams with a range of functions,
backgrounds and experiences. We had people from three different <a href="https://jobs.zalando.com/tech/locations/">Zalando
locations</a> (
<a href="https://engineering.zalando.com/posts/2018/01/faces-behind-fashion-mnist.html">Berlin</a>,
<a href="https://engineering.zalando.com/posts/2017/12/helsinki-100-employee.html">Helsinki</a>,
<a href="https://engineering.zalando.com/posts/2017/10/zalando-smart-product-platform.html">Dublin</a>) and six different nationalities.</p>
<p>After a successful Slingshot round, we created a multi-functional team in Helsinki to build this product, and my role
was product manager. Even with no previous experience, <em>not</em> taking this challenge didn’t cross my mind. In the team, we
followed the 4D principle to build this product. So, from a customer-centric perspective, we engaged in <em>discovery</em>,
<em>definition</em>, <em>design</em> and finally, <em>delivery</em>. Something that seems linear on paper is in actual fact, cyclical, so
while we worked <em>forwards</em> to build our first feature, we had cycles within the journey where we iterated on
discovery-definition-design for new features and the broader product vision.</p>
<p>During this process and in ongoing projects, I continue to learn about team dynamics, how to communicate with
stakeholders, how to work with people doing other roles in the organization beyond engineering, and many other matters.
Every job seems easier when you look at it from the outside. Having the opportunity to work in a diverse and
multi-functional team, I understand everyone’s role much better now. While still being a data scientist at heart, this
process taught me how to be a better one; by focusing on the right problems, finding collaborators, and always keeping a
customer-obsessed mindset in my work.</p>
<p>This year I was part of a team that won another Hack Week award. This time I was not the person leading the vision, but
I am proud to see more people in Zalando growing their ideas and getting the support to bring them to life. You will see
both products really soon; all part of Zalando’s <a href="https://corporate.zalando.com/en/innovation/research-zalando">digital
experience</a>.</p>
<p><em>Interested to hear more? Humberto will be giving a talk at our <a href="https://jobs.zalando.com/tech/locations/?gh_src=4n3gxh1#helsinki">Helsinki Tech
Hub</a> on Thursday March 1st at our Zalando Helsinki
Play Night on “Platform and Personalization in Product Management.” Grab one of the few remaining seats
<a href="https://www.meetup.com/Zalando-Tech-Event-Helsinki/events/247509995/">here</a>.</em></p>Five Minutes from Machine Learning to RESTful API2018-02-15T00:00:00+01:002018-02-15T00:00:00+01:00Elvin Valievtag:engineering.zalando.com,2018-02-15:/posts/2018/02/connexion-zalando-open-source.html<p>The benefits of Connexion: Zalando’s open source API-First framework</p><h3><strong>The benefits of Connexion: Zalando’s open source API-First framework</strong></h3>
<p>In this article, I will show how quick and simple it can be to create a RESTful API for a machine learning model using
Zalando’s open source Swagger/OpenAPI First framework called Connexion. <a href="https://connexion.readthedocs.io/en/latest/">Official
documentation</a> describes Connexion as the following: “Connexion is a
framework on top of <a href="http://flask.pocoo.org/">Flask</a> that automagically handles HTTP requests based on <a href="https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md">OpenAPI 2.0
Specification</a> (formerly known as Swagger
Spec) of your API described in <a href="https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md#format">YAML
format</a>.”</p>
<p>That means you define your API using swagger definition, and Connexion maps the endpoints to your Python functions.
Connexion guarantees that your API works as you defined it.</p>
<p>Connexion “automagically” takes care of the following tasks for you:</p>
<ul>
<li>Provides a web Swagger Console UI for the live documentation and plays around with the API’s endpoints through it</li>
<li>OAuth 2 token-based authentication</li>
<li>API versioning</li>
<li>Automatic serialization of payloads</li>
<li>Validates requests and endpoint parameters automatically, based on your specification</li>
</ul>
<p>This is a great way to get your machine learning model working on production with very little effort. In order to get
started using Connexion we will follow the next steps:</p>
<p><strong>1. Training the Classifier</strong></p>
<p>For the sake of example, let’s first create a simple Scikit learn classifier. We are going to train a very simple news
classifier using a “20 newsgroups” data set.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/05160cb8669931e919845f54036181c8193f0db3_article_1.png?auto=compress,format"></p>
<p>After training the model we will be pickling it to be used by our REST API.</p>
<p><strong>2. Creating the REST API</strong></p>
<p>Getting started with Connexion is very easy. A sample application with nice documentation can be found in this
repository: <a href="https://github.com/hjacobs/connexion-example">https://github.com/hjacobs/connexion-example</a></p>
<p><strong>2.1 Installing Connexion</strong></p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>connexion
</code></pre></div>
<p><strong>2.2 Defining our API</strong></p>
<div class="highlight"><pre><span></span><code><span class="nx">swagger</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="m m-Double">2.0</span><span class="err">'</span>
<span class="nx">info</span><span class="p">:</span>
<span class="w"> </span><span class="nx">title</span><span class="p">:</span><span class="w"> </span><span class="nx">News</span><span class="w"> </span><span class="nx">classifier</span>
<span class="w"> </span><span class="nx">version</span><span class="p">:</span><span class="w"> </span><span class="s">"0.1"</span>
<span class="nx">consumes</span><span class="p">:</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">application</span><span class="o">/</span><span class="nx">json</span>
<span class="nx">produces</span><span class="p">:</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">application</span><span class="o">/</span><span class="nx">json</span>
<span class="nx">paths</span><span class="p">:</span>
<span class="w"> </span><span class="o">/</span><span class="nx">predictions</span><span class="p">:</span>
<span class="w"> </span><span class="nx">post</span><span class="p">:</span>
<span class="w"> </span><span class="nx">operationId</span><span class="p">:</span><span class="w"> </span><span class="nx">app</span><span class="p">.</span><span class="nx">post_predictions</span>
<span class="w"> </span><span class="nx">summary</span><span class="p">:</span><span class="w"> </span><span class="nx">Predicts</span><span class="w"> </span><span class="nx">categories</span><span class="w"> </span><span class="nx">of</span><span class="w"> </span><span class="nx">the</span><span class="w"> </span><span class="nx">given</span><span class="w"> </span><span class="nx">news</span><span class="w"> </span><span class="nx">articles</span>
<span class="w"> </span><span class="nx">parameters</span><span class="p">:</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">name</span><span class="p">:</span><span class="w"> </span><span class="nx">query</span>
<span class="w"> </span><span class="k">in</span><span class="p">:</span><span class="w"> </span><span class="nx">body</span>
<span class="w"> </span><span class="nx">schema</span><span class="p">:</span>
<span class="w"> </span><span class="k">type</span><span class="p">:</span><span class="w"> </span><span class="nx">array</span>
<span class="w"> </span><span class="nx">items</span><span class="p">:</span>
<span class="w"> </span><span class="err">$</span><span class="nx">ref</span><span class="p">:</span><span class="w"> </span><span class="err">'#</span><span class="o">/</span><span class="nx">definitions</span><span class="o">/</span><span class="nx">Query</span><span class="err">'</span>
<span class="w"> </span><span class="nx">responses</span><span class="p">:</span>
<span class="w"> </span><span class="mi">200</span><span class="p">:</span>
<span class="w"> </span><span class="nx">description</span><span class="p">:</span><span class="w"> </span><span class="nx">Returns</span><span class="w"> </span><span class="nx">predicted</span><span class="w"> </span><span class="nx">categories</span>
<span class="w"> </span><span class="nx">schema</span><span class="p">:</span>
<span class="w"> </span><span class="k">type</span><span class="p">:</span><span class="w"> </span><span class="nx">array</span>
<span class="w"> </span><span class="nx">items</span><span class="p">:</span>
<span class="w"> </span><span class="err">$</span><span class="nx">ref</span><span class="p">:</span><span class="w"> </span><span class="err">'#</span><span class="o">/</span><span class="nx">definitions</span><span class="o">/</span><span class="nx">Prediction</span><span class="err">'</span>
<span class="nx">definitions</span><span class="p">:</span>
<span class="w"> </span><span class="nx">Prediction</span><span class="p">:</span>
<span class="w"> </span><span class="k">type</span><span class="p">:</span><span class="w"> </span><span class="nx">object</span>
<span class="w"> </span><span class="nx">required</span><span class="p">:</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">category</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">text</span>
<span class="w"> </span><span class="nx">properties</span><span class="p">:</span>
<span class="w"> </span><span class="nx">category</span><span class="p">:</span>
<span class="w"> </span><span class="k">type</span><span class="p">:</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">description</span><span class="p">:</span><span class="w"> </span><span class="nx">Predicted</span><span class="w"> </span><span class="nx">category</span><span class="w"> </span><span class="nx">of</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="nx">news</span><span class="w"> </span><span class="nx">article</span>
<span class="w"> </span><span class="nx">example</span><span class="p">:</span><span class="w"> </span><span class="s">"talk.politics.misc"</span>
<span class="w"> </span><span class="nx">readOnly</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="nx">text</span><span class="p">:</span>
<span class="w"> </span><span class="k">type</span><span class="p">:</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">description</span><span class="p">:</span><span class="w"> </span><span class="nx">query</span><span class="w"> </span><span class="nx">text</span>
<span class="w"> </span><span class="nx">example</span><span class="p">:</span><span class="w"> </span><span class="s">"Ronaldo scored 3 goals against Napoli"</span>
<span class="w"> </span><span class="nx">readOnly</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="nx">Query</span><span class="p">:</span>
<span class="w"> </span><span class="k">type</span><span class="p">:</span><span class="w"> </span><span class="nx">object</span>
<span class="w"> </span><span class="nx">required</span><span class="p">:</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">text</span>
<span class="w"> </span><span class="nx">properties</span><span class="p">:</span>
<span class="w"> </span><span class="nx">text</span><span class="p">:</span>
<span class="w"> </span><span class="k">type</span><span class="p">:</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">description</span><span class="p">:</span><span class="w"> </span><span class="nx">text</span><span class="w"> </span><span class="nx">to</span><span class="w"> </span><span class="nx">predict</span>
</code></pre></div>
<p>We have an endpoint named/predictions. It will receive list of Query models as an input and will produce a list of
Prediction models. We save this file as <em>‘swagger.yaml’</em>.</p>
<p><strong>2.3</strong> <strong>Import connexion and registering our Swagger file:</strong></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">connexion</span>
<span class="n">app</span> <span class="o">=</span> <span class="n">connexion</span><span class="o">.</span><span class="n">App</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span><span class="n">app</span><span class="o">.</span><span class="n">add_api</span><span class="p">(</span><span class="s1">'swagger.yaml'</span><span class="p">)</span>
</code></pre></div>
<p>2.3 Running the application behind a simple server using <a href="http://www.gevent.org/">gevent</a></p>
<div class="highlight"><pre><span></span><code>if __name__ == '__main__':
app.run(port=8080, server='gevent')
</code></pre></div>
<p><strong>2.5 Importing our classifier</strong></p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">sklearn.externals</span> <span class="kn">import</span> <span class="n">joblib</span>
<span class="n">classifier</span> <span class="o">=</span> <span class="n">joblib</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'./classifier/model.pkl'</span><span class="p">)</span>
</code></pre></div>
<p><strong>2.6 Defining our prediction endpoint</strong></p>
<div class="highlight"><pre><span></span><code><span class="n">def</span><span class="w"> </span><span class="n">post_predictions</span><span class="p">(</span><span class="n">query</span><span class="p">)</span><span class="err">:</span>
<span class="w"> </span><span class="n">predictions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">[]</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">item</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="nl">query</span><span class="p">:</span>
<span class="w"> </span><span class="nc">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">item</span><span class="o">[</span><span class="n">'text'</span><span class="o">]</span>
<span class="w"> </span><span class="n">category</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">classifier</span><span class="p">.</span><span class="n">predict</span><span class="p">(</span><span class="o">[</span><span class="n">text</span><span class="o">]</span><span class="p">)</span><span class="o">[</span><span class="n">0</span><span class="o">]</span>
<span class="w"> </span><span class="n">predictions</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="err">{</span><span class="ss">"category"</span><span class="err">:</span><span class="w"> </span><span class="n">category</span><span class="p">,</span><span class="w"> </span><span class="ss">"text"</span><span class="err">:</span><span class="w"> </span><span class="nc">text</span><span class="err">}</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">predictions</span>
</code></pre></div>
<p>As you can see here, method name corresponds to <em>“operationId”</em> property in our swagger definition. And that’s all.
After running our app using the command:</p>
<div class="highlight"><pre><span></span><code>./app.py
</code></pre></div>
<p>...we now can test our API using simple curl request:</p>
<div class="highlight"><pre><span></span><code><span class="err">➜</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="nx">curl</span><span class="w"> </span><span class="o">--</span><span class="nx">request</span><span class="w"> </span><span class="nx">POST</span><span class="w"> </span>\
<span class="o">--</span><span class="nx">url</span><span class="w"> </span><span class="nx">http</span><span class="p">:</span><span class="c1">//localhost:8080/predictions \</span>
<span class="o">--</span><span class="nx">header</span><span class="w"> </span><span class="err">'</span><span class="nx">content</span><span class="o">-</span><span class="k">type</span><span class="p">:</span><span class="w"> </span><span class="nx">application</span><span class="o">/</span><span class="nx">json</span><span class="err">'</span><span class="w"> </span>\
<span class="o">--</span><span class="nx">data</span><span class="w"> </span><span class="err">'</span><span class="p">[{</span><span class="s">"text"</span><span class="p">:</span><span class="w"> </span><span class="s">"Angela Merkel just walked into her fourth term as chancellor of Germany.Her party, the Christian Democrats (CDU), picked up 32.5 percent of the votes in Sunday's election, according to the first exit polls issued at 6 pm German local time."</span><span class="p">}]</span><span class="err">'</span>
<span class="p">[</span>
<span class="p">{</span>
<span class="s">"category"</span><span class="p">:</span><span class="w"> </span><span class="s">"talk.politics.misc"</span><span class="p">,</span>
<span class="s">"text"</span><span class="p">:</span><span class="w"> </span><span class="s">"Angela Merkel just walked into her fourth term as chancellor of Germany.Her party, the Christian Democrats (CDU), picked up 32.5 percent of the votes in Sunday's election, according to the first exit polls issued at 6 pm German local time."</span>
<span class="p">}</span>
<span class="p">]</span>
</code></pre></div>
<p>And there we are! With a few lines of code we created a running REST API for our machine learning model.</p>
<p>We can even further experiment with and explore other capabilities of Connexion.</p>
<p><strong>Using Swagger UI:</strong></p>
<p>If you open your browser and go to <a href="http://localhost:8080/ui/">http://localhost:8080/ui/</a> you will be able to see the Swagger UI.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1426ca50d12acf42f53a7146b7471435ea4f1525_article_2.png?auto=compress,format"></p>
<p>Here, you can play around with and test your API, send sample requests, test input validation, etc.</p>
<p><strong>Using a different Server Backend:</strong></p>
<p>By default, Connexion uses Flask server but we can also use <a href="http://www.tornadoweb.org/en/stable/">Tornado</a> or gevent as
a backend. To use Tornado:</p>
<div class="highlight"><pre><span></span><code>app.run(port=8080, server='tornado')
</code></pre></div>
<p><strong>OAuth 2 Authentication and Authorization:</strong></p>
<p>If we set TOKENINFO_URL environment variable or include ‘x-tokenInfoUrl’ in our swagger file, Connexion will secure our
endpoints. More information on this can be found on this
<a href="https://github.com/zalando/connexion/tree/master/examples/oauth2">repo</a>.</p>
<p><strong>Conclusion</strong></p>
<p>By writing the API spec first you can easily get your REST API up and running with a little effort using Connexion. And
additionally, allows features like, automatic request validation, Oauth 2 token based authentication, json
serialization (if your specification defines that an endpoint returns json) and API versioning.</p>
<p><em>This piece was originally published on
<a href="https://medium.com/@elvin.valiev/5-minutes-from-machine-learning-to-rest-api-e8c6e508a370">Medium</a>. Want to work with
people like Elvin? Check out our <a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1">jobs</a> page!</em></p>
<p><strong>Resources:</strong></p>
<p>Source code of the example : <a href="https://github.com/elvinx/connexion-example">https://github.com/elvinx/connexion-example</a>
Link to Connexion github repo : <a href="https://github.com/zalando/connexion">https://github.com/zalando/connexion</a>
Connexion official documentation: <a href="https://connexion.readthedocs.io/en/latest/">https://connexion.readthedocs.io/en/latest/</a></p>Cross-Lingual End-to-End Product Search with Deep Learning2018-02-08T00:00:00+01:002018-02-08T00:00:00+01:00Han Xiaotag:engineering.zalando.com,2018-02-08:/posts/2018/02/search-deep-neural-network.html<p>How We Built the Next Generation Product Search from Scratch using a Deep Neural Network</p><h3><strong>How We Built the Next Generation Product Search from Scratch using a Deep Neural Network</strong></h3>
<p>Product search is one of the key components in an online retail store. A good product search can understand a user’s
query in any language, retrieve as many relevant products as possible, and finally present the results as a list in
which the preferred products should be at the top, and the less relevant products should be at the bottom.</p>
<p>Unlike text retrieval (e.g. Google web search), products are structured data. A product is often described by a list of
key-value pairs, a set of pictures and some free text. In the developers’ world, <a href="http://lucene.apache.org/solr/">Apache
Solr</a> and <a href="https://www.elastic.co/">Elasticsearch</a> are known as de-facto solutions for
full-text search, making them a top contender for building e-commerce product searches.</p>
<p>At the core, Solr/Elasticsearch is a <em>symbolic information retrieval (IR) system</em>. Mapping queries and documents to a
common string space is crucial to the search quality. This mapping process is an NLP pipeline implemented with <a href="http://lucene.apache.org/core/6_2_1/core/org/apache/lucene/analysis/Analyzer.html">Lucene
Analyzer</a>. In this post, I will
reveal some drawbacks of such a symbolic-pipeline approach, and then present an end-to-end way of building a product
search system from query logs using Tensorflow. This deep learning based system is less prone to spelling errors,
leverages underlying semantics better, and scales out to multiple languages much easier.</p>
<h3><strong>Recap: Symbolic Approach for Product Search</strong></h3>
<p>Let’s first do a short review of the classic approach. Typically, an information retrieval system can be divided into
three tasks: <strong>indexing</strong>, <strong>parsing</strong> and <strong>matching</strong>. As an example, the next figure illustrates a simple product
search system:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/96c3241266029744b470f7914ef27e2e8e620cea_screen-shot-2018-01-25-at-18.12.13.png?auto=compress,format"></p>
<ol>
<li><strong>indexing</strong>: storing products in a database with attributes as keys, e.g. brand, color, category;</li>
<li><strong>parsing</strong>: extracting attribute terms from the input query, e.g. red shirt -> {"color": "red", "category":
"shirt"};</li>
<li><strong>matching</strong>: filtering the product database by attributes.</li>
</ol>
<p>Many existing solutions such as Apache Solr and Elasticsearch follow this simple idea.Note, at the core, they are
<em>symbolic IR systems</em> that rely on NLP pipelines for getting effective string representation of the query and product.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/92afea015ceac700157ede958ca88b1fe8c1330e_screen-shot-2018-01-25-at-18.14.10.png?auto=compress,format"></p>
<p>**Pain points of A Symbolic IR System
**</p>
<ol>
<li><strong>The NLP pipeline is fragile and doesn’t scale out to multiple languages
</strong>The NLP Pipeline in Solr/Elasticsearch is based on the Lucene Analyzer class. A simple analyzer such as
StandardAnalyzer would just split the sequence by whitespace and remove some stopwords. Quite often you have to extend
it by adding more and more functionalities, which eventually results in a pipeline as illustrated in the figure below.</li>
</ol>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/513b207dac2cea1a589525a83ad94acd2bc4a160_c37a7fd4.png?auto=compress,format"></p>
<p>While it looks legit, my experience is that such NLP pipelines suffer from the following drawbacks:</p>
<ul>
<li>The system is fragile. As the output of every component is the input of the next, a defect in the upstream component
can easily break down the whole system. For example, canyourtoken izer split thiscorrectly?</li>
<li>Dependencies between components can be complicated. A component can take from and output to multiple components,
forming a directed acyclic graph. Consequently, you may have to introduce some <em>asynchronous mechanisms</em> to reduce
the overall blocking time.</li>
<li>It is not straightforward to improve the overall search quality. An improvement in one or two components does not
necessarily improve the end-user search experience.</li>
<li>The system doesn’t scale out to multiple languages. To enable cross-lingual search, developers have to rewrite those
language-dependent components in the pipeline for every language, which increases the maintenance cost.</li>
</ul>
<p><strong>2. Symbolic Systems do not Understand Semantics without Hard Coding
</strong>A good IR system should understand trainer is sneaker by using some semantic knowledge. No one likes hard coding this
knowledge, especially you machine learning guys. Unfortunately, it is difficult for Solr/Elasticsearch to understand any
acronym/synonym unless you implement <a href="https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/synonym/SynonymFilter.html">SynonymFilter
class</a>,
which is basically a rule-based filter. This severely restricts the generalizability and scalability of the system, as
you need someone to maintain a hard-coded language-dependent lexicon. If one can represent query/product by a vector in
a space learned from actual data, then synonyms and acronyms could easily be found in the neighborhood without hard
coding.</p>
<p><strong>Neural IR System
</strong>The next figure illustrates a neural information retrieval framework, which looks pretty much the same as its symbolic
counterpart, except that the NLP pipeline is replaced by a deep neural network and the matching job is done in a learned
common space.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e0b97d5bf5eba677b9b1feae692dfd44f542af6d_dd4c4faa.png?auto=compress,format"></p>
<p><strong>End-to-End Model Training
</strong>There are several ways to train a neural IR system. One of the most straightforward (but not necessarily the most
effective) ways is <em>end-to-end</em> learning. Namely, your training data is a set of query-product pairs feeding on the
top-right and top-left blocks in the last figure. All the other blocks are learned from data. Depending on the
engineering requirements or resource limitations, one can also fix or pre-train some of the components.</p>
<p><strong>Where Do Query-Product Pairs Come From?
</strong>To train a neural IR system in an end-to-end manner, you need some associations between query and product such as the
query log. This log should contain what products a user interacted with after typing a query. Typically, you can fetch
this information from the query/event log of your system. After some work on segmenting, cleaning and aggregating, you
can get pretty accurate associations. In fact, any user-generated text can be good association data. This includes
comments, product reviews, and crowdsourcing annotations.</p>
<p><strong>Neural Network Architecture
</strong>The next figure illustrates the architecture of the neural network. The proposed architecture is composed of multiple
encoders, a metric layer, and a loss layer. First, input data is fed to the encoders which generate vector
representations. In the metric layer, we compute the similarity of a query vector with an image vector and an attribute
vector, respectively. Finally, in the loss layer, we compute the difference of similarities between positive and
negative pairs, which is used as the feedback to train encoders via backpropagation.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/85c1201d8af2795b4bce4a5656572f5fa33c4bf0_fe003c7f.png?auto=compress,format"></p>
<p><strong>Query Encoder
</strong>Here we need a model that takes in a sequence and outputs a vector. Besides the content of a sequence, the vector
representation should also encode language information and be resilient to misspellings. The character-RNN (e.g. LSTM,
GRU, SRU) model is a good choice. By feeding RNN character by character, the model becomes resilient to misspelling such
as adding/deleting/replacing characters. The misspelled queries would result in a similar vector representation as the
genuine one. Moreover, as European languages (e.g. German and English) share some Unicode characters, one can train
queries from different languages in one RNN model. To distinguish the words with the same spelling but different
meanings in two languages, such as German <em>rot</em> (color red) and English <em>rot</em>, one can prepend a special character to
indicate the language of the sequence, e.g. 🇩🇪 rot and 🇬🇧 rot.</p>
<p><strong>Image Encoder
</strong>The image encoder rests on purely visual information. The RGB image data of a product is fed into a multi-layer
convolutional neural network based on the ResNet architecture, resulting in an image vector representation in
128-dimensions.</p>
<p><strong>Attribute Encoder
</strong>The attributes of a product can be combined into a sparse one-hot encoded vector. It is then supplied to a four-layer,
fully connected deep neural network with steadily diminishing layer size. Activation was rendered nonlinear by standard
ReLUs, and drop-out is applied to address overfitting. The output yields attribute vector representation in 20
dimensions.</p>
<p><strong>Metric & Loss Layer
</strong>After a query-product pair goes through all three encoders, one can obtain a vector representation of the query, an
image representation and an attribute representation of the product. It is now the time to squeeze them into a common
latent space. In the metric layer, we need a similarity function which gives higher value to the positive pair than the
negative pair. To understand how a similarity function works, I strongly recommend you read my other blog post on
<a href="https://hanxiao.github.io/2017/11/08/Optimizing-Contrastive-Rank-Triplet-Loss-in-Tensorflow-for-Neural/">“Optimizing Contrastive/Rank/Triplet Loss in Tensorflow for Neural Information
Retrieval”</a>. It
also explains the metric and loss layer implementation in detail.</p>
<p><strong>Inference
</strong>For a neural IR system, doing inference means serving search requests from users. Since products are updated regularly
(say once a day), we can pre-compute the image representation and attribute representation for all products and store
them. During the inference time, we first represent user input as a vector using query encoder; then iterate over all
available products and compute the metric between the query vector and each of them; finally, sort the results.
Depending on the stock size, the metric computation part could take a while. Fortunately, this process can be easily
parallelized.</p>
<p><strong>Qualitative Results
</strong>Here, I demonstrated (cherry-picked) some results for different types of query. It seems that the system goes in the
right direction. It is exciting to see that the neural IR system is able to correctly interpret named-entity, spelling
errors and multilinguality without any NLP pipeline or hard-coded rule. However, one can also notice that some top
ranked products are not relevant to the query, which leaves quite some room for improvement.</p>
<p>Speed-wise, the inference time is about two seconds per query on a quad-core CPU for 300,000 products. One can further
improve the efficiency by using model compression techniques.</p>
<h3><strong>Query & Top-20 Results</strong></h3>
<p>🇩🇪 <strong>nike</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/244caedb67cdccda88236207c696a3aac5447ca4_eb31253e.png?auto=compress,format"></p>
<p><strong>🇩🇪 schwarz (black)</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2d725bf1d99ddca54b2f4438eb4db709344a8f5d_318f4a85.png?auto=compress,format"></p>
<p><strong>🇩🇪 nike schwarz</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8b1e8f3feb678910d07c4850d0ebc6e921673159_a7b2ccf7.png?auto=compress,format"></p>
<p><strong>🇩🇪 nike schwarz shirts</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b9ed139fbe931b51762e514ac9839118068602bd_755d08d2.png?auto=compress,format"></p>
<p><strong>🇩🇪 nike schwarz shirts langarm (long-sleeved)</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b89364f01d1c91f92a515d8f3c9f972a0ebf15e9_3a30e440.png?auto=compress,format"></p>
<p><strong>🇬🇧 addidsa (misspelled brand)</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/29785d583f1ce82687cc77cb77d275a665908bdb_b2fb3a2a.png?auto=compress,format"></p>
<p><strong>🇬🇧 addidsa trosers (misspelled brand and category)</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/89201eac46837318c32154a8ff799b1b98c9a57d_c876f756.png?auto=compress,format"></p>
<p><strong>🇬🇧 addidsa trosers blue shorrt (misspelled brand and category and property)</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/38cc53a3a3f7a1436aead598eaa9ae0d67ab2687_3b486353.png?auto=compress,format"></p>
<p><strong>🇬🇧 striped shirts woman</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/33ac8172970137b0fe62e36ebfc4bd505c766d75_559e2c5f.png?auto=compress,format"></p>
<p><strong>🇬🇧 striped shirts man</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7a6d96bcc738e64ca411f4f1a9f220b51ecf0222_3d7a53b7.png?auto=compress,format"></p>
<p><strong>🇩🇪 kleider (dress)</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/78b679b4f1609fa67783dce5469287d162f7eb8c_ba08c560.png?auto=compress,format"></p>
<p><strong>🇩🇪 🇬🇧 kleider flowers (mix-language)</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/842a44565a6fb40241a59a9c447f8f257265b5dc_0c60e120.png?auto=compress,format"></p>
<p>🇩🇪 🇬🇧 kleid ofshoulder (mix-language & misspelled off-shoulder)</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7768c7d975fe5d43ecdf845a460127b10912e139_9d217d91.png?auto=compress,format"></p>
<p><strong>Summary
</strong>If you are a search developer who is building a symbolic IR system with Solr/Elasticsearch/Lucene, this post should
make you aware of the drawbacks of such a system.</p>
<p>This post should also answer your What?, Why? and How? questions regarding a neural IR system. Compared to the symbolic
counterpart, the new system is more resilient to the input noise and requires little domain knowledge about the products
and languages. Nonetheless, one should not take it as a “Team Symbol” or “Team Neural” kind of choice. Both systems have
their own advantages and can complement each other pretty well. A better solution would be combining these two systems
in a way that we can enjoy all advantages from both sides.</p>
<p>Some implementation details and tricks are omitted here but can be found in my other posts. I strongly recommend readers
to continue with the following posts:</p>
<ul>
<li><a href="https://hanxiao.github.io/2017/11/08/Optimizing-Contrastive-Rank-Triplet-Loss-in-Tensorflow-for-Neural/">“Optimizing Contrastive/Rank/Triplet Loss in Tensorflow for Neural Information
Retrieval.”</a></li>
<li><a href="https://hanxiao.github.io/2017/08/16/Why-I-use-raw-rnn-Instead-of-dynamic-rnn-in-Tensorflow-So-Should-You-0/">“Why I Use raw_rnn Instead of dynamic_rnn in Tensorflow and So Should
You.”</a></li>
</ul>
<p>Last but not least, the open-source project <a href="https://github.com/faneshion/MatchZoo">MatchZoo</a> contains many
state-of-the-art neural IR algorithms. In addition to product search, one may find its application in conversational
chatbot and question-answer systems.</p>
<p><em>To work with great people like Han, have a look at our <a href="https://jobs.zalando.com/tech/jobs/">jobs page</a>.</em></p>Crushing AVRO Small Files with Spark2018-02-06T00:00:00+01:002018-02-06T00:00:00+01:00Ian Duffytag:engineering.zalando.com,2018-02-06:/posts/2018/02/solving-many-small-files-avro.html<p>Solving the many small files problem for AVRO</p><h3><strong>Solving the many small files problem for AVRO</strong></h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a4a45a32fb7066a4db6ec5bfa4416b546f05faff_screen-shot-2018-02-06-at-13.12.57.png?auto=compress,format"></p>
<p>The Fashion Content Platform teams in <a href="https://engineering.zalando.com/posts/2017/10/zalando-smart-product-platform.html">Zalando
Dublin</a> handle large amounts of data
on a daily basis. To make sense of it all, we utilise Hadoop (EMR) on AWS. Within this post, we discuss a system where
a real-time system feeds the data. Due to the variance in data volumes and the period that these systems write to
storage, there can be a large number of small files.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/99b434ce1b34c48c13f0220e6ae4e34c0e3ce632_big-files.jpg?auto=compress,format"></p>
<p>While Hadoop is capable processing large amounts of data it typically works best with a small number of large files, and
not with a large number of small files. A small file is one which is smaller than the Hadoop Distributed File System
(HDFS) block size (default 64MB). In MapReduce, every map task handles computation on a single input block. Having many
small files means that there will be a lot of map tasks, and each map task will handle small amounts of data. This
creates a larger memory overhead and slows down the job. Additionally, when using HDFS backed by AWS S3, listing objects
can take quite a long time and even longer when lots of objects exist. [3]</p>
<h3>Known solutions and why they don’t work for AVRO</h3>
<p>This is a well-known problem; there are many utilities and approaches for solving the issue:</p>
<ol>
<li><a href="https://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html">s3-dist-cp</a> - This is a utility created
by Amazon Web Services (AWS). It is an adaptation of Hadoop’s
<a href="https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html">DistCp</a> utility for HDFS that supports S3. This
utility enables you to solve the small file problem by aggregating files together using the --groupBy option and by
setting a maximum size using the --targetSize option.</li>
<li><a href="https://github.com/edwardcapriolo/filecrush">Filecrush</a> - This is a highly configurable tool designed for the sole
purpose of “crushing” small files together to solve the small file problem.</li>
<li><a href="https://avro.apache.org/docs/1.8.2/gettingstartedjava.html">Avro-Tools</a> - This supplies many different functions
for reading and manipulating AVRO files. One of these functions is “concat” which works perfectly for merging AVRO
files together. However, it’s designed to be used on a developer’s machine rather than on a large scale scheduled
job.</li>
</ol>
<p>While both these utilities exist, they do not work for our use case. The data produced by our system is stored as AVRO
files. These files contain a file header followed by one or more blocks of data. As such, a simple append will not work
and doing so results in corrupt data. Additionally, the Filecrush utility doesn’t support reading files from AWS S3.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c61c0bf768ab9cc3f655f0921ffc06a4342499db_screen-shot-2018-02-06-at-12.46.50.png?auto=compress,format"></p>
<p>We decided to roll out <a href="https://github.com/imduffy15/spark-avro-compactor">our own solution</a>. The idea was
straightforward: Use <a href="https://spark.apache.org/">Spark</a> to create a simple job to read the daily directory of the raw
AVRO data and re-partition the data using the following equation to determine the number of partitions needed to write
back the larger files:</p>
<p><em>number_of_partitions = input_size / (AVRO_COMPRESSION_RATIO * DEFAULT_HDFS_BLOCK_SIZE)</em></p>
<p>Our initial approach used <a href="https://github.com/databricks/spark-avro">spark-avro</a> by Databricks to read in the AVRO files
and write out the grouped output. However, on validation of the data, we noticed an
<a href="https://github.com/databricks/spark-avro/issues/92">issue</a>; the schema in the outputted data was completely mangled.
With no workaround to be found, we reached out to our resident big data guru <a href="https://www.linkedin.com/in/barronpeter/">Peter
Barron</a> who saved the day by introducing us to Spark’s <em>newAPIHadoopFile</em> and
<em>saveAsNewAPIHadoopFile</em> methods which allowed us to read and write GenericRecords of AVRO without modifying the
schema.</p>
<h3>Conclusion</h3>
<p>To put it in a nutshell, we were able to solve the many small files problem in AVRO by writing a Spark job leveraging
the low level functionalities of the <em>Hadoop fs</em> library. In effect, repartitioning files to be able to work on bigger
blocks of data will improve the speed of future jobs by decreasing the number of map tasks needed and reducing the cost
of storage.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/fa88c7d4f6139d697c032f83e849b82bb23a0f84_small-files.jpg?auto=compress,format"></p>
<p><em>We're looking for software engineers and other talents. For details, check out our
<a href="https://jobs.zalando.com/jobs/1000713-software-engineer-api-management/">jobs</a> page.</em></p>
<p><strong>References</strong></p>
<p>[1] Dealing with Small Files Problem in Hadoop Distributed File System, Sachin Bendea and Rajashree Shedge,
<a href="https://www.sciencedirect.com/science/article/pii/S1877050916002581">https://www.sciencedirect.com/science/article/pii/S1877050916002581</a>
[2] The Small Files Problem, Cloudera Engineering Blog,
<a href="https://blog.cloudera.com/blog/2009/02/the-small-files-problem/">https://blog.cloudera.com/blog/2009/02/the-small-files-problem/</a>
[3] Integrating Spark At Petabyte Scale, Netflix,
<a href="https://events.static.linuxfound.org/sites/events/files/slides/Netflix%20Integrating%20Spark%20at%20Petabyte%20Scale.pdf">https://events.static.linuxfound.org/sites/events/files/slides/Netflix%20Integrating%20Spark%20at%20Petabyte%20Scale.pdf</a></p>Rabbit in the Cloud2018-02-01T00:00:00+01:002018-02-01T00:00:00+01:00Waleed Madanattag:engineering.zalando.com,2018-02-01:/posts/2018/02/rabbit-in-the-cloud.html<p>How we deployed RabbitMQ on AWS</p><h3><strong>How we deployed RabbitMQ on AWS</strong></h3>
<p>In an effort to move away from our legacy monolithic service, we decided take on the challenge of building a new
communication platform based on a micro service architecture, which would be more focused and more easily manageable.
The challenge was exciting and big; we had to make crucial decisions early on, decisions that we would have to live with
for the foreseeable future. And one of the most important decisions was how to integrate our microservices once we
created them. One common, obvious choice would be REST APIs. This is usually great, except that in certain parts of the
platform we needed to speed things up because we were aiming for a fast platform; one that would suffer as little as
possible from the inherent latency and limits of possible HTTP connections.</p>
<p>Our new shiny platform would need to process several million messages every day, with the guarantee of no lost messages
as well as being able to account for them throughout their journey.</p>
<p>To achieve all that, we decided to use RabbitMQ, because it offered us the reliability and guarantee of message delivery
through transactions, flexible routing of messages based on routing keys as well as other conditions, support for high
availability, clustering and most of all the maturity of the project as a whole.</p>
<p>In this article, I will describe the challenges we encountered in bringing RabbitMQ to the cloud.</p>
<h3>A brief background of RabbitMQ</h3>
<p>RabbitMQ is middleware software that first launched in 2007 and serves to support cross-system integration via message
exchange. It implements the <a href="https://www.amqp.org/">Advanced Message Queueing Protocol (AMQP)</a>, which is different than
other standards such JMS in that it aims to standardize the wire protocol rather than the development API.</p>
<p>RabbitMQ allows for highly flexible means of defining exchanges and their bindings to queues with multiple routing
options. In RabbitMQ, you publish messages to exchanges and based on your configuration, messages are delivered to the
correct queues. This allows for many interesting and useful scenarios.</p>
<p>RabbitMQ is built with <a href="http://www.erlang.org/">Erlang</a>, which is a functional programing language created by Ericsson
back in 1986 to be a fault-tolerant, distributed, hot replace capable language that is supposed to run long-running and
possibly non-stop applications. There’s no need for Erlang knowledge apart from basic syntax for defining system
properties.</p>
<h3>How hard could it be to bring it to the cloud?</h3>
<p>Two words: “high availability”. With a system such as RabbitMQ serving to integrate your microservices, it is even more
important than ever.</p>
<p>RabbitMQ is capable of running in high availability mode, but it requires having well-known machine host names.
Unfortunately, that proves difficult since EC2 instance come and go all the time, always with a new IP address (unless
you reserve those IPs). In our case that was a limitation that we needed to work around.</p>
<p>We run our cluster on AWS which limits each account to five Elastic IP addresses per region, and in our case, those are
mostly used by other services. Hence, we opted to identify the cluster dynamically by building our sidekick service to
run alongside the RabbitMQ server.</p>
<p>That sidekick service interfaces with the AWS API and inquires about the available healthy instances running with the
same CloudFormation and same version, and then figures out the oldest instance to designate as the master node. Sidekick
will keep querying the cluster to detect if the current master is dead, in which case the current oldest node is
promoted and so on.</p>
<p>We use Pivotal’s (now deprecated) <a href="https://github.com/rabbitmq/rabbitmq-clusterer">rabbitmq-clusterer plugin</a>. Updating
the configuration file utilized by this plugin will force-update the cluster formation for any changes.</p>
<p>There also comes the problem of cluster upgrades. As mentioned earlier, RabbitMQ is built using Erlang and is only
backwards compatible on the minor version level, so if the major version has changed for RabbitMQ and/or Erlang, both
old and new clusters would not be compatible, and all hell breaks loose!</p>
<h3>Dealing with cluster upgrades</h3>
<p>So, we opted for a single solution to enable RabbitMQ’s federation plugin as soon as a new cluster shows up in our AWS
account. With this plugin, we’d instruct the old cluster to setup an upstream towards the new cluster. We’d then have
all our services renew their connection to bind to the new cluster forcing the messages to be transmitted from the old
to the new cluster. Once we were sure all messages were gone, we’d shut down the old cluster, not losing any messages.</p>
<p>Finally, all this runs within Docker containers on EC2 instances, and to have the sidekick and the RabbitMQ server run
alongside each other, we opted to use <a href="http://supervisord.org/">supervisord</a> with a configuration that looks something
like this:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/65f1f55b431a11679d2d6db3ecf6a5d57a169bd1_screen-shot-2018-01-11-at-11.48.50.png?auto=compress,format"></p>
<h3>Where do we go from here?</h3>
<p>Well, so far this has proved to be useful to us, and we would like to explore this even further. One thing for sure is
that we would get rid of the deprecated plugin and move to another more stable one where it would perform node
exploration automatically based on AutoScaling Groups on AWS.</p>
<p>In essence, we would like to reduce the duties assigned to the sidekick service to the bare minimum and also have it run
independently from the cluster so that we can update it without having to deploy the broker, while also maintaining a
state of the art RabbitMQ cluster in the cloud.</p>
<p>We would also like to have the persistence of queues outlive the existence of the RabbitMQ cluster itself in case of a
total failure of all instances, which is currently another shortcoming we have to worry about.</p>
<p>And finally, we would like to make the upgrade from one version to another as easy as just deploying the new cluster and
then removing the old one. This would help our team perform more seamless deployments and upgrades of the RabbitMQ
cluster.</p>
<h2>Final Words</h2>
<p>We are really satisfied with the features we get out of using RabbitMQ. We have invested enough time to make it work for
us in the cloud. We ended up learning a lot along the way, and we’d like to continue to use it while making it easier to
maintain.</p>
<p><em>Join our awesome tech family! All job opportunities can be seen <a href="https://jobs.zalando.com/tech/jobs/">here</a>!</em></p>Building a Better Tech Radar2018-01-25T00:00:00+01:002018-01-25T00:00:00+01:00Tim Lossentag:engineering.zalando.com,2018-01-25:/posts/2018/01/building-tech-radar.html<p>How Zalando helps its engineering teams navigate the tech landscape</p><h3>How Zalando helps its engineering teams navigate the tech landscape</h3>
<p>Zalando has more than 200 engineering teams, which regularly face tricky technology choices. To help them make good
decisions, we created the <a href="https://zalando.github.io/tech-radar/">Zalando Tech Radar</a> as a "navigation" tool. Inspired
by <a href="https://www.thoughtworks.com/radar">ThoughtWorks</a>, it assigns each technology to one of four rings — Adopt, Trial,
Assess and Hold — which represents the current consensus within Zalando. At the same time, the Tech Radar also serves as
platform to share experience around relevant technologies and as a public showcase of technology adoption within
Zalando.</p>
<h3>The problem</h3>
<p>How do you <strong>auto-generate</strong> a radar-like chart? At first, this seems trivial. Each technology (or "blip", in radar
speak) is assigned to exactly one quadrant and one ring, resulting in 16 ring segments in total. But arranging blips
inside each segment — in a way that looks "natural" — is surprisingly hard.</p>
<h3>Our old approach</h3>
<p>When we first published the Tech Radar in 2015, we researched how others had solved the layout problem, and found
<a href="https://github.com/bdargan/techradar">radar.js</a> from Brett Dargan. This script uses <a href="https://en.wikipedia.org/wiki/Polar_coordinate_system">polar
coordinates</a>, where each position is defined "by a distance from
a reference point and an angle from a reference direction”. Polar coordinates make it easy to express the boundaries of
ring segments in code. The script utilizes the popular <a href="https://d3js.org/">D3.js</a> library to render
<a href="https://en.wikipedia.org/wiki/Scalable_Vector_Graphics">SVG</a> — a vector graphics format that is supported by modern
browsers.</p>
<p>So far, so good, but we still need to arrange blips inside each segment. We tried some heuristics like evenly spacing
the blips, or alternating between two rows. But in the end, the layout either looked very rigid and artificial, had some
blips overlapping others, or both. We fixed this by auto-generating the layout anyway, and then hand-tweaking selected
coordinates.</p>
<h3>A new hope</h3>
<p>After fiddling with polar coordinates for two years, we decided that enough is enough. "This is ridiculous! There has to
be a better way!" And indeed, the <a href="https://github.com/d3/d3-force">d3-force</a> module provides "force-directed graph
layout", where the layout of the graph is determined automatically, by simulating physical forces (like gravitation,
cohesion or collision). Check out this <a href="https://bl.ocks.org/mbostock/95aa92e2f4e8345aaa55a4a94d41ce37">cool example</a>
from Mike Bostock, or this <a href="http://blog.theodo.fr/2015/03/introduction-to-d3js-force-layout/">step-by-step
introduction</a>.</p>
<p>So let's create a force-directed Radar layout! As it turns out, this works really great — except the challenge is now to
keep each blip inside its segment.</p>
<h3>Fencing with invisible blips</h3>
<p>Our first idea was to use two kinds of blips:</p>
<ol>
<li>visible ones (representing technologies), which would be pushed around by forces</li>
<li>invisible ones, which would be placed in fixed positions around the perimeter of each ring segment, and act like a
fence:</li>
</ol>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/237e128065613d0c7eac0390e41742c7de6f600b_fencing.jpg?auto=compress,format"></p>
<p>This approach mostly works except that sometimes, when crowded, blips push each other so hard that one of them slips
through the fence into the neighboring segment. Which completely ruins the visualization, of course, so we had to try
something else.</p>
<h3>The solution</h3>
<p>In the end, the best way to keep blips "inside the box", is exactly that: a bounding box, which is applied on each
simulation step. Actually, after some experimentation we settled on two bounding boxes for each segment: one in
Cartesian space (to keep blips inside their quadrant)...</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/548edf261441e52cf3d159b67e0a0e6ec71a5112_box_cartesian.jpg?auto=compress,format"></p>
<p>... and one in polar space (to keep blips inside their ring):</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e8f6161a9fddebfef558a60dffa6243e23745965_box_polar.jpg?auto=compress,format"></p>
<p>So, to recap, we start by placing blips randomly into each ring segment. When we run the simulation, a collision force
ensures that blips stay away from each other; and our bounding boxes make sure that no blip can escape. But, are we
finished?</p>
<p>Almost</p>
<h3>Repeat after me</h3>
<p>Although the Radar visualization is created on the fly (you can actually see the blips moving, if you watch closely), we
want the result to be identical if the page is reloaded. This is why we use a <a href="https://stackoverflow.com/questions/521295">custom random number
generator</a>; essentially, a function that returns a fixed sequence of
"random<em>ish</em>" numbers. This ensures that the simulation always produces the same result.</p>
<h3>Ta-da!</h3>
<p>And we are done: an auto-generated <a href="https://zalando.github.io/tech-radar/">Radar visualization</a> with "natural" spacing
and without any overlaps, even in crowded ring segments:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/75793f677142d5be65cd880d33eaee13ea5594c5_screen-shot-2018-01-25-at-11.16.16.png?auto=compress,format"></p>
<p>We've even <a href="https://github.com/zalando/tech-radar">released the code</a> on GitHub under MIT license. So feel free to go
ahead and create your own Radar!</p>
<p><em>Work with great minds like Tim. Check out our <a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1">jobs</a> page.</em></p>Simplicity by Distributing Complexity2018-01-23T00:00:00+01:002018-01-23T00:00:00+01:00Michal Michalskitag:engineering.zalando.com,2018-01-23:/posts/2018/01/simplicity-by-distributing-complexity.html<p>Building an aggregated view of data in the event-driven microservice architecture</p><h3><strong>Building an aggregated view of data in the event-driven microservice architecture</strong></h3>
<p>In the world of microservices, where a domain model gets decomposed into related, but independently handled entities, we
often face the challenge of building an aggregate view of the data that brings together different parts of that model.
While this can already be interesting with “traditional” designs, the move to event-driven architectures can magnify
these difficulties, especially with simplistic event streams.</p>
<p>In this post, I'll describe how we tackled this when building Zalando’s <a href="https://engineering.zalando.com/posts/2017/10/zalando-smart-product-platform.html">Smart Product
Platform</a> (SPP); how we initially got it wrong by
trying to solve all problems in one place, and how we fixed it by "distributing" pieces of that complexity. I'll show
how stepping back and taking a fresh look at the problem can lead to a much cleaner, simpler, more maintainable and less
error-prone solution.</p>
<h3>The Challenge</h3>
<p>The SPP is the IT backbone of Zalando’s business. It consists of many smaller components focused on the ultimate goal of
making articles sellable in Zalando’s online stores. One of the “earliest” stages in the pipeline is the Article
ingestion, built as a part of Smart Product Platform. That’s the part that I, together with my colleagues, am
responsible for.</p>
<p>Long story short, we built a system that allows for a very flexible Article data model based on a few “core” entities
representing the various pieces of data we need in the system. For simplicity, in this blog post, I’m going to limit the
core model to:</p>
<ul>
<li>Product - core entity representing products that Zalando sells, which can form a hierarchy itself (Product being a
child of another Product); all the entities need to be associated – either directly or indirectly – with a single
Product that’s the “root” (topmost entity) of the hierarchy,</li>
<li>Media - photos, videos, etc., associated with a Product,</li>
<li>Enrichments - additional pieces of data associated with either Product or Media entity.</li>
</ul>
<p>Sample hierarchy constructed from the building blocks described above may look like this:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f37a5ef01d580d415c1d9ae43e7bb66cd72990ba_hierarchy.jpg?auto=compress,format"></p>
<p>Since we <a href="https://engineering.zalando.com/posts/2017/11/why-event-driven.html">decided to adopt</a> a “ <a href="https://engineering.zalando.com/posts/2017/05/platform-engineering-and-third-generation-microservices-in-dublin.html">third generation
Microservices</a>”
architecture focusing on <a href="https://engineering.zalando.com/posts/2017/10/event-first-development---moving-towards-kafka-pipeline-applications.html">“event-first
development”</a>,
we ended up with a bunch of CRUD-like (i) microservices (service per entity) producing ordered streams of events
describing the current “state of the world” to their own Kafka topic. Clients would then build the Article (which is an
aggregate of Product, Media and Enrichment information) by making subsequent calls to all the services, starting from a
Product and then adding other pieces of data as required.</p>
<p>What’s important is that <strong>all the services were ensuring the correct ordering of operations</strong>. For instance, it’s not
possible to create a Media entity associated with a non-existent Product, or a “child” Product whose “parent” Product
wasn’t yet created (implementation detail: we make HEAD requests to the referenced entity’s service REST API to ensure
it).</p>
<p>Initially, we thought that this was enough; we would expose these streams to the consumers, and they would do The Right
Thing™, which is merging the Product, Media and Enrichment data into the aggregated Article view.</p>
<p><em>i) CRU, rather than CRUD - the “D” part was
<a href="https://www.simacan.com/2015/06/25/preserve-historical-data-scala-days/">intentionally</a> left out.</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4c56167a5d9e8ebe2695289575e94011c7ae384d_pre-gozer.jpg?auto=compress,format"></p>
<p>This approach, however, had some significant drawbacks; one being that consumers of our data needed the aggregated
Article view. They rarely cared only about the bits of information we exposed. This would mean that the non-trivial
logic responsible for combining different streams of data would have to be re-implemented by many teams across
Zalando.</p>
<p>We decided to bite the bullet and solve this problem for everyone.</p>
<h3>Gozer - the reason to cross the streams</h3>
<p>Gozer, as we called it, is an application whose only purpose was “merging” the inbound
<a href="http://ghostbusters.wikia.com/wiki/Cross_the_Streams">streams</a> of the data – initially only Products and Media – and
use them to build an aggregated Article view that’s exposed to the consumers who need it.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/25e9b4ad988a3bc5c54f1b4375fd5617cd8447f2_gozer-idea.jpg?auto=compress,format"></p>
<p>As simple as it sounds, that each service was publishing its stream and there were no ordering guarantees across
different streams was making Gozer’s implementation non-trivial. We knew that entities were created and published in the
correct order, but it didn’t guarantee the ordering of consumption. To account for that, whenever we consume an entity
that’s associated with another entity not yet received by Gozer (e.g. Media event for a Product), we fetch the missing
data using the service’s REST API.</p>
<p>Once we have the data, we have to sequence all the events correctly; make the “root” Product go first and the rest of
the hierarchy follow in the way that a node is followed by its children. We use Kafka, so it’s to make sure that for all
the entities in the hierarchy, the ID of the “root” Product is used as Partition Key. This will become important later.
To do it in a performant way, we need to keep track of the whole hierarchy and its “root” (and some other information,
but I’m going to ignore it for simplicity), which added more complexity and a significant performance penalty to the
processing.</p>
<p>Then sequenced entities are published in the correct order to an “intermediate” Kafka topic, so in the next step they
can be processed and merged.</p>
<p>This whole logic, extra service calls, local hierarchy tracking and event sequencing, added some complexity to the code,
but at the time we were happy with the outcome. We had reasonably simple REST APIs and a single place handling the merge
complexity. This looked reasonable and quite clean at the time.</p>
<p>Unfortunately, it didn’t stay like this for too long. Soon we added handling of the Enrichments inbound stream and some
other features. This added complexity to the sequencing and merging logic and resulted in even more dependencies on REST
APIs for fetching the missing data. Code was becoming more and more convoluted and processing was becoming slower.
Making changes to the Gozer codebase was becoming a pain.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0c5440872a357654b6ab636915186a294c590644_gozer.jpg?auto=compress,format"></p>
<p>To visualise this complexity, below you can see the interaction diagram that my colleague, <a href="https://engineering.zalando.com/posts/2017/11/why-event-driven.html">Conor
Clifford</a> drew, which describes the creation of a
2-level Product hierarchy with a single Media and a single Enrichment for that Media. Don’t worry if you can’t see the
details; it’s the number of interactions and services involved that matters, showing the scale of the problem:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b07a12e8fa3e128d6d3f6a80365a9d853db9ee18_screen-shot-2018-01-22-at-17.19.56.png?auto=compress,format"></p>
<p>Note that we’re dealing with millions of Products with tens of Media and Enrichments. What you see above is the
unrealistically simplified version. The real issue was much, much bigger.</p>
<p>But it wasn’t the end. As our data model grew, not only more inbound streams were about to arise, but we also started
discussing the need for adding outbound streams for other entities in the future.</p>
<p>At this time, a significant amount of Gozer’s code and processing power was dedicated to dealing with issues caused by
the lack of ordering across all the inbound streams, <strong>which was guaranteed at the entity creation time</strong> (remember the
HEAD checks I mentioned earlier?), but lost later. The fact that we were not maintaining this ordering all the way down
to Gozer because of having a stream per entity was causing us significant pain when we had to deal with out-of-order
events.</p>
<p>We realised that <strong>we were giving up a very important property of our system</strong> (ordering) because it was “convenient”
and looked “clean”, <strong>only to introduce a lot of complexity and put significant effort into reclaiming it back
later</strong>.</p>
<p>This was something that we needed to change.</p>
<h3>Vigo to the rescue</h3>
<p>Following the established <a href="http://ghostbusters.wikia.com/wiki/Vigo">naming convention</a>, we decided to rework Gozer by
creating Vigo; an application whose purpose was the same as Gozer’s, but the approach we took this time was
substantially different.</p>
<p>The main difference was that this time we wanted to ensure that the order of events received by Vigo was guaranteed to
be correct. This way Vigo wouldn’t be responsible for “merging streams” and fetching missing data as before. It would
consume entity-related events as they come and it’s only purpose would be to produce the “aggregate” event correctly.
This design would have two main benefits:</p>
<ul>
<li>Ordered events mean no “missing” data when events are delivered in an incorrect order, so the application
architecture (sequencing step, additional Kafka topic) and logic (handling of the “missing entity” case, fetching it
via REST API) are simplified,</li>
<li>No external calls are required to fetch missing entities; a massive performance gain.</li>
</ul>
<p>As much as we cared for performance, since we were about to add even more functionality to Gozer, the simplicity
argument was the main driving force to make the Vigo project happen.</p>
<p>We knew what we want Vigo to be, but we had to figure out how to get there; how to create that single, ordered stream of
all the entity-change events.</p>
<p>One could ask, “Why not just make all the services publish to one Kafka topic? This would ensure the ordering and it is
simple to do”:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d6f5fc171158ec28574e18a17f6f73b1e3c0cddc_simplistic-vigo.jpg?auto=compress,format"></p>
<p>Unfortunately, it’s not that simple in our case. I mentioned earlier that all the entities in our system build a
hierarchy and need to be processed in the context of a Product. More precisely, to know what partition key to use for an
entity, we need to know its “root” Product entity ID, which is the very top of the whole hierarchy. That’s where this
approach gets a bit tricky…</p>
<p>Let's consider a Product that has a Media associated with it. That Media has an Enrichment. This Enrichment only 'knows'
which Media it's defined for, but has no 'knowledge' on the Product (and its ID) that the Media is for. From the
Enrichment's perspective, to get the Product ID we need, we must either:</p>
<ul>
<li>Make the Enrichment service query the Media service for information about the Product that given Media is assigned
to (meaning that Media would be a “proxy” for Product API),</li>
<li>Make the Enrichment service “understand” this kind of relationship and make it query the Product service directly,
asking for a Product ID for a Media that the Enrichment is assigned to.</li>
</ul>
<p>Both of these solutions sound bad to us: they break encapsulation and leak the details of the “global” design all over
the system. Services would become responsible for things that, we believe, they shouldn’t be. They should only
“understand” the concept of the “parent entity” and they should only interact directly with the services that are
responsible for their parents.</p>
<p>This leads us to the third option; a bit more complex than the simplistic, ideal approach described above, but still
significantly cleaner than what we had before in Gozer.</p>
<p>This complexity wouldn’t completely disappear, of course. We still have to:</p>
<ul>
<li>enforce the ordering of events across different streams,</li>
<li>ensure entities are processed in the context of their parents.</li>
</ul>
<p>To achieve the above, our services needed to become a bit smarter. They would need to first <strong>consume the outbound
streams of the services responsible for entities that depend on them</strong> (e.g. Product stream would consume Media and
Enrichment streams) and then <strong>publish the received entities into a single, ordered Kafka topic</strong> (partitioned by
Product ID, because it’s the “root” entity) <strong><em>after</em> the entity they’re associated with</strong>.</p>
<p>This approach can be “decomposed” and presented in a more abstract form as a service X consuming N inbound streams
(containing “dependent” entities), and multiplexing the events received with the entities it’s responsible for (X) into
a single, ordered outbound topic.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9b0e5a3025a16e69fc42e67e0c506cae90ada81f_fragment.jpg?auto=compress,format"></p>
<p>This service’s outbound topic may then become an inbound topic for another service and so on, which means that these
small blocks can be composed into a more complex structure maintaining the ordering across all the entities, but still
allow them to process all the entities in a context of their parents.</p>
<p>Putting all the building blocks together, the final design looked like this:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/904fc08fa883617d535cc3c5e8e6d733be94f85d_sequencing.jpg?auto=compress,format"></p>
<p>with Product service’s outbound queue containing something like:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f0e527545b825d55795b3623a384c63a2f5fdc69_state-of-the-queue.jpg?auto=compress,format"></p>
<p>This queue contains all the entities in the correct order, so they can be consumed and processed by Vigo “as is”,
without considering the case of missing data and making any external calls.</p>
<p>While the “big picture” looks more complex right now it’s important to remember that a <strong>single engineer will rarely
(almost never) deal with all that complexity at once</strong> in their daily development work as it will usually be done within
the boundaries of a single service. Before Vigo, Gozer was a single place where all this complexity (and more!) was
accumulated, sitting and waiting for an unsuspecting engineer to come and get swallowed.</p>
<p>Also, do you remember the interaction diagram I showed you earlier? This is <strong>the same</strong> interaction <strong>after</strong> making
the discussed changes:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/88a5af5c60dda417dbe10e2d972401c82f6a5332_screen-shot-2018-01-22-at-17.51.25.png?auto=compress,format"></p>
<p>Again, don’t worry about the details - it’s about the number of interactions and services involved. The difference
should be apparent.</p>
<h3>Was it worth it?</h3>
<p>As I was hopefully able to show you, we removed a lot of complexity from the “aggregating” service, but it came at a
price. We had to add some complexity to other services in our Platform; this is how we coined the term “<strong>simplicity by
distributing complexity</strong>”. While we think there’s no “silver bullet” solution here, the benefits of the second design
(Vigo) make it superior to the original solution (Gozer) for at least few reasons:</p>
<ul>
<li>It’s easier to understand; fewer cases to handle, less special cases, less external dependencies make it easier to
reason about the service and create a mental model of it.</li>
<li>It’s easier to test - there’s less code, less possible scenarios to test overall, no REST services to take into
account.</li>
<li>It’s easier to reason about and debug - monitoring a couple of consumers with cross-dependencies and making external
calls is much more challenging than doing the same for a single one.</li>
<li>More extensible and composable - adding new data flows and streams becomes a much smaller undertaking.</li>
<li>It’s more resilient - again, no external REST services to call means that it’s less likely that a problem with other
services will stop us from aggregating the data that’s waiting to be aggregated.</li>
</ul>
<p>What’s worth noting is that the last point (resiliency) is true for the system as a whole as well. These REST calls
weren’t moved anywhere, they simply disappeared: they’re now handled by moving data through Kafka (which we already have
a hard dependency on).</p>
<p>**What we noticed is that while complexity grouped in a single place tends to “multiply” (ii), the similar amount of
complexity spread across many parts of the system is easier to handle and only “adds up”.</p>
<p>**This only applies for instances when complexity is spread by design, put where it “logically” belongs; not just
randomly (or even worse: accidentally) thrown all over the place!</p>
<p>“Distributing the complexity” is not a free lunch. In the same way that Microservice architecture distributes the
monolith’s complexity into smaller, self-contained services at the price of the general operational overhead, our
approach massively reduced the pain related to the complexity of a single service, and yet, resulted in adding a few
small moving pieces to a couple of other places in the system.</p>
<p>Overall: yes, we think it was worth it.</p>
<p><em>ii) Of course this mathematical / numerical interpretation assumes that complexity has to be greater than 1</em></p>
<p><em>Check out some of the amazing jobs with people like Michal in our <a href="https://jobs.zalando.com/tech/jobs/?gh_src=4n3gxh1&location=Dublin">Dublin Tech
Hub</a>!</em></p>Drawn Together2018-01-18T00:00:00+01:002018-01-18T00:00:00+01:00Adrian Dampctag:engineering.zalando.com,2018-01-18:/posts/2018/01/how-to-talk-about-design.html<p>How to talk about design in the agile world</p><h3><strong>How to talk about design in the agile world</strong></h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0b9c269492567dcc276106aa42f7078d8134530b_screen-shot-2018-01-12-at-13.29.30.png?auto=compress,format"></p>
<h3>How we improved design communication in the Retail Ops Team</h3>
<p>With an agile and lean approach, most of us here at Zalando changed the way we build digital products. Design processes
also evolved, with designers usually working alongside cross-functional product teams.</p>
<p>But, at first, one thing did not change too much: how we talk about the design.</p>
<p>Many organizations still believe that a designer is a salesman; that having the ability to convince others that their
ideas (products) are ‘good’ or ‘great’ is the key skill. Yes, it is crucial for digital agencies, but you need a
different approach when you are working in a cross-functional team. You must be a great <strong>Connector</strong>, focused on
teamwork.</p>
<p>So how do we do it?</p>
<h3>Convince others to participate in design</h3>
<p>Let’s face it: designers do not own the design process.</p>
<p>In agile/lean-oriented teams, if you try to keep the design to yourself, you will be challenged or ignored.</p>
<p>To prevent that, ask others to join you in the design process. **Instead of being the owner, be the host of the design
process.</p>
<p>**How does it work in our teams? All team members (developers, product specialists, customer care, etc.) can participate
in research, design workshops or propose ideas and changes. Designers, Product Owners, and Developers — everyone can
participate in the design process. But there is a catch; everything is prepared and driven by UX specialists.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/509875c7c6d4d696e45afa0716ef72a3773504b2_article-pic1.png?auto=compress,format"></p>
<p>This cross-functional approach will enrich design solutions with better business or tech input. What is more, it is much
easier for team members to accept the design outcome — it is their works as well.</p>
<p>Unfortunately, it’s not as easy as it sounds.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/61f4ec0f6143695b4c79681f0e703ef10eeea0a9_article-pic2.png?auto=compress,format"></p>
<p><em>Collaborative Design: Day 1</em></p>
<p>Managing the design process properly is hard. Managing it with non-designers is even harder.</p>
<p>People try to solve things their own way, get distracted by other activities, do not understand some of the design
methods, or do not believe in the efficiency of those methods.</p>
<p>How can you deal with it?</p>
<h3>Establish Design Rules</h3>
<p>First, define your design process. Document it in a clear way and share it with your teammates. Do the same with your
design toolbox — create a list of your methods with a clear description of every activity.</p>
<p>Ensure that they understand and accept the way you want to approach design. If there are any doubts — be open to
dialogue and finding a common ground.</p>
<h3>Make Clear Who Is Responsible for the Final Decision</h3>
<p>Be honest about it.</p>
<p>Do not pretend that your decisions are democratic if they are not. It will not save any problems and may lead to
frustration in the future.</p>
<p>Usually, the CEO or PO is responsible for a product and its potential failure so they make the final decision. Yes, it
might be disappointing, annoying, even unfair. But this can, in fact, improve your position so long as you've had the
opportunity to make yourself heard.</p>
<p>In a previous role of mine, we were dealing with a Product Owner who was changing design decisions without informing the
team. We accepted that — as a PO — he had the right to the final decision, but we still wanted some clarity and
transparency. We asked him for a confirmation email with all his changes, which in this instance worked! The PO
preferred to find a solution that was accepted by the whole team. We felt heard, our input was considered and decisions
were made with this in mind.</p>
<h3>Focus on Informal Communication</h3>
<p>Keep the number of scheduled meetings to a minimum. People usually have too many appointments on their calendars
already. Replace meetings with short, single topic, informal chats.</p>
<p>Do not keep design artifacts to yourself. Share them with the whole team as soon as possible, and allow them to check
them whenever they want to.</p>
<p>Be physically available for the rest of the team. Communication between people increases drastically if they are able to
make eye contact with you. Be open to having a discussion whenever teammates need it. Yes, it can be inconvenient at
times, but it will help you earn the team’s trust.</p>
<h3>Draw</h3>
<p>Try to make your communication visual. Drawing is a universal language, understood by people with different
backgrounds.</p>
<p>Designers are considered as visualization experts, which will help keep control over the design process.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/74d75ac8357fcf2e7f9b7201360558c1596fb4b5_article-pic3.jpg?auto=compress,format"></p>
<p><em>Wall behind my desk — we are trying to create visual artifacts for</em> every <em>design discussion</em></p>
<p>These changes are best demonstrated by yourself. Do not talk about design without having something to draw with in your
hand. When a team member suggests any idea, answer: “Okay, let’s draw it!” You can visualize every problem, process or
idea (including “non-visual” topics, like voice interfaces).</p>
<p>Finally, enrich your design toolbox by techniques that require team drawing (for example, a Design Studio Workshop or
Storyboarding).</p>
<h3>Separate People from Their Ideas</h3>
<p>It's a common scenario:</p>
<p>People in a team have different ideas about how to solve a problem. Everyone has a tendency to think that their ideas
are the best. They get attached to them. Things get emotional and people want to show that they are right.</p>
<p>When it happens, teams have a tendency to support ideas that:</p>
<ul>
<li>belong to a person with a higher role in the organization (HiPPO — highest paid person’s opinion),</li>
<li>are presented better.</li>
</ul>
<p>To avoid this situation, you need to separate people and opinions.</p>
<p>Start by drawing. When you move the idea from a person’s mouth to a sheet of paper, the solution becomes more
independent. People sometimes realize that their idea on paper is not so attractive anymore.</p>
<p>If you have a little more time, you can try a solution promoted in a book I recommend, <a href="https://www.thesprintbook.com/">The
Sprint</a>. The author proposes that everyone prepares their design proposal separately.
Afterwards, all ideas are presented and validated without revealing the author.</p>
<h3>Define Clear Discussion Goals</h3>
<p>Our design team started design critique sessions. At the beginning it was a disaster — every design was torn apart by
hundreds of comments, all of which were more or less accurate. It was not only unpleasant but it was also ineffective.
We generated long lists of errors without too many ideas on how to solve any of them.</p>
<p>We decided to put more structure in our design discussions. Here's what we do now: before a talk, we write down:</p>
<ul>
<li>What we want to achieve with this design,</li>
<li>What kind of feedback is expected / What should the outcome of the discussion be.</li>
</ul>
<p>Sounds simple, but at the time, it was a big change — our discussions became more structured, efficient, and now have a
clear outcome. It also helps to quickly drop all discussions that are not leading us to a desirable conclusion.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5e7cddf0a8c58413bf1431964979c1740d5e9f6b_screen-shot-2018-01-15-at-12.00.10.png?auto=compress,format"></p>
<p><em>Typical feedback session</em></p>
<h3>Do Not Criticize</h3>
<p>When we focus on generating solutions, we do not criticize at all. Others are then encouraged to share even the craziest
idea. It also makes the process much more efficient.</p>
<p>This is something we learned from a Google Team we collaborated with from Hamburg on the <a href="https://corporate.zalando.com/en/newsroom/en/press-releases/zalando-launches-gift-finder-assistant-app-google-assistant">Google Assistant Gift
Finder</a>
project. During a workshop, we focused on generating as many different ideas as possible and chose around ten that were
the most promising. They suggested we comment only on the things we liked and just ignore the rest.</p>
<p>We had a similar task a few weeks earlier, but we included criticisms. In the end, we had similar outcomes but it took
three times longer.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7637e73914b7bde66813955ea2e0b3147e378a80_article-pic5.png?auto=compress,format"></p>
<p>When you focus on the positive and negative sides of every idea, it often starts a long discussion. Yes, sometimes it
helps to rule out unrealistic or naive ideas, but with proper processes, those ideas will be ruled out anyway.</p>
<h3>Change Critique into Questions</h3>
<p>It is hard to ignore doubts when you are trying to choose the best option. Choices become clearer when we ask
questions.</p>
<p>Example: Instead of “This button should not be here,” we ask, “Why did we decide to place this button here?”</p>
<p>Sounds subtle but it changes a lot. It starts the discussion. The presenter must explain the reason behind decisions. If
we are not sure about an explanation we can go deeper with further question.</p>
<p>The goal is to reveal the whole process behind decisions. We unearth not only potential errors but also illustrate the
process that leads to choices. And if we know this, it is much easier to come up with proper solutions and fix
problems.</p>
<p>What is more, if you are preparing a design you need to be prepared to explain “why”. It makes our solutions more mature
and well thought out.</p>
<h3>Conclusion</h3>
<p>What did we learn from our experiments with communication in design? Our design meetings require less time and have a
clearer outcome. It is easier to communicate and reach an agreement with other designers, developers, and product
specialists.</p>
<p>The proper design communication also limits uncertainty. When the team believes that design decisions are right, work is
more efficient. This leads to providing a better product at the end; a win-win situation, you could say.</p>
<p><em>Be part of a dynamic design team. Have a look at our <a href="https://jobs.zalando.com/tech/jobs/ux/">job openings</a>.</em></p>The Faces Behind the Fashion-MNIST2018-01-11T00:00:00+01:002018-01-11T00:00:00+01:00Nana Yamazakitag:engineering.zalando.com,2018-01-11:/posts/2018/01/faces-behind-fashion-mnist.html<p>The Faces Behind the Fashion-MNIST</p><h3><strong>We talk to Han and Kashif from Zalando Research</strong></h3>
<p><em>Employer Branding Specialist Data Science, Nana Yamazaki catches up with the team using literal fashion icons in Deep
Learning.</em></p>
<p>Tell us about Fashion-MNIST. What did you want to accomplish?
<a href="https://github.com/zalandoresearch/fashion-mnist">Fashion-MNIST</a> is a freely available dataset of Zalando articles that
most importantly has the same format as the MNIST dataset. Just for context, the <a href="http://yann.lecun.com/exdb/mnist/">MNIST
dataset</a> from the late ‘90s consists of 60,000 28x28 grayscale handwritten single
digits (0 to 9) which has somehow become the "Hello World" dataset in Machine Learning, and especially in Deep Learning
(DL).</p>
<p>The MNIST dataset is somewhat "simple" with images looking like these emojis: 0️⃣, 3️⃣, 6️⃣, etc., which means modern DL
techniques these days are able to recognize unseen digits with accuracies of 99%! The size of the dataset, however,
means that researchers and practitioners of DL can quickly get started or prototype ideas with this dataset, even if it
does not reflect modern DL problems.</p>
<p>Thus, our goal for Fashion-MNIST was to have more "complicated" images such as, 👢, 👟, 👜, etc., but with the same size
convenience of MNIST so that the data more closely reflects the problems we deal with in Deep Learning.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/01936e633c87d48b1a4f4fca368003764a778a59_2017-12-07-han-xiao--kashif-rasul-0147.jpg?auto=compress,format"></p>
<p><em>The faces behind, Fashion-MNIST, Han Xiao and Kashif Rasul</em></p>
<p>**
How did you react to the amount of attention it got, and why do you think it blew up like that?
**We had most of the interaction with users on our GitHub repository, where we have around 2,780 stars as of today. We
requested people to run benchmarks on their particular method and send us the comparison with the MNIST dataset, which
generated a lot of interest as well. <a href="https://www.kaggle.com/datasets">Kaggle Datasets</a> is another place were the
community added this dataset; it has been the most popular dataset there for a few months now.</p>
<p>There are a number of reasons for its popularity. Scientific utility aside, we think the fashion images are somewhat
more playful and easily relatable to those starting out than more "serious" sounding datasets like national flags.</p>
<p>We and the DL community also wrote helper functions to import this dataset in all the most popular DL frameworks, which
reduced the barrier to entry for this dataset even further.</p>
<p><strong>Where do you see it impacting Zalando?
</strong>At Zalando we heavily use fashion image data for a number of products, and although this dataset is only a small
subset of the whole Zalando inventory, and is tiny and gray, it does help (at least we hope) in letting our developers
and scientists test out ideas quickly before tackling the whole Zalando images data at full color resolution.</p>
<p>The dataset has also helped in promoting Zalando, and it’s always nice to hear about our colleagues being approached at
conferences and meetups because they know of the dataset.</p>
<p><strong>Tell us a bit about working at Zalando Research
</strong>Zalando Research is a great place to work because of the team we have and the way we work collaboratively. We try to
achieve a balance between scientific or academic impact and business impact, where the work we do ends up being used in
a production setting at Zalando. The fact we spend our days experimenting, learning, teaching, failing, and
investigating makes it a great place to grow as an individual as well as a team, which is wonderful.</p>
<p>Want to work with an exciting team like this? You
<a href="https://jobs.zalando.com/jobs/690125-research-scientist-machine-learning-ai/">can!</a></p>Why We Do Scala in Zalando2018-01-09T00:00:00+01:002018-01-09T00:00:00+01:00Javier Arrietatag:engineering.zalando.com,2018-01-09:/posts/2018/01/why-we-do-scala.html<p>Leveraging the full power of a functional programming language</p><h3><strong>Leveraging the full power of a functional programming language</strong></h3>
<p>In Zalando Dublin, you will find that most engineering teams are writing their applications using Scala. We will try to
explain why that is the case and the reasons we love Scala.</p>
<p>This content is coming both from my own experience and the team I'm working with in building the new Zalando Customer
Data Platform.</p>
<h3>How I came to use Scala</h3>
<p>I have been working with JVM for the last 18 years. I find there is a lot of good work making the Java Virtual Machine
very efficient and very fast, utilizing the underlying infrastructure well.</p>
<p>I feel comfortable debugging complex issues, such identifying those caused by garbage collection, and improving our code
to alleviate the pauses (see <a href="https://mechanical-sympathy.blogspot.ie/2013/07/java-garbage-collection-distilled.html">Martin Thompson’s blog
post</a> or <a href="https://shipilev.net/jvm-anatomy-park/">Aleksey Shipilёv’s JVM
Anatomy Park</a>).</p>
<p>I liked Java. I didn’t mind the boilerplate code too much if it didn’t get in the way of expressing the intent of the
code. However, what bugged me was the amount of code required to encourage immutability and not having lambdas to
transform collections.</p>
<p>At the end of 2012, I had to design a service whose only mission was to back up files from customer mobile devices
(think a cloud backup service). It was a simple enough service, accepting bytes from the customer device (using a REST
API) and writing them to disk. We were using the Servlet API, and the system was working well. However, as the devices
were mobile phones and the upload bandwidth wasn’t very high, the machines were mostly idle, waiting for the buffers to
fill up. Unfortunately, we couldn't scale up. When the system reached a few hundred workers, it would start to quickly
degrade due to excessive context switching.</p>
<p>We were using <a href="https://netty.io/">Netty</a> in other components, but the programming model of callbacks wasn’t something I
wanted to introduce, as it becomes very complex very quickly to compose the callbacks.</p>
<h3>Introducing Scala</h3>
<p>I had been looking at Scala for some years and started to look into async HTTP frameworks. I liked
<a href="http://spray.io/">spray</a> because it allowed us to use it at a high level or low level depending on our requirements. It
also wasn’t a framework that forced us into adapting everything to it. I created a quick proof of concept and was amazed
at the conciseness of the code and how efficient it was (spray is optimised for throughput, not latency), being able to
handle thousands of concurrent uploads with a single core. Previously we were bound by CPU because of all the context
switching, but with spray, we managed to overcome this and become limited mostly by IO.</p>
<p>From that point on I decided I wanted to learn Scala and Functional Programming. I finished the <a href="https://www.coursera.org/specializations/scala">Coursera Functional
Programming in Scala</a> course, started writing my side projects in Scala
and tried to find a position in a company that worked with Scala.</p>
<p>I evaluated other FP languages like Clojure, but I like strongly typed languages as my experience is that the systems
written with them are easier to maintain in the long term. I also looked at Haskell, but I felt more confident with a
JVM compiled language that could use all the existing Java libraries.</p>
<p>The first thing I fell in love with was Monad composition to define the program (or subprogram) as a series of stages
composed in a for-comprehension. It is a very convenient way to model asynchronous computations using Future as a Monad
(I know that Future is not strictly a Monad, but for our code point of view we can assume it is, see
<a href="https://stackoverflow.com/questions/27454798/is-future-in-scala-a-monad">https://stackoverflow.com/questions/27454798/is-future-in-scala-a-monad</a>)</p>
<p>We will see some examples below; including a snippet here:</p>
<div class="highlight"><pre><span></span><code><span class="n">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">customer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">findCustomer</span><span class="p">(</span><span class="n">address</span><span class="p">)</span>
<span class="w"> </span><span class="n">segment</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">getCustomerSegment</span><span class="p">(</span><span class="n">customer.id</span><span class="p">)</span>
<span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">promotionEmail</span><span class="p">(</span><span class="n">customer</span><span class="p">,</span><span class="w"> </span><span class="n">segment</span><span class="p">)</span>
<span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">sendEmail</span><span class="p">(</span><span class="n">address</span><span class="p">,</span><span class="w"> </span><span class="n">email</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">yield</span><span class="w"> </span><span class="n">result</span>
</code></pre></div>
<p>It becomes natural and straightforward to define the different stages of computation as a pipeline in a
for-comprehension, layering your program in different components, each responsible for their steps inside a
for-comprehension.</p>
<p>(We could also run them in parallel using an Applicative instead of a Monad)</p>
<h3>Now in Zalando</h3>
<p>Zalando is a big company, currently with over 1,900 engineers working here. As we have mentioned in previous blog posts,
we are empowered to use the technologies we choose to build our systems, so the teams pick the language, libraries,
components and tools. As you can see on our public <a href="https://zalando.github.io/tech-radar/">Tech Radar</a>, Scala is one of
our core languages, with several Scala libraries like AKKA and Play!.</p>
<p>So I cannot say how Zalando teams are working in Scala. Some teams are deep into the Functional Programming side while
others are using the language mostly as a “better Java”; adopting lambdas, case classes and pattern matching to make
the code more concise and understandable.</p>
<p>But I can talk about how people are using the language in the Dublin office where the services and data pipelines are
written mostly in Scala: How our team that is developing the Customer Data Platform is using Scala, what libraries we
are using and what we like about Scala.</p>
<h3>Things we love about Scala</h3>
<h3>Types</h3>
<p>We love types. Types help us understand what we are dealing with. String or Int can often be meaningless; we don’t want
to mix a Customer Password with a Customer Name, Email Address, etc. We want to know what a given value is.</p>
<p>For this we are currently using two different approaches:</p>
<ul>
<li>Tagged types: Using shapeless @@ we decorate the primitive type with the tag we want to attach.</li>
<li>Value classes: Using a single attribute case class that extends AnyVal, so that the compiler tries to remove the
boxing/unboxing whenever it can. This is useful when we want to override toString for example we may want to redact
sensitive customer data when it goes into logs.</li>
</ul>
<p>Here you have a simple example of both (full code
<a href="https://scastie.scala-lang.org/javierarrieta/bb7Y1OouRbaus3K01oagZw/2">here</a> ):</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">java.util.UUID</span>
<span class="kn">import</span> <span class="nn">shapeless.tag</span><span class="o">,</span> <span class="nn">tag.</span><span class="o">@@</span>
<span class="kn">import</span> <span class="nn">cats.data.Validated</span><span class="o">,</span> <span class="nn">Validated._</span>
<span class="nb">object</span> <span class="n">model</span> <span class="p">{</span>
<span class="n">final</span> <span class="n">case</span> <span class="k">class</span> <span class="nc">Password</span> <span class="n">private</span><span class="p">(</span><span class="n">value</span><span class="p">:</span> <span class="n">String</span><span class="p">)</span> <span class="n">extends</span> <span class="n">AnyVal</span> <span class="p">{</span>
<span class="n">override</span> <span class="k">def</span> <span class="nf">toString</span> <span class="o">=</span> <span class="s2">"***"</span>
<span class="p">}</span>
<span class="nb">object</span> <span class="n">Password</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">create</span><span class="p">(</span><span class="n">s</span><span class="p">:</span> <span class="n">String</span><span class="p">):</span> <span class="n">Validated</span><span class="p">[</span><span class="n">String</span><span class="p">,</span> <span class="n">Password</span><span class="p">]</span> <span class="o">=</span> <span class="n">s</span> <span class="n">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="n">candidate</span> <span class="k">if</span> <span class="n">candidate</span><span class="o">.</span><span class="n">isEmpty</span> <span class="o">||</span> <span class="n">candidate</span><span class="o">.</span><span class="n">length</span> <span class="o"><</span> <span class="mi">8</span> <span class="o">=></span> <span class="n">Invalid</span><span class="p">(</span><span class="s2">"Minimum password length has to be 8"</span><span class="p">)</span>
<span class="k">case</span> <span class="n">valid</span> <span class="o">=></span> <span class="n">Valid</span><span class="p">(</span><span class="n">Password</span><span class="p">(</span><span class="n">valid</span><span class="p">))</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">sealed</span> <span class="n">trait</span> <span class="n">UserIdTag</span>
<span class="nb">type</span> <span class="n">UserId</span> <span class="o">=</span> <span class="n">UUID</span> <span class="o">@@</span> <span class="n">UserIdTag</span>
<span class="nb">object</span> <span class="n">UserId</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">(</span><span class="n">v</span><span class="p">:</span> <span class="n">UUID</span><span class="p">)</span> <span class="o">=</span> <span class="n">tag</span><span class="p">[</span><span class="n">UserIdTag</span><span class="p">](</span><span class="n">v</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">sealed</span> <span class="n">trait</span> <span class="n">EmailAddressTag</span>
<span class="nb">type</span> <span class="n">EmailAddress</span> <span class="o">=</span> <span class="n">String</span> <span class="o">@@</span> <span class="n">EmailAddressTag</span>
<span class="nb">object</span> <span class="n">EmailAddress</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">(</span><span class="n">s</span><span class="p">:</span> <span class="n">String</span><span class="p">):</span> <span class="n">Validated</span><span class="p">[</span><span class="n">String</span><span class="p">,</span> <span class="n">EmailAddress</span><span class="p">]</span> <span class="o">=</span> <span class="n">s</span> <span class="n">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="n">invalid</span> <span class="k">if</span> <span class="err">!</span><span class="n">invalid</span><span class="o">.</span><span class="n">contains</span><span class="p">(</span><span class="s2">"@"</span><span class="p">)</span> <span class="o">=></span> <span class="n">Invalid</span><span class="p">(</span><span class="n">s</span><span class="s2">"$invalid is not a valid email address"</span><span class="p">)</span>
<span class="k">case</span> <span class="n">valid</span> <span class="o">=></span> <span class="n">val</span> <span class="n">tagged</span> <span class="o">=</span> <span class="n">tag</span><span class="p">[</span><span class="n">EmailAddressTag</span><span class="p">](</span><span class="n">valid</span><span class="p">);</span> <span class="n">Valid</span><span class="p">(</span><span class="n">tagged</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h3>Function Composition</h3>
<p>Monads/applicatives</p>
<p>One of my favourite features of Scala is how easy and elegant it is to compose functions to create more complex ones.
The most common way of doing this is using Monads inside a for-comprehension.</p>
<p>This way we can run several operations sequentially and obtain a result. As soon as any of the operations fail, the
comprehension will exit with that failure.</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span><span class="w"> </span><span class="n">promotionEmail</span><span class="p">(</span><span class="nl">customer</span><span class="p">:</span><span class="w"> </span><span class="n">Customer</span><span class="p">,</span><span class="w"> </span><span class="nl">segment</span><span class="p">:</span><span class="w"> </span><span class="n">CustomerSegment</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Email</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="vm">???</span>
<span class="n">def</span><span class="w"> </span><span class="n">sendEmail</span><span class="p">(</span><span class="nl">address</span><span class="p">:</span><span class="w"> </span><span class="n">EmailAddress</span><span class="p">,</span><span class="w"> </span><span class="nl">message</span><span class="p">:</span><span class="w"> </span><span class="n">Email</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Future</span><span class="o">[</span><span class="n">Unit</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="vm">???</span>
<span class="n">def</span><span class="w"> </span><span class="n">findCustomer</span><span class="p">(</span><span class="nl">address</span><span class="p">:</span><span class="w"> </span><span class="n">EmailAddress</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Future</span><span class="o">[</span><span class="n">Customer</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="vm">???</span>
<span class="n">def</span><span class="w"> </span><span class="n">getCustomerSegment</span><span class="p">(</span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">CustomerId</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Future</span><span class="o">[</span><span class="n">CustomerSegment</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="vm">???</span>
<span class="n">def</span><span class="w"> </span><span class="n">sendPromotionalEmail</span><span class="p">(</span><span class="nl">address</span><span class="p">:</span><span class="w"> </span><span class="n">EmailAddress</span><span class="p">)(</span><span class="n">implicit</span><span class="w"> </span><span class="nl">ec</span><span class="p">:</span><span class="w"> </span><span class="n">ExecutionContext</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Future</span><span class="o">[</span><span class="n">Unit</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">customer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">findCustomer</span><span class="p">(</span><span class="n">address</span><span class="p">)</span>
<span class="w"> </span><span class="n">segment</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">getCustomerSegment</span><span class="p">(</span><span class="n">customer</span><span class="p">.</span><span class="n">id</span><span class="p">)</span>
<span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">promotionEmail</span><span class="p">(</span><span class="n">customer</span><span class="p">,</span><span class="w"> </span><span class="n">segment</span><span class="p">)</span>
<span class="w"> </span><span class="k">result</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sendEmail</span><span class="p">(</span><span class="n">address</span><span class="p">,</span><span class="w"> </span><span class="n">email</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span><span class="w"> </span><span class="n">yield</span><span class="w"> </span><span class="k">result</span>
<span class="err">}</span>
</code></pre></div>
<p>For a full example see <a href="https://scastie.scala-lang.org/javierarrieta/fKFi3ENAQwOEaoZEf3h6FQ/1">here</a>.</p>
<p>If what you want to do is evaluate several functions in parallel and collect all the errors or the successful results,
you can use an Applicative Functor. This is very common when doing validations of a complex entity, where we can present
all the detected errors in one go to the client.</p>
<div class="highlight"><pre><span></span><code><span class="n">type</span><span class="w"> </span><span class="n">ValidatedNel</span><span class="o">[</span><span class="n">A</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Validated</span><span class="o">[</span><span class="n">NonEmptyList[String</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">A</span><span class="err">]</span>
<span class="n">final</span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">Customer</span><span class="p">(</span><span class="nl">name</span><span class="p">:</span><span class="w"> </span><span class="n">Name</span><span class="p">,</span><span class="w"> </span><span class="nl">email</span><span class="p">:</span><span class="w"> </span><span class="n">EmailAddress</span><span class="p">,</span><span class="w"> </span><span class="nl">password</span><span class="p">:</span><span class="w"> </span><span class="n">Password</span><span class="p">)</span>
<span class="w"> </span><span class="k">object</span><span class="w"> </span><span class="n">Customer</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="nl">name</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="nl">email</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="nl">password</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">ValidatedNel</span><span class="o">[</span><span class="n">Customer</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">Apply</span><span class="o">[</span><span class="n">ValidatedNel</span><span class="o">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">map3</span><span class="p">(</span><span class="n">Name</span><span class="p">(</span><span class="n">name</span><span class="p">),</span><span class="w"> </span><span class="n">EmailAddress</span><span class="p">(</span><span class="n">email</span><span class="p">),</span><span class="w"> </span><span class="n">Password</span><span class="p">(</span><span class="n">password</span><span class="p">))(</span><span class="n">Customer</span><span class="p">.</span><span class="n">apply</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="err">}</span>
</code></pre></div>
<p>For a full example see <a href="https://scastie.scala-lang.org/javierarrieta/bcVl0ve2TbqmzaU7SuqowA/1">here</a>.</p>
<p>Another combination is a simple function composition using compose or andThen, or if the arguments don’t match
completely, using anonymous functions to combine them.</p>
<div class="highlight"><pre><span></span><code>final<span class="w"> </span>case<span class="w"> </span>class<span class="w"> </span>Customer(id:<span class="w"> </span>CustomerId,<span class="w"> </span>address:<span class="w"> </span>EmailAddress,<span class="w"> </span>name:<span class="w"> </span>Name)
val<span class="w"> </span>findCustomer:<span class="w"> </span>EmailAddress<span class="w"> </span>=><span class="w"> </span>Customer
val<span class="w"> </span>sendEmail:<span class="w"> </span>Customer<span class="w"> </span>=><span class="w"> </span>Either[String,<span class="w"> </span>Unit]
val<span class="w"> </span>sendCustomerEmail:<span class="w"> </span>EmailAddress<span class="w"> </span>=><span class="w"> </span>Either[String,<span class="w"> </span>Unit]<span class="w"> </span>=<span class="w"> </span>findCustomer<span class="w"> </span>andThen<span class="w"> </span>sendEmail
</code></pre></div>
<p>For a full example see <a href="https://scastie.scala-lang.org/javierarrieta/fTwwNwDnTjK7AyoobWFNVA">here</a>.</p>
<h3>Referential Transparency</h3>
<p>We like being able to reason about a computation by using the substitution model, i.e., in a referential transparent
computation you can always substitute a function with parameters, with the result of executing the function with those
parameters.</p>
<p>This simplifies enormously the understanding of a complex system by understanding the components (functions) that
together compose the system.</p>
<div class="highlight"><pre><span></span><code>def sq(x: Int): Int = x <span class="gs">* x</span>
<span class="gs">assert(sq(5) == 5 *</span> 5)
</code></pre></div>
<p>The previous example might be too basic, but I hope it suffices to make the point. You can always replace calling the
<em>sq</em> function with the result, and there is no difference between both. This is also very helpful when testing your
program. For more detail, you can look at the Wikipedia article
<a href="https://en.wikipedia.org/wiki/Referential_transparency">here</a>.</p>
<p>One of the most common issues that stops people using referential transparency, apart from global mutable state, is the
ability to blow up the stack throwing exceptions. Throwing exceptions across the call stack can be seen as very
powerful. However, it makes it difficult to reason about your program when composing functions.</p>
<h3>Monad Transformers</h3>
<p>One of the caveats of using effects (effects are orthogonal to the type you get, for instance <em>Future</em> is the effect of
asynchrony, <em>Option</em> is the effect of optionality, <em>Iterable</em> of repeatability…) is that they become extremely
cumbersome when composing more than two operations, or when composing and nesting.</p>
<p>One of the most popular solutions is using a Monad Transformer that allows us to stack two Monads in one and use them as
if they were a standard Monad. You can visit <a href="https://blog.buildo.io/monad-transformers-for-the-working-programmer-aa7e981190e7">this blog
post</a> for more detail.</p>
<p>Other option that we are not going to explore in this post is to use extensible effects, you can visit
<a href="https://github.com/atnos-org/eff">eff</a> for more detail.</p>
<p>The reader can try this without using a Monad Transformer to see how complex it becomes, even if only composing two
functions.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">scala.concurrent.</span><span class="p">{</span><span class="n">ExecutionContext</span><span class="p">,</span> <span class="n">Future</span><span class="p">}</span>
<span class="kn">import</span> <span class="nn">cats.data._</span>
<span class="kn">import</span> <span class="nn">cats.instances.future._</span>
<span class="n">final</span> <span class="n">case</span> <span class="k">class</span> <span class="nc">Customer</span><span class="p">(</span><span class="nb">id</span><span class="p">:</span> <span class="n">CustomerId</span><span class="p">,</span> <span class="n">email</span><span class="p">:</span> <span class="n">EmailAddress</span><span class="p">,</span> <span class="n">fullName</span><span class="p">:</span> <span class="n">String</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">findCustomer</span><span class="p">(</span><span class="n">email</span><span class="p">:</span> <span class="n">EmailAddress</span><span class="p">):</span> <span class="n">EitherT</span><span class="p">[</span><span class="n">Future</span><span class="p">,</span> <span class="n">Throwable</span><span class="p">,</span> <span class="n">Customer</span><span class="p">]</span> <span class="o">=</span>
<span class="n">EitherT</span><span class="p">[</span><span class="n">Future</span><span class="p">,</span> <span class="n">Throwable</span><span class="p">,</span> <span class="n">Customer</span><span class="p">](</span><span class="n">Future</span><span class="o">.</span><span class="n">successful</span><span class="p">(</span><span class="n">Right</span><span class="p">(</span><span class="n">Customer</span><span class="p">(</span><span class="s2">"id"</span><span class="p">,</span> <span class="s2">"me@exampe.com"</span><span class="p">,</span> <span class="s2">"John Doe"</span><span class="p">))))</span>
<span class="k">def</span> <span class="nf">sendEmail</span><span class="p">(</span><span class="n">recipient</span><span class="p">:</span> <span class="n">EmailAddress</span><span class="p">,</span>
<span class="n">subject</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
<span class="n">content</span><span class="p">:</span> <span class="n">String</span><span class="p">):</span> <span class="n">EitherT</span><span class="p">[</span><span class="n">Future</span><span class="p">,</span> <span class="n">Throwable</span><span class="p">,</span> <span class="n">Unit</span><span class="p">]</span> <span class="o">=</span>
<span class="n">EitherT</span><span class="p">[</span><span class="n">Future</span><span class="p">,</span> <span class="n">Throwable</span><span class="p">,</span> <span class="n">Unit</span><span class="p">](</span><span class="n">Future</span><span class="o">.</span><span class="n">successful</span> <span class="p">{</span>
<span class="n">println</span><span class="p">(</span><span class="n">s</span><span class="s2">"Sending promotional email to $recipient, subject: '$subject', content: '$content'"</span><span class="p">)</span>
<span class="n">Right</span><span class="p">(())</span>
<span class="p">})</span>
<span class="k">def</span> <span class="nf">promotionSubject</span><span class="p">(</span><span class="n">fullName</span><span class="p">:</span> <span class="n">String</span><span class="p">):</span> <span class="n">String</span> <span class="o">=</span> <span class="n">s</span><span class="s2">"Amazing promotion $fullName, only for you"</span>
<span class="k">def</span> <span class="nf">promotionContent</span><span class="p">(</span><span class="n">fullName</span><span class="p">:</span> <span class="n">String</span><span class="p">):</span> <span class="n">String</span> <span class="o">=</span> <span class="n">s</span><span class="s2">"Click this link for your personalised promotion $fullName..."</span>
<span class="k">def</span> <span class="nf">sendPromotionEmailToCustomer</span><span class="p">(</span><span class="n">email</span><span class="p">:</span> <span class="n">EmailAddress</span><span class="p">)(</span><span class="n">implicit</span> <span class="n">ec</span><span class="p">:</span> <span class="n">ExecutionContext</span><span class="p">):</span> <span class="n">EitherT</span><span class="p">[</span><span class="n">Future</span><span class="p">,</span> <span class="n">Throwable</span><span class="p">,</span> <span class="n">Unit</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">{</span>
<span class="n">customer</span> <span class="o"><-</span> <span class="n">findCustomer</span><span class="p">(</span><span class="n">email</span><span class="p">)</span>
<span class="n">subject</span> <span class="o">=</span> <span class="n">promotionSubject</span><span class="p">(</span><span class="n">customer</span><span class="o">.</span><span class="n">fullName</span><span class="p">)</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">promotionContent</span><span class="p">(</span><span class="n">customer</span><span class="o">.</span><span class="n">fullName</span><span class="p">)</span>
<span class="n">result</span> <span class="o"><-</span> <span class="n">sendEmail</span><span class="p">(</span><span class="n">customer</span><span class="o">.</span><span class="n">email</span><span class="p">,</span> <span class="n">subject</span><span class="p">,</span> <span class="n">content</span><span class="p">)</span>
<span class="p">}</span> <span class="k">yield</span> <span class="n">result</span>
<span class="p">}</span>
<span class="kn">import</span> <span class="nn">ExecutionContext.Implicits.global</span>
<span class="n">sendPromotionEmailToCustomer</span><span class="p">(</span><span class="s2">"me@example.com"</span><span class="p">)</span>
</code></pre></div>
<p>For full example see <a href="https://scastie.scala-lang.org/javierarrieta/NnR2YgmjTxGKeqyCsTPvhA/4">here</a></p>
<h3>Typeclasses</h3>
<p>We like decoupling structure from behaviour. Some structures might encompass some business functionality, but we should
not try as we did in OO to define everything we think a class can do as methods.</p>
<p>For this, we adopt the Typeclass pattern where we define laws for a given behaviour and then implement the particular
functionality for a given class. Classic examples are Semigroup, Monoid, Applicative, Functor, Monad, etc.</p>
<p>As a simple example, we wanted to be able to serialise/deserialise data types on the wire. For this we defined two
simple Typeclasses:</p>
<div class="highlight"><pre><span></span><code><span class="n">trait</span><span class="w"> </span><span class="n">BytesEncoder</span><span class="o">[</span><span class="n">A</span><span class="o">]</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="nl">a</span><span class="p">:</span><span class="w"> </span><span class="n">A</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="k">Array</span><span class="o">[</span><span class="n">Byte</span><span class="o">]</span>
<span class="err">}</span>
<span class="n">trait</span><span class="w"> </span><span class="n">BytesDecoder</span><span class="o">[</span><span class="n">A</span><span class="o">]</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="nl">arr</span><span class="p">:</span><span class="w"> </span><span class="k">Array</span><span class="o">[</span><span class="n">Byte</span><span class="o">]</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Either</span><span class="o">[</span><span class="n">Throwable, A</span><span class="o">]</span>
<span class="err">}</span>
</code></pre></div>
<p>All we need to do is for every type we want to be able to serialise is to implement those interfaces. When we need to
use an A serialiser, we will require an implicit <em>BytesEncoder[A]</em>, and we will have to provide the implicit when we
instantiate the user.</p>
<p>For example, we have a CustomerEntity, and we want to be able to write this to the wire using protobuf, we will then
provide:</p>
<div class="highlight"><pre><span></span><code><span class="n">implicit</span><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="nl">customerEntityProtoDecoder</span><span class="p">:</span><span class="w"> </span><span class="n">BytesEncoder</span><span class="o">[</span><span class="n">CustomerEntity</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">BytesEncoder</span><span class="o">[</span><span class="n">CustomerEntity</span><span class="o">]</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">override</span><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="nl">ce</span><span class="p">:</span><span class="w"> </span><span class="n">CustomerEntity</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ProtoMapper</span><span class="p">.</span><span class="n">toProto</span><span class="p">(</span><span class="n">ce</span><span class="p">).</span><span class="n">toByteArray</span>
<span class="err">}</span>
</code></pre></div>
<h3>Folds/merges</h3>
<p>We use extensively <em>Either</em> so we need to fold both cases to get a uniform response from them. We can use a fold to
merge both cases into a common response to the client, for instance, a <em>HttpResponse.</em></p>
<p>We also want to consolidate all the errors from a <em>Validated[NonEmptyList[String],T]</em> into a common response by
folding the NoneEmptyList into a String or other adequate type.</p>
<p>Conclusion</p>
<p>As you can see we are very happy using Scala and finding our way into more advanced topics in the functional programming
side of it. But there is nothing stopping you from implementing all these useful features step by step to get up to
speed and experiment with the benefits of using an FP language.</p>
<p>Learn more about opportunities at Zalando by visiting our <a href="https://jobs.zalando.com/tech/jobs/">jobs page.</a></p>Rock Solid Kafka and ZooKeeper Ops on AWS2018-01-04T00:00:00+01:002018-01-04T00:00:00+01:00Ricardo De Cillotag:engineering.zalando.com,2018-01-04:/posts/2018/01/rock-solid-kafka.html<p>Reducing ops effort while maintaining Kafka and Zookeeper</p><h3><strong>Reducing ops effort while maintaining Kafka and Zookeeper</strong></h3>
<p>This post is targeted to those looking for ways to reduce ops effort while maintaining Kafka and Zookeeper deployments
on AWS and also improving their availability and stability. In a nutshell, we are going to explain how using Elastic
Network Interfaces can improve over a straight out of the box setup.</p>
<p>We will examine how Kafka and Zookeeper react to instance terminations and their subsequent replacement by newly
launched instances.</p>
<p>For this example, we'll consider Zookeeper to have been deployed with Exhibitor, which is a popular choice since it
facilitates instance discovery and automatic ensemble formation by sharing configurations over S3 buckets. For
simplicity we're considering a three instances setup, but in real life, five instances would be recommended.</p>
<p>The picture below shows the initial state of our cluster.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/33128199ba2862e7b2da622b4eecf5e7327e0f96_screen-shot-2018-01-02-at-14.21.18.png?auto=compress,format"></p>
<p>As we can see in the image, each Zookeeper instance gets to know about each other through a self advertised IP addresses
on the S3 bucket. As for Kafka, it gets to know about Zookeeper instances through the zookeeper.connect' property
provided to each broker.</p>
<p>Is it safe to terminate a single Zookeeper instance?</p>
<p>Let's examine what happens when AWS terminates one Zookeeper instance.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1501277d88616e1d284d1d8b02d9c7de7159da14_screen-shot-2018-01-02-at-14.23.19.png?auto=compress,format"></p>
<p>See the following actions:</p>
<ul>
<li>Zookeeper keeps serving, since the Quorum condition is preserved with two out of three instances still available.</li>
<li>All Kafka connections served by the terminated instance are closed and automatically re-opened on both of the two
remaining instances.</li>
<li>When Kafka tries to open a connection on the terminated broker, it will fail but it will automatically try on the
next one in the list.</li>
</ul>
<p>Summarising, even though the system is missing one instance from the original composition, everything keeps working as
expected.</p>
<ul>
<li>If you use Exhibitor's <a href="https://github.com/soabase/exhibitor/wiki/Features#automatic-instance-management">instance add/removal
feature</a>, be aware that Exhibitor
might automatically remove the terminated instance from the S3 configuration causing a rolling restart of the
remaining two exhibitors, which on its turn could cause instability in the ensemble and some moments of downtime.</li>
</ul>
<h3>Does it recover without any human interaction?</h3>
<p>One common technique to automatically replace terminated instances in AWS is to use Auto Scaling Groups with fixed
number of instances. There are other techniques, like Amazon EC2 Auto Recovery, which is more interesting but is not
available for all types of instances.</p>
<p>In case of Auto Scaling Groups, a new instance is launched as a reaction to the remaining number of instances (two) not
matching the minimum desired number (three). The newly launched instance would have a different IP address. This is a
very important detail. Having a different IP address will require a lot of work to restore the initial state of the
cluster.</p>
<ol>
<li>Exhibitor includes the new IP address in the nodes list configuration.</li>
<li>All Zookeeper instances are restarted in order to reload the new configuration and accept the new instance in the
ensemble.</li>
<li>Kafka configuration ‘zookeeper.connect’ needs to be updated with the new Zookeeper IP address and all Kafka
instances restarted in order to reload this configuration.</li>
</ol>
<p>Considering all these restarts are done one by one, depending on the number of Kafka instances in the cluster, it could
take hours.</p>
<p>Restarting Kafka takes time:</p>
<ol>
<li>Kafka needs to be gracefully stopped, not to corrupt the indexes, which would require even more time to launch the
process again.</li>
<li>Some publishing requests might fail since it takes some time for producers to find out that those partitions are no
longer served by that broker. It means instability and failures for clients performing synchronous tasks.</li>
<li>All the data ingested by the other replicas while the broker is down would have to be copied, which takes several
minutes depending on the load.</li>
<li>Leadership needs to be switched back to the broker after restart, in order to preserve a well-balanced load on the
cluster. This process needs to be closely watched, because restarting the next broker before the previous came back
and replicated all data would mean that some partitions could go offline.</li>
<li>Leadership switch again causes some moments of instability.</li>
</ol>
<h3>How to do it better</h3>
<p>What if Kafka could reload ‘zookeeper.connect’ without restarts? Not an option, <a href="https://issues.apache.org/jira/browse/KAFKA-1229">this feature is not
available</a> and discussions about implementing it are stuck since
January of 2017.</p>
<p>What if we used domain names for ‘zookeeper.connect’ instead of IP addresses? Also not an option. <a href="https://github.com/apache/zookeeper/pull/150">Zookeeper client
library resolves domain names only once</a>.</p>
<p>What if we could have static IP addresses that don't change?</p>
<p>After some investigation, we came up with this idea of using Elastic Network Interfaces, which have a static IP address,
and attaching them to instances.</p>
<p>This would avoid the need to update configurations and consequently, avoid restarting Kafka and Zookeeper.</p>
<h3>How Elastic Network Interfaces work</h3>
<ol>
<li>First we create a pool of Network Interfaces and tag them accordingly, so that Zookeeper instances are able to
filter them later and automatically attach an available one. Be aware that Network Interfaces are bound to the
subnet of a specific Availability Zone.</li>
<li>Then we launch Zookeeper instances, which during initialization execute the following <a href="https://gist.github.com/rcillo/1a64d757bf3ebaffcb3c71eb95607f1f">script to attach a Network
Interface</a> and configure it properly.</li>
</ol>
<p>Let's go through the instance termination scenario we analyzed before, but this time with Network Interfaces and static
IP addresses.</p>
<ol>
<li>Each Zookeeper node communicates through an attached Network Interface (white box)</li>
</ol>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e1f76f96399a31d153fcffa6a6d387c9f5d69866_screen-shot-2018-01-02-at-15.40.27.png?auto=compress,format"></p>
<ol>
<li>One Zookeeper instance is terminated, but the attached Network Interface is not.</li>
</ol>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e1f76f96399a31d153fcffa6a6d387c9f5d69866_screen-shot-2018-01-02-at-15.40.27.png?auto=compress,format"></p>
<ol>
<li>A new Zookeeper instance, with a different IP address is launched to replace the old one, but it still uses the same
Elastic Network Interface, thus not changing the exposed IP address. This is very important, because it means that Kafka
no longer needs to be restarted to reach the newly launched Zookeeper instance.</li>
</ol>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c5285309196896109078f108635232d2a447bdaa_screen-shot-2018-01-02-at-15.49.28.png?auto=compress,format"></p>
<p>This technique helped us reduce the amount of time spent on operations and raised the bar in terms of stability and
availability, which users of our Kafka cluster greatly appreciate.</p>
<p>Learn more about <a href="https://jobs.zalando.com/tech/jobs/">tech jobs at Zalando.</a></p>
<p><a href="http://www.gnu.org/software/emacs/">Emacs</a> 24.4.1 ( <a href="http://orgmode.org/">Org mode 8.2.10)</a></p>AngularConnect 20172017-12-21T00:00:00+01:002017-12-21T00:00:00+01:00Lora Vardarovatag:engineering.zalando.com,2017-12-21:/posts/2017/12/angularconnect-2017.html<p>Highlights and Takeaways from Europe’s Largest Angular Conference</p><p><strong>Highlights and Takeaways from Europe's Largest Angular Conference</strong></p>
<p>Just a week after <a href="https://blog.angular.io/version-5-0-0-of-angular-now-available-37e414935ced">Angular Version 5 was
released</a>, I had the pleasure to attend
AngularConnect 2017. AngularConnect is Europe's largest Angular conference. It is a multi-track conference so I could
not attend all the sessions, but the talks I saw were amazing and offered great content.</p>
<p>AngularConnect 2017 allowed me to connect with some of the world’s leading Angular experts and learn best practices from
them. During the diversity lunch on day one and the community lunch on day two, I also had the opportunity to take part
in discussions about diversity and inclusion, and community building. There was also great opportunity to network with
other engineers and exchange ideas in between talks in the conference common areas. It was great to find out that many
of the attendees know and shop at Zalando!</p>
<h3><strong>My Highlights</strong></h3>
<p>**
Performance
**Performance was one of the biggest topics at the conference. Performance is a topic I am particularly interested in as
performance optimization increases consumer engagement. During the <a href="https://www.youtube.com/watch?v=9mVfjLhUQl0">Day One
Keynote</a>, <a href="https://www.angularconnect.com/2017/speakers/#igor-minar">Igor
Minar</a> presented lots of interesting insights on how Angular’s
performance was improved. In addition, <a href="https://www.angularconnect.com/2017/speakers/#minko-gechev">Minko Gechev</a>
offered lots of tips on improving the performance of Angular applications in his talk “ <a href="https://www.youtube.com/watch?v=ukErcJDjR_Y">Purely
Fast</a>”. If you only watch one talk from the conference, this is it. Head
over to his
<a href="http://blog.mgechev.com/2017/11/11/faster-angular-applications-onpush-change-detection-immutable-part-1/">blog</a> to read
an in-depth explanation of the content he presented at AngularConnect.</p>
<p><strong>Upgrade
</strong>Upgrading legacy AngularJS applications is still an important topic as over 100,000 websites still use AngularJS
according to <a href="https://trends.builtwith.com/javascript/Angular-JS-v1">builtwith.com</a>. This is a topic I was particularly
interested in as we still have some legacy AngularJS applications. I attended the panel on Migration (ngUpgrade, etc).
One point that business often misses when considering whether to upgrade from AngularJS to Angular is how difficult it
is to attract talent. The main takeaway was that if businesses want to be competitive and be seen as attractive
employers they need to invest in modernizing their legacy applications.</p>
<p><a href="https://www.angularconnect.com/2017/speakers/#asim-hussain">Asim Hussain</a>’s talk “ <a href="https://www.youtube.com/watch?v=JxDuEwLfeGc">From Donkey to Unicorn: a New
Approach to AngularJS Migration</a>” is a must-see for everyone looking to
migrate their applications from AngularJS to Angular. He explains how if you already have good architecture the
ngUpgrade approach should work smoothly. He also presented an iframe approach with which you don’t have to worry at all
about your technical debt. The exciting thing about the iframe approach is that you could use it to migrate to/from
other frameworks as well.</p>
<p><strong>Accessibility
</strong>“ <a href="https://www.youtube.com/watch?v=5o1U9ENzh_g">Accessibility Through the Eyes of a Deaf Professional</a>” by <a href="https://www.angularconnect.com/2017/speakers/#svetlana-kouznetsova">Svetlana
Kouznetsova</a> reminded us of the importance of
accessibility and taking into account the needs of people with disabilities. As she said: “<em>Diversity and inclusion
means nothing without accessibility”.</em> The takeaway is to remember to make our applications accessible. It is a small
effort for the developer but has a big impact on people with disabilities.</p>
<p>Testing
<a href="https://www.angularconnect.com/2017/speakers/#jan-molak">Jan Molak</a> shared his knowledge on “ <a href="https://www.youtube.com/watch?v=CocG0FBFJLU">Testing Angular Apps at
Scale</a>”. He presented an overview of what
<a href="http://serenity-js.org/">Serenity/JS</a> is and how it helps make the design of end-to-end tests scalable. It helps you to
write your tests in a declarative way. In my opinion, this is the future of end-to-end tests design and I look forward
to seeing this approach in other testing frameworks.</p>
<p><strong>Progressive Web Apps
</strong>“ <a href="https://www.youtube.com/watch?v=wFjw0DM1ui4">Automatic Progressive Web Apps Using Angular Service Worker</a>” by
<a href="https://www.angularconnect.com/2017/speakers/#maxim-salnikov">Maxim Salnikov</a> demonstrated how easy it is to build
Progressive Web Apps (PWAs) with the Angular Service Worker. <a href="https://blog.angular.io/angular-5-1-more-now-available-27d372f5eb4e">Angular CLI
1.6</a>, which was released at the beginning of
December 2017 together with Angular 5.1, supports Angular Service Worker. An interesting point that was brought up
during the conference was about customer conversion rates and whether PWAs affected the native apps users. The takeaway
is that PWAs radically improve customer engagement and do not negatively affect the use of native apps.</p>
<p>Luckily all talks are online, so if you did not get a chance to attend, you can watch them on the <a href="https://www.youtube.com/playlist?list=PLAw7NFdKKYpGUpg7JJ8-PJNMdlrOnmZtN">AngularConnect
YouTube channel</a>. A list of all sessions
including slides is available <a href="https://www.angularconnect.com/2017/sessions/">here</a>.</p>
<p>See also: <a href="https://corporate.zalando.com/en/newsroom/en/stories/all-systems-go">Zalando at RecSys 2017.</a></p>Surviving Data Loss2017-12-19T00:00:00+01:002017-12-19T00:00:00+01:00Nina Hanzlikovatag:engineering.zalando.com,2017-12-19:/posts/2017/12/backing-up-kafka-zookeeper.html<p>Backing up Apache Kafka and Zookeeper to S3</p><h3><strong>Backing up Apache Kafka and Zookeeper to S3</strong></h3>
<p><strong>What is Apache Kafka?</strong></p>
<p>Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming applications.
It is horizontally scalable, fault-tolerant, and wicked fast. It runs in production in many companies.</p>
<p>Backups are important with any kind of data. Apache Kafka lowers this risk of data loss with replication across brokers.
However, it is still necessary to have protection in place in the event of user error.</p>
<p>This post will demo and introduce tools that our small team of three at <a href="https://jobs.zalando.com/en/">Zalando</a> uses to
backup and restore Apache Kafka and Zookeeper.</p>
<h3>Backing up Apache Kafka</h3>
<p><strong>Getting started with Kafka Connect
</strong>Kafka Connect is a framework for connecting Kafka with external systems. Its purpose is to make it easy to add new
systems to scalable and secure stream data pipelines.</p>
<p>By using a <a href="https://github.com/spredfast/kafka-connect-s3">connector by Spredfast.com</a>, backing up and restoring the
contents of a topic to S3 becomes a trivial task.</p>
<p><strong>Demo
Download the prerequisite</strong>
Checkout the following repository:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/imduffy15/kafka-env.git
</code></pre></div>
<p>It contains a docker-compose file for bring up a Zookeeper, Kafka, and Kafka Connect locally.</p>
<p>Kafka Connect will load all jars put in the <em>./kafka-connect/jars</em> directory. Go ahead and download the Spredfast.com
<em>kafka-connect-s3.jar</em></p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>wget<span class="w"> </span><span class="s2">"http://dl.bintray.com/iduffy/maven/com/spredfast/kafka/connect/s3/kafka-connect-s3/0.4.2-zBuild/kafka-connect-s3-0.4.2-zBuild-shadow.jar"</span><span class="w"> </span>-O<span class="w"> </span>kafka-connect-s3.jar
</code></pre></div>
<p><strong>Bring up the stack
</strong>To boot the stack, use *docker-compose up
*</p>
<p><strong>Create some data
</strong>Using the Kafka command line utilities, create a topic and a console producer:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>kafka-topics<span class="w"> </span>--zookeeper<span class="w"> </span>localhost:2181<span class="w"> </span>--create<span class="w"> </span>--topic<span class="w"> </span>example-topic<span class="w"> </span>--replication-factor<span class="w"> </span><span class="m">1</span><span class="w"> </span>--partitions<span class="w"> </span>1Created<span class="w"> </span>topic<span class="w"> </span><span class="s2">"example-topic"</span>.$<span class="w"> </span>kafka-console-producer<span class="w"> </span>--topic<span class="w"> </span>example-topic<span class="w"> </span>--broker-list<span class="w"> </span>localhost:9092>hello<span class="w"> </span>world
</code></pre></div>
<p>Using a console consumer, confirm the data is successfully written:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>kafka-console-consumer<span class="w"> </span>--topic<span class="w"> </span>example-topic<span class="w"> </span>--bootstrap-server<span class="w"> </span>localhost:9092<span class="w"> </span>--from-beginning<span class="w"> </span>hello<span class="w"> </span>world
</code></pre></div>
<p><strong>Backing-up
</strong>Create a bucket on S3 to store the backups:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>aws<span class="w"> </span>s3api<span class="w"> </span>create-bucket<span class="w"> </span>--create-bucket-configuration<span class="w"> </span><span class="nv">LocationConstraint</span><span class="o">=</span>eu-west-1<span class="w"> </span>--region<span class="w"> </span>eu-west-1<span class="w"> </span>--bucket<span class="w"> </span>example-kafka-backup-bucket
</code></pre></div>
<p>Create a bucket on S3 to store the backups:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>cat<span class="w"> </span><span class="s"><< EOF > example-topic-backup-tasks.json</span>
<span class="s">{</span>
<span class="s"> "name": "example-topic-backup-tasks",</span>
<span class="s"> "config": {</span>
<span class="s"> "connector.class": "com.spredfast.kafka.connect.s3.sink.S3SinkConnector",</span>
<span class="s"> "format.include.keys": "true",</span>
<span class="s"> "topics": "example-topic",</span>
<span class="s"> "tasks.max": "1",</span>
<span class="s"> "format": "binary",</span>
<span class="s"> "s3.bucket": "example-kafka-backup-bucket",</span>
<span class="s"> "value.converter": "com.spredfast.kafka.connect.s3.AlreadyBytesConverter",</span>
<span class="s"> "key.converter": "com.spredfast.kafka.connect.s3.AlreadyBytesConverter",</span>
<span class="s"> "local.buffer.dir": "/tmp"</span>
<span class="s"> }</span>
<span class="s">}</span>
<span class="s">EOF</span>
curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span>-H<span class="w"> </span><span class="s2">"Content-Type: application/json"</span><span class="w"> </span>-H<span class="w"> </span><span class="s2">"Accept: application/json"</span><span class="w"> </span>-d<span class="w"> </span>@example-topic-backup-tasks.json<span class="w"> </span>/api/kafka-connect-1/connectors
</code></pre></div>
<p>(Check out the <a href="https://github.com/spredfast/kafka-connect-s3/blob/master/README.md">Spredfast documentation</a> for more
configuration options.)</p>
<p>After a few moments the backup task will begin. By listing the Kafka Consumer groups, one can identify the consumer
group related to the backup task and query for its lag to determine if the backup is finished.</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>kafka-consumer-groups<span class="w"> </span>--bootstrap-server<span class="w"> </span>localhost:9092<span class="w"> </span>--listNote:<span class="w"> </span>This<span class="w"> </span>will<span class="w"> </span>only<span class="w"> </span>show<span class="w"> </span>information<span class="w"> </span>about<span class="w"> </span>consumers<span class="w"> </span>that<span class="w"> </span>use<span class="w"> </span>the<span class="w"> </span>Java<span class="w"> </span>consumer<span class="w"> </span>API<span class="w"> </span><span class="o">(</span>non-ZooKeeper-based<span class="w"> </span>consumers<span class="o">)</span>.
connect-example-topic-backup-task
$<span class="w"> </span>kafka-consumer-groups<span class="w"> </span>--describe<span class="w"> </span>--bootstrap-server<span class="w"> </span>localhost:9092<span class="w"> </span>--group<span class="w"> </span>connect-example-topic-backup-tasks
</code></pre></div>
<p>Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers).</p>
<div class="highlight"><pre><span></span><code><span class="n">TOPIC</span> <span class="n">PARTITION</span> <span class="n">CURRENT</span><span class="o">-</span><span class="n">OFFSET</span> <span class="n">LOG</span><span class="o">-</span><span class="kr">END</span><span class="o">-</span><span class="n">OFFSET</span> <span class="n">LAG</span> <span class="n">CONSUMER</span><span class="o">-</span><span class="n">ID</span> <span class="n">HOST</span> <span class="n">CLIENT</span><span class="o">-</span><span class="n">ID</span>
<span class="n">example</span><span class="o">-</span><span class="n">topic</span> <span class="mi">0</span> <span class="mi">1</span> <span class="mi">1</span> <span class="mi">0</span> <span class="n">consumer</span><span class="o">-</span><span class="mi">5</span><span class="o">-</span><span class="n">e95f5858</span><span class="o">-</span><span class="mi">5</span><span class="n">c2e</span><span class="o">-</span><span class="mi">4474</span><span class="o">-</span><span class="n">bab9</span><span class="o">-</span><span class="mi">8</span><span class="n">edfa722db21</span> <span class="o">/</span><span class="mf">172.22</span><span class="p">.</span><span class="mf">0.4</span> <span class="n">consumer</span><span class="o">-</span><span class="mi">5</span>
</code></pre></div>
<p>The backup is completed when the lag reaches 0. On inspecting the S3 bucket, a folder of the raw backup data will be
present.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7096f62ecc37e1773be9b8b1c424e83241505343_screen-shot-2017-12-14-at-13.47.06.png?auto=compress,format"></p>
<p><strong>Restoring
</strong>Let’s destroy all of the containers and start fresh:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>rm<span class="w"> </span>-f<span class="w"> </span>-v
$<span class="w"> </span>docker-compose<span class="w"> </span>up
</code></pre></div>
<p>Re-create the topic:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>kafka-topics<span class="w"> </span>--zookeeper<span class="w"> </span>localhost:2181<span class="w"> </span>--create<span class="w"> </span>--topic<span class="w"> </span>example-topic<span class="w"> </span>--replication-factor<span class="w"> </span><span class="m">1</span><span class="w"> </span>--partitions<span class="w"> </span>1Created<span class="w"> </span>topic<span class="w"> </span><span class="s2">"example-topic"</span>.
</code></pre></div>
<p>Create a source with Kafka Connect:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>cat<span class="w"> </span><span class="s"><< EOF > example-restore.json</span>
<span class="s">{</span>
<span class="s">"name": "example-restore",</span>
<span class="s">"config": {</span>
<span class="s">"connector.class": "com.spredfast.kafka.connect.s3.source.S3SourceConnector",</span>
<span class="s">"tasks.max": "1",</span>
<span class="s">"topics": "example-topic",</span>
<span class="s">"s3.bucket": "imduffy15-example-kafka-backup-bucket",</span>
<span class="s">"key.converter": "com.spredfast.kafka.connect.s3.AlreadyBytesConverter",</span>
<span class="s">"value.converter": "com.spredfast.kafka.connect.s3.AlreadyBytesConverter",</span>
<span class="s">"format": "binary",</span>
<span class="s">"format.include.keys": "true"</span>
<span class="s">}</span>
<span class="s">}</span>
<span class="s">EOF</span>
curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span>-H<span class="w"> </span><span class="s2">"Content-Type: application/json"</span><span class="w"> </span>-H<span class="w"> </span><span class="s2">"Accept: application/json"</span><span class="w"> </span>-d<span class="w"> </span>@example-restore.json<span class="w"> </span>/api/kafka-connect-1/connectors
</code></pre></div>
<p>(Check out the <a href="https://github.com/spredfast/kafka-connect-s3/blob/master/README.md">Spredfast documentation</a> for more
configuration options.)</p>
<p>The restore process should begin, in some time, the ‘hello world’ message will display when you run the Kafka console
consumer again:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>kafka-console-consumer<span class="w"> </span>--topic<span class="w"> </span>example-topic<span class="w"> </span>--bootstrap-server<span class="w"> </span>localhost:9092<span class="w"> </span>--from-beginning
hello<span class="w"> </span>world
</code></pre></div>
<h3>Backing Up Zookeeper</h3>
<p><strong>Zookeeper’s role
</strong>Newer versions of Kafka ( >= version 0.10.x) use ZooKeeper in a small but still important coordination role. When
new Kafka Brokers join the cluster, they use ZooKeeper to discover and connect to the other brokers. The cluster also
uses ZooKeeper to elect the controller and track the controller epoch. Finally and perhaps most importantly, ZooKeeper
stores the Kafka Broker topic partition mappings, which tracks the information stored on each broker. The data will
still persist without these mappings, but won't be accessible or replicated.</p>
<p>Exhibitor
<a href="https://github.com/soabase/exhibitor/wiki">Exhibitor</a> is a popular supervisor system for ZooKeeper. It provides a
number of housekeeping facilities for ZooKeeper as well as exposing a nice UI for exploring the stored data. It also
provides some <a href="https://github.com/soabase/exhibitor/wiki/Backup-Provider">backup</a> and
<a href="https://github.com/soabase/exhibitor/wiki/Restore-UI">restore</a> capabilities for ZooKeeper out of the box. However, we
should make sure we understand these features before relying on them.</p>
<p>On AWS, we run our Exhibitor brokers under the same stack. In this setup, when the stack auto scaling group is
responsible for controlling when any of the Exhibitor instances are removed, it is relatively easy for multiple (or even
all) Exhibitor brokers to be terminated at the same time. That’s why for our tests we set up an Exhibitor cluster and
connected Kafka to it. We then indexed all the Kafka znodes to create an exhibitor backup. Finally, we tore down the
Exhibitor stack and re-deployed it with the same config.</p>
<p>Unfortunately, after re-deploy, while the backup folder was definitely in S3, the new Exhibitor appliance did not
recognise it as an existing index. With a bit of searching we found that this is actually the <a href="https://github.com/soabase/exhibitor/issues/343">expected
behaviour</a> and the suggested solution is to read the S3 index and apply
changes by hand.</p>
<p><strong>Backing-up
</strong>Creating, deleting and re-assigning topics in Kafka is an uncommon occurrence for us, so we estimated that a daily
backup task would be sufficient for our needs.</p>
<p>We came across <a href="https://github.com/mhausenblas/burry.sh">Burry</a>. Burry is a small tool which allows for snapshotting and
restoring of a number of system critical stores, including ZooKeeper. It can save the snapshot dump locally or to
various cloud storage options. Its backup dump is also conveniently organized along the znode structure making it very
easy to work with manually if need be. Using this tool we set up a daily cron job on our production to get a full daily
ZooKeeper snapshot and upload the resultant dump to an S3 bucket on AWS.</p>
<p><strong>Restoring
</strong>Conveniently, Burry also works as a restore tool using a previous ZooKeeper snapshot. It will try to recreate the full
snapshot znode structure and znode data. It also tries to be careful to preserve existing data, so if a znode exists it
will not overwrite it.</p>
<p>But there is a catch. Some of the Kafka-created znodes are ephemeral and expected to expire when the Kafka Brokers
disconnect. Currently Burry snapshots these as any other znodes, so restoring to a fresh Exhibitor cluster will recreate
them. If we were to restore Zookeeper before restarting our Kafka brokers, we'd restore from the snapshot of the
ephemeral znodes with information about the Kafka brokers in our previous cluster. If we then bring up our Kafka
cluster, our new broker node IDs, which must be unique, would conflict with the IDs restored from our Zookeeper backup.
In other words, we'd be unable to start up our Kafka brokers.</p>
<p>We can easily get around this problem by starting our new Zookeeper and new Kafka clusters before restoring the
Zookeeper content from our backup. By doing this, the Kafka brokers will create their ephemeral znodes, and the
Zookeeper restore will not overwrite these, and will go on to recreate the topic and partition assignment information.
After restarting the Kafka brokers, the data stored on their persisted disks will once again be correctly mapped and
available, and consumers will be able to resume processing from their last committed position.</p>
<p>Be part of Zalando Tech. We're <a href="https://jobs.zalando.com/tech/jobs/">hiring!</a></p>Constant Gardening2017-12-14T00:00:00+01:002017-12-14T00:00:00+01:00Fausto Sanninotag:engineering.zalando.com,2017-12-14:/posts/2017/12/constant-gardening.html<p>Effective management is not an end goal, but a process.</p><p><strong>How effective management is a continuing story of growth</strong></p>
<h3>Producers’ Style</h3>
<p>One of the things I struggled the most with in the past year was identifying the best way to lead my teams. I worked a
lot on myself, observed my peers, and tried to learn from my leads, but in the end, I ran into into the well known
dilemma: task-focused or people-focused management, which one is best?</p>
<p><strong>Task-focused</strong> management combines strong analytical skills with an intense motivation to move forward and solve
problems. <strong>People-focused</strong> management combines skills like communication and empathy. Teams are heterogeneously
comprised of people with different needs and behaviours but based on my experience, they too eventually sort themselves
into one of two macro categories:</p>
<ul>
<li><strong>Result-oriented -</strong> Team members who love organized work, take responsibility, are generally self-confident,
challenging and energetic. They pay attention to details and processes, and tend to be reliable and conscientious.</li>
<li><strong>Relationship-oriented -</strong> Team members who naturally focus on relationships, take care of others’ feelings, are
good at building cohesion, and tend to be warm, diplomatic, and approachable.</li>
</ul>
<h3>Your personal style</h3>
<p>What is our personal style and what happens when we apply the right approach, but with the <em>wrong</em> team?</p>
<p>Task-focused management is an attitudinal and behavioral approach where the leader, manager or supervisor focuses
primarily on getting the work done. When this style is applied on result-oriented teams, high standards are maintained
and with great efficiency. Team members rely on the structure and have good time management due to clear and sensible
deadlines.</p>
<p>However, task-oriented management can lead to bottlenecks, reducing autonomy and creativity, and fostering
dissatisfaction. This can have a negative effect on a company’s products as well, since it tends to kill innovation.
Leaders who centralize all process or insert themselves too aggressively into decision making can overwhelm and
suffocate team members with quieter personalities.</p>
<p>Applied to relationship-oriented teams, task-focused management can be perceived as uncaring: too many process to
follow, no room for dialogue, top-down decision making, and the perception of autocratic organization. Working under
intense scheduling and excessive task orientation can bring the company culture down. People who are self-motivated
could become rebels in this kind of environment and it can affect the rate of employee retention in the medium/short
term.</p>
<p>By contrast, a people-oriented management style tends to energize people because it makes them feel appreciated. This
kind of leader cares about tasks and schedules, but believes that work culture is more important. People-oriented
management makes people feel that they make a difference in the company. Each decision is shared and accepted following
a totally transparent process.</p>
<p>Where this style falls down is that managers often invest so much time on relationship building through meetings and
team-building exercises that delays occur. Some relationship-oriented leaders give employees allow autonomy to the
extent that tasks may not be completed on time. Moreover, when applied to a result-oriented team, the lack of guidance,
direction and organization can be a point of frustration and even stress.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1008fca091fe9310bae176120e67f0c5073796e8_constant-gardening-oct-17-airshow.jpg?auto=compress,format"></p>
<p><em>There is no one 'right way' to manage, but a balance of multiple aspects</em></p>
<h3>Finding Balance</h3>
<p>We require leaders who try to balance both approaches. Some of us are naturally more task-focused: we tend to be good at
decision making, problem solving and delegating. And some of us are naturally inclined to focus on empathy, listening,
including others and encouraging cooperation.</p>
<p>Neither approach is better than the other. Starting with our natural management approach, we should strive to
incorporate the strengths of the other style. People-focused managers can improve their decision making, while
task-focused managers can try to be more empathetic for example. We add to our natural style new skills training,
mentoring, experimenting, coaching or self-study, thus complementing the natural management style we already have.</p>
<p>The fundamental message is that you show the team you really care about them and their wellbeing. We should work as
gardeners instead of craftsmen, as Nobel Prize-winning economist Friedrich August von Hayek advised in his 1974
acceptance speech.</p>
<p>When we create something we have a sense of control; we have a plan and an end goal. When the craftsman builds the
table, it's built. Being a gardener is different. You have to create the environment, tending to the plants and knowing
when to leave them alone. You have to make sure the environment is fertile for everything you want to grow (different
plants have different needs), and even after the harvest you aren’t done. You need to turn the earth and, in essence,
start again. There is no end state if you want something to grow. For both result-oriented and people-oriented teams,
there is no ‘completed table’, only constant growth and, with conscious effort, improvement.</p>
<p><em>Fausto Sannino is a project manager at Zalando Payments.</em></p>
<p><em>Want to be part of a dynamic team? We’re <a href="https://jobs.zalando.com/tech/jobs/">hiring</a>!</em></p>Introducing: Helsinki’s 100th Employee2017-12-07T00:00:00+01:002017-12-07T00:00:00+01:00Vivi Brooketag:engineering.zalando.com,2017-12-07:/posts/2017/12/helsinki-100-employee.html<p>In conversation with Full Stack Engineer, Maksim Ekimovskii</p><h3><strong>In conversation with Full Stack Engineer, Maksim Ekimovskii</strong></h3>
<p>Yesterday, Finland celebrated the 100th anniversary of its independence. To join our <a href="https://engineering.zalando.com/posts/2015/08/hello-helsinki.html">Helsinki
hub’s</a> celebrations, we spoke to the 100th employee,
Maksim Ekimovskii. A full stack engineer and <a href="https://vimeo.com/229080865">passionate videographer</a>, Maksim tells us
about his journey with Zalando so far.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e6db91340235e0d4f04c58bf6423c642cbf626db_maksim-helsinki.jpg?auto=compress,format"></p>
<p><em>Chillin' out Maksim, relaxin' all cool.</em></p>
<p>**</p>
<p>Tell us a little about yourself.
**My name is Maksim Ekimovskii and my native city is Severodvinsk in Northern Russia.</p>
<p>All my life I’ve been passionate about tech, arts and sports of any kind. By day I’m a full stack engineer and by night
I’m a software artist, videographer and music lover.</p>
<p><strong>Why did you decided to relocate to Finland?
</strong>I had a long journey to Finland. I studied in St. Petersburg, but after graduating I moved to Hong Kong to work for a
local startup. At that time, I got my first international experience and it sparked my interest to live abroad long
term.</p>
<p>Why Finland? It’s interesting because I had opportunities to continue my career in three different locations: come back
and stay in St. Petersburg, move to the United States or try Finland. The companies were equally interesting to me, so I
was really deliberating what to do. I started my research about quality of life in different cities and set the goal to
move to the city from the top 15. I checked such things as health care, public services, safety, financial stability,
job market, etc. After awhile I figured out that Helsinki was a clear winner among the cities I could get a working
contract in and moreover I’ve been there many times, really liked the environment and knew what to expect.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/734881bc1fd35230254dd0ae7c5cb65f002a9308_image3.jpg?auto=compress,format"></p>
<p><em>Zalando Tech – Helsinki hub</em></p>
<p>**
When did you first join Zalando? What were your first weeks like?
**I joined in the beginning of July (2017) as the 100th employee, so now I’ve been in the company for 5 months. My first
weeks felt like I was in the right place; I could find many challenges, a lot of opportunities and passionate people
around, which got me excited to work here. I felt welcomed, made my first friends, and overall had a smooth intro to the
company.</p>
<p><strong>What kind of projects are you working on?
</strong>I work in Team Perimeter, which is a new team in the Helsinki Tech Hub and we integrate partners into the Zalando
Platform. We take end-to-end responsibility of the entire integration process and develop new service pages for the
Fashion Store, which allows our partners to connect their customers to fashion.</p>
<p><strong>What’s been a personal high in your role so far?
</strong>Dynamics. Company dynamics, team dynamics and project dynamics. My personal high so far is realizing the impact I can
have, developing new soft and hard skills, working with people of different mindsets and ideas you can learn from and
grow.</p>
<p><strong>How does Zalando fit in the tech community in Helsinki?
</strong>It’s really nice to see how well Zalando fits in the tech community in Helsinki. Sometimes there are so many events
going on here I miss some. It’s quite regular that we host a tech meetup in our office or my proactive colleagues
participate and give some talks here and there.</p>
<p>I personally gave a talk at the Helsinki docker meetup and had an internal Zalando presentation just recently. I saw
that people are really interested in what we are doing and how, and now I have an exclusive “Zelsinki On-Air” speaker
sticker at the back of my laptop.</p>
<p>Join our Helsinki team. We're <a href="http://zln.do/2iCcHRu">hiring</a>!</p>A Recipe for Kafka Lag Monitoring2017-12-05T00:00:00+01:002017-12-05T00:00:00+01:00Mark Kellytag:engineering.zalando.com,2017-12-05:/posts/2017/12/recipe-for-kafka-lag-monitoring.html<p>A closer look at the ingredients needed for ultimate stability</p><h3><strong>A closer look at the ingredients needed for ultimate stability</strong></h3>
<p><em>This is part of a series of posts on Kafka. See <a href="https://engineering.zalando.com/posts/2017/11/real-time-ranking-kafka.html">Ranking Websites in Real-time with Apache Kafka’s Streams
API</a> for the first post in the series.</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6b01f59fb7f262c93c5a902ea858a718bc93c1e6_screen-shot-2017-12-04-at-12.19.57.png?auto=compress,format"></p>
<p>Remora is a small application to track the monitoring of Kafka. Due to many teams deploying this to their production
environments, open sourcing this application made sense. Here, I’ll go through some technical pieces of what is
streaming, why it is useful and the drive behind this useful monitoring application.</p>
<p>Streaming architectures have become an important architecture pattern at Zalando. To have fast, highly available and
scalable systems to process data across numerous teams and layers, having a good streaming infrastructure and monitoring
is key.</p>
<p>Without streaming, the system is not reactive to changes. An older style would manage changes through incremental
batches. Batches, such as CRONs, can be hard to manage as you have to keep track of a large number of jobs, each taking
care of a shard of data, this may also be hard to scale. Having a streaming component, such as Kafka, allows for a
centralised scalable resource to make your architecture reactive.</p>
<p>Some use cloud infrastructure such as AWS Kinesis or SQS. Alternatively, to reach better throughput and use frameworks
like AKKA streams, Kafka is chosen by Zalando.</p>
<p>Monitoring lag is important; without it we don’t know where the consumer is relative to the size of the queue. An
analogy might be piloting a plane without knowing how many more miles are left on your journey. Zalando has trialled:</p>
<ul>
<li>Burrow, which has <a href="https://github.com/linkedin/Burrow/wiki/Known-Issues">performance issues</a>,</li>
<li>Kafka lag monitor, which relates more to storm,</li>
<li><a href="https://github.com/yahoo/kafka-manager">Kafka manager</a>, which is more of a UI tool.</li>
</ul>
<p>We needed a simple independent application, which may scrape metrics from a URL and place them into our metrics system.
But what to include in this application? I put on my apron and played chef with the range of ingredients available to
us.</p>
<p>Scala is big at Zalando; a majority of the developers know AKKA or Play. At the time in our team, we were designing
systems using AKKA HTTP with an actor pattern. It is a very light, stable and an asynchronous framework. Could you just
wrap the command line tools in a light Scala framework? Potentially, it could take less time and be more stable. <em>Sounds
reasonable</em>, we thought, <em>let’s do that</em>.</p>
<p>The ingredients for ultimate stability were as follows: a handful of Kafka java command line tools with a pinch of <a href="http://doc.akka.io/docs/akka-http/current/scala/http/">AKKA
Http</a> and a hint of an <a href="http://doc.akka.io/docs/akka/current/scala/general/actor-systems.html">actor
design</a>. Leave to code for a few days and take
out of the oven. Lightly garnish with a performance test to ensure stability, throw in some docker. Deploy, monitor,
alert, and top it off with a beautiful graph to impress. Present <a href="https://github.com/zalando-incubator/remora">Remora</a>
to your friends so that everyone may have a piece of the cake, no matter where in the world you are!</p>
<p>Bon <em>app</em>-etit!</p>Running Kafka Streams applications in AWS2017-11-30T00:00:00+01:002017-11-30T00:00:00+01:00Nina Hanzlikovatag:engineering.zalando.com,2017-11-30:/posts/2017/11/running-kafka-streams-applications-aws.html<p>Second in our series about the use of Apache Kafka’s Streams API by Zalando</p><h3><strong>Second in our series about the use of Apache Kafka’s Streams API by Zalando</strong></h3>
<p>This is the second in a series about the use of Apache Kafka’s Streams API by Zalando, Europe’s leading online fashion
platform. See <a href="https://engineering.zalando.com/posts/2017/11/real-time-ranking-kafka.html">Ranking Websites in Real-time with Apache Kafka’s Streams API for the first post in the
series.</a></p>
<p>This piece was first published on <a href="https://www.confluent.io/blog/running-kafka-streams-applications-aws/">confluent.io</a></p>
<h3>Running Kafka Streams applications in AWS</h3>
<p>At Zalando, Europe’s leading online fashion platform, we use Apache Kafka for a wide variety of use cases. In this blog
post, we share our experiences and lessons learned to run our real-time applications built with Kafka’s Streams API in
production on Amazon Web Services (AWS). Our <a href="https://engineering.zalando.com/">team at Zalando</a> was an early adopter
of the Kafka Streams API. We have been using it since its initial release in Kafka 0.10.0 in mid-2016, so we hope you
find this hands-on information helpful for running your own use cases in production.</p>
<h3>What is Apache Kafka’s Streams API?</h3>
<p>The <a href="https://kafka.apache.org/documentation/streams/">Kafka Streams API</a> is available as a Java library included in
Apache Kafka that allows you to build real-time applications and microservices that process data from Kafka. It allows
you to perform <em>stateless</em> operations such as filtering (where messages are processed independently from each other), as
well as <em>stateful</em> operations like aggregations, joins, windowing, and more. Applications built with the Streams API are
elastically scalable, distributed, and fault-tolerant. For example, the Streams API guarantees fault-tolerant data
processing with <a href="https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/">exactly-once
semantics</a>, and it
processes data based on event-time i.e., when the data was actually generated in the real world (rather than when it
happens to be processed). This conveniently covers many of the production needs for mission-critical real-time
applications.</p>
<p>An example of how we use Kafka Streams at Zalando is the aforementioned use case of <a href="https://www.confluent.io/blog/ranking-websites-real-time-apache-kafkas-streams-api/">ranking websites in real-time to
understand fashion trends</a>.</p>
<h2>Library Upgrades of Kafka Streams</h2>
<p>Largely due to our early adoption of Kafka Streams, we encountered many teething problems in running Streams
applications in production. However, we stuck with it due to how easy it was to write Kafka Streams code. In our early
days of adoption, we hit various issues around stream consumer groups rebalancing, issues with getting locks on the
local RocksDB after a rebalance, and more. These eventually settled down and sorted themselves out in the 0.10.2.1
release (April 2017) of the Kafka Streams API.</p>
<h3>I/O</h3>
<p>After upgrading to 0.10.2.1, our Kafka Streams applications were mostly stable, but we would still see what appeared to
be random crashes every so often. These crashes occurred more frequently on components doing complex stream
aggregations. We eventually discovered that the actual culprit was AWS rather than Kafka Streams; on AWS General purpose
SSD (GP2) EBS volumes operate using I/O credits. The AWS pricing model allocates a baseline read and write IOPS
allowance to a volume, based on the volume size. Each volume also has an IOPS burst balance, to act as a buffer if the
base limit is exceeded. Burst balance replenishes over time but as it gets used up, the reading and writing to disks
starts getting throttled to the baseline, leaving the application with an EBS that is very unresponsive. This ended up
being the root cause of most of our issues with aggregations. When running Kafka Stream applications, we had initially
assigned 10gb disks as we didn’t foresee much storage occurring on these boxes. However, under the hood, the
applications performed lots of read/write operations on the RocksDBs which resulted in I/O credits being used up, and
given the size of our disks, the I/O credits were not replenished quickly enough, grinding our application to a halt. We
remediated this issue by provisioning Kafka Streams applications with much larger disks. This gave us more baseline
IOPS, and the burst balance was replenished at a faster rate.</p>
<h3>Monitoring</h3>
<h4>EBS Burst Balance</h4>
<p>Our <a href="https://github.com/zalando/zmon">monitoring solution</a> polls <a href="https://aws.amazon.com/cloudwatch/">CloudWatch</a>
metrics and pulls back all AWS exposed metrics. For the issue outlined, the most important of these is <a href="https://aws.amazon.com/blogs/aws/new-burst-balance-metric-for-ec2s-general-purpose-ssd-gp2-volumes/">EBS burst
balance</a>. As
mentioned above, in many cases applications that use Kafka Streams rely on heavy utilization of locally persisted
RocksDBs for storage and quick data access. This storage is persisted on the instance’s EBS volumes and generates a high
read and write workload on the volumes. GP2 disks were used in preference to provisioned IOPS disks (IO1) since these
were found to be much more cost-effective in our case.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/63b7f3790416f6933fb1f0faf4058067ee033ec5_screen-shot-2017-11-28-at-10.12.44.png?auto=compress,format"></p>
<h3>Fine-tuning your application</h3>
<p>With upgrades in the underlying Kafka Streams library, the Kafka community introduced many improvements to the
underlying stream configuration defaults. Where in previous, more unstable iterations of the client library we spent a
lot of time tweaking config values such as <em>session.timeout.ms</em>, <em>max.poll.interval.ms</em>, and request.timeout.ms to
achieve some level of stability.</p>
<p>With new releases we found ourselves discarding these custom values and achieving better results. However, some timeout
issues persisted on some of our services, where a service would frequently get stuck in a rebalancing state. We noticed
that reducing the <em>max.poll.records</em> value for the stream configs would sometimes alleviate issues experienced by these
services. From partition lag profiles we also saw that the consuming issue seemed to be confined to only a few
partitions, while the others would continue processing normally between re-balances. Ultimately we realised that the
processing time for a record in these services could be very long (up to minutes) in some edge cases. Kafka has a fairly
large maximum offset commit time before a stream consumer is considered dead (five minutes), but with larger message
batches of data, this timeout was still being exceeded. By the time the processing of the record was finished, the
stream was already marked as failed and so the offset could not be committed. On rebalance, this same record would once
again be fetched from Kafka, would fail to process in a timely manner and the situation would repeat. Therefore for any
of the affected applications, we introduced a processing timeout, ensuring there was an upper bound on the time taken by
any of our edge cases.</p>
<h3>Monitoring</h3>
<h4>Consumer Lag</h4>
<p>By looking at the metadata of a Kafka Consumer Group, we can determine a few key metrics. How many messages are being
written to the partitions within the group? How many messages are being read from those partitions? The difference
between these is called lag; it represents how far the Consumers lag behind the Producers.</p>
<p>The ideal running state is that lag is a near zero value. At Zalando, we wanted a way to monitor and plot this to see if
our streams applications are functioning.</p>
<p>After trying out a number of consumer lag monitoring utilities, such as <a href="https://github.com/linkedin/Burrow">Burrow</a>,
<a href="https://github.com/srotya/kafka-lag-monitor">Kafka Lag Monitor</a> and <a href="https://github.com/yahoo/kafka-manager">Kafka
Manager</a>, we ultimately found these tools either too unstable or a poor fit for
our use case. From this need, our co-worker, <a href="https://www.linkedin.com/in/mark-kelly-a0040319/">Mark Kelly</a>, build a
small utility called <a href="https://github.com/zalando-incubator/remora">Remora</a>. It is a simple HTTP wrapper around the Kafka
consumer group “describe” command. By polling the Remora HTTP endpoints from our monitoring system at a set time
interval, we were able to get good insights into our stream applications.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0657c1f7f1f785c956b00fef4da8f42eb2177b73_screen-shot-2017-11-28-at-10.19.28.png?auto=compress,format"></p>
<h3>Memory</h3>
<p>Our final issue was due to memory consumption. Initially we somewhat naively assigned very large heaps to the Java
virtual machine (JVM). This was a bad idea because Kafka Streams applications utilize a lot of off-heap memory when
configured to use RocksDB as their local storage engine, which is the default. By assigning large heaps, there wasn’t
much free system memory. As a result, applications would eventually come to a halt and crash. For our applications we
use M4.large instances, we assign 4gb of ram to the heap and usually utilize about 2gb of it, the system has a remaining
4gb of ram free for off-heap and system usage, utilization of overall system memory is at 70%. Additionally, we would
recommend reviewing the <a href="https://docs.confluent.io/current/streams/developer-guide.html#memory-management">memory
management</a> section of Confluent’s
Kafka Streams documentation as customising the RocksDB configuration was necessary in some of our use cases.</p>
<h3>Monitoring</h3>
<h4>JVM Heap Utilization</h4>
<p>We expose JVM heap utilization using <a href="http://metrics.dropwizard.io/3.2.3/">Dropwizard metrics</a> via HTTP. This is polled
on an interval by our <a href="https://github.com/zalando/zmon">monitoring solution</a> and graphed. Many of our applications are
fairly memory intensive, with in-memory caching, so it was important for us to be able to see at a glance how much
memory was available to the application. Additionally, due to the relative complexity of many of our applications, we
wanted to have easy visibility into garbage collection in the systems. Dropwizard metrics offered a robust, ready-made
solution for these problems.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/70340f1bec18a2a7fd17aaf5cc5ee4fe34f02a06_screen-shot-2017-11-28-at-10.20.51.png?auto=compress,format"></p>
<h3>CPU, System Memory Utilization and Disk Usage</h3>
<p>We run <a href="https://github.com/prometheus/node_exporter">Prometheus node exporter</a>on all of our servers; this exports lots
of system metrics via HTTP. Again, our monitoring solution polls this on interval and graphs them. While the JVM
monitoring provided a great insight into what was going on in the JVM, we needed to also have insight into what was
going on in the instance. In general, most of the applications we ended up writing had a much greater network and memory
overheads than CPU requirements. However, in many failure cases we saw, our instances were ultimately terminated by
auto-scaling groups on failing their health checks. These health checks would fail because the endpoints became
unresponsive due to high CPU loads as other resources were used up. While this was usually not due to high CPU use in
the application processing itself, it was a great symptom to capture and dig further into where this CPU usage was
coming from. Disk monitoring also proved very valuable, particularly for Kafka Streams applications consuming from
topics with a large partitioning factor and/or doing more complex aggregations. These applications store a fairly large
amount of data (200MB per partition) in RocksDB on the host instance, so it is very easy to accidentally run out of
space. Finally, it is also good to monitor how much memory the system has available as a whole since this was frequently
directly connected to CPU loads saturating on the instances, as briefly outlined above.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/dc9a5fd2ff7ef3d9a2927f7a7a58bffaf84e6195_screen-shot-2017-11-28-at-10.23.49.png?auto=compress,format"></p>
<h3>Conclusion: The Big Picture of our Journey</h3>
<p>As mentioned in the beginning, our team at <a href="https://engineering.zalando.com/">Zalando</a> has been using the Kafka
Streams API since its initial release in Kafka 0.10.0 in mid-2016. While it wasn’t a smooth journey in the very
beginning, we stuck with it and, with its recent versions, we now enjoy many benefits: our productivity as developers
has skyrocketed, writing new real-time applications is now quick and easy with the Streams API’s very natural
programming model, and horizontal scaling has become trivial.</p>
<p>In the next article of this series, we will discuss how we are backing up Apache Kafka and Zookeeper to Amazon S3 as
part of our disaster recovery system.</p>
<h3>About Apache Kafka’s Streams API</h3>
<p>If you have enjoyed this article, you might want to continue with the following resources to learn more about Apache
Kafka’s Streams API:</p>
<ul>
<li><a href="https://kafka.apache.org/documentation/streams/">Get started with the Kafka Streams API</a> to build your own
real-time applications and microservices.</li>
<li>Walk through our <a href="https://docs.confluent.io/current/streams/kafka-streams-examples/docs/index.html">Confluent tutorial for the Kafka Streams API with
Docker</a> and play with our
<a href="https://github.com/confluentinc/kafka-streams-examples">Confluent demo applications</a>.</li>
</ul>
<p>Join our ace tech team. We’re <a href="https://jobs.zalando.com/tech/jobs/">hiring!</a></p>Real-time Ranking with Apache Kafka’s Streams API2017-11-23T00:00:00+01:002017-11-23T00:00:00+01:00Hunter Kellytag:engineering.zalando.com,2017-11-23:/posts/2017/11/real-time-ranking-kafka.html<p>Using Apache and the Kafka Streams API with Scala on AWS for real-time fashion insights</p><h3><strong>Using Apache and the Kafka Streams API with Scala on AWS for real-time fashion insights</strong></h3>
<p><a href="https://www.confluent.io/blog/ranking-websites-real-time-apache-kafkas-streams-api/">This piece was originally published on
confluent.io</a></p>
<h3>The Fashion Web</h3>
<p>Zalando, Europe’s leading online fashion platform, cares deeply about fashion. Our mission statement is to, “Reimagine
fashion for the good of all”. To reimagine something, first you need to understand it. The Dublin Fashion Insight Centre
was created to understand the “Fashion Web” – what is happening in the fashion world beyond the borders of what’s
happening within Zalando’s shops.</p>
<p>We continually gather data from fashion-related sites. We bootstrapped this process with a list of relevant sites from
our fashion experts, but as we scale our coverage, and add support for multiple languages (spoken, not programming), we
need to know what are the next “best” sites.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/63fb0106e79a7db22606f43697f8ff96781b14c2_screen-shot-2017-11-23-at-09.04.33.png?auto=compress,format"></p>
<p>Rather than relying on human knowledge and intuition, we needed an automated, data-driven methodology to do this. We
settled on a modified version of Jon Kleinberg’s <a href="http://www.cs.cornell.edu/home/kleinber/auth.pdf">HITS algorithm</a>.
HITS (Hyperlink Induced Topic Search) is also sometimes known as <em>Hubs and Authorities</em>, which are the main outputs of
the algorithm. We use a modified version of the algorithm, where we flatten to the domain level (e.g., Vogue.com) rather
than on the original per-document level (e.g, http://Vogue.com/news).</p>
<h3>HITS in a Nutshell</h3>
<p>The core concept in HITS is that of Hubs and Authorities. Basically, a <em>Hub</em> is an entity that points to lots of other
“good” entities. An <em>Authority</em> is the complement; an entity pointed to by lots of other “good” entities. The entities
here, for our purposes, are web sites represented by their domains such as Vogue.com or ELLE.com. Domains have both Hub
and Authority scores, and they are separate (this turns out to be important, which we’ll explain later).</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/72a78c389b07bb1fc9c7ee53828456a5c12d45ec_screen-shot-2017-11-23-at-09.06.39.png?auto=compress,format"></p>
<p>These Hub and Authority scores are computed using an <a href="https://en.wikipedia.org/wiki/Adjacency_matrix">adjacency matrix</a>.
For every domain, we mark the other domains that it has links to. This is a directed graph, with the direction being who
links to whom.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/dee6f4e0f3c55baf0f0784b9ab974a9896bbd0ee_screen-shot-2017-11-23-at-09.07.27.png?auto=compress,format"></p>
<p><em>(Image courtesy of <a href="http://faculty.ycp.edu/~dbabcock/PastCourses/cs360/lectures/lecture15.html">http://faculty.ycp.edu/~dbabcock/PastCourses/cs360/lectures/lecture15.html</a>)</em></p>
<p>Once you have the adjacency matrix, you perform some straightforward matrix calculations to calculate a vector of Hub
scores and a vector of Authority scores as follows:</p>
<ul>
<li>Sum across the columns and normalize, this becomes your Hub vector</li>
<li>Multiply the Hub vector element-wise across the adjacency matrix</li>
<li>Sum down the rows and normalize, this becomes your Authority vector</li>
<li>Multiply the Authority vector element-wise down the the adjacency matrix</li>
<li>Repeat</li>
</ul>
<p>An important thing to note is that the algorithm is iterative: you perform the steps above until eventually you reach
convergence—that is, the vectors stop changing—and you’re done. For our purposes, we just pick a set number of
iterations, execute them, and then accept the results from that point. We’re mostly interested in the top entries, and
those tend to stabilize pretty quickly.</p>
<p>So why not just use the raw counts from the adjacency matrix? The beauty of the HITS algorithm is that Hubs and
Authorities are mutually supporting—the better the sites that something points at, the better a Hub it is; similarly,
the better the sites that point to something, the better an Authority it is. That is why the iteration is necessary: it
bubbles the good stuff up to the top.</p>
<p>(Technically, you don’t have to iterate. There’s some fancy matrix math you can do instead with calculating the
eigenvectors. In practice, we found that when working with large, sparse matrices, the results didn’t turn out the way
we expected, so we stuck with the iterative, straightforward method.)</p>
<h3>Common Questions</h3>
<p>**
What about non-fashion domains? Don’t they clog things up?
**Yes, in fact, on the first run of the algorithm, sites like Facebook, Twitter, Instagram, et al. were right up at the
top of the list. Our Fashion Librarians then curated that list to get a nice list of fashion-relevant sites to work
with.</p>
<p><strong>Why not PageRank?
</strong>PageRank needs to have nearly complete information on the web; with the resources needed to get this. We only have
outgoing link data on the domains that are already in our working list. We need an algorithm that is robust in the face
of partial information.</p>
<p>This is where the power of the separate Hub and Authority scores comes in. Given the information for our seeds, they
become our list of Hubs. We can then calculate the Authorities, filter out our seeds, and have a ranked list of stuff we
don’t have. Voilà! Problem solved, even in the face of partial knowledge.</p>
<h3>But wait, you said Kafka Streams?</h3>
<p>Kafka was already part of our solution, so it made sense to try to leverage that infrastructure and our experience using
it. Here’s some of the thinking behind why we chose to go with Apache Kafka’s® <a href="https://kafka.apache.org/documentation/streams/">Streams
API</a> to perform such real-time ranking of domains as described above:</p>
<ul>
<li>It has all the primitives necessary for MapReduce-style computation: the “Map” step can be done with <em>groupBy</em> &
<em>groupByKey</em>, the “Reduce” step can be done with <em>reduce</em> & <em>aggregate</em>.</li>
<li>Streaming allows us to have real-time, up-to-date data.</li>
<li>The focus stays on the data. We’re not thinking about distributed computing machinery.</li>
<li>It fits in naturally with the functional style of the rest of our application.</li>
</ul>
<p>You may wonder at this point, “Why not use MapReduce? And why not use tools like Apache Hadoop or Apache Spark that
provide implementations of MapReduce?” Given that <a href="https://research.google.com/archive/mapreduce.html">MapReduce</a> was
invented originally to solve this type of ranking problem, why not use it for the very similar type computation we have
here? There are a few reasons we didn’t go with it:</p>
<ul>
<li>We’re a small team, with no previous Hadoop experience, which rules Hadoop out.</li>
<li>While we do run Spark jobs occasionally in batch mode, it is a high infrastructure cost to run full-time if you’re
not using it for anything else.</li>
<li>Initial experience with Spark Streaming, snapshotting, and recovery didn’t go smoothly.</li>
</ul>
<h3>How We Do It</h3>
<p>For the rest of this article we are going to assume at least a basic familiarity with the Kafka Streams API and its two
core abstractions, KStream and KTable. If not, there are plenty of
<a href="https://kafka.apache.org/documentation/streams/">tutorials</a>, <a href="https://www.confluent.io/blog/hello-world-kafka-connect-kafka-streams/">blog
posts</a> and
<a href="https://github.com/confluentinc/kafka-streams-examples">examples</a> available.</p>
<h3>Overview</h3>
<p>The real-time ranking is performed by a set of Scala components (groups of related functionality) that use the Kafka
Streams API. We deploy them via containers in AWS, where they interact with our Kafka clusters. The Kafka Streams API
allows us to run each component, or group of components, in a distributed fashion across multiple containers depending
on our scalability needs.</p>
<p>At a conceptual level, there are three main components of the application. The first two, the Domain Link Extractor and
the Domain Reducer, are deployed together in a single JVM. The third component, the HITS Calculator and its associated
API front end, is deployed as a separate JVM. In the diagram below, the curved bounding boxes represent deployment
units; the rectangles represent the components.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c303691a684e3c828770464c604439ac2a2be874_screen-shot-2017-11-23-at-09.15.28.png?auto=compress,format"></p>
<p>Data, in the form of s3 URLs to stored web documents, comes into the system on an input topic. The Domain Link Extractor
loads the document, extracts the links, and outputs a mapping from domain to associated domains, for that document.
We’ll drill into this a little bit more below. At a high level, we use the <em>flatMap</em> KStream operation. We use <em>flatMap</em>
rather than <em>map</em> to simplify the error handling—we’re not overly concerned with errors, so for each document we either
emit one new stream element on success, or zero if there is an error. Using <em>flatMap</em> makes that straightforward.</p>
<p>These mappings then go to the Domain Reducer, where we use <em>groupByKey</em> and <em>reduce</em> to create a KTable. The key is the
domain, and the value in the table is the union of all the domains that the key domain has linked to, across all the
documents in that domain. From the KTable, we use <em>toStream</em> to convert back to a KStream and from there to an output
topic, which is log-compacted.</p>
<p>The final piece of the puzzle is the HITS Calculator. It reads in the updates to domains, keeps the mappings in a local
cache, uses these mappings to create the adjacency matrix, then perform the actual HITS calculation using the process
described above. The ranked Hubs and Authorities are then made available via a REST API.</p>
<h3>The flexibility of Kafka’s Streams API</h3>
<p>Let’s dive into the Domain Link Extractor for a second, not to focus on the implementation, but as a means of exploring
the flexibility that Kafka Streams gives.</p>
<p>The current implementation of the Domain Link Extractor component is a function that calls four more focused functions,
tying everything together with a Scala <em>for</em> comprehension. This all happens in a KStream <em>flatMap</em> call. Interestingly
enough, the monadic style of the <em>for</em> comprehension fits in very naturally with the <em>flatMap</em> KStream call.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9a3212007c4bc0604c76197a42b30398c43fab79_screen-shot-2017-11-23-at-09.19.08.png?auto=compress,format"></p>
<p>One of the nice things about working with Kafka Streams is the flexibility that it gives us. For example, if we wanted
to add information to the extracted external links, it is very straightforward to capture the intermediate output and
further process that data, without interfering with the existing calculation <em>(shown below)</em>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8ee2a514335bf6ef6e689fc1e5e6cbe9aa2b2092_screen-shot-2017-11-23-at-10.01.46.png?auto=compress,format"></p>
<h3>In Summary</h3>
<p>For us, Apache Kafka is useful for much more than just collecting and sharing data in real-time; we use it to solve
important problems in our business domain by building applications on top. Specifically, Kafka’s Streams API enables us
to build real-time Scala applications that are easy to implement, elastically scalable, and that fit well into our
existing, AWS-based deployment setup.</p>
<p>The programming style fits in very naturally with our already existing functional approach, allowing us to quickly and
easily tackle problems with a natural decomposition of the problem into flexible and scalable microservices.</p>
<p>Given that much of what we’re doing is manipulating and transforming data, we get to stay close to the data. Kafka
Streams doesn’t force us to work at too abstract a level or distract us with unnecessary infrastructure concerns.</p>
<p>It’s for these reasons we use Apache Kafka’s Streams API in the Dublin Fashion Insight Centre as one of our go-to tools
in building up our understanding of the Fashion Web.</p>
<h3>About Apache Kafka’s Streams API</h3>
<p>If you have enjoyed this article, you might want to continue with the following resources to learn more about Apache
Kafka’s Streams API:</p>
<ul>
<li><a href="https://kafka.apache.org/documentation/streams/">Get started with the Kafka Streams API</a> to build your own
real-time applications and microservices.</li>
<li>Walk through our <a href="https://docs.confluent.io/current/streams/kafka-streams-examples/docs/index.html">Confluent tutorial for the Kafka Streams API with
Docker</a> and play with our
<a href="https://github.com/confluentinc/kafka-streams-examples">Confluent demo applications</a>.</li>
</ul>
<p><em>Interested in working in our team? Get in touch: <a href="https://jobs.zalando.com/tech/jobs/">we’re hiring</a>.</em></p>Why Event Driven?2017-11-21T00:00:00+01:002017-11-21T00:00:00+01:00Conor Cliffordtag:engineering.zalando.com,2017-11-21:/posts/2017/11/why-event-driven.html<p>Zalando is using an event-driven approach for its new Fashion Platform. Conor Clifford examines why</p><h3><strong>Zalando is using an event-driven approach for its new Fashion Platform. Conor Clifford examines why</strong></h3>
<p>In a <a href="https://engineering.zalando.com/posts/2017/10/event-first-development---moving-towards-kafka-pipeline-applications.html">recent
post</a>, I wrote
about how we went about building the core Article services and applications, of Zalando’s new Fashion Platform, with a
strong <strong>event first</strong> focus. That new platform also has a strong overall event-driven focus, rather than a more
“traditional” service-oriented approach.</p>
<p>The concept of “event-driven” is not a new one; indeed, it has been quite well covered in recent years.</p>
<p>In this post, we look at why we are using an event-driven approach to build the new Fashion Platform in Zalando.</p>
<h3>Serving Complexity</h3>
<p>A “traditional” service/microservice architecture will be composed of many individual services, each with different
responsibilities. Each service will likely have several, probably many, clients; each interacting with the service to
fetch data as needed. And these clients may be services to other clients, etc.</p>
<p>Various clients will have different requirements for the data they are fetching, for example:</p>
<ul>
<li>Regular high frequency individual primary key fetches</li>
<li>Intermittent, yet regular, large bulk fetches</li>
<li>Non-primary key based queries/searches, also with varieties of frequencies and volumes</li>
<li>All the above with differing expectations/requirements around response times, and request throughputs</li>
</ul>
<p>Over time, as such systems grow and become more complex, the demands on each of these services grow, both in terms of
new functional requirements, as well as operational demands. From generally easy beginnings, growth can lead to
ever-increasing complexity of the overall system, resulting in systems that are difficult to operate, scale, maintain
and improve over time.</p>
<p>While there are excellent tools and techniques for dealing with and managing these complexities, these target the
symptoms, not the underlying root causes.</p>
<p>Perhaps there is a better way.</p>
<p><em>Want to know more about Zalando Dublin? Check out the video straight from our fashion insights center.</em></p>
<h3>Inversion of flow</h3>
<p>The basic underlying concept here is to <strong>invert</strong> this traditional flow of information. To change from a top-down,
request oriented system to one where data flows from the bottom up, with changes to data causing new snapshot events to
be published. These changes propagate upwards through the system, being handled appropriately by a variety of client
subsystems on its way.</p>
<p>Rather than fetching data on demand, clients requiring the data in question can process it appropriately for their own
needs, at their own pace. That can be processing transformation, merging and producing new events, or building an
appropriate local persisted projection of the data, e.g. a high speed key-value store for fast lookups, populating an
analytical database, maintaining a search cluster, or even maintaining a corpus of data for various data science/machine
learning activities, etc. In fact, there can and will be clients that do a combination of such activities around the
event data.</p>
<h3>On-Demand Requests is Easy</h3>
<p>Building a client that pulls data on demand from a service would appear the easier thing to do, with clients being free
to just fetch data directly as needed. From the client perspective, it can even appear that there is no obvious benefit
to an event-driven approach.</p>
<p>However, with a view to the wider platform ecology (many clients, many services, lots of data, etc.), the traditional
“pull-based” approach will lead to much more <strong>complex</strong> and problematic systems, leading to a variety of challenges:</p>
<ul>
<li><strong>Operation and Maintenance</strong> - core services in pull-based systems grow to serve more and more clients over time;
clients with different access requirements (PK fetches, batch fetches, periodic "fetch the world" cycles, etc.). As
the number and types of such clients grow, operating and maintaining such core services becomes ever more complex
and difficult.</li>
<li><strong>Software Delivery</strong> - as clients of core services grow, so to will the list of requirements around different
access patterns and capabilities of the underlying data services (e.g. inclusion of batch-fetches, alternative
indexing, growing request loads, competing prioritizations, etc.). This workload has a strong tendency to ultimately
swamp the delivery teams of core services, to the detriment of delivering new business value. In addition to the
service's team, the client teams themselves would also be dependent on new/changed functionality in the services to
allow them to move forward.</li>
<li><strong>Runtime Complexity -</strong> Outages and other such incidents in "pull" based environments can have dramatic impacts.
Core service outage would essentially break any client fetching data "on demand". Multiple dependent applications
can be brought down by an outage in a single underlying service. There can also be interesting dynamics on recovery
of such services, with potential thundering herds, etc., causing repeating outages during this recovery, prolonging,
or worse, further degrading, the impact of the original incident. Even without outages, the complexity of systems
built around a request/response approach makes forecasting and predicting load growth difficult, modelling the
interplay of many different clients, with different request patterns is difficult. Attempting to do forecasting of
growth for each of these becomes a real challenge.</li>
</ul>
<p>By evolving to an event-driven system, there are many advantages over these, and other aspects:</p>
<ul>
<li><strong>Operationally</strong> - since clients receive information about change, the clients can react instantly and
appropriately themselves. As the throughput of data is driving the system, the performance/load characteristics are
much more predictable (and testable.) There is no non-determinism caused by the interplay of multiple clients, etc.
In general, handing data over event streams allows for much looser coupling of clients and services results in
simpler systems.</li>
<li><strong>Delivery</strong> - With the ability to access complete data from the event streams, clients are no longer blocked by the
service teams delivery backlog; they are completely free to move forward themselves. Similarly, service delivery
team backlogs are not overloaded by requests for serving modifications/alterations, etc., and as such freed up to
directly deliver new business value.</li>
<li><strong>Outages</strong> - With clients receiving data changes, and handling these locally, an outage of the originating service
essentially means clients working with some stale data (the data that would have been changed during that outage),
typically a much less invasive and problematic issue. In many cases, where clients depend on data that changes
infrequently, if at all, once established, it’s not an issue.</li>
<li><strong>Greater Runtime Simplicity</strong> - with data passing through event streams, and clients consuming these streams as
they need, the overall dynamic of the system becomes more predictable/less complicated.</li>
</ul>
<h3>Tradeoffs</h3>
<p>*“Time is an illusion. Lunchtime doubly so.” - Douglas Adams</p>
<p>*There's no such thing as a free lunch. There’s likely more work up front in establishing such an architectural shift,
as well as other concerns:</p>
<ul>
<li><strong>Client Burden -</strong> There will be an additional burden on clients in a purely event-driven system, with those
clients having to implement and operate local persisted projections of the event stream(s) for their purposes.
However, a non-trivial part of this extra work is offset by removing work (development and operational) around
integrating with a synchronous API and all the details that entails; dealing with authentication, rate limiting,
circuit breakers, outage mitigations, etc. There is also less work involved with not having an API that is 100%
purpose built. In addition, the logic to maintain such local snapshot projections is straightforward (e.g. write an
optionally transformed value to a “key value” store for fast local lookups).</li>
<li><strong>Source of Truth</strong> - A common concern with having locally managed projections of the source data is that there is a
loss of the single source of truth that the originating service represents in the traditional model. By following an
“<strong>Event First</strong>” approach, with the <strong>event stream</strong> itself being <strong>the primary source of truth</strong>, and by allowing
<em>only</em> changes from the event stream itself to cause changes to any of the local projections, the <em>source of truth</em>
is kept true.</li>
<li><strong>Traditional Clients</strong> - there may be clients that are not in a position to deal with a local data projection (e.g.
clients that only require few requests processed, clients that facilitate certain types of custom/external
integrations, etc.) In such cases, there may be a need to provide a more traditional “request-response” interface.
These, though, can be built using the same foundations, i.e. a custom data projection, and a dedicated new service
using this to address these clients’ needs. We need to ensure that any clients looking to fit the “traditional”
model are appropriate candidates to do so. Care should be taken to resist the temptation to implement the “easy”
solution, rather than the correct solution.</li>
</ul>
<h3>Conclusion</h3>
<p>In the modern era of building growing systems, with many hundreds of engineers, dozens of teams, all trying move fast
and deliver excellent software with key business value, there is a need for less fragile solutions.</p>
<p>In this post, we have looked at moving away from a more regular “service” oriented architecture, towards one driven by
event streams. There are many advantages, but, with their own set of challenges. And of course, there is no such thing
as a silver bullet.</p>
<p>In the next post, I will look at some of the lessons we have learned building the system, and some practices that should
be encouraged.</p>
<p>If you are interested in working with these types of systems and challenges, <a href="https://jobs.zalando.com/jobs/823362-software-engineer-smart-product-platform/">join us. We’re
hiring!</a></p>Do We Really Need UI Tests?2017-11-16T00:00:00+01:002017-11-16T00:00:00+01:00Vadym Kukhtintag:engineering.zalando.com,2017-11-16:/posts/2017/11/do-we-really-need-ui-testing.html<p>Two brothers examine the pros and cons of UI testing</p><h3><strong>Two brothers examine the pros and cons of UI testing</strong></h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4fd507fb406a584dd96052be66da87f69965462b_screen-shot-2017-11-13-at-10.23.06.png?auto=compress,format"></p>
<p>Based on their different experiences in <a href="https://corporate.zalando.com/en/innovation/partner-solutions">Partner
Solutions</a> and <a href="https://zms.zalando.com/">Zalando Media
Solutions</a> respectively, we speak to frontend developers, Vadym Kukhtin and Oleksandr Kukhtin
about their opposing opinions on UI testing.</p>
<p><strong>The Case Against UI Testing - Vadym</strong></p>
<p>**TL;DR It depends on preference, but I believe that UI testing isn’t required in every instance
**</p>
<p>In my experience, it is a sisyphean task to force developers to write even basic Unit tests, nevermind UI and E2E. Only
Spartans led by Leonidas can achieve UI and E2E testing.</p>
<p>Of course, the case for UI testing is more complex than a simple “good vs bad” dichotomy. For example, the scale and
scope of the app should be taken into account. If the app is small or short-term, most probably UI tests aren’t
required. If it’s a monster project that needs to be covered as much as possible, then unit and E2E tests are required.</p>
<p>In the real world app, any interaction should change some state of the app, whether it’s a click, hover or any custom
event. With unit tests, the developer can test internal component or service functionality, and with E2E the developer
can test common component interactions and connections to third-party services, and API and backend functionality.</p>
<p><strong>Example</strong>:</p>
<ul>
<li><strong>Use case:</strong> User should be able to login using OAuth and see “Hello” board.</li>
<li><strong>App</strong>:</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7f19dee884257d0ba56e29692121e1a0499ba783_screen-shot-2017-11-16-at-11.39.42.png?auto=compress,format"></p>
<ul>
<li><strong>Test case</strong>:</li>
<li><strong>Unit test:</strong> Checks which functions inside <strong>LoginComponent</strong> were triggered when the mock user clicks on the
“Login” button. If “login()” function is triggered and the router state changed to “/hello-board”, we can discern
that everything is working fine.</li>
<li><strong>E2E test</strong>: Checks if the user really clicked on the correct button, if OAuth returned the correct data, and if
current location state in the browser contains the “hello-board”</li>
</ul>
<p>This process can be incredibly time consuming, with developers spending time writing the tests and mocking all
dependencies. For small or short-term apps, we have to ask ourselves: is the time worth it?</p>
<p>My answer would be no.</p>
<p><strong>The Case for UI Testing - Oleksandr</strong></p>
<p><strong>TL;DR In most cases I think we don’t need to write UI tests.</strong></p>
<p>Let’s start with a small illustration. You start a new project and a month later have a nicely working app. You then
decide to change one component in your structure. It is only UI component, so you know, that logic has no changes. The
change itself works, but now the app has some of errors: it looks like you forgot about some style changes and your
component now looks awkward. So, you fix the styles, deploy the changes, sit back and relax. But <em>now</em>, you have
unwanted style issues in other component. So you again do the same thing and deploy the changes. Ideally, UI tests can
identify this kind of problems.</p>
<p>Like most things in life, UI tests has advantages and disadvantages. Some argue it takes too much time, but this time
can be considered an investment, safeguarding against any unwanted games of “tennis” as seen in the example above.
Testing UI helps us better understand our code, and what actually it should render.</p>
<p>Yes, testing is complicated. Complex UI logic is pretty hard to test, but not inconceivable. Problem appears here: the
process <em>has</em> so much troubles for the advantages gained: you need to write a lot of tests, but in result even small
changes (that often happened with UI) force to rewrite bigger part of it.</p>
<p><strong>In Conclusion
</strong>The biggest takeaway from our discussion is that UI testing cannot be simply filed away under “good” or “bad”. In some
circumstances – such as small apps or short-term projects – testing may not be the best use of time. In others, testing
is a must for maintaining the integrity of the components and saving time in the future.</p>
<p>Got an opinion on UI testing and want to bring it to a dynamic team? Get in touch. We’re
<a href="https://jobs.zalando.com/tech/">hiring.</a></p>Dedicated Ownership for Teams at Zalon2017-11-09T00:00:00+01:002017-11-09T00:00:00+01:00Jan Helwichtag:engineering.zalando.com,2017-11-09:/posts/2017/11/dedicated-ownership-for-teams-at-zalon.html<p>Agile Lead and Software Engineer at Zalon, Jan Helwich on how to work well</p><h3>Agile Lead and Software Engineer at Zalon, Jan Helwich on how to work well</h3>
<p>At the beginning of 2017, we at Zalon decided to enable our teams to work in what we believe is the most effective and
efficient way. At the heart of this restructuring process, we assigned cross-functional teams to business goals or user
needs <em>only</em> and let them take full responsibility for solving these problems. This transfers ownership and
responsibility to the teams, and allows them to focus solely on the topics at hand. We first evaluated the idea by
testing it with a very small team and are currently rolling it out to all our teams.</p>
<p>The teams now ideate, define features, build prototypes for user tests if needed, set up A/B tests, implement and bring
the features live, and also monitor impact to decide if further work is needed. To be able to do this the teams need to
be fully cross-functional, meaning they must consist of UX, UI, Producer (role comparable to Scrum Master), Product and
Business personnel, plus all engineers required to bring a feature to life.</p>
<p>This approach is not new. For example, Spotify is working in a similar way. But there is <a href="https://www.infoq.com/news/2016/10/no-spotify-model">no common Spotify
model</a> and no approach is “one size fits all”. Spotify itself is
very unique in setup and state, as it evolved quite fast and was an extremely flexible startup from the very beginning.</p>
<p>In this article we’d like to shed some light on the learnings and best practices that we’ve acquired from taking our
first steps in this direction.</p>
<h3>The reasoning</h3>
<p>I would consider our former practices as fairly agile here at Zalon. For any business problem, Business and Product
would start discussing potential ideas; at some point UX would be included. Together, they’d ideate and expand upon them
by prototyping and user testing. Engineers were included as needed. This resulted in a relatively rough specification
that was then planned in the backlog of a team. We completed the final specification during implementation to avoid
common <a href="https://en.wikipedia.org/wiki/Waterfall_model">Waterfall</a> problems.</p>
<p>This approach worked decently for us but had some major drawbacks:</p>
<ul>
<li>Our engineers still often felt like code monkeys as the ideation process happens before (sometimes way before)
actual implementation and those engineers included in ideation were not always the ones implementing the feature.</li>
<li>The risk of producing waste was high, as Product and UX often worked on things that were never implemented or had to
change a lot during implementation.</li>
</ul>
<h3>How we want to work</h3>
<p>The basic ideas we wanted to hold on to were already present in the <a href="http://agilemanifesto.org/">Agile Manifesto</a> from
roughly 15 years ago; in the <a href="http://agilemanifesto.org/principles.html">Principles</a> to be more specific. Here are the
areas that steered our vision:</p>
<ul>
<li>Business people and developers must work together daily throughout the project,</li>
<li>The most efficient and effective method of conveying information to and within a development team is face-to-face
conversation,</li>
<li>Build projects around motivated individuals. Give them the environment and support they need, and trust them to get
the job done.</li>
</ul>
<p>In my experience, the last point is often the hardest. Why? Because it requires a vast amount of trust of leadership in
teams. We try to work within an organization that trusts teams to do what they have been hired to do. Implementing this
philosophy in other workplaces or companies is another thing altogether.</p>
<p>Our experience with this new approach showed that transparency and trust are crucial. One example in how we saw benefits
here was when we calculated roughly how much a sprint costs. Using this information the teams could make informed
decisions, i.e. “Is it worth spending more time on implementing a certain feature if the estimated outcome is X?”</p>
<p>One other very important part of this approach is team setup. Here is an overview of what we consider important for our
teams:</p>
<ul>
<li><strong>Cross functional</strong></li>
</ul>
<p>It is important that a team is able to solve the problem at hand, from ideation to release of the feature. Our typical
team consists of Product, UX, UI, Frontend and Backend Engineer(s), plus one or two associated people from business.</p>
<ul>
<li><strong>Long living</strong></li>
</ul>
<p>The team should work for a certain time together to allow them to really grow as a team, in order to get to the
performing phase of team building.</p>
<ul>
<li><strong>Self-organizing</strong></li>
</ul>
<p>The teams <a href="https://www.infoq.com/articles/what-are-self-organising-teams">self organize</a>.</p>
<ul>
<li><strong>Size matters</strong></li>
</ul>
<p>Studies show that teams should not be too big. This is due to the communication overhead becoming a problem.</p>
<ul>
<li><strong>Business should be part of the team, not only through product</strong></li>
</ul>
<p>The gain from useful information transferred from one area to another is not to be underestimated. Until now, we
associated colleagues from the business side as included only for the duration of a feature “project”. This might be a
disadvantage when it comes to team building.</p>
<h3>What we’ve done so far</h3>
<p>So which process are we using? As a department we are still learning a lot every day. We are sure that we have yet to
cover every possible case. Regardless, we’re still hoping to share what we’ve distilled from our recent experience in
order to grow and learn.</p>
<p>As said, for us it is most important that teams can fully focus on the topics at hand. But how do you get all the small
stuff done? We alternate between focus phases and so-called miscellaneous phases. In a miscellaneous sprint, the team
“focuses” on smaller improvements, bugs, and the monitoring of new features. The latter is important as the team has
usually brought about live features beforehand, thus it is crucial to be prepared for upcoming issues or bugs.</p>
<p>Note: It should be obvious that in a focus phase, critical issues should also be addressed by teams. Experience shows
that engineers sometimes have <a href="http://tracks.roojoom.com/r/451">slack time</a> where they are also able to get smaller
improvements done.</p>
<p>General process is as follows:</p>
<ol>
<li>Business, in cooperation with Product, prepares One Pagers, which are documents that explain the user need or a
business goal, and the KPIs that shall be improved. Also, an estimate of impact is given, so that they can be
prioritised in an informed way.</li>
<li>Kickoff meeting(s), which might take place before the actual start of the focus phase, so that Business and Product
can explain the One Pager to the team and answer questions.</li>
<li>Ideation phase (if necessary → 0-2 sprints): (i) Brainstorming potential solutions (ii) Testing ideas eg. by
prototypes with user tests, A/B tests etc (iii) Tracking: Are all KPIs tracked in a way we can measure success?
Implement as necessary.</li>
<li>Implementation phase: (i) Implementing the outcomes of the ideation (ii) Setup of reporting (iii) Bringing features
live, A/B test etc</li>
</ol>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/58e1a5fdf7a0d5ee993d6d2a1d54dd31f5f826a3_screen-shot-2017-11-07-at-17.42.08.png?auto=compress,format"></p>
<h3>Learnings</h3>
<p><strong>Process
</strong>We are not dogmatic in any way about these process steps. In some cases it might make sense to start implementing
certain features during the ideation phase. A small ideation about an unforeseen topic might also happen in the
implementation step.</p>
<p>For some teams it works better to have the kickoff and even some ideation meetings before the actual focus sprint
starts.</p>
<p><strong>Sprints and meetings
</strong>Our teams work in one or two week sprints. One of the most important meetings related to this setup is the review
meeting. Here, the team presents the results of their work to everyone interested. This can be wireframes, prototypes,
or diagrams showing how things are planned, and of course working software as well. The reviews are usually fruitful as
here the team can celebrate successes and discuss with their stakeholders how to continue or what next to tackle based
on the current results.</p>
<p><strong>Ideation
</strong>In the ideation phases it is crucial to meet often, as there is still a lot of uncertainty and the team needs to
communicate and coordinate next steps often. Once every one to two days works for us. One team even extended their
stand-up to up to 30 minutes. We are currently considering the option to start with two full workshop days for bigger
topics.</p>
<p>Our teams also found it crucial to write down what they had decided upon during these meetings, but also what they
decided <em>not</em> to do and why. When they are working for a longer period on a problem, it otherwise happened incredibly
often that they re-evaluated these ideas once again.</p>
<p><strong>Leadership
</strong>When embarking on a new method of working and collaborating, it is important to support the team undertaking the
change. For example, a lot of engineers may be very unfamiliar to ideate and not “just” implement defined tickets. It
must be made clear that their input is appreciated and needed throughout that time. Also, keep an eye on and support the
team when going through team building ( <a href="https://www.mindtools.com/pages/article/newLDR_86.htm">phases</a>).</p>
<p><strong>Decision making
</strong>Coming to decisions in a timely manner might be challenging for some teams. Supporting the team here might be
necessary, e.g. by <a href="http://agileorganizations.io/#consentdecisionmaking">consent decision making</a>. Sometimes, input from
Business and Product can accelerate decision making if the team is indecisive.</p>
<p><strong>KPIs
</strong>Try to focus on the least number of KPIs as possible, preferably one at a time.</p>
<p><strong>Rules
</strong>For <a href="https://engineering.zalando.com/posts/2017/11/agile-fails.html">some teams</a>, defining rules allows for structure and guidance,
so everyone is on the same page and Business personnel understand how technical members of the team work.</p>
<p><strong>Capacity and slack time
</strong>It is important to note that this requires a fair amount of capacity from the associated Business person, as they are
usually responsible for providing a lot of background data, preparing reports, etc. Our recommendation currently would
be to have one person with a main focus, or nearly all of their capacity, working with the team. We know this is often
not possible and are open to alternatives.</p>
<p>Teams and leaders must also be aware that developers have slack time here and there. There are two things I would like
to stress here. Firstly, it is important that the developers are aware that the focus topic at hand must always have
first priority. That might also mean they have to support other team members, optimize some processes or check if all
the required data is available, thus having a hand in completing more “uncommon” development tasks.</p>
<p>Secondly, having some slack time for engineers is <a href="http://www.jamesshore.com/Agile-Book/slack.html">part of many agile
methods</a> for good reasons. They should use this time for clean-ups,
getting rid of technical debt, research, and so on.</p>
<h3>Conclusion</h3>
<p>Letting teams take over full ownership of goals for us is already proving to be advantageous. Overall, teams are very
satisfied when it comes to focus, engagement and collaboration. Most important is that the results of the first focus
areas are “good” to “very good”, e.g. an NPS of up to 70 for one new tool for our stylists out of the box.</p>
<p>For leadership, communicating goals and expectations clearly, supporting the teams in working as a team and on difficult
decisions without explicitly telling them what to do, is vital in making a success of this method.</p>
<p>I hope the learnings we shared here are helpful for some teams wanting to implement a similar approach. If you have any
remarks, questions or additions please contact me on <a href="https://twitter.com/janhelwich?lang=en">Twitter</a> or via
<a href="mailto:jan.helwich@zalando.de">email</a>.</p>
<p>Want to bring your skills to the table? Have a look at our <a href="https://jobs.zalando.com/tech/">jobs page.</a></p>Zalando Wins Big in Dublin2017-11-07T00:00:00+01:002017-11-07T00:00:00+01:00Justin Lawlertag:engineering.zalando.com,2017-11-07:/posts/2017/11/datsci-awards-2017.html<p>Ana Peleteiro Ramallo takes ‘Data Scientist of the Year’ award at the DatSci’s</p><h3>Ana Peleteiro Ramallo takes ‘Data Scientist of the Year’ award at the DatSci’s</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8d4dca1cbcb7b64904bf0c3487251386293f5a64_screen-shot-2017-11-06-at-09.28.25.png?auto=compress,format"></p>
<p>There was a great turnout at the <a href="https://www.datsciawards.ie/">Dublin DatSci Awards</a> at Dublin’s Croke Park, with Data
Scientists from across companies, universities, startups and the public sector attending. Zalando Dublin had finalists
in two major award categories, backed up with two tables for support. Ana, Senior Data Scientist at Zalando Dublin and
one of the founding members of the Dublin data science team, ended up taking home the top prize – <a href="https://www.datsciawards.ie/awards/data-scientist-of-the-year/">Data Scientist of the
Year</a> – much to the celebrations of the Dublin office.
Zalando’s second entry for the Fashion Content Platform (FCP) team were finalists in the competition for the <a href="https://www.datsciawards.ie/awards/best-data-science-sme/">Small and
Medium-size Enterprise (SME) award</a>.</p>
<p>A special thanks goes to <a href="http://www.ceadar.ie">CeADAR</a> and <a href="https://www.nextgeneration.ie">Next Generation</a> for
supporting the DatSci Awards.</p>
<h2>The DatSci Awards</h2>
<p>The explosion of data and data science is changing all of our worlds. Data Scientists in Ireland are at the cutting edge
of this work.</p>
<p>The data science community in Ireland is extremely collaborative, with companies working with universities and <a href="https://engineering.zalando.com/posts/2016/04/dublin-data-science-tour.html">data
science meetups running nearly every week</a>; usually hosted
by companies.</p>
<p>The DatSci Awards give recognition to those doing outstanding work in this area, taking into account their background,
the current work they’re doing and what knowledge sharing they’re taking part in with the community.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/74ce25cbd5593229eff77c4cd84a0cbbee2d387c_screen-shot-2017-11-06-at-09.31.20.png?auto=compress,format"></p>
<h2>Ana Peleteiro Ramallo - Data Scientist of the Year</h2>
<p>Ana has a long history in Data Science. She had worked in academia for many years before moving into FinTech, and then
into her current role at Zalando in Dublin.</p>
<p>Ana sets a high bar for how data science is done in Zalando, bringing best research practices from her background.</p>
<p>When <a href="https://corporate.zalando.com/en/newsroom/en/stories/my-first-day-zalando-ana-data-scientist">Ana joined the Dublin
team</a>, they were still
deciding what the most impactful products they could build were going to be.</p>
<p>Ana was always asking questions: What ways can data science add value to Zalando? How can data be leveraged in new ways?
How can sense be made out of all the data being generated in the fashion ecosystem online? What new <a href="https://engineering.zalando.com/posts/2017/01/sapphire-deep-learning-upskilling.html">technologies should
we be investigating</a>?</p>
<h3>“Without machine learning, we could never keep up with the amount of fashion resources that are available.” - <strong>Ana Peleteiro Ramallo</strong></h3>
<p>With her team, Ana has been shipping groundbreaking products in Deep Learning for Natural Language Processing (NLP) and
Knowledge Extraction. It’s not possible to scale data science without engineering. Ana has been building and leading
multidisciplinary teams that include both data scientists and engineers working together on data science products that
scale to millions of documents a day, on top of making an impact for the business.</p>
<p>Outside Zalando, Ana has always been a key contributor to the Data Scientist community, presenting at meetups and
conferences. She is a Women in Tech advocate, having given talks at several relevant venues and mentoring at the <a href="https://www.linkedin.com/pulse/wiml-2016-ana-peleteiro-ramallo">Women
in Machine Learning workshop</a> in Barcelona last
December.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b0f8288076e534856cacdb5599cff3875125e966_screen-shot-2017-11-07-at-11.55.21.png?auto=compress,format"></p>
<h3>Fashion Content Platform - Finalists for SME Award</h3>
<p>In the age of vast amounts of fashion data, fashion blogs and social media influencers, predicting trends is a hard
problem.</p>
<p>Zalando is Europe’s largest online fashion platform. Wrong buying decisions can be expensive. Traditionally, fashion
purchasing and trendspotting was done with manual research; offline at shows but also online with blogs and social
media. Manual research is limited by the time it takes. Data at scale has never before influenced fashion decisions.</p>
<p>Zalando’s <a href="http://insights.zalando.com/">Fashion Insights Explorer</a> (FIE) and <a href="https://www.collabary.com">Collabary</a>
products are the outcomes of the work done in the Zalando Dublin Fashion Content Platform (FCP) team. These products
bring data-driven insights to marketers and purchasers.</p>
<p>These products represent just two views of the fashion landscape generated by the FCP backend. This backend can be
extended in the future to bring new tools to market, with minimum development effort.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/de90bc1f095310a04eb3e41ce56153034fb28967_screen-shot-2017-11-07-at-11.53.37.png?auto=compress,format"></p>
<p>The work done in the FCP team has lead directly to decisions made for putting together the Spring/Summer 2018 season at
Zalando. Fashion buyers and trend forecasters are using data-driven software to make decisions through the FIE. Fashion
brands are finding the right social media influencers for marketing campaigns through Collabary.</p>
<p>From Zalando Dublin, Hunter Kelly and Martina Naughton were finalists in the Small to Medium Enterprise award,
representing a team of over twenty data scientists and engineers.</p>
<p>For the SME award, judges were looking for a solution to a real business problem, using the latest in data science in
new ways.</p>
<p>To understand fashion in machine learning, some of the technical challenges taken on by the FCP team were:</p>
<ul>
<li><strong>Data models</strong> - built on structured data sets. Quality structured data is not easy to get in fashion. The FCP team
built these datasets with <a href="https://engineering.zalando.com/posts/2017/07/closing-the-data-quality-loop.html">crowdsourcing
platforms</a>.</li>
<li><strong>Unstructured data</strong> - online fashion data is stored in HTML blogs and social media posts. It’s messy. It’s not
standardized. Parsing this data is not trivial.</li>
<li><strong>Deep Learning</strong> - the data science behind the insights extraction uses the latest deep learning algorithms and
libraries; Named Entity Relationships, text classification and other machine learning methods.</li>
<li><strong>Scaling for production</strong> -
<a href="https://engineering.zalando.com/posts/2017/05/platform-engineering-and-third-generation-microservices-in-dublin.html">microservices</a>
and the latest in streaming technologies are used to parse several million documents a day.</li>
</ul>
<p>FCP engineers and data scientists are also key contributors to the community - speaking <a href="https://www.confluent.io/kafka-summit-sf17/real-time-document-rankings-kafka-streams/inv">at
conferences</a>,
<a href="https://www.meetup.com/Zalando-Tech-Events-Dublin/">meetups</a>, contributing <a href="https://zalando.github.io/">to open
source</a>, and blogging about their knowledge and experiences.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/949615443b8030bfe971c3164794f77c25e35797_screen-shot-2017-11-06-at-09.47.18.png?auto=compress,format"></p>
<h2>The Future of Data Science at Zalando</h2>
<p>The work hasn’t stopped here. The Dublin teams are building out their products with next generation data science and
engineering technologies.</p>
<p>Data is being used to solve new business cases, to understand and serve customers better. New business units are using
data throughout their work, from logistics to payments and beyond.</p>
<p>Want to join our award-winning data team? Check out our <a href="https://jobs.zalando.com/tech/">jobs page.</a> We're hiring.</p>Agile Fails2017-11-02T00:00:00+01:002017-11-02T00:00:00+01:00Samir Hannatag:engineering.zalando.com,2017-11-02:/posts/2017/11/agile-fails.html<p>"To be pressure-proof, a process needs to become second nature." We look at learning from and overcoming the issues that slow our teams down.</p><h3>Learning from and overcoming the issues that slow our teams down</h3>
<p>As Agile Coaches we meet a lot of teams, engineers, product people, leads and managers throughout our daily work. Not
all of them are agile experts and we understand that there are some misconceptions about “agile”. When we encounter a
misconception several times, we see it as a pattern that we can overcome.</p>
<p>We’ll take a closer look at some of the misconceptions surrounding “agile”, or what we refer to as #agilefails. You
might even recognize some of them from your own organization.</p>
<p><strong>1. #Agilefail: “Process is bypassed”
</strong>Teams often see their processes bypassed. For example, under pressure, stakeholders may directly push their needs on
the team instead of using the existing planning sessions. Or perhaps management asks for (push) reporting instead of
using a review meeting. In practice, this could be considered disrespect to processes.</p>
<p>To be pressure-proof, a process needs to become second nature. This can be achieved by reinforcing the social dynamics
between the team, the stakeholder and management. From our experience, it takes between six and nine iterations. This is
also known as the forming, storming, norming and performing phases (this is assuming the team goes through these phases
without disruption).</p>
<p>What can you do to build a routine?</p>
<ul>
<li>Build your process collaboratively with your stakeholders and management</li>
<li>Reinforce the connections within this group (team, stakeholders and management)</li>
<li>Protect your team from disruptive change (new hire, team split, complete new setup) during the six to nine iteration
period</li>
<li>Monitor and refine your process (using retrospectives)</li>
</ul>
<p>As a result your process will become a routine.</p>
<p><strong>2. #Agilefail: “No more meetings”
</strong>This is a sentence we hear a lot in coaching sessions with teams. “We want to work, we do not want meetings.”</p>
<p>Let’s use empirical data to make a decision here. Take a look at all of the Scrum meetings or activities involved in a
weekly sprint: Backlog Grooming (1h), Planning (4h), Demo (1h), Retro (1h) and Standup (1,5h). Altogether we end up with
around eight and a half hours spent, which equals around 20% of our work time.</p>
<p>We observed that if you do not spend time well-ordered and organized in these meetings, you will need to spend more time
in unorganized meetings and alignments. Without proper planning, you spend more time planning small bits and pieces
throughout the week. Manage and optimize your time to organize so that it does not exceed 20%. Structured meetings with
a clear purpose means less meeting time in the long run (not less meetings).</p>
<p><strong>3. #Agilefail: “More people = more outcome (team setup)”
</strong>Sometimes we hear the following about teams and their workflow: “Teams need to grow in order to get their work done
and speed up” or “Agile means constant change and therefore it is okay to change teams very often.”</p>
<p>It is a common misunderstanding that adding more people to a team will increase performance. If you look at team
dynamics ( <a href="https://en.wikipedia.org/wiki/Tuckman%27s_stages_of_group_development">e.g. the team phases defined by
Tuckman</a>), you see that every team goes through a
“natural” development. Every time you add a new member, team development basically starts back at zero. This means that
every newly added team member will slow your team down, bring it to a “storming” phase again.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9109dde6625cfdf7a8de435d6df419ee5c6740c7_screen-shot-2017-10-23-at-13.45.43.png?auto=compress,format"></p>
<p>Secondly, team size has a natural peak, where communication, alignment and number of members are in balance. If you
exceed the size of a team, alignment will take more time than the resources of the extra person will add.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8ba6433547deffdf17561310ab2be54bee186847_screen-shot-2017-10-23-at-13.47.02.png?auto=compress,format"></p>
<p>The <a href="https://less.works/less/structure/teams.html">LeSS Framework</a> gives us four indications for a well structured team:</p>
<p>1- Dedicated teams
2- Cross-functional teams
3- Co-located teams
4- Long-lived teams</p>
<p><strong>4. #Agilefail: “Output instead of outcome”
</strong>Some members of our department indicated that teams that end up becoming more and more efficient, but focus mainly on
themselves can be problematic. In the end, they are somewhat closed-mindedly steering “agile” to focus on their velocity
(“We get faster and faster”). This is a failure we have also seen a few times ourselves.</p>
<p>“Agile” focuses not on getting faster and faster as a team (output), but on delivering constant value to the customer
(outcome). There is no bigger waste than delivering valueless software, but in a very efficient way. “Agile” makes us
deliver what customers need and enables us to learn from their feedback. It is more than a tool to purely optimize the
workflow of your working items. It is vital for an efficient and satisfied team.</p>
<p>Interested in more around agility at Zalando? Get in touch with us at <a href="mailto:agility-coaching@zalando.de">agility-coaching@zalando.de</a> to share your
thoughts and feedback, or be part of an agile team by checking out our <a href="https://jobs.zalando.com/tech/">jobs page.</a></p>Reattaching Kafka EBS in AWS2017-10-26T00:00:00+02:002017-10-26T00:00:00+02:00Andrey Dyachkovtag:engineering.zalando.com,2017-10-26:/posts/2017/10/reattaching-kafka-ebs-in-aws.html<p>Stop worrying about losing your Apache Kafka broker without copying a large amount data.</p><p>At Zalando we’ve created <a href="https://github.com/zalando/nakadi">Nakadi</a>, a distributed event bus that implements a RESTful
API abstraction on top of Kafka-like queues. It helps to provide an available, durable, and fault tolerant
publish/subscribe messaging system for simple microservices communication.</p>
<p>A Kafka cluster is able to grow to a huge amount of data stored on the disks. Hosting Kafka requires support of instance
termination (on purpose or just because the “cloud provider” decided to terminate the instance), which in our case
introduces a node with no data: the rebalance of the whole cluster has to be accomplished in order to evenly distribute
the data among the nodes, taking hours of data copying. Here, we are going to talk about how we avoided rebalance after
node termination in a Kafka cluster hosted on AWS.</p>
<p>In the beginning, at least when I joined, our Kafka cluster had the following configuration:</p>
<ul>
<li>5 Kafka brokers: m3.medium 2TB gp2</li>
<li>Replication factor 3 and min insync replicas 2</li>
<li>3 Zookeeper nodes: m3.medium</li>
<li>Ingest 250GB per day</li>
</ul>
<p>Nowadays, the cluster is much bigger:</p>
<ul>
<li>15 Kafka brokers: m4.2xlarge 8TB st1</li>
<li>Replication factor 3 and min insync replicas 2</li>
<li>3 Zookeeper nodes: i3.large</li>
<li>Ingest 5TB per day and egress 30TB per day</li>
</ul>
<h3>Problem</h3>
<p>The above setup results in a number of problems that we are looking to solve, such as:</p>
<p><strong>Loss of instance</strong></p>
<p>AWS is able to terminate or restart your instance without notifying you in advance of the fact. Once it has happened,
the broker has lost its data, which means it has to copy the log from the other brokers. This is a pain point, because
it takes hours to accomplish.</p>
<p><strong>Changing instance type</strong></p>
<p>The load is growing and at some point in time, the decision is to upgrade the AWS instance type to sustain the load.
This could be a major issue in the sense of time as well as availability. It correlates with the first issue, but a
different scenario of losing broker data.</p>
<p><strong>Upgrading a Docker image</strong></p>
<p>Zalando has to follow certain compliance guidelines, which is provided by using services like Senza and Taupage. In
their turn, they have requirements themselves which is to have immutable Docker images that are not replaceable once the
instance is running. To overcome this, one has to relaunch the instance, hence coping a lot of data from other Kafka
brokers.</p>
<p><strong>Kafka cluster upgrade</strong></p>
<p>Upgrading your Kafka version (or maybe downgrading it) requires the building of a different image which holds new
parameters for downloading a Kafka version. This again requires the termination of the instance involving data copying.</p>
<p>When the cluster is quite small, it is pretty fast to rebalance it, which takes around 10-20 mins. However the bigger
the cluster, the longer it takes to rebalance. It has happened that a rebalance of our current cluster takes about 7
hours in the case that one broker is down.</p>
<h3>Solution</h3>
<p>Our Kafka brokers were already using attached EBS volumes, which is an additional volume, located somewhere in the AWS
Data Center. This is connected to the instance via network in order to have durability, availability and more disk space
available. The <a href="https://aws.amazon.com/ebs">AWS documentation</a> states: "Amazon Elastic Block Store (Amazon EBS) provides
persistent block storage volumes for use with Amazon EC2 instances in the AWS Cloud."</p>
<p>The only tiny issue here is that instance termination would bring the EBS volume down together with the instance,
introducing data loss for one of the brokers. The figure below shows how it was organized:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/17445b2ad643da351e47e7cf06b75bb2bb77f7b3_ebs-1.png?auto=compress,format"></p>
<p>The solution we found was to detach the EBS volume from the machine before terminating the instance and attach it to the
new running instance. There is one small detail here: it is better to terminate Kafka gracefully, in order to decrease
the startup time. In case you terminated Kafka in a “dirty” way without stopping, it would rebuild the log index from
the start, requiring a lot of time and depending on how much data is stored on the broker.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e59d16579ab0b77681fa7f61674146027f0c32a4_ebs-crash-now-1.png?auto=compress,format"></p>
<p>In the diagram above we see that the instance termination does not touch any EBS volume, because it was safely detached
from the instance.</p>
<p>Losing a broker without detaching EBS (terminating it together with the instance) introduces under-replicated partitions
on other brokers, which holds the same partitions. To recover from that state, the rebalance takes around around 6-7
hours. If during the rebalance other brokers go down, which have the same partitions, it will provoke offline partitions
and it will not be possible to publish to them anymore. It is better not to lose any other broker.</p>
<p>Reattachment of EBS volumes is possible to accomplish using the AWS Console by clicking buttons there, but to be honest
I have never done it myself. Our team went about automating it with Python scripts and the <a href="https://boto3.readthedocs.io/en/latest/">Boto 3
library</a> from AWS.</p>
<p>The scripts are able to do the following:</p>
<ul>
<li>Create a broker with attached EBS volume</li>
<li>Create a broker and attach an existing EBS volume</li>
<li>Terminate a broker, detaching the EBS volume beforehand</li>
<li>Upgrade a Docker image reusing the same EBS volume</li>
</ul>
<p><a href="https://github.com/zalando-nakadi/bubuku/tree/master/instance_control">Kafka instance control scripts</a> can be found in
our GitHub account, where the usage is described. Basically, these are one line commands which consume
<a href="https://github.com/zalando-nakadi/bubuku/blob/master/instance_control/bubuku-1.json">configuration</a> for the cluster in
order to not pass in the script parameters. Remember, we use Senza and Taupage, so the scripts are a bit Zalando
specific, but can be changed quite quickly with very little effort.</p>
<p>However, it’s important to note that the instance could have a Kernel panic or some hardware issues while running the
broker. AWS Auto Recovery helps to address this kind of issue. In simple terms, it is a feature of the EC2 instance to
be able to recover after network connectivity, hardware or software failure. What does recovery mean here? The instance
will be rebooted after failure to preserve a lot of parameters from an impaired instance, among that being EBS volume
attachments. This is exactly what we need!</p>
<p>In order to turn it on, just create CloudWatch Alarm for StatusCheckFailed_System and you are all set. The next time
the instance has a failure scenario it will be rebooted, preserving the attached EBS volume, which helps to avoid data
copying.</p>
<h3>Impact</h3>
<p>Our team no longer worries about losing a Kafka broker, as it can be recovered in a number of minutes without copying
data and wasting money on traffic. It only takes 2 hours to upgrade 15 nodes of a Kafka cluster and it just so happens
that it is 42x faster than our previous approach.</p>
<p>In the future, we plan to add this functionality directly to our <a href="https://github.com/zalando-nakadi/bubuku">Kafka
supervisor</a>, which will allow us to completely automate our Kafka cluster
upgrades and failure scenarios.</p>
<p>Have any feedback or questions? Find me on Twitter at <a href="https://twitter.com/a_dyachkov">@a_dyachkov</a>.</p>Zalando's Smart Product Platform2017-10-24T00:00:00+02:002017-10-24T00:00:00+02:00Alberto Alvareztag:engineering.zalando.com,2017-10-24:/posts/2017/10/zalando-smart-product-platform.html<p>What's the craic? We'll tell you: it's our SPP and it happens where fashion meets tech in our Dublin hub.</p><h3><strong>Fashion meets tech in our Dublin hub</strong></h3>
<p>At the Fashion Insights Centre in Dublin, one of the core tech products being developed is the Smart Product Platform
(SPP). The fashion products we sell are the fundamental building blocks of what we do as a business. How to manage and
represent these products and their associated data in today's competitive fashion marketplace is challenging. Fashion is
everywhere, at once global and local, something that helps us identify with others but also something deeply personal. A
thorough understanding of these products and their associated data is vital in delivering an engaging customer
experience.</p>
<h3>Fashion Data in My Language</h3>
<p>Shopping, as we know, can be a deeply frustrating experience. Options run in spectrums from formal to casual, head to
toe, expensive to affordable, and many more. So how can customers articulate what they want and see items or looks that
are relevant to them. In essence, how can people truly dress *well.</p>
<p>*Perhaps you are a dedicated follower of fashion and understand fashion in terms of the latest designers, trends and
styles that may be appearing this season on the catwalks of London, New York or Milan. Or maybe, while you like to keep
up with the latest trends, you don’t understand fashion in the language of the catwalk. You just know what you like. Or
perhaps you’re not even sure what you like at the moment; you’re just looking for something new and are seeking
inspiration.</p>
<p>It all starts with the product data, which is where SPP comes in. How we understand fashion, how we describe it, the
words we use, the language we speak, and the things that matter to us are all personal. Likewise, while we may all share
a common understanding of themes, the approach to finding the fashion we want to wear, whatever the motivation or
occasion may be, will be different from person to person. A few examples:</p>
<ul>
<li><strong>Physical characteristics</strong> - What category of product am I looking for: a dress, jeans, shoes? * What color is
this? I know what color I like but can I describe it? * What about the material? What is it made of? * Is brand
important to me?</li>
<li><strong>Fashion information</strong> - Am I shopping for a trend or style? * What occasion am I shopping for? * Do I care if it
is sustainably manufactured? * I saw celebrity ‘X’ wearing a fantastic dress. I wonder where I can get that?</li>
</ul>
<p>To connect with our customers we truly have to match our products to human thinking. We must understand customers’
stories, and encourage and enable a transformation from low confidence ‘please inspire me’ encounters to truly
materialized fashion confidence.</p>
<p>To match the individual with fashion, we need data. Article data. Fashion product data. Data that helps describe and
understand the fashion we sell.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4f2c385b8efc58679bbc0423285e77f202af83cf_screen-shot-2017-10-23-at-08.39.57.png?auto=compress,format"></p>
<h3>Smart Products</h3>
<p>Smart Product Platform (SPP) is Zalando’s new product platform which enables the collection, management and exposure of
fashion product data across the organization, at scale. SPP is a bespoke combination of Product Information Management
(PIM) and Master Data Management (MDM) systems engineered 100% in-house to satisfy Zalando’s current and future product
management requirements.</p>
<p>We intend to increase the amount of contextual product fashion data we store and do this at scale in order to help us,
and our customers understand our products better. Consumers of this data can enhance their business use cases by
deriving value and insights from this product data, as well as enriching it with new information. SPP provides a
foundational infrastructure to store, organize and deliver product information from multiple sources.</p>
<p>Our considerations during the initial product and technology phases were:</p>
<ul>
<li><strong>Scalability</strong> - We’re facing exponential increases in product data (both in terms of the number of products in our
assortment and in terms of the breadth of product data we store), so we need to ensure we can scale, both production
and consumption wise.</li>
<li><strong>Flexibility</strong> - We need an underlying hyper-flexible data model to ensure we can meet our customer needs quickly
when it comes to managing and modeling data to support new business use cases.</li>
<li><strong>Quality</strong> - Validate product data and ensuring it is complete, correct and consistent. This is done in conjunction
with our Fashion Librarians (data stewardships) curation guidelines.</li>
<li><strong>Accessibility</strong> - Ease of data onboarding, and data exposure and consumption are cornerstones of SPP.</li>
<li><strong>Next generation architecture</strong> - We built all the underlying infrastructure on a microservice architecture
connected via
<a href="https://engineering.zalando.com/posts/2017/10/event-first-development---moving-towards-kafka-pipeline-applications.html">event-driven</a>
streaming interfaces to enable distribution data to our consumers.</li>
</ul>
<h3>Product Platform</h3>
<p>SPP’s purpose is founded around its core value unit: the product. As an internal data platform, we connect multiple data
producers (supply side) with product consumers (demand side), looking for ways to maximize their offerings and product
value.</p>
<p>To meet the needs of our data suppliers (currently our wholesale organization to our partner brands and merchants), our
data consumers (our fashion stores and other consumer-facing applications) and most importantly our customers, we need
to ensure that our data ingestion and data update processes are effective and efficient.</p>
<p>As well as ensuring data supplies can onboard data easily, we need to ensure the same ease of access for all our data
consumers. With millions of products and thousands of attributes, we offer rich data sets for other (internal and
external) value added services such as product recommendations, personalization, advertising, logistics and so on.</p>
<p>More and more, it’s the product and associated data that is becoming a key source of differentiation.</p>
<h3>Discover, Understand, Decide</h3>
<p>While accurate and consistent metadata about physical attributes (What size is it? What color is it? What is it made
of?) are vital to helping us understand a product, it is the effective enriching or augmenting of our product data with
additional information over and above these traditional physical product attributes that truly provides business value:
more conversions, higher customer engagement and a better customer experience.</p>
<p>When we talk about enrichment, we mean attaching associated relevant content to products:</p>
<ul>
<li><strong>Information (tags, descriptions) -</strong> What trend or style is it part of? What occasions would you wear it for? What
celebrity/influencer is wearing it? Where is it being worn? Where is it popular? And so on.</li>
<li><strong>Media (images, video, AR/VR) -</strong> Images are at the heart of the shopping experience. Video is increasingly
important. Augmented and Virtual Reality are considerations also.</li>
<li><strong>Content -</strong> Links, editorial content, social media.</li>
</ul>
<p>This content could be implicit as is the case with, say, personalization, or explicit e.g. discovery.</p>
<p>Thanks to SPP, we can now effectively store and expose this additional fashion context. With increases in additional
fashion knowledge we can:</p>
<ul>
<li>Help customers <strong>find</strong> the products they are searching for</li>
<li>Improve our customers’ <strong>understanding</strong> of our products better</li>
<li>Allow our customers to make more <strong>informed</strong> purchase decisions</li>
</ul>
<p>Fashion is a journey and an experience, both inspirational and aspirational. A journey where data plays a key role.</p>
<h3>Data, Data, Data</h3>
<p>Considering the dynamic nature of the fashion world, the prominence of fast fashion and new avenues of discovery and
promotion (e.g. social media, fashion influencers) it is clear that both the breadth and depth of (consistent and
accurate) product data and fast time-to-market are key drivers for the success of any fashion product platform.</p>
<p>This additional data, or fashion context, lays the foundation to establish the product, the fashion article, as a new
source of competitive advantage. The more and more relevant data we add, the better our customers can understand our
products and ultimately make more informed buying decisions. We want to be there for our customers for every occasion,
present and future. In discovery and product understanding, and in leveraging data that makes dressing <em>well</em> more
achievable to everyone.</p>
<p>Got something to say about data? Come <a href="https://jobs.zalando.com/tech/">join us</a>. We're hiring.</p>All Systems Go2017-10-19T00:00:00+02:002017-10-19T00:00:00+02:00Humberto Coronatag:engineering.zalando.com,2017-10-19:/posts/2017/10/recsys-2017.html<p>Zalando flies the fashion flag at Recsys 2017. Research engineer, Humberto Corona reports back.</p><h3>Zalando Flies the Fashion Flag at RecSys 2017</h3>
<p>RecSys, the annual ACM Recommender Systems Conference held its 11th session this year in the gorgeous city of Como,
Italy. As part of our platform strategy, it’s vital that we fully engage with the wider tech community, and so we
brought a full team to soak up the great learnings and bring some of our own. With over 620 attendees, and a program of
46 scientific papers and 12 industry papers, the ACM Recommender Systems conference lived up to its reputation;
delivering solid content, great organization and an open and pleasant atmosphere, where we presented and discussed our
work with hundreds of peers.</p>
<p><em>Relive the experience in our video below.</em></p>
<p>In the last few years, some topics have started to get more and more attention within the community. This year,
<a href="http://www.ec.tuwien.ac.at/rectour2017/">tourism</a>, <a href="https://kidrec.github.io/">kids</a> and
<a href="http://healthrecsys.ur.de/">health</a> had their own workshops, and fashion was one of the most talked-about topics at the
conference. We were delighted to be part of the conversation and share our knowledge in one of the most exciting
emerging topics represented at the event.</p>
<p>Zalando Principal Research Engineer, Antonino Freno presented his
<a href="https://dl.acm.org/citation.cfm?doid=3109859.3109897">paper</a> showcasing our learnings and challenges when <a href="https://engineering.zalando.com/posts/2016/12/recommendations-galore-how-zalando-tech-makes-it-happen.html">creating
real world recommender
systems</a> for our
large-scale platform. The paper highlights the fact that only a small part of these challenges are of an algorithmic
nature. Instead, most technical problems usually arise from operational constraints, such as cost and complexity of
system maintenance.</p>
<p>During the <a href="https://lsrs2017.wordpress.com/">Large Scale Recommender Systems Workshop</a> (LSRS), with my experience as a
Research Engineer, I presented the architecture for understanding customer intent, which powers some of the personalized
elements that you see in the Zalando platform. At the
<a href="https://sites.google.com/edu.haifa.ac.il/tempreasoninginrs/">TempRec</a> workshop, <a href="https://research.zalando.com/welcome/team/sebastian-heinz/">Sebastian
Heinz</a> a Research Scientist from <a href="https://research.zalando.com/">Zalando
Research</a> presented <a href="https://arxiv.org/pdf/1708.07347v1.pdf">our findings</a> on using
LSTM-based models for fashion recommendations, drawn from sales data of 100,000 frequent Zalando shoppers.</p>
<p>Fashion was heavily represented this year in general with four talks in the main conference that looked at <a href="http://dl.acm.org/citation.cfm?id=3109926">deep
learning</a>, <a href="http://dl.acm.org/citation.cfm?id=3109891">size recommendation</a>,
<a href="http://dl.acm.org/citation.cfm?id=3109897">practical lessons from production systems</a>, and <a href="http://dl.acm.org/citation.cfm?id=3109929">outfit
recommendations</a>. The fashion recommendations community is growing and the
synergy between industry and academia is getting stronger. Being part of the burgeoning industry is deeply important to
us at Zalando, so we look forward to seeing what <a href="https://recsys.acm.org/recsys18/">RecSys 2018</a> has in store.</p>
<p>Want to know more about our Recommender Systems team? Check out their in depth article, <a href="https://engineering.zalando.com/posts/2016/06/feature-extraction-science-or-engineering.html">“Feature Extraction, Science or
Engineering?”</a>. Or to be
part of our team at Zalando Tech, have a look at our <a href="https://jobs.zalando.com/en/tech">jobs page</a> <em>For more about Humberto,
have a look at his <a href="https://corporate.zalando.com/en/newsroom/en/stories/what-difference-decade-makes-humberto-corona-data-scientist">story</a>.</em></p>A Plea For Small Pull Requests2017-10-17T00:00:00+02:002017-10-17T00:00:00+02:00Paulo Renato Campos de Siqueiratag:engineering.zalando.com,2017-10-17:/posts/2017/10/a-plea-for-small-pull-requests.html<p>Size matters when we're talking about Pull Requests, especially when it comes to best practices in software development.</p><p>Pull Requests (PRs) are the norm today when it comes to common software development practices in teams. It is the right
way to submit code changes so that your peers can check them out, add in their thoughts and help you create the best
code you can - i.e. PRs allow us to easily introduce code review to our development process and enable a great deal of
teamwork, while also decreasing the number of bugs our software contains.</p>
<p>There are several aspects we can talk about when it comes to Pull Requests and code review. In this post, I'm
specifically concerned with the <em>size</em> of PRs, although I'll briefly touch other points as well. Other dimensions you
could think about include having a good description of what is being done and why, and being sure that the Pull Request
only changes <em>one</em> thing and one thing only, i.e. it is independent and self-contained.</p>
<p>On a personal note, I think Pull Requests nowadays are so important that I even use them on projects where I work alone
so that I can have automated checks applied before deciding to merge into master. It allows me that extra opportunity to
catch errors before it is too late. In GitHub for example, this generates a nice visual summary of the checks performed.
And yes, you could do this straight into your branches, but using PRs is easier and more organized. You could for
example easily decline and close a PR, and document why you did it. <a href="https://github.com/jcranky/lojinha/pull/84">In this PR in one of my pet
projects</a> for example, you can see <a href="https://www.codacy.com/">Codacy</a>,
<a href="https://travis-ci.org/">Travis CI</a> and <a href="https://codecov.io/">CodeCov</a> checking my code before I merge it to master.</p>
<p>Having said the above, it is way too easy to get carried away when developing and you may end up adding several small
things at once - be it features or fixes, or simply some refactoring in the same PR - thus making it quite large and
hard to read. And don't get me wrong: crafting small, self contained and useful Pull Requests is not easy! Good
developers don't create big PRs because they are lazy:sometimes it is hard to see the value in going the extra mile to
break what has already become way too big.</p>
<p>Another aspect to consider here is related to <em>git commit</em> good practices in general. Having small Pull Requests will
also help to have small and focused individual commits, which is very valuable when maintaining code. Let’s illustrate
this point with an example that happened recently to me.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a380a517ecac72b2eb841fccf1e34016a04b8a64_selection_001.png?auto=compress,format"></p>
<h3><strong>Can I revert this?</strong></h3>
<p>I was investigating a bug, something that used to work well and that simply stopped working out of the blue. After some
time and investigation, I found that the relevant code was simply removed, and that we didn't notice it beforehand
because of yet another bug. Obvious solution: go through the <em>git history</em> and just <a href="https://git-scm.com/docs/git-revert">git
revert</a> the deleted code. Except that I couldn't find any commit related to it.</p>
<p>After further investigation I finally found the commit that removed the files - but it was a commit that also did
several other unrelated things. <em>git revert</em> was no longer an option, especially due to the rest of the code that had
been changed at this point, and I ended up having to manually add the files myself. The total time spent with this
became way more than it could have been.</p>
<h3><strong>Why are big Pull Requests a problem?</strong></h3>
<p>The first and most important thing to note here is our human capacity to hold knowledge in one's head. There is a limit
of how much information you can keep and correlate at once, while at the same time weighing all its consequences in the
rest of the system, or even for external / client systems. This will obviously be different for different people, but is
a problem at some level. And when working in a team, you have to lower this bar, to make sure everyone can work at the
same level.</p>
<p>When you are reviewing a Pull Request, you have to keep some things in mind, such as:</p>
<ul>
<li>What are the new components being created?</li>
<li>How do they interact with existing components?</li>
<li>Is there code being deleted? If so, should it really be deleted?</li>
<li>Are the new components really necessary? Perhaps you already have something in the current code base that solves the
problem? Or something that could be generalized and applied to both places?</li>
<li>Do you see new bugs being introduced?</li>
<li>Is the general design OK? Is it consistent with the rest of the project's design?</li>
</ul>
<p>There are quite a few things to check. How much of that can you keep in your mind while you are reviewing code? This
gets harder the more lines of code there are to be checked.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c6e17f98da26a11155d5b02b33727b1946b9ac79_5432.png?auto=compress,format"></p>
<p>So back to small PRs. While all of this has little to no impact for automated checks and builds, this can actually have
a huge impact when it comes to code review. With that in mind, let’s go through at least a few ideas you can use to
escape the type of situation where you don't really feel you want to break your PR into smaller pieces - but should
nonetheless. There is no black magic here, we will just use some nice git commands in a way that helps us achieve our
goal. Perhaps you will even discover a few things you didn't know before!</p>
<h3><strong>Sort your imports</strong></h3>
<p>I prefer sorting imports in alphabetical order, but the actual criteria doesn't matter, as long as the whole team uses
the same technique. This practice can be easily automated and avoids generating a diff when two developers add the same
import in different positions. It also completely eliminates duplicated imports generated by merges.</p>
<p>Sometimes this will also avoid conflicts where two developers remove or add unrelated imports in the same position and
git doesn't know what to do about them. Sorting imports makes them naturally mergeable.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a654a4447d2ee57c4dafacf30caf476e5c31d970_selection_002.png?auto=compress,format"></p>
<h3><strong>Avoid frequent formatting changes</strong></h3>
<p>This happens a lot, especially if you don't use code formatter tools like <a href="http://scalameta.org/scalafmt/">scalafmt</a> or
<a href="https://github.com/scala-ide/scalariform">scalariform</a> (or whatever is available for your language of choice).
Sometimes, you may see a blank line you don't like. Or you don't see a blank where you believe it should be. You simply
go on and delete or add it. This means yet another line change that goes into your PR.</p>
<p>This is not related only to PR sizes. This small change has a big chance of creating conflicts if you ever have to
update your PR before merging. Another developer might legitimately change a certain code point and you now have to very
carefully check if a change was only cosmetic and thus can be ignored, or if there was something real there to consider.
More than once I've seen features simply vanish because of this kind of thing.</p>
<p>If you really want to make some formatting changes, do so, but send it as a separate PR that can be merged as soon as
possible, and independent of any features. And consider automating this task as well.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4cccf69a53d3d168ff0d32fe76e7ea5ca7129b1d_img_20170906_164916.jpg?auto=compress,format"></p>
<h3><strong>Allow reviewers the time to review</strong></h3>
<p>This is a little meta, but important nonetheless: resist the urge to want your code merged right away. I suffer from
this myself from time to time, especially when we have some <em>very</em> small PRs. Still, the reviewers should be allowed
time to work. If you did a good job of making it small and self-contained, and added a good description to the PR body,
you will likely get some speedy feedback.</p>
<p>To better explain this it is worth quoting a teammate, who once said: “Sometimes it feels like we are asking for thumbs,
not for reviews.”</p>
<p>If you sense something like this is happening, you should stop. You are probably rushing the review process, which will
only result in some stress and badly reviewed code. My rule of thumb is to <em>not ask for a thumbs up</em>, quite literally.
Every time I catch myself doing so I stop and rephrase, asking for <em>a review</em> instead.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a120e633cf5099a9eec2ac1615b2230783713e5b_selection_0011.png?auto=compress,format"></p>
<h3><strong>Advanced and powerful: manipulating your sources with Git</strong></h3>
<p>Now for the more complex (and perhaps interesting) practices. What follows will require you to have at least an
intermediate understanding of <em>git</em>, and a prerequisite of not being afraid of <em>git rebase</em>. As a side note, I say this
because most of us are afraid of it (<em>git rebase</em>) when we first begin learning. This is only temporary though, until
you fully realize the power it gives you.</p>
<p>Lets now think of the following scenario. You are working on a feature, and suddenly notice that some kind of <em>side
change</em> is required. Something not strictly related to the feature itself, but would be of great help for your task. You
might then get the urge to simply go on and do it, together with your current feature code.</p>
<h3><strong>Side changes with Git Stash</strong></h3>
<p>See the problem already? If you simply do it, the PR for your feature will get bigger. It will also now contain one (or
more) extra concerns, meaning that the reviewers would have to verify this as well.</p>
<p>Instead, you should send this side change as a new PR. There are a few different ways to do this properly with <em>git</em>,
but the easiest is to use <em>git stash</em>. What this does is hide your current changes and let you work with a clean
workspace. Then you can switch to a new branch, implement the side changes and submit the PR request.</p>
<p>With that, your teammates can start reviewing these changes immediately while you are still working on the feature
itself. Moreover, they will also be able to leverage those changes in their own code - who said that these changes would
be useful only for you? And finally, it also gives your colleagues the opportunity to point out problems sooner rather
than later. Perhaps something is incompatible with someone else's work, or another developer had just started to make
the same kind of changes and now don't have to do anything. You can work together to achieve an even better result. Not
to mention that this should a small PR, so quite easy to review.</p>
<p>After the PR is sent, you can recover your work with <em>git stash pop</em>. When you move to a new branch, you can get your
changes back and start working. Now here there is yet another problem: how to deal with the fact that your side changes
are probably not merged yet?</p>
<p>First, the problem in principle is not that big. The side changes are in their own commit, and thus your main changes
are completely isolated. If at anytime you get feedback and have to update the PR you just sent, you can always stash
your current changes again. Again, see the <a href="https://git-scm.com/docs/git-stash">git stash documentation</a> for more
information on how this works.</p>
<p>Second, it might be that your PR with the side changes will simply be accepted as is and merged. In this case, it is
quite easy to get your feature branch up-to-date. A <em>git rebase master</em> (or whatever branch your teams merge to) should
do the trick. This is probably the easiest (and safest) variation of <em>git rebase</em> you can use. See the <a href="https://git-scm.com/docs/git-rebase">git rebase
documentation here</a>.</p>
<p>Finally, some pointers for the most complex case. You may find that you will have to fix many things on your side
changes PR. Also, at this point you may have already made a few commits towards the feature you are implementing. You
can use your imagination here and a nifty combination of git features to solve your problem. For example, you could try
the following steps:</p>
<ul>
<li>Wait for the side changes PR to be merged to master</li>
<li>Update your master: <em>git pull</em></li>
<li>Create a new branch, based on master: <em>git checkout -b my-new-branch</em></li>
<li>Go to your feature branch and carefully use <em>git log</em> to find which commits you used for the feature</li>
<li>Go back to the new branch</li>
<li>Use git <em>cherry-pick</em> to move the commit over that you found with <em>git log</em></li>
</ul>
<p>See the <a href="https://git-scm.com/docs/git-cherry-pick">git cherry-pick documentation here</a>. Notice that you can also
cherry-pick a series of commits, instead of one by one, if you prefer. This also allows you to use the commit you sent
as a new PR already, perhaps in a temporary branch where you add your feature code on top of that.</p>
<p>As you can see, git is a very powerful tool and offers you many ways to solve your problems.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f5c6b4a198de2281ab9a48c3aca5cc66f82f2753_selection_0021.png?auto=compress,format"></p>
<h3><strong>Splitting up code into multiple PRs</strong></h3>
<p>The next scenario is that moment when you’ve already gotten too excited with your code and couldn't stop, and ended up
with a huge pile of changes to throw at your peers' heads. In this case, it can be very easy to simply go and say
something like :</p>
<p>“Sorry for the big PR. I could split it into smaller pieces but it would take too long.”</p>
<p>Let's go through some ideas to avoid this scenario by applying a little effort and splitting up your work.</p>
<p>First off, if you have well-crafted, individual commits, those can be turned easily into PRs with <em>git cherry-pick</em>. You
can simply write down which commits you want to submit as new PRs, move to a new branch and bring those commits over
with <em>git cherry-pick</em>. You can combine this with <em>git stash</em> to make it easier to deal with uncommitted code, like
described above.</p>
<p>One small drawback is that sometimes your changes are dependent between each other and you might have to wait for the
first one to be merged before you can really send the second one. On the other hand, if the first commit is small,
chances are that it will be approved quickly, like we have already mentioned.</p>
<p>The whole process might not be too pleasant for you at first, but will definitely help the rest of the team. A small tip
that might sound obvious is to "pre-wire" your PRs: go to your peers and let them know that those PRs are coming and
what they are about. This will help them review your code faster.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5b88c0e70e00d1d688164c4b3ae65cf313964a05_selection_003.png?auto=compress,format"></p>
<h3><strong>A note about failure</strong></h3>
<p>It might all be beautiful on paper, but in reality this is not always possible. Even if you follow the tips presented
here, you may still end up with big PRs from time to time. The critical point is that, when this happens, it should:</p>
<ul>
<li>Be a conscious decision, not an accident;</li>
<li>Be as small as possible, i.e., you applied at least some of the tips above;</li>
<li>Be an exception, not the rule.</li>
</ul>
<p>Remember: this is all about teamwork. Some things might make you a little slower, especially until you get into the
right frame of mind, but it will make the whole team faster in the long run, and will also increase the chances of bugs
being caught during code review. A final plus is that knowledge-sharing will also be better, since there is less to
learn on each PR, and team members can ask more questions without being afraid of turning the review process into an
endless discussion.</p>
<p>If you have read everything up until this point, then perhaps you are interested in reading even more. Here are some
further interesting references around the subject:</p>
<ul>
<li><a href="https://chromium.googlesource.com/chromium/src/+/master/docs/cl_respect.md">Respectful Changes</a></li>
<li><a href="https://chromium.googlesource.com/chromium/src/+/master/docs/cr_respect.md">Respectful Code Reviews</a></li>
<li><a href="https://github.com/blog/1943-how-to-write-the-perfect-pull-request">How to write the perfect pull request</a></li>
<li><a href="https://www.ibm.com/developerworks/rational/library/11-proven-practices-for-peer-review/">11 proven practices for more effective, efficient peer code
review</a></li>
</ul>
<p>What do you think? Do you have other techniques that you think could help in creating small and effective PRs? Or do you
disagree that this is necessary? Let me know via Twitter at <a href="https://twitter.com/jcranky">@jcranky</a>.</p>Event First Development - Moving Towards Kafka Pipeline Applications2017-10-10T00:00:00+02:002017-10-10T00:00:00+02:00Conor Cliffordtag:engineering.zalando.com,2017-10-10:/posts/2017/10/event-first-development---moving-towards-kafka-pipeline-applications.html<p>Read how we went about building our primarily event driven system for better access to data for users of our Smart Product Platform.</p><h3>A Challenge</h3>
<p>Shortly after joining Zalando, I, along with a small number of other new colleagues (in a newly opened Dublin office),
was entrusted with the task of building an important part of the new Fashion Platform - in particular, the core services
around the Article data of Zalando. This task came with several interesting challenges, not least of which was ensuring
the new platform provided not just sufficient capacity/throughput for existing workloads, but also had capacity for
longer term growth - not just in terms of data volumes/throughput, but also with the number, and types, of users of that
data. The aim here was the democratization of data for all potential users on the new platform.</p>
<h3>Decision Time</h3>
<p>It was decided that this new platform would be primarily an <em>event driven</em> system - with data changes being streamed to
consumers. These consumers would subscribe, receive, and process the data appropriately for their own needs -
essentially inverting the flow of data, from the <em>traditional</em> “pull” based architectures, to a “push” based approach.
With this, we were looking to strongly prompt a wide adoption of a “ <a href="https://engineering.zalando.com/posts/2017/05/platform-engineering-and-third-generation-microservices-in-dublin.html">third generation
microservices</a>”
architecture.</p>
<p>In an event driven system it is important that the outbound events themselves have at least equal importance to the data
being managed by the system. The primary responsibility of the system is not just to manage the data, but also ensure a
fully correct, and efficient, outbound event stream, as it is this event stream that is the primary source of data for
the majority of clients of this system.</p>
<p>Starting with an <strong>API First</strong> approach, the event structure and definition were treated as much a part of the system’s
API as the more traditional HTTP API being designed. Beyond just the definition of the events (as part of the API), key
focus was placed on ensuring both correctness of the events (compared to any stored data, in addition to the sequence of
changes made to that data), as well as efficient publishing of the stream of events. This <strong>Event First</strong> approach meant
that any decisions around design or implementation were taken always with correctness, and efficiency, of the outbound
event stream in primary focus.</p>
<p>Initially, we built a quick prototype of the data services - primitive CRUD-type services, with synchronous HTTP APIs,
each interacting directly with a simple (dedicated) PostgreSQL database as the operational store for the data. Outbound
events were generated after completion of DB updates.</p>
<p>For this prototype, a very simple HTTP-based mockup of an event delivery system was used, while we decided on the real
eventing infrastructure that would be used.</p>
<p>Not only did this prototype allow us to quickly exercise the APIs (in particular the event definitions) as they were
being constructed, it also allowed us to quickly identify several shortfalls with this type of synchronous service
model, including:</p>
<ul>
<li>Dealing with multiple networked systems, especially around ensuring correct delivery of outbound events for every
completed data change</li>
<li>Ensuring concurrent modifications to the same data entities are correctly sequenced, guaranteeing correct outbound
event sequenced delivery</li>
<li>Effectively supporting a variety of data providing client types, including live low latency clients, through to high
volume bulk-type clients.</li>
</ul>
<h3>Throw away your prototypes</h3>
<p>With these limitations in mind, we worked at moving from this synchronous service approach to an <em>asynchronous</em>
approach, processing data using an <strong>Event Sourcing</strong> model. At the same time, we progressed in our selection of an
eventing platform, and were looking strongly at <strong>Apache Kafka</strong> - the combination of high throughput, guaranteed
ordering, <em>at least once</em> delivery semantics, strong partitioning, natural <em>backpressure handling</em>, and <em>log compaction</em>
capability were a winning combination for dealing with the outbound events.</p>
<p>With this selection of <strong>Kafka</strong> as the outbound event platform, it was also a natural selection for the inbound data
processing. Using Kafka for the inbound event source, the logic for processing the data became a relatively simple event
processing engine. Much of the feature set that was valuable for outbound event processing was equally as valuable for
the inbound processing:</p>
<ul>
<li><strong>High throughput allowing for fast data ingestion</strong> - HTTP submissions getting transformed to <em>inbound events</em>
published to an internal topic - even with high <strong>acknowledge</strong> settings for publishing these events, submission
times are generally in the order of single digit milliseconds per submitted event. By allowing clients to submit
data, with fast, guaranteed, <em>accepted</em> responses, clients can safely proceed through their workload promptly -
allowing for greater flow of information in general through the wider system.</li>
<li><strong>Guaranteed ordering</strong> - moving processing to event processing on a guaranteed ordered topic removed a lot of
complexity around <strong>concurrent modifications</strong>, as well as cross-entity validations, etc.</li>
<li><strong>At least once delivery -</strong> With any network-oriented service, modelling data changes to be <strong>idempotent</strong> is an
important best practice - it allows reprocessing the same request/event (in cases of retries, or in the case of <em>at
least once delivery</em>, repeated delivery.) Having this semantic in place for both the inbound event source, as well
as the outbound event topic, actually allowed the event processing logic to use coarse grained retries around
various activities (e.g. database manipulations, accessing remote validation engines, audit trail generations, and
of course, outbound event delivery.) Removing the need for complex transaction handling allowed for much simpler
logic, and as such, higher throughput in the nominal case.</li>
<li><strong>Natural Backpressure handling</strong> - with <strong>Kafka’s</strong> “pull” based semantics, clients process data at their own
rate - there is no complex feedback/throttling interactions required for clients to implement.</li>
<li><strong>Partitioning</strong> - using <strong>Kafka’s</strong> partitioning capabilities, the internal event source topics can be subdivided
logically - some careful thought to select an appropriate partitioning key was required for some data services
(especially those with interesting cross-entity validation requirements), but once partitioned, it allowed the
processing logic of the application to be <strong>scaled</strong> effectively horizontally, as each partition can be processed
without any involvement with any data in the other partitions.</li>
</ul>
<p>There were also several additional benefits to the use of <strong>Kafka</strong> for the event sources, including:</p>
<ul>
<li>As it was already a selected platform for the outbound events, there was no additional technology required for
<em>Event Source</em> processing - the one tool was more than sufficient for both tasks - immediately <strong>reducing</strong>
<strong>operational burden</strong> by avoiding different technologies for the two cases.</li>
<li>Using the same technology for <em>Event Source</em> processing as well as <em>Outbound Event Delivery</em> led to a <strong>highly
composable architecture</strong> - one application’s <em>Outbound event stream</em> became another application’s inbound <em>Event
Source</em>. In conjunction with judicious use of <strong>Kafka’s</strong> <strong>Log Compacted Topics</strong>, to act as a complete snapshot,
bringing in new applications “later” was not a problem.</li>
<li>By building a suite of <strong>asynchronous</strong> services and applications all around an event sourcing and delivery data
model, identifying bottlenecks in applications became much simpler - monitoring the <strong>Lag</strong> processing the <em>event
source</em> for any given application allows bottlenecks to be much clearer - allowing us to quickly direct efforts to
the hotspots without delay.</li>
<li>Coordinating event processing, retries, etc. - it was possible to minimise the interaction with underlying
operational databases to <strong>just the data being processed</strong> - no large transactional handling, no additional advisory
(or otherwise) locking, no secondary “messaging” queue tables, etc. This allowed much simpler optimisation of these
datastores for the key operational nature of the services in question.</li>
<li>Processing applications could be, and several already have been, refactored opaquely to process <strong>Batches</strong> of
events - allowing for many efficiencies that come with batch processing (e.g. bulk operations within databases,
reduced network costs, etc.) - this could be done naturally with <strong>Kafka</strong> as the client model directly supports
event batches. Adding batch processing in this way ensures all applications get the benefits of batch processing
without impacting client APIs (forcing clients to create batches), and also without loss of low latency under
“bursty” loads.</li>
<li>Separation of client data submissions from data processing allows for (temporary) disabling of the processing
engines without interrupting client data requests - this allows for a far less intrusive operational model for these
applications.</li>
<li>A <strong>coarse grained</strong> event sourcing model is much more amenable to a heterogeneous technology ecosystem - using “the
right tool for the job” - for example, PostgreSQL for operational datastores, Solr/ElasticSearch for
search/exploratory accesses, S3/DynamoDB for additional event archival/snapshotting, etc. - all primed from the
single eventing platform.</li>
</ul>
<h3>Today, and Moving Forward</h3>
<p>Today, we have a suite of close to a dozen loosely coupled event driven services and applications - all processing data
asynchronously, composed via event streams. These applications and services, built on a standard set of patterns are
readily operated, enhanced and further developed, by anyone in our larger, and still growing, team. As new requirements
and opportunities come up around these applications, and the very data itself, we have strong confidence and capability
in growing this system as appropriate.</p>
<p>If you find the topics in this post interesting, and would enjoy these types of challenges, <a href="https://jobs.zalando.com/jobs/823362-software-engineer-smart-product-platform/">come join
us</a> - we're hiring!</p>On the Road to Full Stack Responsibility2017-10-04T00:00:00+02:002017-10-04T00:00:00+02:00Team Alphatag:engineering.zalando.com,2017-10-04:/posts/2017/10/on-the-road-to-full-stack-responsibility.html<p>Learnings from a team's journey to making their product foolproof, regardless of team switches or roles.</p><p>Programming is hard, and being part of an engineering team is even harder. Depending on requirements, cross-functional
teams are not equally formed with frontend and backend engineers in most organizations. Also, they are neither stable
nor do people have an equal amount of experience. People come and go but software stays on, so we need to buckle up and
maintain it.</p>
<h3>Retrospective</h3>
<p>One year ago we started a new project within Zalando Category Management, which is the branch of Zalando that looks
after all of our fashionable apparel and accessories. We had to implement a new system to support the reorganization of
Zalando Buyers into new, more autonomous teams, to enable them to work more effectively.</p>
<p>When we developed a Minimal Viable Product (MVP), neither one of our backend developers could support or add new
features to our frontend. Due to project workload, our backend developers couldn’t collaborate with our frontend
developers, nor had any visibility regarding progress. Therefore, to address these concerns we decided to introduce
full-stack responsibility – and we failed! We failed because of several factors:</p>
<ul>
<li>The frontend stack was too sophisticated for the tasks we had to complete (Angular 2 Beta + Angular CLI + ngrx
store);</li>
<li>User stories were not feature-focused, but instead role-focused (separate backend and frontend stories);</li>
<li>It was hard to dive into frontend development on a daily basis.</li>
</ul>
<p>Once again, we face the issue that some frontend engineers switch teams or roles, but the original team is still
responsible for all the products that have been developed. We have since decided to become responsible end-to-end as a
team, independent from team members or engineering roles.</p>
<h3>What has changed since?</h3>
<p>We learned from our previous experience that we have to decide on the instrument we use as a team, as well as share
knowledge early and often. This is why we took a two-week sprint to evaluate two popular frameworks (Angular and React)
which allowed us to make an informed decision this time around.</p>
<p>We also challenged our Product Specialist to provide us with feature-oriented user stories so we can break them down
into smaller subtasks containing frontend and backend parts. It allows us to truly have full-stack user stories,
including both frontend and backend, which leads us to working together and sharing the knowledge. All in all, this
leads to a better product.</p>
<p>Finally, we introduced a “health check” in our sprint planning to track if we still work as one team. Every two weeks
during sprint planning we ask ourselves: “Are we still one team?” We check our backlog and ask if the whole team is
satisfied with the scope for the next sprint. Then, based on our criterias, we define the status of the health check and
see if any immediate action is needed or if we are progressing towards our goal. It reminds us of issues we have as a
team and keeps our commitment high in order to solve them.</p>
<h3>It’s getting personal</h3>
<p>When taking on the task of introducing end-to-end responsibility, we surveyed the whole team and looked for answers to a
specific question:</p>
<p><strong>What is the single most important thing YOU wish to take care of to make our full-stack initiative a success, and why
is it so important?</strong></p>
<p>Check out some of our answers below. Do you agree?</p>
<p><em>"That no one is afraid of changing code anywhere in our stack. Which also means we don't have single points of
failure."</em></p>
<p><em>"Having good documentation about 'Where to start?' and 'What architecture, tools?' are we using. I think most of the
time developers of one domain are just overwhelmed with where to start when you want to write code, do a bug fix or add
a small feature. For example, if you want to contribute to a Play-Scala project as a frontend developer you don't know
where to change things, what the structure of the project is, which things you have to keep in mind if you do an API
change etc. It is the same when you ask a Java backend developer to add a new component to an AngularJS application. I
think what could help the most is something that good open source projects are doing:</em></p>
<ul>
<li><em>Provide a great README as an overview to the project</em></li>
<li><em>Provide Checklists and Guidelines for Contributors to describe shortly what a user would need to do if he/she wants
to add a new component, a new API endpoint etc."</em></li>
</ul>
<p><em>"Understanding that cross-functional teams are equally responsible members for each part of their system. While there
might be only frontend expertise or backend expertise in the team, from the responsibility aspect it doesn't have any
impact. Decisions, discussions and changes should be discussed independently from the roles of a frontend developer or
backend developer. Increase the expertise of backend developers in the frontend and vice versa to make them more
impactful in discussions. They would feel more responsible and feel a stronger ownership if they could bring up valuable
arguments in the discussions. In collaboration with product, we should send at least one backend developer also to
product-related discussions to avoid knowledge silos."</em></p>
<p><em>“To make sure that people with different backgrounds actually work together and practice pair programming. In my
opinion, this is crucial to succeed and also to understand other ways of working.”</em></p>
<p>We’re just starting on this full stack journey. If you’re interested in how we progress, follow us to know more! The
official Zalando Tech Twitter account is <a href="https://twitter.com/ZalandoTech">here</a>.</p>A State-of-the-Art Method for Generating Photo-Realistic Textures in Real Time2017-09-27T00:00:00+02:002017-09-27T00:00:00+02:00Urs Bergmanntag:engineering.zalando.com,2017-09-27:/posts/2017/09/a-state-of-the-art-method-for-generating-photo-realistic-textures-in-real-time.html<p>Our Zalando Research team give an overview of the latest work in image generation using machine learning.</p><p>This blog post gives an overview of the latest work in image generation using machine learning at Zalando Research. In
particular, we show how we advanced the state-of-the art in the field by using deep neural networks to produce
photo-realistic high resolution texture images in real-time.</p>
<p>In the spirit of Zalando’s embrace of open source, we've published <a href="https://arxiv.org/abs/1611.08207">two</a>
<a href="http://proceedings.mlr.press/v70/bergmann17a.html">consecutive</a> (see <a href="https://arxiv.org/abs/1611.08207">[Jetchev et al.
2016]</a> and <a href="http://proceedings.mlr.press/v70/bergmann17a.html">[Bergmann et al.
2017]</a>) papers at world-class machine learning conferences, and the
source code ( <a href="https://github.com/zalandoresearch/spatial_gan">SGAN</a> and
<a href="https://github.com/zalandoresearch/psgan">PSGAN</a>) to reproduce the research is also available on GitHub.</p>
<h3>State-of-the-art in Machine Learning</h3>
<p>It’s all over town. Machine learning, and in particular deep learning, is the new black. And justifiably so: not only do
vast datasets and raw computational GPU power contribute to this fact, but also the influx of brilliant people
dedicating their time to the topic has accelerated the progress in the field.</p>
<h3>Computer Vision and Machine Learning</h3>
<p>Computer vision methods are very popular in Zalando’s research lab, where we constantly work on improving our
classification and recommendation methods. This type of business relevant research aims to discriminate articles
according to their visual properties, which is what most people expect from computer vision methods. However, the recent
deep learning revolution has made a great step towards generative models - models that can create novel content and
images.</p>
<h3>Generative Adversarial Networks</h3>
<p>The typical approach in machine learning is to formulate a so-called loss function, which basically quantifies a
distance of the output of a model to samples from a dataset. The model parameters can then be optimized on this dataset
by minimizing a loss function. For many datasets this is a viable approach - for image generation, however, it fails.
The problem is that nobody knows how to plausibly measure the distance of a generated image to a real one - standard
measures, which typically assume isotropic Gaussianity, do not correspond to our perception of image distances. However,
how do humans know how to perceive this distance? Could it be that the answer is in the image data itself?</p>
<p>In 2014, Ian Goodfellow published a brilliant idea [Goodfellow et al. 2014] which strongly indicates that it seems to
be in the data: he proposed to learn the loss function in addition to the model. But how can this be done?</p>
<p>The key inspiration comes from game theory. We have two different networks. First, a generative model (‘generator
network’) takes noise as input and should transform it into valid images. Second, a discriminator network is added,
which should learn the loss function. The two networks then enter a game in which they compete: the discriminator
network tries to tell if an image is from the generator network or a real image, while the generator tries to be as good
as possible in fooling the discriminator network into believing that it produces real images. Due to the competitive
nature of this setup, Ian called this approach <strong>Generative Adversarial Networks (GANs)</strong>.</p>
<p>Since 2014 a lot has happened, in particular GANs have been built with convolutional architectures, called DCGANs (Deep
Convolutional GANs) [Radford et al. 2015]. DCGANs are able to produce convincing and crisp images that resemble, but
are not contained, in the training set - i.e. to some degree you could say the network is creative, because it invents
new images. You could now argue that this is not too impressive, because it is ‘just’ in the style of the training set,
and hence not really ‘creative’. However, let us convince you that it is at least technically spectacular.</p>
<p>Consider that DCGANs learn a probability distribution over images, which are of extremely high dimensionality. As an
example, assume we want to fit a Gaussian distribution to match image statistics. The sufficient statistic (i.e. the
parameters that fully specify it) of a Gaussian distribution is the mean and covariance matrix, which for a color image
of (only) 64x64 pixels would mean that more than 75 million parameters have to be determined. To make this even worse,
it has been known for decades by now that Gaussian statistics are not even sufficient for images - 75 million parameters
are therefore only a lower bound. Hence, as typically less than 75 million images are used, it is from a statistical
perspective borderline crazy that DCGANs actually work at all.</p>
<h3>Texture synthesis methods</h3>
<p>Textures capture the look and feel of a surface, and they are very important in Computer Generated Imagery. The goal of
texture synthesis is to learn a generating process and sample textures with the "right" properties, corresponding to an
example texture image.</p>
<p>Classical methods include instance-based approaches [Efros et al. 2001] where parts of the example texture are copied
and recombined. Other methods define parametric statistics [Portilla et al. 2000] that capture the properties of the
“right” texture and create images by optimizing a loss function to minimize the distance between the example and the
generated images.</p>
<p>However, both of these methods have a big drawback: they are slow to generate images, taking as much as 10 minutes for a
single output texture of size 256x256. This is clearly too slow for many applications.</p>
<p>More recent work [Gatys et al. 2015] uses filters of pretrained deep neural networks to define powerful statistics of
texture properties. It yields textures of very high quality, but it comes with the disadvantage of high computational
cost to produce a novel texture, due to the optimization procedure involved (note that there has been work to short-cut
the optimization procedure more recently).</p>
<p>Besides the computational speed issue, there are a few other issues plaguing texture synthesis methods. One of them is
the failure to accurately reproduce textures with periodic patterns. Such textures are important both in nature -- e.g.
the scales of a fish -- and for human fashion design, e.g. the regular patterns of a knitted material. Another issue
needing improvement is the ability to handle multiple example textures and learning a texture process with properties
reflecting the diverse inputs. The methods mentioned above cannot flexibly model diverse texture classes in the process
they learn.</p>
<h3>Spatial Generative Adversarial Networks (SGANs)</h3>
<p>Our own research into generative models and textures allowed us to solve many of the challenges of the existing texture
synthesis methods and constitute a new state of the art for texture generation algorithms.</p>
<p>The key insight we had in Spatial Generative Adversarial Networks (SGANs) [Jetchev et al. 2016] is that in texture
images, the appearance is the same everywhere. Hence, a texture generation network needs to reproduce the data
statistics only locally, and, when we ignore alignment issues (see PSGAN below for how to fix this), can generate
far-away regions of an image independent of each other. But how can this idea be implemented in a GAN?</p>
<p>Recall that in GANs a randomly sampled input vector, e.g. from a uniform distribution, gets transformed into an image by
the generative network. We extend this concept to sampling a whole tensor Z, or spatial field of vectors. This tensor Z
is then transformed by a fully convolutional network to produce an image X. A fully convolutional network consists of
exclusively convolutional layers, i.e. layers in which neuronal weights are shared over spatial positions. The images
this network produces have therefore local statistics that are the same everywhere.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/472c699d2d6f0b0e9e0348449c49ba9c9745b102_figure1zresearch1.png?auto=compress,format"></p>
<p><em>Figure 1: The left column shows images used to train the SGAN algorithm. The algorithm analyses the structure in these
images and then can produce arbitrary large images with the same texture as in the input images. The results are shown
in the right column.</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/dd57cc0c1a133b5716d9d90f99843f8c64ac0994_figure2zresearch2.png?auto=compress,format"></p>
<p><em>Figure 2: In the left column several images are provided to train the SGAN algorithm. The output on the right side then
mixes properties of the input images in such a way that the output contains properties of all images smoothly
interpolated. While this is a nice property of SGANs, it is not in general desirable (PSGAN improves this by allowing
sampling of different textures). The top row example mixed 10000 different
<a href="http://www.robots.ox.ac.uk/~vgg/data/flowers/102/">flowers</a> for the final result, of which only 3 are shown top left.</em></p>
<p>In a standard, non-convolutional fully-connected network, the addition of new neurons at a layer implies the addition of
weights that connect to this neuron. In a convolutional layer, this is not the case, as the weights are shared across
positions. Hence, the spatial extent of a layer can be changed by simply changing the inputs to the layer. In a
fully-convolutional network, a model can be trained on a certain image size, but then rolled-out to a much larger size.
In fact, given the texture assumption above, i.e. that the generation process of the image at different locations is
independent (given a large enough distance), generation of images of arbitrary larger size is possible. The only
(theoretical) constraint is computation time. These resulting images locally resemble the texture of the original image
on which the model was trained, see <em>Figure 1</em>. This is a key point where the research goes beyond the literature, as
standard GANs are bound to a fixed size, and producing globally coherent, uncannily large images remains a standing
challenge.</p>
<p>Further, as the generator network is a fully-convolutional feed-forward network, and convolutions are efficiently
implemented on GPUs and in current deep learning libraries, image generation is very fast: generation of an image of
size 512x512 takes 0.019 seconds on an nVidia K80 GPU. This corresponds to 50 frames per second! As this is real-time
speed, we built a webcam demo, with which you can observe and manipulate texture generation – <a href="https://github.com/zalandoresearch/spatial_gan">see
here</a>.</p>
<h3>Periodic Spatial GANs (PSGANs)</h3>
<p>The second paper we wrote on the texture topic was published recently at the <a href="https://2017.icml.cc/Conferences/2017/Schedule?showEvent=664">International Conference of Machine
Learning</a>, where we also gave a talk about it. In the
paper, we improved two shortcomings of the original SGAN paper.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/cc412399475c0514b0593b28d9ee88d00b497c2d_figure3zresearch3.png?auto=compress,format"></p>
<p><em>Figure 3: Shown are 4 times 3 tiles of generated textures from a Periodic Spatial GAN - PSGAN. Within the tiles the
global dimensions of the generating tensor are set to be identical, yet random in each tile. Local dimensions are random
everywhere. The complete resulting tensor Z is then passed through the generator network, and in a single feed-forward
sweep yields the complete image.</em></p>
<p>The first shortcoming of SGANs is that they always sample from the same statistical process, which means that after
they’re trained, they always produce the same texture. When the network is trained on a single image, it produces
texture images that correspond to the texture that was in this image. However, if it was trained on a set of images, it
produces a texture image that mixes the original texture images in a single texture in the outputs, see <em>Figure 2</em>.
Often, though, we’d rather want a network to produce an output image, which resembles one of the training images - and
not all simultaneously. This means that the generation process needs to have some global information, that encodes which
texture to generate. We achieved this by setting a few dimensions of each vector in the spatial field of vectors Z to be
identical across all positions - instead of randomly sampling it as in the previous section. These dimensions are hence
globally identical, and we therefore call them global dimensions. <em>Figure 3</em> shows an image that resulted from a model
trained with this idea on many pictures of snake skins from the <a href="https://www.robots.ox.ac.uk/~vgg/data/dtd/">Describable Textures
Dataset</a>. <em>Figure 4</em> shows how the flower example looks when explicitly
learning the diverse flower image dataset, and that this is totally different behaviour than the example in <em>Figure 2</em>.
In addition to training on a set of texture images, the model can also be trained on clip-outs of one larger image,
which itself does not have to be an image of textures. Generating images with the model will then result in textures
that resemble the local appearance of the original image.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4b09aef41d3f4474808122a5027cd038bcdd9ede_figure4zresearch4.png?auto=compress,format"></p>
<p><em>Figure 4: 3x3 tiles from the Flower dataset, illustrating how PSGAN can automatically detect the various types of input
images it gets, and can learn a texture generating process that flexibly represents these distinct different texture
examples.</em></p>
<p>An interesting property of GANs is that a small change in the input vector Z results in a small change of the output
image. Moving between two points in Z-space hence morphs one image smoothly into another one [Radford et al. 2015]. In
our model, we can take this property one step further, because we have a spatial field of inputs: we can interpolate the
global dimension in Z in space. <em>Figure 5</em> shows that this produces smooth transitions between the learned textures in
one image.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/bae6ae029aeb2c7aa05f5a9c27bc5e663ed26f31_figure5zresearch5.png?auto=compress,format"></p>
<p><em>Figure 5: Illustration of the learned “manifold” property of the PSGAN. The system is trained on a whole set of DTD
snake skins. Simple interpolation in the global variables of the Z-tensor yields an output image that smoothly morphs
between textures (here 4 independent ones are sampled in the four corners of the image). Note that the texture looks
locally plausible everywhere.</em></p>
<p>Second, many textures contain long-range dependencies. In particular, in periodic textures the structure changes at
well-defined length scales - the periods - thus the generation process is not independent of other positions. However,
we can make it independent by handing information about the phase of our periodic structure to the local generation
processes. We did this by adding simple sinusoids of a given periodicity, so-called plane-waves (see <em>Figure 6</em>), to our
input Z. The wavenumbers that determine the periodicity of the sinusoids were learned as a function of the current
texture (using multi-layer perceptrons), i.e. as a function of the global dimensions. This allows the network to learn
different periodic structures for diverse textures. <em>Figure 7</em> shows generated images learned on two different input
images for various methods: text and a honeycomb texture. PSGAN is the only method which manages to generate images
without artifacts. Note in particular that the other two neural based methods (SGAN and Gatys’ method) scramble the
honeycomb pattern. Interestingly, our simulations indicate that the periodic dimensions also helped stabilize the
generation process of non-periodic patterns. Our interpretation of this observation is that it helps to anchor
generation processes to a coordinate system.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/210cb2da6d6bfe0a82338bb721a778ab1da47411_figure6zresearch6.png?auto=compress,format"></p>
<p><em>Figure 6: Transforming a coordinate system with sinusoidal functions with learned wavenumbers allows to flexibly learn
planar waves and represent periodical patterns accurately.</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/97ded2b8e69a64ea6792eb98cf1f16de7ab55337_figure7zresearch7.png?auto=compress,format"></p>
<p><em>Figure 7: This figure illustrates the superior ability of the PSGAN to handle periodic textures. Other methods either
fail completely, or have occasional artifacts (Efros et al.), while PSGAN produces globally coherent periodic textures.</em></p>
<h3>Discussion and Outlook</h3>
<p>As a wrap-up, in this blog post we have given an overview of how we extended current methods in generative image
modeling to allow for very fast creation of high-resolution texture images.</p>
<p>The method exceeds the state-of-the art in the following ways:</p>
<ul>
<li>Scalable, arbitrary large texture image generation</li>
<li>Learn texture image processes representing various texture image classes</li>
<li>Flexibly sample diverse textures and blend them in novel textures</li>
</ul>
<p>Please check out our <a href="https://research.zalando.com/welcome/mission/publications/">research papers</a> for more details, and
these videos as examples that show you how to animate textures:</p>
<p>So far, this is basic research with a strong academic focus. In the more long term perspective, one of several potential
products could be a virtual wardrobe, which could be used to asses how Zalando’s customers will look in a desired
article, e.g. a dress. Will it fit? How will I look in it? Solutions to these questions will very likely become a
reality in the future of e-commerce online shopping. We already have academic results that get closer to this use case
and a paper will be published soon in a workshop of “Computer Vision for Fashion” at the <a href="http://iccv2017.thecvf.com/">International Conference for
Computer Vision</a>.</p>
<p>Stay tuned!</p>
<h3>References</h3>
<p><strong>[Bergmann et al. 2017]</strong> Urs Bergmann and Nikolay Jetchev and Roland Vollgraf.</p>
<p>Learning Texture Manifolds with the Periodic Spatial GAN. Proceedings of The 34th International Conference on Machine
Learning, ICML 2017.</p>
<p><strong>[Efros et al. 2001]</strong> Alexei A. Efros and William T. Freeman. Image quilting for texture synthesis and transfer. In
Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH, 2001.</p>
<p><strong>[Gatys et al. 2015]</strong> Leon Gatys, Alexander Ecker, and Matthias Bethge. Texture synthesis using convolutional neural
networks. In Advances in Neural Information Processing Systems 28, 2015.</p>
<p><strong>[Goodfellow et al. 2014]</strong> Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing
Systems 27, 2014.</p>
<p><strong>[Jetchev et al. 2016]</strong> Nikolay Jetchev, Urs Bergmann and Roland Vollgraf. Texture Synthesis with Spatial Generative
Adversarial Networks. Adversarial Learning Workshop at NIPS 2016</p>
<p><strong>[Portilla et al. 2000]</strong> Javier Portilla and Eero P. Simoncelli. A parametric texture model based on joint
statistics of complex wavelet coefficients. Int. J. Comput. Vision, 40(1), October 2000.</p>
<p><strong>[Radford et al. 2015]</strong> Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with
deep convolutional generative adversarial networks. CoRR, abs/1511.06434, 2015.</p>Zalando Dublin Welcomes Their 100th Employee2017-09-26T00:00:00+02:002017-09-26T00:00:00+02:00Justin Lawlertag:engineering.zalando.com,2017-09-26:/posts/2017/09/zalando-dublin-welcomes-their-100th-employee.html<p>We've hit an important milestone at our Fashion Insights Centre in Dublin, Ireland!</p><p>In September 2017, Zalando Dublin welcomed their 100th employee, Joe Maguire.</p>
<p>Joe previously worked as an intern in summer 2016 at the Zalando Dublin offices, and comes from a background of
electrical engineering. Joe picked up coding in Python and also worked with APIs during his time as an intern.</p>
<p>Joe has good memories of his previous time at Zalando, developing a calendar app and giving everyone a run for their
money at pool. We sat down to chat about his first Zalando experiences, projects, and what brought him back to the Grand
Canal Quay offices.</p>
<p><strong>How did you get to apply to Zalando?</strong></p>
<p><strong>Joe:</strong> My brother works here, so it was a referral. I came in for a chat over coffee, which was great getting that.</p>
<p><strong>Do you remember your first day as an intern?</strong></p>
<p><strong>Joe:</strong> Everyone was very friendly. I met a load of new faces. I got my computer and desk, and Darrell, my team lead,
started showing me the ropes.</p>
<p><strong>What was the office like back then?</strong></p>
<p><strong>Joe:</strong> The Dublin office was a lot smaller then, only about 50 people working there at the time. And we were in the
old offices on the Grand Canal in Dublin, an old converted attic warehouse.</p>
<p>We moved to the new offices halfway through the summer to where we are now.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/684d8aa1c7bc5027c3063216d005f243106b3763_dublin_team_old_office.jpg?auto=compress,format"></p>
<p><strong>What was the first project you worked on?</strong></p>
<p><strong>Joe:</strong> I was putting together a calendar display app for meeting rooms. Python based, taking data from Google Calendar
and the meetup APIs, all running on a Raspberry Pi. So this was all new for me - Python, using APIs.</p>
<p>We never got it launched at the time because of WIFI problems with the Raspberry Pi. I hope to get it working maybe now.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f40bbbd73d7a2f9734addbb26ad108f8b80bc103_screenshotui_1.jpg?auto=compress,format"></p>
<p><strong>What surprised you most about the experience as an intern?</strong></p>
<p><strong>Joe:</strong> Just what a great experience it was. I learned so much, not having any development experience before. I was
working closely with my team lead, who gave me all the help I needed, building out the app.</p>
<p>I got a bit of help from the brother as well.</p>
<p><strong>What brought you back to Zalando?</strong></p>
<p><strong>Joe:</strong> As a first job, you wouldn’t get a much better experience than at Zalando. In other companies, I mightn't be
getting as much hands-on experience. In Zalando, you’re working on the latest technologies -
<a href="https://engineering.zalando.com/posts/2017/05/platform-engineering-and-third-generation-microservices-in-dublin.html">microservices</a>,
<a href="https://engineering.zalando.com/posts/2017/01/sapphire-deep-learning-upskilling.html">data science</a> and working on the core Zalando
platform, the Smart Product Platform.</p>
<p>And the people are great. I get on well with everyone. I knew about 60 people when I left a year ago, and most of them
are still here now.</p>
<p><strong>How’s it been starting back full time so far?</strong></p>
<p><strong>Joe:</strong> Good. I’m just getting set up so far. I haven’t dived into anything too deep yet. It’s different from last
year, of course, I’m working on Zalando products now, and understanding what all the different teams are doing. Last
year I didn’t get to work on the products.</p>
<p><strong>I hear you're not a bad pool player?</strong></p>
<p><strong>Joe:</strong> <em>(Laughing)</em> Ahem. No comment.</p>Zalando Fulfillment Solutions and our FAST Replenishment Algorithm2017-09-21T00:00:00+02:002017-09-21T00:00:00+02:00Jan Schulztag:engineering.zalando.com,2017-09-21:/posts/2017/09/zalando-fulfillment-solutions-and-our-fast-replenishment-algorithm.html<p>Better availability of products is regarded as extremely important, which is where Zalando Fulfillment Solutions comes in.</p><p>At Zalando, we are constantly looking into ways to widen our assortment, in depth and width. This is to make sure that
all fashion items are available anywhere and at anytime for our customers. Our Partner Program helps to bring this
vision to life. Through the Partner Program, brands and retailers can integrate their own e-commerce stock into the
Zalando Fashion Store and ship their products directly from their own warehouse to Zalando customers.</p>
<p>Following this, we not only want to offer the best and freshest assortment to customers, but a frictionless shopping
experience throughout the whole process – including delivery and returns. We are constantly improving our service
proposition and also want our partners to fulfill the high standards that our customers are used to – standards that
some partners often struggle with due to limited logistics capabilities for certain markets.</p>
<p>With Zalando Fulfillment Solutions (ZFS), we’re now able to help our partners in the Partner Program with these
challenges and offer up our logistics expertise, taking over all logistics processes from inbound to pick, pack,
shipping plus returns. But better availability of products is regarded as extremely important – not only for Zalando to
offer the best assortment, but also for our partners to further grow their business. With Zalando Fulfillment Solutions
we are able to provide our current and future brand partners with highly customized and reliable solutions, enabling
them to sell their merchandise through our platform and without having to worry about logistics concerns.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f42f21399494643c6bf952245e230dbad6fdab57_screen-shot-2017-06-27-at-13.30.39.png?auto=compress,format"></p>
<p>Zalando Fulfillment Solutions addresses different target groups - smaller brands and retailers, as well as bigger
partners, by using synergies and the one parcel principle: More than half of the orders of an item from our Partner
Program also contain an article from Zalando Wholesale. With all items from Partner Program and Wholesale in our Zalando
warehouse, we can simplify the process for all parties involved, meaning customers no longer receive two different
parcels, but one combined package, with shipping costs being shared with our partner. This is not only more efficient
but also more profitable overall. However, bigger partners still sell their products via different channels and prefer
full flexibility for their inventory. This introduces the idea of replenishment, with Zalando wanting to enable its
partners to replenish the right amount of fashion items to reduce:</p>
<ul>
<li>Lost sales; due to insufficient inventory</li>
<li>Inventory holding costs; due to too much inventory</li>
</ul>
<p>To deliver on this we have developed the FAST Replenishment Algorithm, which serves ZFS partners with recommendations on
what fashion items need to be replenished and in what quantity. In the following post, we address the challenges in the
proposition, key product features, and possible improvements for future iterations.</p>
<h3>Challenges and opportunities</h3>
<p>In short, we face two main challenges in the project: The forecasting of demand and the delivery of operational
excellence with our FAST supply.</p>
<p>Supply comes in two flavours: The ZFS partner’s replenishment and returns from customers. Both are by far not
deterministic with regards to:</p>
<ul>
<li>The quantities our partner actually replenishes: In some cases, partners can have insufficient inventory units to
follow the recommended quantity.</li>
<li>The lead-time between when the partner has received the replenishment recommendation and when their replenished
inventory units are available for sale.</li>
<li>The quantities and lead-time of customer return.</li>
</ul>
<p>Demand forecasting can be seen as even more challenging, for reasons such as:</p>
<ul>
<li>Fashion is seasonal, meaning a fashion article’s life cycle is short (< 180 days) and continues to get shorter
(with fast fashion having a 28 day cycle).</li>
<li>Demand steering with promotions (advertisements) while inventory management works on SKU level (named “article
sample”, size, or EAN).</li>
<li>Demand forecasting of fashion-type products is described as being a problem of high uncertainty, high volatility and
impulsive buying behavior. Several authors advise against forecast demand for these products, but instead build an
agile supply chain that can <a href="http://www.emeraldinsight.com/doi/abs/10.1108/09590550410546188">satisfy demand as soon as it
occurs</a>.</li>
</ul>
<p>Replenishment planning is always integer planing and thus presents another challenge. It’s impossible to replenish the
fraction of fashion item demand required for your intended days of coverage. Therefore it’s crucial to verify, for each
demand pattern, the impact of rounding up, rounding down, or proportionally rolling the dice.</p>
<h3>Key solution concepts</h3>
<p><strong>FAST replenishment</strong></p>
<p>A FAST supply chain gives us a powerful strategic advantage. FAST is a reference to the speed of replenishment, which
can be broken down into the following steps:</p>
<ol>
<li>Zalando calculates a replenishment recommendation</li>
<li>Our partner coordinates their inventory availability and replenishment shipping schedules</li>
<li>Zalando receives the replenishment</li>
</ol>
<p>A high replenishment process speed is equivalent to shorter replenishment lead-time, and therefore equivalent to the
lower inventory quantity levels needed to fulfill customer demand.</p>
<p>Currently, FAST is implemented as a weekly inventory review. Zalando, together with its ZFS partner BESTSELLER, is able
to execute replenishment with a one week cycle-time. Other ZFS partners aim to increase their cycle-time as well.</p>
<p>The key contributions here are clear wins for both sides. Lower out-of-stock notifications mean higher sales, while the
partnership yields lower inventory costs and thus higher margins.</p>
<p><strong>How agile product development helped the process</strong></p>
<p>Agile product development is a perfect fit for data-driven product development, especially when the product is a
replenishment algorithm.</p>
<p>In order to start quick and learn fast, our Logistics Algorithms team focused on continuous interactions between the
customer, our ZFS Partner, and Zalando, organised into weekly build, measure, and learn cycles.</p>
<p>The Logistics Algorithms team was able to successfully contribute real business value within one week by radically
focusing on the problem and reducing the scope in order to build an MVP.</p>
<p>This was done with a script that created a CSV file with the ZFS partner’s SKUs and a “recommended” replenishment
quantity. The minimal quality on “recommendation” leads to the question of how to assess the quality of any “ZFS Partner
Replenishment Algorithm” and therefore what to measure. The Logistics Algorithms team started with some standard
<a href="http://www.business2community.com/product-management/6-important-inventory-kpis-can-make-break-warehouse-01479733#oMDhwWEsXdcjjt1y.97">standard inventory control
KPIs</a>
as a basis for this.</p>
<p>In order to build quick, the Logistics Algorithms team used <a href="https://www.continuum.io/">Anaconda</a> as their data science
platform.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/91d34743ffcfffcd26e4ed76aab3dda66a955053_anacondaplatform.png?auto=compress,format"></p>
<p>From the open data science pillar, Python and Jupyter Notebooks were used to collaborate and share results, including
data science models and visualizations, as well as to reproduce results and govern the ZFS replenishment algorithm
product as a whole.</p>
<p>On the data front, the team used standard ODBC connectivity to extract, transform and load sales, on top of inventory
and article data from Zalando’s <a href="http://www.exasol.com/">EXASOL</a>. <a href="https://www.postgresql.org/">Postgres</a> is our
standard for data storage.</p>
<p><strong>Demand forecast</strong></p>
<p>Any type of replenishment is based on forecasting the demand of items. The quality of the demand forecast is defined as
the <a href="https://www.researchgate.net/profile/Robert_Fildes/publication/257026708_Measuring_Forecasting_Accuracy_The_Case_Of_Judgmental_Adjustments_To_Sku-Level_Demand_Forecasts/links/5632795608aefa44c36851ed.pdf">forecast
accuracy</a>,
which depends on the level of detail and the time horizon. Our FAST replenishment algorithm requires SKU-level demand
forecast for a time horizon of about one to two weeks. One great method to assess the demand forecast quality is
benchmarking your performance within the industry. <a href="https://ibf.org/">The Institute of Business Forecasting and
Planning</a> serves those benchmarks for the short-term, meaning they delivered a <a href="http://www.prevedere.com/forecasting-removing-blinders">one month
outlook</a> of:</p>
<ul>
<li>Aggregate forecasts that had an average error rate between 10.4% and 15%</li>
<li>SKU level forecasts had ranged between a staggering 27% to 37.7%.</li>
</ul>
<p>Forecast errors on high volumes can cause greater issues for a business than slower moving SKU. If the stock-out is
caused by low forecast accuracy on a fast mover, it makes a huge impact on sales volume and profitability. In the case
that low forecast accuracy has caused overstocking, it holds too much working capital on inventory and leads to extra
warehousing costs.</p>
<p><strong>Forecasting methods</strong></p>
<p>To forecast the demand, our Logistics Algorithms team applied standard quantitative methods such as a naive moving
average with several lookback times (7 days, 14 days, 28 days, 42 days), as well as simple <a href="http://www.bauer.uh.edu/gardner/Exp-Sm-1985.pdf">exponential
smoothing</a> based on historic sales data on an SKU-level. The demand
forecasts for new articles perform best on a higher aggregation level with article configuration, brand or category. The
team also applied the principle of combining severable reasonable forecasting methods which yielded more accuracy
overall.</p>
<h3>Key product features</h3>
<p>For ZFS partners, features are configurable and include sales channels, replenishment cycle-time, as well as inventory
cycle-time at the service level. We also provide automatic inventory detection for partners via their current inventory
on hand in order to detect stock-outs. Historic sales data is also taken into account.</p>
<p>Stock-up recommendations on the SKU level are based on demand pattern segmentation and best-in-class forecast methods
respectively when it comes to forecast accuracy.</p>
<h3>How do we further improve the service?</h3>
<p>To speed up the supply chain even more, our ZFS FAST Replenishment Algorithm must incorporate check-point events along
the supply chain. This could look like the following:</p>
<ol>
<li>When our ZFS partner acknowledges replenishments</li>
<li>When Zalando accepts replenishments inbound</li>
<li>When ZFS partners ship replenishments</li>
<li>When Zalando receives replenishments</li>
<li>When Zalando stores replenishments</li>
</ol>
<p>When the supply chain is controlled, Zalando and its ZFS partner are enabled to move from a weekly periodic review to a
continuous review while processing multiple replenishment cycles in parallel.</p>
<h3>Outlook</h3>
<p>The Zalando platform is an operating system for the fashion world, with multiple ways of integrating all sorts of
fashion contributors and stakeholders. Our logistics services enable the platform, and ZFS is merely one example of how
we cater to specific stakeholder needs. We see ZFS as supporting the growth of our Partner Program by meeting high
delivery standards and supporting one of our core values: To make the fashion experience as frictionless as possible.</p>
<p>Currently, Zalando supports ZFS from only one dedicated warehouse. In the future, ZFS will be rolled out to multiple
warehouses, which means the FAST Replenishment Algorithm must consider multi-warehouse allocation for ZFS inventory.</p>
<p>We expect an increase in the level of organisational and technology maturity as the next iteration of this service: From
manual execution and supervision (build, measure, learn) to an even more automated approach. In the end, we aim to
enable partners to further build up their business, becoming the go-to digital strategy for their growth. We see further
partners and further countries being added to increase scope and scale our solution.</p>IT-Compliance in the 21st Century2017-09-11T00:00:00+02:002017-09-11T00:00:00+02:00Nicolas Brauntag:engineering.zalando.com,2017-09-11:/posts/2017/09/it-compliance-in-the-21st-century.html<p>Read about how our world-class team manages the biggest challenge in Europe.</p><p>When combing through state-of-the-art articles about IT-Compliance Management, it is easy to see that its importance is
being highlighted more now than ever. With Zalando being a platform currently available in 15 different markets, we have
an array of interesting and exciting challenges to face in terms of regulations, best practices, and standards. Finding
creative solutions to these varied tasks is a great driver of innovation in the IT-Compliance field.</p>
<p>At Zalando, our challenges regarding the technology landscape, regulations, and the people required to make the magic
happen all elevate IT-Compliance to exciting heights.Need an example? In order to ensure compliant software development,
we implemented an <a href="https://zappr.opensource.zalan.do">open source agent</a> that enforces guidelines for GitHub
repositories. It automatically checks pull requests before they are merged. This kind of work is what makes
IT-Compliance all the more impressive.</p>
<h3>The Challenge</h3>
<p>So what’s so challenging about dealing with IT-Compliance in the 21st century? First, there is the <strong>IT landscape</strong>
itself. Second, there are <strong>regulations</strong>. Third, we must always include the <strong>people</strong> involved. And last but not
least, there is the <strong>company</strong> that wishes to remain competitive in the market. Finding the sweet spot of a
well-balanced alignment between these factors is the key to success.</p>
<p><strong>IT Landscape</strong></p>
<p>Modern IT systems are as complex as they are diverse. This is related to the emergence and usage of numberless
technologies and programming languages as well. Rapid, continuous enhancements of existing technologies make the
landscape profoundly volatile. Along with this comes the “modern engineering mindset”: agile, curious, experiment-happy,
and willing to take risks. Both aspects cross-fertilize each other and strengthen the use of ever new technologies. On
top of that, Zalando grants engineers a high degree of <a href="https://engineering.zalando.com/posts/2016/08/radical-agility-study-notes.html">development
freedom</a>. The logical consequence is a regularly
changing way of developing software and bringing it to production, which also fuels rapid change.</p>
<p><strong>Regulations</strong></p>
<p>Regulations can be vague and technology is changing rapidly, as noted above. This means that quite often, regulations
can’t keep the pace. From a technical perspective, part of the dilemma now becomes the question: how to adequately
address vague regulations?</p>
<p><strong>People</strong></p>
<p>Rational people understand the need for being compliant. However, they face natural business-driven constraints such as
time pressure and delivery stress. Under these circumstances, engineers tend to avoid undesired overhead. One of the
most frequently asked questions is: “why do I have to do that?”. Understanding and clarifying the “why” (in both
directions) is an indispensable prerequisite. Afterwards, addressing the constraints (e.g. offering frictionless
compliance tooling) while deliberately sharpening an engineer's mindset and raising awareness is the most challenging
mission.</p>
<p><strong>Company</strong></p>
<p>Zalando is a multi-billion dollar business with the fastest growing technology engineering group in Europe. In fact,
it’s one of the fastest growing European companies with a transition period from startup to IPO in 6 years. How do we
find the healthy balance between investment and return-of-investment? How do you even measure IT-Compliance costs
anyway? How can you guarantee IT-Compliance in a company of this size and scope?</p>
<h3>Managing IT-Compliance in the 21st Century</h3>
<p>IT-Compliance of the modern age has to cope with all the challenges listed above and more. It’s as simple as this:
nobody knows how to achieve “100% IT-Compliance”. However, certainty needs to be brought into a sea of uncertainty.
Assessment procedures of yearly IT-Audits are also less transparent. In order to adequately address these aforementioned
challenges, we identified two building blocks: “<strong>Strategic Focus</strong>” and “<strong>Division of Powers</strong>”.</p>
<p><strong>Strategic Focus</strong></p>
<p>Strategic Focus ensures that the unit stays on their game in terms of objectives and strategy. All teams are involved in
setting and evolving the vision, goals, and progress of our work. Focus topics are identified as change management, data
classification, and access management. Having defined the “<em>what to do?</em>” we then define the “<em>how to get there?</em>” via a
maturity model and by mapping each focus topic to it. The model consists of several maturity levels that can be thought
of as well-defined evolutionary plateaus towards achieving service excellence. In the end, the “<em>when to reach
maturity?</em>” is stated by putting a concrete timeline on top of each focus topic in accordance with its current maturity
level.</p>
<p><strong>Division of Powers</strong></p>
<p>Theory (legislative power) and practice (executive power) are merged into an undividable unit, which serves the inside
and outside - neither arrogant nor dictating and with a clear guideline of consolidated, unified communication.
Important in the overall concept is that the executive power - although acting as an internal supervisory committee -
is neither appearing or being perceived as judiciary. The latter is entering the “game” early enough in the form of
audit companies or the internal revision department.</p>
<p>Instead, we strive for closely involving employees in all matters of compliance and taking their concerns seriously.
Feedback is our most valuable asset, highly appreciated and always taken into consideration. Another important piece of
the puzzle is the support of both legislative and executive powers via close collaboration with a dedicated engineering
team.</p>
<p><strong>Legislative Power: ITC Foundation</strong></p>
<p>This unit deals with <em>Scoping and Narrowing</em> of IT-Compliance requirements. Risk-based rules are identified along the
focus topics and communicated to the relevant engineering units. A close collaboration with our stakeholders is
essential. Main credo: not against them - with them! This credo is also reflected in the provisioning of exciting,
innovative IT-Compliance trainings and bootcamps, around topics such as resolving violations, or understanding our Rules
of Play in quiz-like or gamified formats. Moreover, individual consultancy services and support channels complete the
task area.</p>
<p><strong>Executive Authority: IT Internal Controls</strong></p>
<p>This unit implements <em>Measuring and Monitoring</em> solutions. Main credo: uncover violations before the auditors find them!
For this purpose, control measures are defined and executed along the focus topics. A reasonable reporting of results to
stakeholders is a critical endeavor. This likely results in a professional execution of escalation management (shared
activity with ITC Foundation).</p>
<p><strong>Computerized Support: ITC Engineering</strong></p>
<p>A third technical unit fulfills <em>Remediating and Automating</em> tasks. Dedicated tooling supports legislative and executive
powers as well as customers in their daily work. The primary goal here is to realize the highest possible automation of
manual processes. Monitoring activities are supported by implementing a reliable visualization of violations (e.g. in
form of IT-Compliance dashboards). Tooling is evaluated in aspects of compliant usage and - if applicable - integrated
into a “Compliance Radar” (analog to <a href="https://zalando.github.io/tech-radar/">Zalando’s Tech Radar</a>). In addition, the
unit takes over the important task of supporting all stakeholders in understanding the complex IT landscape and the
offered tooling itself.</p>
<h3>Conclusion</h3>
<p>As you can see, there is a lot involved in the area of IT-Compliance and a lot of factors to consider. When analysing
what contributes to these various factors, finding smart solutions to meet regulations and standards in a large,
versatile Tech environment - like you find at Zalando - is actually one of the biggest challenges in Europe.</p>
<p><a href="https://jobs.zalando.com/jobs/784435-senior-itcompliance-manager/">Where to learn, grow and succeed better than here?</a></p>InnerSource Do’s and Don’ts out of Dortmund2017-08-30T00:00:00+02:002017-08-30T00:00:00+02:00Martin Schwitallatag:engineering.zalando.com,2017-08-30:/posts/2017/08/innersource-dos-and-donts-out-of-dortmund.html<p>Communication is the key to success when it comes to InnerSource across teams, offices, and locations</p><p>At Zalando, we have many teams working on their respective systems and domains where they operate like small startups.
They take responsibility for these systems and have a feeling of ownership, constantly engaging with other teams to
solve bigger challenges and enable new features.</p>
<p>Sometimes during these alignments and feature requests, priorities collide and we can’t support others. In these
situations, we aim for an InnerSource approach to address the required changes. This means that much like open source
projects, the requesting team must implement the necessary changes in other components and the owner of these components
has to review and approve the changes. This way, the impact on the owning team is minimal and the requesting team is not
blocked.</p>
<p>Team Sokoban in Dortmund is responsible for Inventory Management at Zalando and we’ve had the opportunity to do a lot of
InnerSourcing over the last months. We wanted to share some of our experiences and best practices, which can mostly be
applied to usual open source InnerSourcing methods or sometimes even to day-to-day development.</p>
<h3>Do respond quickly</h3>
<p>If someone creates an issue, a pull request, or just has a question while implementing a feature, a quick response is
always valuable. It can get very frustrating if you have to ask multiple times to get an answer or if you feel that you
are being ignored. Sometimes all that is needed is to say that you will address their concerns after your meetings or
just explain that you are currently occupied by other work.</p>
<h3>Do define contribution and project setup guidelines</h3>
<p>Every project should have guidelines for setup and how to contribute to it. It prevents unnecessary rounds of questions
and also eases frustration while opening a pull request just to see that you still have to fix your commit messages,
check style, or any other changes that might be required for a merge.</p>
<h3>Do align on the implementation design early and raise concerns</h3>
<p>Depending on the scope of your change it is a good idea to align on your implementation idea. Ask yourself what could be
the implications for the application, do you need new dependencies, will you maybe need to change a certain process?
Raise the concerns you have early on to align on the design for both sides, so that there are no negative surprises for
either team during the pull request or after merging. If you don’t align, you risk changing your implementation after
the pull request multiple times.</p>
<h3>Do small pull requests</h3>
<p>I think this is self explanatory and should always be the way to go. Small pull requests make it much easier for the
reviewer to analyze the impact of the change.</p>
<h3>Don’t hold others hostage during pull requests</h3>
<p>Although you own the code and application and are the one who needs to approve of the changes, you should not misuse
this power during a pull request and take the requester as hostage. If the change is based on an own preference, you
should ask yourself if this change is really needed or if the current state of the project is fine as is. In the end,
you always have the power to change the code afterwards. This also includes nitpicking which should be avoided, because
it leads to frustration on both sides.</p>
<h3>Don’t do team discussions on a pull request</h3>
<p>Sometimes it may happen that your code is reviewed by multiple people from one team with different opinions, which leads
to big discussions and confusion. Keep them off the pull request and align in person to solve these situations.</p>
<p>As a counter measure we came up with, we invented a so-called “InnerSource Buddy”. When one team wants to InnerSource a
change into one of our components, we assign someone from our team as the InnerSource Buddy, that will act as the go-to
person for the other team. It’s their responsibility to answer their questions, deducting some time for reviewing and so
on. Whenever the Buddy speaks it should be on behalf of the team, not merely a personal opinion. With this system, we
are able to reduce confusion and frustration during the InnerSource process and our experience shows that both sides
were happy with the end result.</p>
<p>It is always important to appreciate one another. In the end, one team is enabling you to continue your work and the
other team is doing work for you, so as not to negatively impact your own priorities. Therefore, communication is the
key to success when it comes to InnerSource across teams, offices, and locations.</p>Spaghetti and Marshmallows at Zalando: An Exercise to Inspire Deep Learning2017-08-24T00:00:00+02:002017-08-24T00:00:00+02:00Massimiliano Fattorussotag:engineering.zalando.com,2017-08-24:/posts/2017/08/spaghetti-and-marshmallows-at-zalando.html<p>Encourage teams to experience simple yet profound lessons in collaboration, innovation, and creativity.</p><p>Some months ago I had the opportunity, with two fellow Zalandos, to organize the “Dortmund 5PM”; a gathering across all
Dortmund teams, scheduled once a month on Fridays in our local event space. We want to foster further cross-team
collaboration between individuals, making these meetings a memorable experience for all.</p>
<p>We opted for running <a href="https://www.tomwujec.com/design-projects/marshmallow-challenge/">The Marshmallow Challenge</a>; a
funny design exercise that encourages teams to experience simple yet profound lessons in collaboration, innovation, and
creativity.</p>
<p>The challenge is really simple: Teams of four to five people have to build the tallest free-standing structure out of 20
sticks of spaghetti, 1 yard of tape, 1 yard of string, and 1 marshmallow (which must stand on the top) in 20 minutes.
Easy, right?!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/44bc40d4b18681e30a500b081ce7ef473095495c_imag0756.jpg?auto=compress,format"></p>
<p>Around 50 team members joined the challenge and the results were quite varied. According to <a href="https://www.tomwujec.com/design-projects/marshmallow-challenge/">Tom
Wujec</a>, creator of the exercise, there are some
patterns behind how teams perform, which he explained further during his <a href="https://www.ted.com/talks/tom_wujec_build_a_tower">TED
Talk</a>. See below for the takeaways we noted based on running the
challenge in Dortmund.</p>
<p><strong>Education doesn’t matter</strong>. Engineering is a mindset, a mental attitude or inclination to solve problems with what is
provided to us.</p>
<p><strong>Prototyping matters</strong>. Successful teams are the ones that naturally start with the marshmallow and add the sticks to
it, rather than executing the plan with almost no time to fix the design once they place the marshmallow on top.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/187b5764a3c9fb7eca2d019cdb17ce5a6d880ac4_img_0439.jpg?auto=compress,format"></p>
<p><strong>Hidden assumptions are everywhere</strong>. The marshmallow is nothing more than a metaphor for hidden assumptions. Each
project or product has its own marshmallow: real customer needs, the cost of the product, service performance, or
dependencies across teams. Prove them early and often – that’s the mechanism that leads to effective innovation.</p>
<p><strong>Visual metaphors are a powerful instrument</strong>. Months after running this experiment, it’s quite impressive how some
colleagues are still pointing out a “marshmallow” when they refer to hidden assumptions.</p>
<p>For all Producers and Project Managers out there, I recommend you try out such a design exercise with your team or your
company – you can have a lot of fun while engaging in deep learning activities and improving perspectives to your
workflow.</p>Data For All: An Introduction to Product Analytics at Zalando2017-08-16T00:00:00+02:002017-08-16T00:00:00+02:00Christoph Luetke Schelhowetag:engineering.zalando.com,2017-08-16:/posts/2017/08/data-for-all-an-introduction-to-product-analytics-at-zalando.html<p>Embedding a true data and experimentation culture at Zalando.</p><p>As Zalando continues taking steps towards becoming a fully-fledged platform, we want to move fast, validate the ways
that our big strategic moves pay off, and capture the full value of our products by continuous optimization. To this
end, we wanted to ensure that we’re bringing data-informed decision making to the forefront of our processes by
establishing a true data and experimentation culture that could ultimately become a competitive advantage in today’s
fast-changing world.</p>
<p>Zalando has always been a data-driven company and analytics has been one of our key success factors. We believe that
much of the success (or failure) of a product rides on data, and on how it is used. This brought about the following
question: How can we elevate Zalando to the next level of data-informed decision making? This is how the Product
Analytics department came to life.</p>
<h3><strong>Purpose</strong></h3>
<p>The purpose of the Product Analytics team is to embed a true data and experimentation culture at Zalando to empower
smart decision making.</p>
<p>What do we mean by true data and experimentation culture?</p>
<ul>
<li>Our Business Units are aligned around key metrics that are rooted in our most important business priorities. Success
is defined by a set of well-proven metrics which individual teams own and contribute to.</li>
<li>Every team can access the data they need from various data sources and with high data quality. Setting up tracking
is easy as well as assessing the data quality. Understanding user behavior based on A/B tests is quick and teams are
always running multiple experiments at the same time.</li>
<li>Every team can draw the right insights from their data. Teams have the ability and skills to learn from and make
decisions informed by data. Advanced analytics helps them discover problems and opportunities, plus focus on the
right developments.</li>
<li>Decision making is not influenced by compromises, personal biases or egos, but only insights.</li>
</ul>
<h3><strong>How can we get there?</strong></h3>
<p>To make data-informed decision making an easy and effective routine, and establish a data and experimentation culture,
we focus on 1.) building a self-service infrastructure for experimentation, tracking, analytics 2.) ensuring common data
governance, and 3.) enabling and educating all teams throughout Zalando.</p>
<ul>
<li><strong>Self-service infrastructure for tracking, experimentation, and analytics:</strong> Data analysis and experimentation
should be fast and easy. Only true self-service tools are truly scalable given the size of our organization today.</li>
<li><strong>Common data governance:</strong> With nearly 200 teams producing and consuming data events, there’s a growing need to
ensure event tracking completeness and correctness and to allow for the easy compatibility of data.</li>
<li><strong>Enablement and education:</strong> As we want to move fast, all teams must be enabled and empowered in data informed
product development; e.g. from building a rationale around new features up to iterative testing and optimization at
the end of the product lifecycle. We expect a certain data and experimentation affinity from everybody and want to
embed a data-driven culture everywhere. In order to get there, we want to guide teams and help them be more rigorous
by embedding an expert analyst role into teams.</li>
</ul>
<h3><strong>Department structure and competencies</strong></h3>
<p>The Product Analytics department was created as a hybrid organization of central teams and team-embedded product
analysts. The central teams provide world-class tools and knowledge in the domains of Economics, Tracking,
Experimentation, Journey Modelling, and Digital & Process Analytics. Product Analysts would also be embedded into teams
varying from our Fashion Store, Data, and Logistics areas to focus on insight-driven product development. They play an
instrumental part in all steps of the product lifecycle (“discover - define - design - deliver”) and can support
insights-based decision making by performing the following tasks:</p>
<ul>
<li><strong>Understand user and customer behavior</strong>: Develop in-depth analytical understanding for what drives growth for the
product and how it can be improved, thus inspiring product work.</li>
<li><strong>Measure and monitor product progress:</strong> Analysts help to define target KPIs for the team and ensure that Product
Specialists and and Product Owners develop ownership of them. At the same time, they facilitate access to the key
target KPIs and other relevant data. They establish methods to monitor short-term progress and long-term product
health. When KPIs change, embedded analysts explore the underlying reasons and are able to provide context for these
changes.</li>
<li><strong>Prove if product ideas work</strong>: In the context of value creation, especially for new features, embedded analysts
play an essential role by gathering and formulating analytical evidence that supports all phases in the product
lifecycle, from discovery to rollout. Data must justify why we do what we do.</li>
<li><strong>Drive product optimization</strong>: From a value capturing point of view, embedded analysts drive optimization
iterations for existing features until they reach a local maximum.</li>
<li><strong>Ensure data quality</strong>: Product Analysts create awareness about data quality within the teams where they are
embedded. They have the responsibility of defining the specifications of the data to be generated by their teams,
monitoring its quality and making sure the team addresses any quality-related issues they are responsible for.</li>
<li><strong>Improve data literacy:</strong> Analysts drive the data mindset in their teams, educate and guide in terms of analytical
methodology – they are enablers for any data leading to product decisions.</li>
</ul>
<h3><strong>What the future holds</strong></h3>
<p>Ultimately, we want the magic of data-informed product development to happen in every team, guided by team-embedded
Product Analysts and empowered by central teams with best in class self-service tools and methodologies. By adopting
processes that ensure data-informed decision making is taking place, our teams can build better products and iterate
faster than ever.</p>
<p>Opinions are great to start a discussion, but we win on insights from user behavior. We prove strong hypotheses with
relentless and granular attention to data and KPIs driving our decisions. We believe in high frequency experimentation
and iterations to create the best possible experience for customers and all other players in the ecosystem.</p>
<p>It’s our vision that every product decision – be it the discovery or rollout of a new product; be it on the
customer-facing, brand, core platform or intermediary side – is backed by analytical insights and rigorous impact
testing. Thereby, we’re building a solid foundation for the next big learning curve in analytics: Artificial
Intelligence and Machine Learning. We’ll be revealing more about our plans and learnings in upcoming articles.</p>
<p>Interested in Product Analytics possibilities at Zalando? <a href="https://jobs.zalando.com/jobs/679847-digital-data-analyst/">We’re
hiring.</a></p>How Tech Candidate Feedback Helped Improve our Candidate Net Promoter Score2017-08-14T00:00:00+02:002017-08-14T00:00:00+02:00Dr. Magdalena Masluk-Mellertag:engineering.zalando.com,2017-08-14:/posts/2017/08/how-tech-candidate-feedback-helped-improve-our-candidate-net-promoter-score.html<p>Read about the combination of actions that led to an incredible improvement of our cNPS.</p><p>Almost two years ago, the Talent Acquisition Team (Technology) at Zalando began their work on Candidate Experience (CX)
research. We were determined to learn our tech candidates’ needs and expectations, and identify the factors that
influence their experience the most. This research allows us to identify how we can deliver a superb and meaningful
candidate experience and thereby continue to strengthen our employer brand, securing a steady flow of strong candidates.</p>
<h3><strong>The Net Promoter Score</strong></h3>
<p>To quantify the candidate experience, we chose the Net Promoter Score (NPS) method. This method has been used for years
by our colleagues in Marketing to measure the level of Zalando’s customer experience.</p>
<p>In the context of recruiting, we speak about the Candidate Net Promoter Score (cNPS). The candidate experience survey
question we ask then is: “How likely are you to recommend your friends and acquaintances to apply at Zalando?”
Respondents check their answer on a scale from 0 to 10, where 0 is „I strongly do not recommend” and 10 is „I strongly
recommend”.</p>
<p>Those who checked answers between 9 or 10 are referred to as “Promoters”, those who checked 7 or 8 as “Passives”, and
those who checked between 0 and 6 as “Detractors”. An NPS score is calculated by subtracting the percentage of the
Detractors from the percentage of Promoters.</p>
<h3><strong>cNPS Score</strong></h3>
<p>We were quite satisfied with our first ever cNPS score. Candidates enjoyed the personal contact with our recruiters as
well as their friendliness and professionalism.</p>
<p>For areas such as the interview process, interviewer interaction, and process speed, we received some critique. We
embarked on the journey to action some of the improvement suggestions that were given to us. Here are a few of the
initiatives that we’ve put into place.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9cadea84af0b0ea6fe68abdf6caec70b6bb74121_improvement-suggestions.jpg?auto=compress,format"></p>
<h3><strong>Interview process</strong></h3>
<p>First of all, we launched an online Coding Challenge as part of our recruitment process for Software Engineering roles.
This way we could ensure a fair and consistent evaluation of the coding skills of candidates.</p>
<p>Second, we increased the number of tech interviewers, clustering them in the interviewer groups and thereby enabling a
more efficient scheduling of interviews. This also allowed for a number of possible interviewer replacements. This
resulted in fewer interview cancellations.</p>
<p>Third, we helped tech candidates better understand what we will be evaluating in interviews and help them prepare. We
sent them a link to an <a href="https://tech.zalando.com/blog/how-to-prepare-for-your-zalando-tech-interview/">inspiring
article</a> by Eric Bowman, our VP
Engineering, while candidates for leadership positions received Leadership Interview Guidelines.</p>
<h3><strong>Interviewer interaction</strong></h3>
<p>In order to improve interviewer interaction, we introduced interviewer training consisting of in-class courses and
interview shadowing. Throughout this training, we discuss the impact of interviewers on the candidate experience, e.g.
the importance of being on time, of asking relevant questions, and creating an environment enabling candidates to
present themselves best.</p>
<h3><strong>Process speed</strong></h3>
<p>To tackle the issue of process speed, we streamlined the coordination of the interviews and encouraged recruiters to
have weekly alignment meetings with Hiring Managers to discuss each candidate and their recruitment progress. This way,
most of our candidates could receive timely feedback and status updates.</p>
<h3><strong>Outlook</strong></h3>
<p>Over time, the combination of our actions has led to an incredible improvement of the cNPS. Comparing Year on Year, the
cNPS score improved by 19 points between end of HY1 2016 and HY1 2017.</p>
<p>Our goal for HY2 is to continue increasing the NPS score by listening to our tech candidates. They are the best source
of ideas that are worth implementing.</p>
<p>Once you, as our candidate, receive the invitation to complete our candidate survey, please do not hesitate to provide
us with your feedback. We value all responses, from satisfied to less satisfied tech candidates.</p>
<p>If you have any questions about our survey design and our further experiences with conducting CX research, feel free to
get in contact via email at <a href="mailto:magdalena.masluk-meller@zalando.de">magdalena.masluk-meller@zalando.de</a>, I would love to hear from you.</p>Community Group Hug: Techspert Loves Open Source2017-08-02T00:00:00+02:002017-08-02T00:00:00+02:00Natali Vlatkotag:engineering.zalando.com,2017-08-02:/posts/2017/08/techspert-loves-open-source.html<p>Touching on issues of diversity, non-code contributions, and bigger market players in open source.</p><p>Zalando’s foray into open source many years ago has yielded <a href="https://engineering.zalando.com/posts/2017/03/an-open-source-pulse-check-at-zalando-for-2017.html">some amazing
projects</a> and unearthed a community
that is proud of its open achievements. Contributing to open source and giving back to the community is crucial, and big
players like Zalando can do a lot more to cultivate sustainability in the community as a whole.</p>
<p>Inviting other influential aficionados to discuss all things open source was a no-brainer for the latest Zalando
Techspert Panel, which saw the likes of Jan Lehnardt, VP at <a href="https://twitter.com/CouchDB">Apache CouchDB</a> and co-creator
of <a href="https://twitter.com/hoodiehq">Hoodie</a>, and <a href="https://twitter.com/claus__m">Claus Matzinger</a>, Developer Relations at
<a href="https://crate.io/">Crate.io</a>, join Zalando’s Open Source Evangelist <a href="https://twitter.com/LauritaApplez">Lauri Apple</a>
along with moderator <a href="https://twitter.com/therealpadams">Dr. Paul Adams</a>, one of our newly minted Engineering Leads in
Search and Personalization.</p>
<p>We sat down with this awesome foursome to get a taste of a panel that touched on issues of diversity, contributions
outside of code, and bigger market players in the world of open source.</p>
<p><strong>Zalando: What kind of support do you think is healthy and valuable for future open source development? For example,
being part of a foundation, using Patreon, or dedicating money and resources to the cause?</strong></p>
<p><strong>Lauri:</strong> I think of this a bit organically, as types of work-related support that can come from anywhere:
documentation edits, code review, keeping the general OSS infrastructure alive. It's a lot to ask from the general
population to support all of that. Projects that are critical to companies keeping things running sometimes have one or
two maintainers tops, meaning more resources need to be thrown behind those projects. So, while I know that foundations
can help, it's not the only means of support that can be offered.</p>
<p><strong>Claus:</strong> The types of support offered should not only be limited to money – you want to be increasing adoption. This
could be anything from someone doing a podcast about an open source project, or a YouTube channel that aims at
increasing adoption and spreading the word. This would help a lot of projects attract more users with a more diverse
skill-set; with more users you attract a higher level of contributors. In the end, we don't know what contribution can
lead to before someone jumps on board, thus adoption is key.</p>
<p><strong>Lauri:</strong> Say you're able to get documentation support – that can actually help you increase users because someone on a
project team can write and tell the world why they should use the project. UX and design assistance are other forms of
help that can transform a project. Expanding the diversity of people that are considered open source contributors and
keeping that in mind is important here.</p>
<p><strong>Jan:</strong> I want to riff on what both Claus and Lauri have said; some of the projects that the world relies upon these
days that are open source comes under public infrastructure, and we should start looking at funding these things. This
isn't by way of Patreon, or collecting money individually, or in Foundations, but taxpayer money. The support for
individuals doing open source and getting more diverse contributors involved should also count on having people that
know how to deal with people: For example, a good coder may not be good at certain people problems that come up when you
work with a diverse group. Having resources that allow you to handle such issues or that teach you to step away and let
others deal with these matters is a valuable lesson.</p>
<p><strong>Paul:</strong> I think for me the crucial thing is that communities need to have an almost business-like understanding of
their needs (even though I hesitate to use the term). Money can be helpful if you've ascertained that your project’s
needs are financial. Having people involved and getting people to do things that are useful for the community is more
valuable 99% of the time over someone handing over cash. That is how we build sustainability.</p>
<p>However, there will always be elements where cold hard cash is just required, for example, when you want to organize a
conference. So I think the vital thing for communities looking to have to a stable, self-sustained life is really
focusing on that 99% – what are the opportunities within the community and how do we build a diverse community around
those needs to ensure that it's all self-sustained? Sure, things like Kickstarter and Patreon have a positive affect,
but ultimately the community as a group of people must work out what its needs are and appropriately recruit into those
needs.</p>
<p><strong>Jan:</strong> To add to that, this panel group spoke previously about the ideas of open source being useful for other
communities as well. It would be nice if we figured out what made projects sustainable and then other communities that
are not about code could use the same ideas, that way the tooling that is used amongst open source projects (that are
themselves open source) can be utilized in other ways. There could be a larger societal movement there around doing
things more sustainably outside of corporations or any other traditional forms of getting projects off the ground.</p>
<p><strong>Zalando: Getting a bit personal now, what is something about your past open source life that you'd like to change or
do differently?</strong></p>
<p><strong>Paul:</strong> EVERYTHING. My open source life began very technically and morphed into more of what the industry would call
"Engineering Management", but at a community level. This includes time spent on my Ph.D, time spent as an undergraduate,
and well... are there things I would do differently? Yeah. But for me the important thing is what open source enabled
for me, which is this whole career path – it <em>was</em> my career – and I've never had a job without some kind of open source
angle to it. And I think that's OK. So, when you ask a question like "Are there things you would do differently?" then
the answer is yes, and it doesn't matter. Those things I did incorrectly, I learned from them, and I can transfer those
learnings into my day-to-day work and into other communities where they allowed me to perform better. All of those
failings I've had I'm proud of. I continue to screw things up every day and I'm proud of those as well.</p>
<p><strong>Jan:</strong> This is a fascinating question. The thing that comes to mind is: What stays with you when you do open source?
It’s the relationships with the people that you’ve had. So if I had known, I would focus less on being annoyed at stuff,
in discussions, at technology, versus making sure that I'm good with the people I want to work with. I would also be
quicker at recognizing the people that I <em>don't</em> want to work with.</p>
<p><strong>Claus:</strong> I became involved in open source as a user mostly, and only recently started contributing. To that end, I
think if I could do it again I would start contributing earlier – start early and fail a bunch of times trying to
contribute and get the hang of things.</p>
<p><strong>Lauri:</strong> Well, I wouldn't have thrown away the paintings that I made of
<a href="https://en.wikipedia.org/wiki/Richard_Stallman">RMS</a> and <a href="https://en.wikipedia.org/wiki/Eric_S._Raymond">Eric S.
Raymond</a> before I moved to Berlin! <em>(laughs)</em> I'll do it again, I'll just
paint it again. It'll be better. <em>(high five)</em>.</p>
<p>The second thing is that I feel in the past I was rather oblivious to this whole world; I went to law school, and I
ended up doing Intellectual Property as my focus. I don’t remember us discussing open source in class, so this world
was a bit foreign to me. At around the time that I started painting RMS and Eric S., I was working with these really
brilliant guys that joined Mozilla as their UI team, and I regret that I didn't dig deeper into this world. I think at
the time I just didn't know how I would contribute, except with paintings <em>(more laughter)</em>.</p>
<hr>
<p>Interested in what the next Zalando Techspert Panel will discuss? Keep your eyes peeled on the <a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/">Zalando Tech Meetup
Page</a> or our Twitter account
<a href="https://twitter.com/ZalandoTech">@ZalandoTech</a> for all the details.</p>The Purpose of JWT: Stateless Authentication2017-07-26T00:00:00+02:002017-07-26T00:00:00+02:00Jan Brennenstuhltag:engineering.zalando.com,2017-07-26:/posts/2017/07/the-purpose-of-jwt-stateless-authentication.html<p>The true purpose of JSON Web Tokens and a comparison of two approaches in authentication.</p><p><strong>JSON Web Tokens, or just JWTs (pron. [ˈdʒɒts]),</strong> are the new fancy kids around the block when it comes to
transporting proofs of identity within an untrusted environment like the Web. In this article, I will describe the true
purpose of JWTs. I will compare classical, stateful authentication with modern, stateless authentication. And I will
explain why it is important to understand the fundamental difference of both approaches.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9b61b2b68ab55fe9baf00eb0f92a9536ad431ae8_jsonwebtoken.png?auto=compress,format"></p>
<p>While there are many good articles available that describe specific aspects, <a href="https://dev.to/neilmadden/7-best-practices-for-json-web-tokens">best
practises</a>, or single use-cases of JWTs, the bigger
picture is often missing. The actual problem that JWT specs try to solve is just not part of most discussions. With JWTs
gaining in popularity however, that missing knowledge of the fundamental ideas of JSON Web Token leads to serious
questions like:</p>
<ul>
<li><a href="https://stackoverflow.com/questions/21871029/logout-invalidate-a-jwt">How to invalidate a JWT</a>,</li>
<li><a href="https://stackoverflow.com/questions/26739167/jwt-json-web-token-automatic-prolongation-of-expiration">How to prolongate a JWTs expiration
date</a> or</li>
<li><a href="https://stackoverflow.com/questions/41865108/why-should-i-use-jwt-not-simple-hashed-token">Why should I use JWT, not a simple hashed
token</a>.</li>
</ul>
<p>This article is not about symptoms, but the purpose of JWT which actually is: <strong>Getting rid of stateful
authentication!</strong></p>
<h3>Stateful Authentication</h3>
<p>In the old days of the Web, authentication was a pure stateful affair. With a centralized overlord entity being
responsible for tokens, the world was fairly simple:</p>
<ul>
<li>Tokens are issued and stored in a single service for future checking and revocation,</li>
<li>Clients and resource servers know a single point of truth for token verification and information gathering.</li>
</ul>
<p>This worked rather well in a world of integrated systems (some might call them legacy app, mothership or simply
<a href="https://tech.zalando.com/blog/from-jimmy-to-microservices-rebuilding-zalandos-fashion-store/">Jimmy</a>), when servers
rendered frontends and dependencies existed on e.g. package-level and not between independently deployed applications.</p>
<p>In a world where applications are composed by a flock of autonomous microservices however, this stateful authentication
approach comes with a couple of serious drawbacks:</p>
<ul>
<li>Basically no service can operate without having a synchronous dependency towards the central token store,</li>
<li>The token overlord becomes an infrastructural bottleneck and single point of failure.</li>
</ul>
<p>Eventually, both facts oppose the fundamental ideas of <a href="https://www.jbspeakr.cc/microservice-evolution/">microservice
architectures</a>. Stateful authentication introduces not just another
dependency for all your single-purpose services (network latency!) but also makes them heavily rely on it. Without the
token overlord being available (even for just a couple of seconds) everything is doomed. This is why a different
approach is required: <strong>Stateless Authentication!</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a631896314d16cc909e5663d57a5ee07b7ed6370_stateful-vs-stateless.png?auto=compress,format"></p>
<h3>Stateless Authentication</h3>
<p>Stateless authentication describes a system/process that enables its components to decentrally verify and introspect
tokens. This ability to delegate token verification allows us to (partly) get rid of the direct coupling to a central
token overlord and in that way enables state transfer for authentication. Having worked in stateless authentication
environments for several years, the benefits in my eyes are clearly:</p>
<ul>
<li>Less latency through local, decentralized token verification,</li>
<li>Custom authorization fallbacks due to local token interpretation,</li>
<li>Increased resilience by removed network overhead.</li>
</ul>
<p>Also, stateless authentication is able to absolve from the need to keep track of issued tokens, and for that reason
removes state (and hence reduces storage) dependencies from your system.</p>
<p>The antiquated, heavy-weighted token overlord converges <a href="https://auth0.com/blog/stateless-auth-for-stateful-minds/">to yet another
microservice</a> being mainly responsible for issuing tokens.
All of this comes in handy, especially when your world mainly consists of single-page applications or mobile clients and
services that primarily communicate using RESTful APIs.</p>
<p>“Using a JWT as a bearer for authorization, you can statelessly verify if the user is authenticated by simply checking
if the expiration in the payload hasn’t expired and if the signature is valid.” — <a href="http://jonatan.nilsson.is/stateless-tokens-with-jwt/">Jonatan
Nilsson</a></p>
<p>One popular way to achieve stateless authentication is defined in <a href="https://tools.ietf.org/html/rfc7523">RFC 7523</a> and
leverages the OAuth 2.0 Authorization Framework ( <a href="https://tools.ietf.org/html/rfc6749">RFC 6749</a>) by combining it with
server-signed JSON Web Tokens ( <a href="https://tools.ietf.org/html/rfc7519">RFC 7519</a>, <a href="https://tools.ietf.org/html/rfc7515">RFC
7515</a>). Instead of storing the token-to-principal relationship in a stateful
manner, signed JWTs allow decentralized clients to securely store and validate access tokens without calling a central
system for every request.</p>
<p>With tokens not being opaque but locally introspectable, clients could also retrieve addition information (if present)
about the corresponding identity directly from the token without the need of calling another remote API.</p>
<h3>Stateful vs. Stateless</h3>
<p>Nowadays in a Web that is mainly characterized by a wide-spread transition from monolithic legacy apps to decoupled
microservices, a centralized token overlord service can be described as an additional burden. The purpose of JWT is to
obviate the need for such a centralistic approach.</p>
<p>However, there again is no silver bullet and JWTs aren’t Swiss Army knives. Stateful authentication has its righteous
place. If you really need a central authentication system (e.g. to fulfil restrictive auditing requirements) or if you
simply don’t trust people <a href="https://www.chosenplaintext.ca/2015/03/31/jwt-algorithm-confusion.html">or libraries</a> to
correctly verify your JWTs, a stateful overlord approach is still the way to go and there is nothing wrong with it.</p>
<p>In my opinion, you probably shouldn’t mix both approaches. To shortly answer the questions above:</p>
<ul>
<li>There is no way of invalidating/revoking a JWT (and <a href="https://www.dinochiesa.net/?p=1388">I don’t see the point</a>),
except if you just use it as yet another random string within a stateful authenticating system.</li>
<li>There is no way of altering an issued JWT, so prolongating its expiration date is again not possible.</li>
<li>You could use JWTs if they really help you in solving your issues. You don’t have to use them. You can also keep
your opaque tokens.</li>
</ul>
<p>If you have further comments regarding the purpose of JWT or if you think I missed something important, do not hesitate
to drop me message <a href="https://www.twitter.com/jbspeakr">via Twitter</a>. I also appreciate feedback and further discussion.
Thanks!</p>
<p><em>This article was originally published on Jan's blog
<a href="https://www.jbspeakr.cc/purpose-jwt-stateless-authentication/">here</a>.</em></p>Closing the Data-Quality Loop2017-07-18T00:00:00+02:002017-07-18T00:00:00+02:00Hunter Kellytag:engineering.zalando.com,2017-07-18:/posts/2017/07/closing-the-data-quality-loop.html<p>Producing high quality validation corpora without the traditional time and cost inefficiencies.</p><p>To be able to measure the quality of some of the machine learning models that we have at Zalando, “Golden Standard”
corpora are required. However, creating a “Golden Standard” corpus is often laborious, tedious and time-consuming.
Thus, a method is needed to produce high quality validation corpora but without the traditional time and cost
inefficiencies.</p>
<h3><strong>Motivation</strong></h3>
<p>As the Zalando Dublin Fashion Content Platform (FCP) continues to grow, we now have many different types of machine
learning models. As such, we need high quality labelled data sets that we can use to benchmark model performance and
evaluate changes to the model. Not only do we need such data sets for final validation, but going forward, we also need
methods to acquire high-quality labelled data sets for training models. This is becoming particular clear as we start
working on models for languages other than English.</p>
<p>Creating a “Golden Standard” corpus generally requires a human being to look at something and make some decisions. This
can be quite time consuming, and ultimately quite costly, as it is often the researcher(s) conducting the experiment
that end up doing the labelling. However, the labelling tasks themselves don't always require much prior knowledge, and
could be done by anyone reasonably computer literate. In this era of crowdsourcing platforms such as <a href="https://www.mturk.com/mturk/welcome">Amazon's
Mechanical Turk</a> and <a href="http://www.crowdflower.com">CrowdFlower</a>, it makes sense to
leverage these platforms to try to create these high quality data sets at a reasonable cost.</p>
<h3><strong>Background</strong></h3>
<p>Back when we first created our English language Fashion Classifier, we bootstrapped our labelled data by using the (now
defunct) <a href="http://www.dmoz.org">DMOZ</a>, also known as the Open Directory Project. This was a site where volunteers, since
1998, were hand categorizing websites and webpages. A web page could live under one or more "Categories". Using a
snapshot of the site, we took any web pages/sites that had a category that contained the word "fashion" anywhere in it's
name. This became our “fashion” dataset. We then also took a number of webpages and sites from categories like "News",
"Sports", etc, to create our “non-fashion” dataset.</p>
<p>Taking these two sets of links, and with the assumption that they would be noisy, but "good enough", we generated our
data sets and went about building our classifier. And from all appearances, the data was "good enough". We were able
to build a classifier that performed well on the validation and test sets, as well as on some small, hand-crafted sanity
test sets. But now, as we circle around, creating classifiers in multiple languages and for different purposes, we want
to know:</p>
<ul>
<li>What is our data processing quality, assessed against real data?</li>
<li>When we train a new model, is this new model better? In what ways is it better?</li>
<li>How accurate were our assumptions regarding "noisy but good enough"?</li>
<li>Do we need to revisit our data acquisition strategy, to reduce the noise?</li>
</ul>
<p>And of course, the perennial question for any machine learning practitioner:</p>
<ul>
<li>How can I get more data??!?</li>
</ul>
<h3><strong>Approach</strong></h3>
<p>Given that Zalando already had a trial account with CrowdFlower, it was the natural choice of crowdsourcing platform to
go with. With some help from our colleagues, we were able to get set up and understand the basics of how to use the
platform.</p>
<h3><strong>Side Note: Crowdsourcing is an adversarial system</strong></h3>
<p>Rather than bog down the main explanation of the approach with too many side notes, it is worth mentioning up-front that
crowdsourcing should be viewed as an <em>adversarial system</em>.</p>
<p>CrowdFlower "jobs" work on the idea of "questions", and the reviewer is presented with a number of questions per page.
On each page there will be one "test question", which you must supply. As such, the test questions are viewed as
ground truth and are used to ensure that the reviewers are maintaining a high enough accuracy (configurable) on their
answers.</p>
<p>Always remember, though, that a reviewer wants to answer as many questions as quickly as possible to maximize their
earnings. They will likely only skim the instructions, if they look at them at all. It is important to consider
accuracy thresholds and to design your jobs such that they cannot be easily gamed. One step that we took, for example,
was to put all the links through a URL shortener ( <a href="https://github.com/retnuh/bulk-url-shortener">see here</a>), so that
the reviewer could not simply look at the url and make a guess; they actually had to open up the page to make a
decision.</p>
<h3><strong>Initial Experiments</strong></h3>
<p>We created a very simple job that contained 10 panels with a link and a dropdown, as shown below.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/179ac8bb932707077ae3f28e8e76d54c5ac1f545_initialrunpanel.png?auto=compress,format"></p>
<p>We had a data set of hand-picked links to use as our ground-truth test questions, approximately 90 fashion links, and 45
non-fashion links. We then also picked some of the links we had from our DMOZ data set, and used those to run some
experiments on. Since this was solely about learning how to use the platform, we didn't agonize over this data set, we
just picked 100 nominally fashion links, and 100 nominally non-fashion links, and uploaded those as the data to use for
the questions.</p>
<p>We ran two initial experiments: the first one we had tried to use some of the more exotic, interesting "Quality Control"
settings that CrowdFlower makes available, but we found that the number of "Untrusted Judgements" was far too high
compared to "Trusted Judgements". We simply stopped the job, copied it and launched another.</p>
<p>The second of the initial experiments proved quite promising: we got 200 links classified, with 3 judgements per link
(so 600 trusted judgements in total). The classifications from the reviewers matched the DMOZ labels pretty closely.
All the links where the DMOZ label and the CrowdFlower reviewers disagreed were examined; there was one borderline case
that was understandable, and the rest were actually indicative of the noise we expected to see in the DMOZ labels.</p>
<h3><strong>Key learnings from initial experiments:</strong></h3>
<ul>
<li>Interestingly, we really overpaid on the first job. Dial down the costs until after you've run a few experiments.
If the “Contributor Satisfaction” panel on the main monitoring page has a “good” (green) rating, you’re probably
paying too much.</li>
<li>Start simple. While it is tempting to play with the advanced features right from the get-go, don't. They can cause
problems with your job running smoothly; only add them in if/when they are needed.</li>
<li>You can upload your ground truth questions directly rather than using the UI, see these <a href="https://success.crowdflower.com/hc/en-us/articles/202702985-How-to-Create-Test-Questions#manual">CrowdFlower
docs</a> for more
information.</li>
<li>You can have extra fields in the data you upload that isn't viewed by the user at all; we were then able to use the
CrowdFlower UI to quickly create pivot tables and compare the DMOZ labels against the generated labels.</li>
<li>You can get pretty reasonable results even with minimal instructions.</li>
<li>Design your job such that "bad apples" can't game the system.</li>
<li>It's fast! You can get quite a few results in just an hour or two.</li>
<li>It's cheap! You can run some initial experiments and get a feeling for what the quality is like for very little.
Even with our "massive" overspend on the first job, we still spent less than $10 total on our experiments.</li>
</ul>
<h3><strong>Data Collection</strong></h3>
<p>Given the promising results from the initial experiments, we decided to proceed and collect a "Golden Standard" corpus
of links, with approximately 5000 examples from each class (fashion and non-fashion). Here is a brief overview of the
data collection process:</p>
<ul>
<li>Combine our original DMOZ link seed set with our current seed set</li>
<li>Use this new seed set to search the most recent <a href="http://commoncrawl.org/">CommonCrawl</a> index to generate candidate
links</li>
<li>Filter out any links that had been used in the training or evaluation of our existing classifiers</li>
<li>Sample approximately 10k links from each class: we intentionally sampled more than the target number to account for
inevitable loss</li>
<li>Run the sampled links through a URL shortener to anonymize the urls</li>
<li>Prepared the data for upload to CrowdFlower</li>
</ul>
<h4><strong>Final Runs</strong></h4>
<p>With data in hand, we wanted to make some final tweaks to the job before running it. We fleshed out the instructions
(not shown) with examples and more thorough definitions, even though we realized they would not be read by many. We
upped the minimum accuracy from 70% to 85% (as suggested by CrowdFlower). Finally, we adjusted the text in the actual
panels to explain what to do in borderline or error cases.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/06fb1fc715007fb577abb8f19e2eb3fba41e31fd_finalrunpanel.png?auto=compress,format"></p>
<p>We ran a final experiment against the same 200 links as in the previous experiments. The results were very similar, if
not marginally better than the previous experiment, so we felt confident that the changes hadn't made anything worse.
We then incorporated the classified links as new ground truth test questions (where appropriate) into the final job.</p>
<p>We launched the job, asking for 15k links from a pool of roughly 20k. Why 15k? We wanted 5k links from each class; we
were estimating about 20% noise on the DMOZ labels. We also wanted a high level of agreement, so links that had 3/3
reviewers agreeing. From the previous experiments, we were getting unanimous agreement on about 80% of the links seen.
So 10k + noise + agreement + fudge factor + human predilection for nice round numbers = 15k.</p>
<p>We launched the job in the afternoon; it completed overnight and the results were ready for analysis the next morning,
which leads to...</p>
<h3><strong>Evaluation</strong></h3>
<p>How does the DMOZ data compare to the CrowdFlower data? How good was "good enough"?</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/06a3a1bea71f9c9001a59ae1792d29455dad63e6_table-1.png?auto=compress,format"></p>
<p>We can see two things, right away:</p>
<ol>
<li>
<p>The things in DMOZ that we assumed were mostly not fashion, were, in fact, mostly not fashion. 1.5% noise is pretty
acceptable.</p>
</li>
<li>
<p>Roughly 22% of all our DMOZ "Fashion" links are not fashion. This is pretty noisy, and indicates that it was worth
all the effort of building this properly labelled "Golden Standard" corpora in the first place! There is definitely
room for improvement in our data acquisition strategy.</p>
</li>
</ol>
<p>Now, those percentages change if we only take into account the links where all the reviewers were in agreement; the
noise in the fashion set drops down to 15%. That's still pretty noisy.</p>
<p>So what did we end up with, for use in the final classifier evaluations? Note that the total numbers don't add up to
15k because we simply skipped links that produced errors on fetching, 404s, etc.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/cefa4a36aefbab3713358c1d82c31269156ba838_table-2.png?auto=compress,format"></p>
<p>This shows us, that similar to the initial experiments, that we had unanimous agreement roughly 80% of the time.</p>
<p><em>Aside: It's interesting to note that both the DMOZ noise and the number of links where opinions were split work out to
about 20%. Does this point to some deeper truth about human contentiousness? Who knows!</em></p>
<p>So what should we use to do our final evaluation? It's tempting to use the clean set of data, where everyone is in
agreement. But on the other hand, we don't want to unintentionally add bias to our classifiers by only evaluating it on
clean data. So <a href="https://www.youtube.com/watch?v=vgk-lA12FBk">why not both?</a> Below are the results of running our old
baseline classifier, as well as our new slimmer classifier, against both the "Unanimous" and "All" data sets.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d10335d92e4bc6229260aa76fbdc76b5e4098eb5_table-3.png?auto=compress,format"></p>
<p>Taking a look at our seeds and comparing that to the returned links, we find that 4,023 of the 15,000 are links in the
seed set, with the following breakdown when we compare against nominal DMOZ labels:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/352fc11d194872430eb6047e15b2ac90f6d195af_table-4.png?auto=compress,format"></p>
<h3><strong>Key Takeaways</strong></h3>
<ul>
<li>Overall, the assumption that the DMOZ was "good enough" for our initial data acquisition was pretty valid. It
allowed us to move our project forward without a lot of time agonizing over labelled data.</li>
<li>The DMOZ data <strong>was</strong> quite noisy, however, and could lead to misunderstandings about the actual quality of our
models if used as a "Golden Standard".</li>
<li>Crowdsourcing, and CrowdFlower, in particular, can be a viable way to accrue labelled data quickly and for a
reasonable price.</li>
<li>We now have a "Golden Standard" corpus for our English Fashion Classifier against which we can measure changes.</li>
<li>We now have a methodology for creating not only "Golden Standard" corpora for measuring our current data processing
quality, but a method that can be extended to create larger data sets that can be used for training and validation.</li>
<li>There may be room to improve the quality of our classifier by using a different type of classifier, that is more
robust in the face of noise in the training data (since we've established that our original training data was quite
noisy).</li>
<li>There may be room to improve the quality of the classifier by creating a less noisy training and validation set.</li>
</ul>
<h3><strong>Conclusion</strong></h3>
<p>Machine Learning can be a great toolkit to use to solve tricky problems, but the quality of data is paramount, not just
for training but also for evaluation. Not only here in Dublin, but all across Zalando, we’re beginning to reap the
benefits of affordable, high quality datasets that can be used for training and evaluation. We’ve just scratched the
surface, and we’re looking forward to seeing what’s next in the pipeline.</p>
<p>If you're interested in the intersection of microservices, stream data processing and machine learning, <a href="https://jobs.zalando.com/?location=Dublin&search=">we're
hiring</a>. Questions or comments? You can find me on Twitter at
<a href="https://twitter.com/retnuh">@retnuH</a>.</p>Complex Event Generation for Business Process Monitoring using Apache Flink2017-07-13T00:00:00+02:002017-07-13T00:00:00+02:00Hung Changtag:engineering.zalando.com,2017-07-13:/posts/2017/07/complex-event-generation-for-business-process-monitoring-using-apache-flink.html<p>We look at the design, implementation, and generation of complex events.</p><p>While developing Zalando’s real-time business process monitoring solution, we encountered the need to generate complex
events upon the detection of specific patterns of input events. In this blog post we describe the generation of such
events using Apache Flink, and share our experiences and lessons learned in the process. You can read more on why we
have chosen Apache Flink over other stream processing frameworks here: <a href="https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/">Apache Showdown: Flink vs.
Spark</a>.</p>
<p>This post is aimed at those familiar with stream processing in general and having had first experiences working with
Flink. We recommend Tyler Akidau’s blog post <a href="https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101">The World Beyond Batch: Streaming
101</a> to understand the basics of stream processing,
and Fabian Hueske’s <a href="https://flink.apache.org/news/2015/12/04/Introducing-windows.html">Introducing Stream Windows in Apache
Flink</a> for the specifics of Flink.</p>
<h3>Business Processes</h3>
<p>To start off, we would like to offer more context on the problem domain. Let’s begin by having a look at the business
processes monitored by our solution.</p>
<p>A <strong>business process</strong> is, in its simplest form, a chain of correlated events. It has a start and a completion event.
See the example depicted below:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1505b80e0ee2e6eb767b8c2ab99cab1ac9f14259_order_created__all_parcels_shipped.png?auto=compress,format"></p>
<p>The <strong>start event</strong> of the example business process is ORDER_CREATED. This event is generated inside Zalando’s platform
whenever a customer places an order. It could have the following simplified JSON representation:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="s">"event_type"</span><span class="p">:</span><span class="w"> </span><span class="s">"ORDER_CREATED"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"event_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="s">"occurred_at"</span><span class="p">:</span><span class="w"> </span><span class="s">"2017-04-18T20:00:00.000Z"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"order_number"</span><span class="p">:</span><span class="w"> </span><span class="mi">123</span>
<span class="p">}</span>
</code></pre></div>
<p>The <strong>completion event</strong> is ALL_PARCELS_SHIPPED. It means that all parcels pertaining to an order have been handed
over for shipment to the logistic provider. The JSON representation is therefore:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="s">"event_type"</span><span class="p">:</span><span class="w"> </span><span class="s">"ALL_PARCELS_SHIPPED"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"event_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">11</span><span class="p">,</span>
<span class="w"> </span><span class="s">"occurred_at"</span><span class="p">:</span><span class="w"> </span><span class="s">"2017-04-19T08:00:00.000Z"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"order_number"</span><span class="p">:</span><span class="w"> </span><span class="mi">123</span>
<span class="p">}</span>
</code></pre></div>
<p>Notice that the events are correlated on <strong>order_number</strong>, and also that they occur in order according to their
<strong>occurred_at</strong> values.</p>
<p>So we can monitor the time interval between these two events, ORDER_CREATED and ALL_PARCELS_SHIPPED. If we specify a
threshold, e.g. 7 days, we can tell for which orders the threshold has been exceeded and then can take action to ensure
that the parcels are shipped immediately, thus keeping our customers satisfied.</p>
<h3>Problem Statement</h3>
<p>A <strong>complex event</strong> is an event which is inferred from a pattern of other events.</p>
<p>For our example business process, we want to infer the event ALL_PARCELS_SHIPPED from a pattern of PARCEL_SHIPPED
events, i.e. generate ALL_PARCELS_SHIPPED when all distinct PARCEL_SHIPPED events pertaining to an order have been
received within 7 days. If the received set of PARCEL_SHIPPED events is incomplete after 7 days, we generate the alert
event THRESHOLD_EXCEEDED.</p>
<p>We assume that we know beforehand how many parcels we will ship for a specific order, thus allowing us to determine if a
set of PARCEL_SHIPPED events is complete. This information is contained in the ORDER_CREATED event in the form of an
additional attribute, e.g. <em>"parcels_to_ship": 3.</em></p>
<p>Furthermore, we assume that the events are emitted in order, i.e. the <strong>occurred_at</strong> timestamp of ORDER_CREATED is
smaller than all of the PARCEL_SHIPPED’s timestamps.</p>
<p>Additionally we require the complex event ALL_PARCELS_SHIPPED to have the timestamp of the last PARCEL_SHIPPED event.</p>
<p>The raw specification can be represented through the following flowchart:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/303f8595f80d78c0f6b72555e7acbd2f5054f216_ceg-flowchart1.png?auto=compress,format"></p>
<p>We process all events from separate Apache Kafka topics using Apache Flink. For a more detailed look of our architecture
for business process monitoring, <a href="https://www.slideshare.net/ZalandoTech/stream-processing-using-apache-flink-in-zalandos-world-of-microservices-reactive-summit/33">please have a look
here</a>.</p>
<h3>Generating Complex Events</h3>
<p>We now have all the required prerequisites to solve the problem at hand, which is to generate the complex events
ALL_PARCELS_SHIPPED and THRESHOLD_EXCEEDED.</p>
<p>First, let’s have an overview on our Flink job’s implementation:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e71070614b2798f2448d06139c8eb4594c70dce1_ceg_blog_post_photo.jpg?auto=compress,format"></p>
<ol>
<li>Read the Kafka topics ORDER_CREATED and PARCEL_SHIPPED.</li>
<li>Assign watermarks for event time processing.</li>
<li>Group together all events belonging to the same order, by keying by the correlation attribute, i.e. order_number.</li>
<li>Assign TumblingEventTimeWindows to each unique order_number key with a custom time trigger.</li>
<li>Order the events inside the window upon trigger firing. The trigger checks whether the watermark has passed the
biggest timestamp in the window. This ensures that the window has collected enough elements to order.</li>
<li>Assign a second TumblingEventTimeWindow of 7 days with a custom count and time trigger.</li>
<li>Fire by count and generate ALL_PARCELS_SHIPPED or fire by time and generate THRESHOLD_EXCEEDED. The count is
determined by the "parcels_to_ship" attribute of the ORDER_CREATED event present in the same window.</li>
<li>Split the stream containing events ALL_PARCELS_SHIPPED and THRESHOLD_EXCEEDED into two separate streams and write
those into distinct Kafka topics.</li>
</ol>
<p>The simplified code snippet is as follows:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// 1</span>
<span class="nx">List</span><span class="w"> </span><span class="nx">topicList</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">new</span><span class="w"> </span><span class="nx">ArrayList</span><span class="p"><>();</span>
<span class="nx">topicList</span><span class="p">.</span><span class="nx">add</span><span class="p">(</span><span class="s">"ORDER_CREATED"</span><span class="p">);</span>
<span class="nx">topicList</span><span class="p">.</span><span class="nx">add</span><span class="p">(</span><span class="s">"PARCEL_SHIPPED"</span><span class="p">);</span>
<span class="nx">DataStream</span><span class="w"> </span><span class="nx">streams</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">env</span><span class="p">.</span><span class="nx">addSource</span><span class="p">(</span>
<span class="w"> </span><span class="nx">new</span><span class="w"> </span><span class="nx">FlinkKafkaConsumer09</span><span class="p"><>(</span><span class="nx">topicList</span><span class="p">,</span><span class="w"> </span><span class="nx">new</span><span class="w"> </span><span class="nx">SimpleStringSchema</span><span class="p">(),</span><span class="w"> </span><span class="nx">properties</span><span class="p">))</span>
<span class="w"> </span><span class="p">.</span><span class="nx">flatMap</span><span class="p">(</span><span class="nx">new</span><span class="w"> </span><span class="nx">JSONMap</span><span class="p">())</span><span class="w"> </span><span class="c1">// parse Strings to JSON</span>
<span class="c1">// 2-5</span>
<span class="nx">DataStream</span><span class="w"> </span><span class="nx">orderingWindowStreamsByKey</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">streams</span>
<span class="w"> </span><span class="p">.</span><span class="nx">assignTimestampsAndWatermarks</span><span class="p">(</span><span class="nx">new</span><span class="w"> </span><span class="nx">EventsWatermark</span><span class="p">(</span><span class="nx">topicList</span><span class="p">.</span><span class="nx">size</span><span class="p">()))</span>
<span class="w"> </span><span class="p">.</span><span class="nx">keyBy</span><span class="p">(</span><span class="nx">new</span><span class="w"> </span><span class="nx">JSONKey</span><span class="p">(</span><span class="s">"order_number"</span><span class="p">))</span>
<span class="w"> </span><span class="p">.</span><span class="nx">window</span><span class="p">(</span><span class="nx">TumblingEventTimeWindows</span><span class="p">.</span><span class="nx">of</span><span class="p">(</span><span class="nx">Time</span><span class="p">.</span><span class="nx">days</span><span class="p">(</span><span class="mi">7</span><span class="p">)))</span>
<span class="w"> </span><span class="p">.</span><span class="nx">trigger</span><span class="p">(</span><span class="nx">new</span><span class="w"> </span><span class="nx">OrderingTrigger</span><span class="p"><>())</span>
<span class="w"> </span><span class="p">.</span><span class="nx">apply</span><span class="p">(</span><span class="nx">new</span><span class="w"> </span><span class="nx">CEGWindowFunction</span><span class="p"><>());</span>
<span class="c1">// 6-7</span>
<span class="nx">DataStream</span><span class="w"> </span><span class="nx">enrichedCEGStreams</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">orderingWindowStreamsByKey</span>
<span class="w"> </span><span class="p">.</span><span class="nx">keyBy</span><span class="p">(</span><span class="nx">new</span><span class="w"> </span><span class="nx">JSONKey</span><span class="p">(</span><span class="s">"order_number"</span><span class="p">))</span>
<span class="w"> </span><span class="p">.</span><span class="nx">window</span><span class="p">(</span><span class="nx">TumblingEventTimeWindows</span><span class="p">.</span><span class="nx">of</span><span class="p">(</span><span class="nx">Time</span><span class="p">.</span><span class="nx">days</span><span class="p">(</span><span class="mi">7</span><span class="p">)))</span>
<span class="w"> </span><span class="p">.</span><span class="nx">trigger</span><span class="p">(</span><span class="nx">new</span><span class="w"> </span><span class="nx">CountEventTimeTrigger</span><span class="p"><>())</span>
<span class="w"> </span><span class="p">.</span><span class="nx">reduce</span><span class="p">((</span><span class="nx">ReduceFunction</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">v1</span><span class="p">,</span><span class="w"> </span><span class="nx">v2</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="nx">v2</span><span class="p">);</span><span class="w"> </span><span class="c1">// always return last element</span>
<span class="c1">// 8</span>
<span class="nx">enrichedCEGStreams</span>
<span class="w"> </span><span class="p">.</span><span class="nx">flatMap</span><span class="p">(</span><span class="nx">new</span><span class="w"> </span><span class="nx">FilterAllParcelsShipped</span><span class="p"><>())</span>
<span class="w"> </span><span class="p">.</span><span class="nx">addSink</span><span class="p">(</span><span class="nx">new</span><span class="w"> </span><span class="nx">FlinkKafkaProducer09</span><span class="p"><>(</span><span class="nx">Config</span><span class="p">.</span><span class="nx">allParcelsShippedType</span><span class="p">,</span>
<span class="w"> </span><span class="nx">new</span><span class="w"> </span><span class="nx">SimpleStringSchema</span><span class="p">(),</span><span class="w"> </span><span class="nx">properties</span><span class="p">)).</span><span class="nx">name</span><span class="p">(</span><span class="s">"sink_all_parcels_shipped"</span><span class="p">);</span>
<span class="nx">enrichedCEGStreams</span>
<span class="w"> </span><span class="p">.</span><span class="nx">flatMap</span><span class="p">(</span><span class="nx">new</span><span class="w"> </span><span class="nx">FilterThresholdExceeded</span><span class="p"><>())</span>
<span class="w"> </span><span class="p">.</span><span class="nx">addSink</span><span class="p">(</span><span class="nx">new</span><span class="w"> </span><span class="nx">FlinkKafkaProducer09</span><span class="p"><>(</span><span class="nx">Config</span><span class="p">.</span><span class="nx">thresholdExceededType</span><span class="p">,</span>
<span class="w"> </span><span class="nx">newSimpleStringSchema</span><span class="p">(),</span><span class="w"> </span><span class="nx">properties</span><span class="p">)).</span><span class="nx">name</span><span class="p">(</span><span class="s">"sink_threshold_exceeded"</span><span class="p">);</span>
</code></pre></div>
<h3>Challenges and Learnings</h3>
<p><strong>The firing condition for CEG requires ordered events</strong></p>
<p>As per our problem statement, we need the ALL_PARCELS_SHIPPED event to have the event time of the last PARCEL_SHIPPED
event. The firing condition of the CountEventTimeTrigger thus requires the events in the window to be in order, so we
know which PARCEL_SHIPPED event is last.</p>
<p>We implement the ordering in steps 2-5. When each element comes, the keyed state stores the biggest timestamp of those
elements. At the registered time, the trigger checks whether the watermark is greater than the biggest timestamp. If so,
the window has collected enough elements for ordering. We assure this by letting the watermark only progress at the
earliest timestamp among all events. Note that ordering events is expensive in terms of the size of the window state,
which keeps them in-memory.</p>
<p><strong>Events arrive in windows at different rates</strong></p>
<p>We read our event streams from two distinct Kafka topics: ORDER_CREATED and PARCEL_SHIPPED. The former is much bigger
than the latter in terms of size. Thus, the former is read at a slower rate than the latter.</p>
<p>Events arrive in the window at different speeds. This impacts the implementation of the business logic, particularly the
firing condition of the OrderingTrigger. It waits for both event types to reach the same timestamps by keeping the
smallest seen timestamp as the watermark. The events pile up in the windows’ state until the trigger fires and purges
them. Specifically, if events in the topic ORDER_CREATED start from January 3rd and and the ones in PARCEL_SHIPPED
start from January 1st, the latter will be piling up and only purged after Flink has processed the former at January
3rd. This consumes a lot of memory.</p>
<p><strong>Some generated events will be incorrect at the beginning of the computation</strong></p>
<p>We cannot have an unlimited retention time in our Kafka queue due to finite resources, so events expire. When we start
our Flink jobs, the computation will not take into account those expired events. Some complex events will either not be
generated or will be incorrect because of the missing data. For instance, missing PARCEL_SHIPPED events will result in
the generation of a THRESHOLD_EXCEEDED event, instead of an ALL_PARCELS_SHIPPED event.</p>
<p><strong>Real data is big and messy. Test with sample data first</strong></p>
<p>At the beginning, we used real data to test our Flink job and reason about its logic. We found its use inconvenient and
inefficient for debugging the logic of our triggers. Some events were missing or their properties were incorrect. This
made reasoning unnecessarily difficult for the first iterations. Soon after, we implemented a custom source function,
simulated the behaviour of real events, and investigated the generated complex events.</p>
<p><strong>Data is sometimes too big for reprocessing</strong></p>
<p>The loss of the complex events prompts the need to generate them again by reprocessing the whole Kafka input topics,
which for us hold 30 days of events. This reprocessing proved to be unfeasible for us. Because the firing condition for
CEG needs ordered events, and because events are read at different rates, our memory consumption grows with the time
interval of events we want to process. Events pile up in the windows’ state and await the watermark progression so that
the trigger fires and purges them.</p>
<p>We used AWS EC2 t2.medium instances in our test cluster with 1GB of allocated RAM. We observed that we can reprocess, at
most, 2 days worth without having TaskManager crashes due to OutOfMemory exceptions. Therefore, we implemented
additional filtering on earlier events.</p>
<h3>Conclusion</h3>
<p>Above we have shown you how we designed and implemented the complex events ALL_PARCELS_SHIPPED and
THRESHOLD_EXCEEDED. We have shown how we generate these in real-time using Flink’s event time processing capabilities.
We have also presented the challenges we’ve encountered along the way and have described how we met those using Flink’s
powerful event time processing features, i.e. watermark, event time windows and custom triggers.</p>
<p>Advanced readers will be aware of the <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/libs/cep.html">CEP
library</a> Flink offers. When we started
with our use cases (Flink 1.1) we determined that these cannot be easily implemented with it. We believed that full
control of the triggers gave us more flexibility when refining our patterns iteratively. In the meantime, the CEP
library has matured and in the upcoming Flink 1.4 it will also support <a href="https://issues.apache.org/jira/browse/FLINK-6418">dynamic state changes in CEP
patterns</a>. This will make implementations of use cases similar to ours
more convenient.</p>
<p>If you have any questions or feedback you’d like to share, please get in touch. You can reach us via e-mail:
<a href="mailto:hung.chang@zalando.de">hung.chang@zalando.de</a> and <a href="mailto:mihail.vieru@zalando.de">mihail.vieru@zalando.de</a>.</p>The Modern Architecture of Search2017-06-27T00:00:00+02:002017-06-27T00:00:00+02:00Alaa Elhadbatag:engineering.zalando.com,2017-06-27:/posts/2017/06/the-modern-architecture-of-search.html<p>Discussing the components needed to solve the problems of IR in web platforms.</p><p>Information Retrieval (IR) systems are a vital component in the core of successful modern web platforms, and Zalando
understand their importance incredibly well.</p>
<p>The main goal of IR systems is to provide a communication layer that enables customers to establish a retrieval dialogue
with underlying data.</p>
<p>The immense explosion of unstructured data drives modern search applications to go beyond just fuzzy string matching, to
invest in deep understanding of user queries through interpretation of user intention in order to respond with a
relevant result set.</p>
<p>The modern architecture of search is a design of a data-driven IR system that covers the following:</p>
<ul>
<li>Data ingestion pipelines from various sources</li>
<li>Data retrieval and the lifecycle of a user search query</li>
<li>Machine-learned relevance ranking</li>
<li>Personalized search</li>
<li>Search performance tracking and quality assessment</li>
</ul>
<p>At the recent Berlin Buzzwords conference this month, we discussed the components needed to build an ecosystem that is
designed to solve the problems of IR in web platforms. What role can Machine Learning play in search relevancy? How can
natural language processing help provide a solid understanding of search phrases? How can data drive a personalized
search experience? And finally, what are the challenges of maintaining such a complex system?</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/db5e75c13ebd99d4719a793e9a28778c1560c953_architecture.png?auto=compress,format"></p>
<p>Watch as we reveal those answers and more below.</p>PostgreSQL in a time of Kubernetes2017-06-21T00:00:00+02:002017-06-21T00:00:00+02:00Jan Mußlertag:engineering.zalando.com,2017-06-21:/posts/2017/06/postgresql-in-a-time-of-kubernetes.html<p>Read about how we created a useful set of projects for operating PostgreSQL on Kubernetes.</p><p>A lot of time has passed at Zalando since the first services were started backed by PostgreSQL 9.0-rc1. Despite the
adoption of other technologies, PostgreSQL
<a href="https://www.slideshare.net/try_except_/goto-2013whyzalandotrustsinpostgre-sql20131018">remains</a> the preferred
relational database for most engineers around. You can follow some of the developments around PostgreSQL <a href="https://tech.zalando.com/blog/?tags=postgresql">on the
blog</a> and also on GitHub where we share most of our PostgreSQL-related
tooling.</p>
<p>Let’s start with a quick look at PostgreSQL on AWS. When Zalando Tech began its transition to AWS, the
<a href="https://stups.io/">STUPS</a> landscape and tooling was created. For the ACID team (the database engineering team), the
most relevant changes where that applications had to run in Docker and EC2 instances might be slightly less reliable
than what we were used to.</p>
<p>At scale and in the cloud automation is key. The ACID team started the work on
<a href="https://github.com/zalando/patroni">Patroni</a>, today Zalando’s most popular open source GitHub project, to take care of
PostgreSQL deployments and manage high availability, among other valuable features. The next step was
<a href="https://github.com/zalando/spilo">Spilo</a>, packaging Patroni and PostgreSQL into a single Docker image and providing
guidance on how to deploy database clusters using AWS CloudFormation templates.</p>
<p>Today teams have the choice of deploying PostgreSQL either with AWS RDS or Spilo. We are convinced that Spilo is a more
flexible solution, providing more control to teams, although often the one-click RDS service is more compelling. We feel
that our own PostgreSQL solution gives us more control and more flexibility, but this is not always required.</p>
<p>However, automated our deployment became, we did not focus on the last step, which is automating the initial request for
a cluster. Somewhere between the team wanting a PostgreSQL database and the database team creating it was still a
ticketing system. This had to change. Initial work on a REST service to trigger Spilo deployments on plain AWS/EC2 was
scrubbed in favor of a new solution using Kubernetes, believing that this the future platform to run on and benefiting
from its feature set, which is a stable API and declarative deployment descriptions. Kubernetes today runs on various
cloud providers, opening up for a bigger target audience and less lock in.</p>
<h3>Current status</h3>
<p>Let’s take a look at what we are currently developing and working on as open source products. First, we will briefly
touch on the PostgreSQL operator and its tiny user interface and then look into the pg_view web version.</p>
<p>Kubernetes provides so-called third party objects, allowing us to store YAML documents within Kubernetes itself and act
upon their changes. Using those third party objects to describe PostgreSQL clusters, we started working on the operator
that picks up the YAML definitions and transforms them into Kubernetes resources needed to run and expose PostgreSQL
clusters to our engineers. This concept will later allow us to easily configure and provision PostgreSQL into production
environments with a common deployment pipeline that relies solely on the Kubernetes API, basically triggering PostgreSQL
cluster setup from engineers committing to Git.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5bb124543023a757c41dc7d56822892fa464b22d_postgreskubernetes1.png?auto=compress,format"></p>
<p>Writing a YAML is pretty easy, but somehow it turned out having a user interface to get a cluster even quicker was a
good idea and less error prone. Thus, we wrote a very small RiotJS user interface for engineers to create PostgreSQL
clusters and provide them with feedback on how far the cluster creation is progressing. As one basically only works
against the nice Kubernetes API, this was not much work in the end.</p>
<p>The next thing we learned is that once you have a UI, engineers create clusters with incomplete or tiny
misconfigurations - forcing us to quickly add the first possible features to change the cluster configuration and test
the idea of the operator in production. Making the change means updating the third party object and letting the
postgres-operator update the Kubernetes resources.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/66ec4046df61f0f1e3c210b257e91e4ac9d2dbff_postgreskubernetes2.png?auto=compress,format"></p>
<p>Thanks to the work done in Patroni, tackling deployment, configuration, failover and recovery from S3 for example, the
deployment of a database is only a part of what users expect from PostgreSQL as a service. Maintenance and monitoring
are equally essential and most likely to require more work and attention.</p>
<h3>Monitoring</h3>
<p>Earlier we released our console tool <a href="https://github.com/zalando/pg_view">pg_view</a> to monitor the PostgreSQL cluster in
“real-time”; however, by its nature it required users to have SSH access into the machine, something no longer possible
and not desired for every engineer. Discussing the options, not everyone was immediately on board with the idea to
transfer this to a web based solution, but <a href="https://github.com/CyberDem0n/bg_mon">one of our engineers</a> had already done
the heavy lifting: a custom PostgreSQL extension was lingering around in his GitHub repos providing all the metric and
query data via a single HTTP endpoint. We quickly implemented a tiny prototype UI showing the same data earlier visible
on the terminal and, as we received good feedback on the idea, decided to stick with it.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7c0e27798a0bfacbf72d90368320a02dbdf46f82_postgreskubernetes3.png?auto=compress,format"></p>
<p>While this provided the critical insights into running queries and system metrics, we also reworked the
<a href="https://github.com/zalando/zmon">ZMON</a>-based coverage of our PostgreSQL clusters. ZMON checks track the basic metrics
one expects from AWS instances/Kubernetes Pods: CPU and memory, along with storage metrics and monitoring for free disk
space. Additionally, we also started to track PostgreSQL internal metrics from tables and indexes to give engineers a
better impression on how tables and indexes were growing, as well as how and where sequential scan or index scan
patterns changed over time.</p>
<h3>What’s in it for you?</h3>
<p>We have already open sourced the <a href="https://github.com/zalando-incubator/postgres-operator">operator</a> and are investing
more time to improve its feature set as we speak. Shortly, we will also release the user interface for “creating” the
third party resources to trigger PostgreSQL clusters. Our pg_view web version will also arrive soon.</p>
<p>From our point of view, the above creates a very useful set of projects around operating PostgreSQL on Kubernetes. Keep
an eye out for new repositories in the Zalando <a href="https://github.com/zalando-incubator">Incubator</a>, or contact me via
Twitter at <a href="https://twitter.com/JanMussler">@JanMussler</a> if you have further questions. Interested in joining us? <a href="https://jobs.zalando.com/jobs/570376-database-engineer-postgresql-cassandra/">We're
hiring</a>.</p>Quantitative UX Research – How Can it Complement our Customer Insights?2017-06-15T00:00:00+02:002017-06-15T00:00:00+02:00Dr. Franziska Susanne Rothtag:engineering.zalando.com,2017-06-15:/posts/2017/06/quantitative-ux-research--how-can-it-complement-our-customer-insights.html<p>Our UX research team has started to uncover the possibilities that quantitative UX research offers.</p><p>Most people associate the term UX research with qualitative methods, for example, interviews with a small number of
participants. These interviews are used to discover things such as customer problems, usability issues with a product,
and customer journeys. Often, we concentrate on observable behavior by watching and interviewing customers while they
actually use the product or prototype.</p>
<p>Observable behavior is the most important aspect, because what users say and what they do can be two quite different
things. The focus on behavior is also the reason why UX researchers are the ones asking the “why” questions: Why do
users behave in a certain way, is it because they do not understand how it works? Is it because they do not want to do
it? Is it because they expect something different? Is it because it does not solve a problem they are facing? To answer
these questions, we keep the abilities, motivations, and experiences of the individual in mind. For example: A person
motivated to search for a specific product will behave differently on-site than a person motivated to search for
inspiration. However, UX research offers more than just qualitative methods; big chances lie in combining this
qualitative approach with quantitative methods and thinking. But what does such “quantitative UX research” entail and
how can it help us to understand more of our customer’s behavior?</p>
<h3><strong>Diving deeper into quantitative UX research</strong></h3>
<p>To start answering this question, I would like to give an example through a project we conducted in the User Research
Team where we combined qualitative and quantitative methodology. We wanted to better understand how users experience the
Editorial section of our Fashion Store and how we can improve the section to make it more inspiring and engaging for
users. For this, we first invited users to our in-house Research Lab, watched them use the Editorial section and its
subparts, and asked questions about their expectations, problems on-site, and overall understanding of the content and
copy. We expected different reactions to our site, based on whether a person considers him-/herself as highly fashion
competent or not (i.e., considered themselves as knowledgeable in the area). Therefore, the recruiting of participants
focussed on diversity regarding this personality trait.</p>
<p>Based on the results from our qualitative interviews, we had new research questions and hypotheses that we wanted to
explore and test further, e.g.: Users with a high fashion competence use other sources for fashion inspiration compared
to users with a low fashion competence. For this, we used a quantitative survey.</p>
<p>In the survey, we measured fashion competence again and looked at our users’ understanding of the Editorial section as
well as their awareness of inspirational content on Zalando. This helped us to not only validate some of our findings
from our interviews, but also gain additional insights into how much and what influence fashion competence really has on
the perception and consumption of inspirational content.</p>
<p>In the future, we could also rely more on on-site data in such a study. In a survey, behavioral data is per definition
not as precise, because it is based on the memory of the participants. However, pairing survey data on
emotions/motivations/intents/personality traits that might influence behavior and actual on-site behavior data could
generate completely new knowledge for areas such as helping to develop personalization features or tailoring them for
shopping intent. We are working on making this possible as we speak.</p>
<h3><strong>What does this tell us?</strong></h3>
<p>The example of our editorial research shows that quantitative UX research uses the same mindset as qualitative UX
research: Focusing on behavior (how did they use what) while keeping individual traits in mind (e.g., their fashion
competence, emotions, motivations etc.). Quantitative UX doesn’t merely look at “how much”, but also offers answers on
the question of “why”. This is mainly done through combining data on behavior (either in the past or in the moment) with
insights on the person. However, in quantitative UX research we do this with standardized measures and in bigger
samples. Here, we are not looking at three or five participants any more, but at 50 participants or more. This, in the
end, merits analyses that use descriptive as well as inductive statistics, enabling us to reliably test hypotheses that
were generated in our qualitative research.</p>
<p>Other research methods that are typical for quantitative UX research, apart from surveys, are those associated with
remote testing (e.g., card sorting and unmoderated task based tests). Here, small scale A/B tests (meaning: comparison
between different variants) are also possible to compare customer impressions of different product versions before they
are finalized for example. While such tests are not as reliable as a classic A/B test when it comes to statistical power
(i.e., how likely it is to discover a significant effect), sample size, and measuring of KPIs like Conversion Rate, they
can be a low key solution to compare different mock-ups or prototypes early on. Furthermore, existing solutions that are
not yet live in every country can be shown to customers without bringing them live.</p>
<p>Remote testing studies also offer the possibility to ask customers about his or her experience, granting additional
insights on why one version performed better than the other. This is something we have recently done by conducting such
comparisons with French customers. We researched their perception of different sizing help features (i.e., size chat,
size recommendation, etc.) that were not yet live in France. This way, we were able to establish a ranking of which
feature was considered most and least helpful, as well as the reasons for this opinion.</p>
<h3><strong>Final thoughts</strong></h3>
<p>Insights from quantitative UX research complement the picture of the customer that we gain through qualitative UX
research, sometimes by adding numbers on phenomena we saw before, and sometimes by enabling comparisons between
customers or products. It can also build new bridges to market research or A/B testing by speaking a more similar
language in ours and their results. However, there are also risks involved: Getting lost in numbers instead of really
listening to what the customer says while using the product.</p>
<p>The UX research team at Zalando has only just started to uncover all the possibilities that quantitative UX research
offers. We are excited about the road ahead. If you have any questions about our thoughts and processes, or would like
to contribute your own ideas regarding quantitative UX research, feel free to get in contact via email at
<a href="mailto:franziska.susanne.roth@zalando.de">franziska.susanne.roth@zalando.de</a>, I would love to hear from you.</p>Signalling Your Jenkins Build Status with a Mini USB Traffic Light2017-06-08T00:00:00+02:002017-06-08T00:00:00+02:00Julian Heisetag:engineering.zalando.com,2017-06-08:/posts/2017/06/signalling-your-jenkins-build-status-with-a-mini-usb-traffic-light.html<p>Raise awareness of your failing builds, like we are, with this handy tutorial.</p><p>As part of an effort to increase developer awareness of quality, we wanted to draw attention the fact that you should
have healthy CI builds. The normal procedure revolved around emails sent to the individuals who broke the build with
their last commit. With almost all of us used to receiving a lot of email-noise throughout the day, this is not a
channel where you can expect an immediate reaction.</p>
<p>We wanted some means to alert a team of their failing build that is prominent and not prone to getting lost in the
background. We figured it should be some kind of hardware - maybe an alert light or flashing LED or something… something
that is able to draw your attention, but not be too annoying at the same time.</p>
<p><strong>Finding the right solution</strong></p>
<p>First drafts considered <a href="https://blink1.thingm.com/">blink(1)</a> - a small LED that is plugged into a USB slot and then
controlled via an API. It can display a whole bunch of colours and flash at different rhythms. The downside is that it
is very small and thus not prominent enough to reliably catch the attention of a whole team.</p>
<p>There were also discussions about single color lamps and alert lights, but we found them too annoying or just plain
boring. The <a href="https://github.com/codedance/Retaliation">Retaliation</a> approach with a USB rocket launcher is hilarious,
though a bit too difficult to set up and maintain for the amount of teams in our company.</p>
<p>So, we came up with the Traffic Light that changes its color based on the outcome of one or more Jenkins jobs: green for
passing, red for failing (now, who would have thought that?). At the time there wasn’t anything similar available, we
just built it ourselves. The result is an <a href="https://github.com/zalando-incubator/build-status-traffic-light">open source
release</a> that natively supports Jenkins and Travis CI,
but it’s also able to draw data from other endpoints by means of Regex Evaluation.</p>
<p><strong>How did we go about it?</strong></p>
<p>If you are interested in setting up the Traffic Light for your team, there is a comprehensive guide in the
<a href="https://github.com/zalando-incubator/build-status-traffic-light/blob/master/readme.md">README</a>. The current package is
supposed to run as a daemon on Linux systems. Most of the installation is handled automatically, but one of the
dependencies has to be built manually (covered in the setup guide). Once you have it running, you can add your Jenkins
(and other) jobs easily by providing a JSON file.</p>
<p>For our internal rollout we used a RaspberryPi 2 to run the daemon on Raspian, because the PI has a low energy
consumption and runs independently from our developer’s workstations. The total cost per Traffic Light is about 80€
overall (including RaspberryPi).</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/00bbc4a66572e91b970fb721e6f41a38c2d5f06c_img_20160630_155142.jpg?auto=compress,format"></p>
<p><strong>Summary and results</strong></p>
<p>With the introduction of the Traffic Light for certain developer teams, we intended to pique curiosity and eventually
attract attention for solid automated software testing. Additionally we wanted to reward teams who already made an
effort towards testing.</p>
<p>Was the project worth it? Absolutely! For a price tag of under 100€ we put together a device that helps raise awareness
for failing builds. Almost everybody introduced to the idea asked us how they could get their own Traffic Light.</p>
<p>If you have any further questions regarding the project, feel free to reach out via email to <a href="mailto:julian.heise@zalando.de">julian.heise@zalando.de</a>.</p>Behind Project Deadlines: Estimations for a Shared Understanding2017-05-30T00:00:00+02:002017-05-30T00:00:00+02:00Fausto Sanninotag:engineering.zalando.com,2017-05-30:/posts/2017/05/behind-project-deadlines-estimations-for-a-shared-understanding.html<p>Managing stakeholder expectations for the good of your team and project.</p><p>Years of project management has taught me that one of the keys to a project’s success is managing stakeholder
expectations. The only medium we have to influence expectations is effective communication.</p>
<p>Autonomy allows teams to open up their creativity and solve hard problems by being self-organized. It expects an
extraordinary degree of commitment but also an important sense of accountability, which often puts teams under pressure
that are asked to provide forecasting on their work. I noticed this especially with brand new teams, who go so far as to
refuse to provide estimations of their efforts. The reason? To avoid making promises that are hard to keep.</p>
<h3>A common misunderstanding</h3>
<p>A new team, with a completely new project to implement, starts providing really high estimates to try and discourage the
expectations of a Product Owner. Iteration after iteration, no value is delivered and the team decides to stop using
estimation to avoid uncomfortable situations after their alignment meeting.</p>
<p>An opposite example yields the same result:</p>
<p>The team is asked to provide estimates for upcoming work. A well defined scope and high team maturity spreads optimism,
which influences estimates to be too low, even without an explicit deadline in the project. During the iteration it
turns out that the effort required is actually more than estimated.</p>
<p>A team member might start feeling uncomfortable, and in order to reduce the time, sacrifices the quality of their work.
The outcome is an announced iteration dedicated entirely to bug fixing or refactoring. Any increase in time or reduction
in scope is seen as a defeat. It leads to a vicious cycle which pushes people to have an aversion towards estimations
and, sprint after sprint, the planning meeting becomes the worst appointment of the week.</p>
<p>In both cases, at certain point in the story, someone will state that their software development methodology is not
Agile but only a variant of developer micromanagement.</p>
<p>A misunderstanding on the purpose of the estimation tool can indeed disrupt a team’s harmony and productivity: From one
side the team doesn't understand the real purpose of estimating a piece of work, while on the customer side it can
create unrealistic expectations.</p>
<h3>The tool</h3>
<p>I believe that estimations are a powerful tool – especially if you work in a complex and super challenging environment.
Estimations establish a shared understanding of the user story between product and development parties, and within the
whole team. It is THE decision-making tool.</p>
<p>If one person considers a task small and another large, they are most probably not on the same page. Whether you have to
go for A or B solution, it must be understood which box we need to pack the problem in. A letter envelope or a big IKEA
box?</p>
<p>Creating expectation is just a natural consequence of the process, but it is essential to share a common mindset of the
ambition level that your stakeholders wish to reach. A crucial appointment is the "user story grooming session" (or
backlog refinement), when the team can re-estimate their stories as soon as they realize an assumption is wrong.</p>
<p>This activity should occur on a regular basis and could be an officially scheduled meeting or an ongoing activity. In my
experience, it has helped teams a lot; Some of the activities that occur during these sessions include:</p>
<ul>
<li>Removing user stories that no longer appear relevant</li>
<li>Creating new user stories in response to newly discovered needs</li>
<li>Re-assessing the relative priority of stories</li>
<li>Assigning estimates to stories which have yet to receive one</li>
<li>Correcting estimates in light of newly discovered information</li>
<li>Splitting user stories which are high priority but too coarse grained to fit in an upcoming iteration</li>
</ul>
<p>The Product Owner then clearly defines and updates the Acceptance Criteria and indicates or reassesses their
expectations.</p>
<h3>Conclusion</h3>
<p>I strongly support that even in the first stages of a project when everything will be highly unpredictable because of
requirements, use cases, planning, staffing, and other unclear variables, that estimations should occur. No matter the
estimation technique, as soon as you receive the first data points and the sooner you can test your assumptions, the
better the variability in the project diminishes.</p>
<p>If you’re interested to know more or share your experience you can get in touch with me via Twitter at
<a href="https://twitter.com/FaustoSannino">@FaustoSannino</a>.</p>Platform Engineering and Third Generation Microservices in Dublin2017-05-26T00:00:00+02:002017-05-26T00:00:00+02:00Bill de hÓratag:engineering.zalando.com,2017-05-26:/posts/2017/05/platform-engineering-and-third-generation-microservices-in-dublin.html<p>Closing the gap between systems of record and intelligence is a team vision.</p><p>The Zalando Dublin Technology Hub <a href="https://twitter.com/zalandotech/status/590795702613188609">formed in 2015</a> in part to
research and develop products and services using data science. There’s ongoing collaboration with our colleagues in
Germany and Finland - recently, over twenty data science researchers and engineers from Dublin attended a two-day
internal conference in Berlin. They presented on topics ranging from multi-lingual analysis of fashion content, image
tagging with <a href="https://www.tensorflow.org/">TensorFlow</a>, to infrastructure for model hierarchies. Dublin Zalandos are
also involved in local meetups and talks. As a result we’re now best known in the Dublin community for our data science
work and culture.</p>
<p>The part we talk about less is building more and more critical elements of the <a href="https://blog.zalando.com/en/blog/how-zalando-becoming-online-fashion-platform-europe">Zalando
Platform</a>. Platform is central to
the company’s thinking — recently Zalando hosted <a href="https://vizions.berlin/">Vizions</a>, the first conference of its kind
in Europe to focus on platform business models. Just as our work on machine learning focuses on <a href="https://news.greylock.com/the-new-moats-53f61aeac2d9">systems of engagement
and intelligence</a>, our platform engineering work focuses on the
systems of record.</p>
<p>Our platform work started small. Initially, a few engineers began looking at fashion article data models and services.
Today that has blossomed into a set of teams developing for a broad range of customer needs and we’re continuing to grow
our platform teams.</p>
<h3>Platform Engineering</h3>
<p>Building a platform involves more than engineering and scaling. Platforms are also about providing leverage to others.
We want teams and customers to work at higher levels and innovate on problems. In Dublin this has led us to take a
considered approach to what we call platform engineering.</p>
<p>Platform teams are autonomous, own their impact, purpose, and technical direction. They also establish a technology
vision to support customer and product goals. Teams thus concentrate on building what matters, but also think long term,
to provide a sustainable technical runway. Quality is not a surface element — what we have to build, we have to build
well and be proud of. So while we iterate and learn on design details, we can’t always justify <a href="https://www.youtube.com/watch?v=5bJi7k-y1Lo&feature=youtu.be&t=324">80/20
thinking</a>. In some cases what we need to build are
almost givens — functions like products, articles, customers and categories are not speculative.</p>
<p>We’re still learning how best to organise the platform and data science teams. An approach that’s working so far is
establishing broad product areas contributed to by many teams. In combination with a <a href="https://tech.zalando.com/blog/from-jimmy-to-microservices-rebuilding-zalandos-fashion-store/">microservices
architecture</a>, it’s
proving a viable way to spin up new teams to solve new problems and maintain consistent team sizes. It lets us strike a
balance between exploring new technology options that give us leverage over problems, and using what we know works well
— cells of teams gossip knowledge and experiences faster, which increases our learning rate. Finally, it avoids
individuals needing to work on one thing, for too long. Technical depth is critical, but new challenges are important
for personal development and maintaining energy.</p>
<p>The mix of engineering for scale and performance targets, while moving quickly to enable future growth, is rewarding and
challenging work. Arguably it’s unusual — large companies that set up remote offices often work at an impressive scale,
but sometimes with a narrow scope for ownership and impact.</p>
<h3>Third Generation Microservices</h3>
<p>Building a new suite of services is an opportunity to learn from the past and examine things from first principles.
We’re fortunate to have engineers that have worked on microservices already, giving us a strong knowledge basis to draw
from.</p>
<p>An observation we’ve made is that synchronous request-response systems require a raft of supporting subsystems. They
tend to need specific incident management processes to handle aspects like fanout latency or partial failure modes. They
also can be tricky to compose. We’re finding a more asynchronous, event driven, and messaging passing style, works well
in many cases. Services don’t always need to call each other. In fact, they often don’t need to know about each other at
all. Instead, they can work on incoming events and data handed off by other services in a functional manner. This is
based on another observation — the kinds of problems we need to solve are dominated by access to and processing of data,
especially data about what’s changing.</p>
<p>And so today, most of our teams are working with data streams and have adopted functional programming. In this approach,
data streaming tools and techniques move from the edge of the system where they have become popular for data
integration, and capture for systems like lakes and warehouses, to become a central part of the service fabric.</p>
<p>You might call this approach <em>third generation microservices</em>. While it’s not exactly a new approach, it’s not
widespread — the industry state of the art is still centered around request-response systems, focusing on resources and
entities, rather than what’s happening and changing in the system. First generation services, were coarse grained, often
tied to a business unit or tier. Second generation are fine grained, arranged around request/response calls, optimising
for organisational velocity and are what many would consider the “microservices” state of the art today, but we think
an industry transition will soon occur.</p>
<p>For us, it’s early days — we have a lot to learn about doing this at the scale and speed we want. But unifying the
worlds of microservices and data streams to achieve our goals is something we’re excited to make progress on.</p>
<h3>Closing the Loop</h3>
<p>Closing the gap between systems of record and systems of intelligence is a vision of the teams in Dublin.</p>
<p>For <a href="https://tech.zalando.com/blog/research-roles-at-zalando-research/">researcher engineers and data scientists</a>, data
streams allow the results of machine learning experiments and models to integrate more easily with platform datasets.
Making platform data available via well defined event streams and moving away from batch/periodic data dumps also
improves data accessibility for research scientists. They also enable data scientists to build their own service APIs
and user interfaces on top of data streams. Ultimately, this is why we take a platform mindset — data and research
scientists can work faster and establish value at higher levels if they can rely on well engineered platform primitives
that give them access to the information they need.</p>
<p>For platform engineering, we see value in using typed functional techniques and data streams, as a complement to our
microservices architecture, <a href="https://tech.zalando.com/blog/zalando-tech-radar/">technology radar</a>, and <a href="https://github.com/zalando/restful-api-guidelines">API
guidelines</a>. These approaches simplify systems operations, enables
closer to real time data processing, and allows easier service composition.</p>
<p>Perhaps the best outcome of organising around data and event streams has been greater knowledge exchange between
software engineering and data science. These disciplines often work in isolation in the industry, but they have much to
learn from each other.</p>
<h3>Conclusion</h3>
<p>Zalando is a multi-billion dollar business with the fastest growing technology engineering group in Europe. In Dublin
it’s a been a challenge and a delight to to work on larger problems and make broader contributions. We’re just getting
started, and we’re excited about what’s next.</p>
<p><em>If you're interested in the intersection of microservices, stream data processing and machine learning,</em> <em>we're
hiring.</em></p>Hack Around The Clock – Hack Night @ Zalando Hamburg2017-05-17T00:00:00+02:002017-05-17T00:00:00+02:00Zalando Tech Community Managerstag:engineering.zalando.com,2017-05-17:/posts/2017/05/hack-around-the-clock--hack-night--zalando-hamburg.html<p>Hacking has a long tradition and it’s not merely limited to writing lines of code.</p><p>Here at Zalando Tech, hacking has a long tradition and it’s not merely limited to writing lines of code. Back in the
day, when there were no blinds to protect our screens from the reflection of the sun, staff covered the office windows
with paper bags. This pragmatism is what makes working at Zalando unique.</p>
<p>With our first Hack Night in Hamburg, a similar approach could be felt. We faced a certain challenge and our concept
around Hack Night held a pragmatic solution.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e96f7d5ddf6724bbf60f3ff21a7d35f784aea263_img_1775.jpg?auto=compress,format"></p>
<p>The challenge: Our Hamburg Teams weren’t able to participate in <a href="https://tech.zalando.com/blog/hack-week-5-is-live/">Hack
Week</a> last year but wanted to hack just like their colleagues.</p>
<p>The solution: We invited all teams working on adtech topics in Berlin to Hamburg and hosted a hackathon for two days in
our newly opened Tech Hub. The teams had 30 hours to work on challenges around our advertising products. They were also
supported by 25 business students from the Hamburg Media School.</p>
<p>In the end, the teams came up with projects that looked quite advanced for the short amount of time they had on their
hands. Not only that, they also got to know each other across office locations and their collaboration efforts grew all
the more for it. But don’t take our word for it, see for yourself:</p>
<p>Going forward, we will iterate and adapt the Hack Night structure and hopefully bring it to every Tech Office in the
future, powered by their respective expert technologies. Keep your eyes peeled for what might be our next challenge.</p>Personalization For The Good Of All2017-05-16T00:00:00+02:002017-05-16T00:00:00+02:00Antti Aaltotag:engineering.zalando.com,2017-05-16:/posts/2017/05/personalization-for-the-good-of-all.html<p>Building an advanced personalization system by creating a fully curated experience.</p><p>In the Fall of 2016, five senior software engineers found themselves in cold water. Not only was the freezing Finnish
winter approaching fast, but the whole vision for their team was about to raise chills: revolutionize the whole
personalization approach for customers shopping on Zalando. The bar was set high. The user stories were endless. The key
metrics set for product evaluation bewildered even the most analytically minded developer.</p>
<p>On the other hand, this was a greenfield project–the dream of any programmer. This meant that Team Picasso, who look
after content personalization, would have significant influence on how the end product would look like. They were also
given free reign on all technological choices with little dependant systems to worry about. The young and cozy <a href="https://tech.zalando.com/locations/#helsinki">Zelsinki
office</a> proved to be a peaceful and distraction-free nest to focus purely
on the engineering challenges this project presented, although spontaneous Nerf wars tended to occur in the late
afternoon.</p>
<p>The journey was a twisting maze of little passages, all different and varied. These new hires had travelled on average
5,000 kilometers from all around the world, from different cultures, speaking different languages, with nobody being
able to pronounce anyone’s name exactly right. Nevertheless, linguistic problems didn’t prevent our team from spending
hours arguing about requirements and solutions. These debates often developed emotional undertones when everyone
advocated for their brainchild. It’s natural for people to identify and become attached to their ideas, and now they
were openly poking holes in each other’s designs.</p>
<p>During the course of development, something remarkable happened: people started to detach themselves from their babies
and rejoice when being proven incorrect. After all, you thought you had the perfect solution to the problem at hand, and
now you were handed a better one. What an opportunity for learning!</p>
<p>Armed with this strong drive for technological excellence and fact-based decisions, the team started work on content
personalization.</p>
<h3>The Product</h3>
<p>The Zalando Core Platform consists of a plethora of small components, and content personalization is a vital piece of
it. The previous iterations of the Fashion Store’s recommendation system were mostly focused on products and brands. Now
we wanted to extend this concept to all content, including blog posts, hand-curated collections, and advertising.</p>
<p>Team Picasso was tasked in building a microservice called Content Broker. Content Broker provides various
customer-facing applications with a single endpoint for retrieving a dynamically configured selection of A/B tested and
personalized content. This collection of content will be carefully selected to maximize certain KPIs, be it basket size,
user retention, or plain old click-through rate.</p>
<p>The dynamic configuration allows modifications to the content selection and ordering logic during runtime, without
re-deploying any application components. The configuration also makes it possible to implement various business rules
for content selection in different contexts, for example, by different applications or demographics.</p>
<p>To verify that the content selection works as intended, Content Broker provides tooling for easy A/B testing between
configurations. Zalando already had excellent tooling for A/B testing–of course, implemented as yet another
microservice–which could be incorporated in personalization products, too. Using these battle tested solutions ensures
the tests are done free from sampling errors and other issues in statistical analysis.</p>
<p>The Content Broker is a Zalando Core Platform component, which means that it will be used by both Zalando internal
applications and third-party applications. The first version went live in February 2017 on the Zalando website, and the
work will then continue with extending the configuration options, more content sources, and some very interesting new
applications.</p>
<p>If you’d like to ask any questions, feel free to get in touch via Twitter at
<a href="https://twitter.com/aaltoantti">@aaltoantti</a>.</p>Detecting List Items Observed by User2017-05-10T00:00:00+02:002017-05-10T00:00:00+02:00Sergii Zhuktag:engineering.zalando.com,2017-05-10:/posts/2017/05/detecting-list-items-observed-by-user.html<p>Improve your knowledge of RxJava and RecyclerView APIs with this tutorial.</p><p>Scrollable sets of items are one of the main UI elements of every app. Quite often a business wants to know if a user
has viewed and perceived a specific item. From here, we need to figure out if a user spent enough time in order to
accept the content. Let’s find an Android solution using RecyclerView and RxJava.</p>
<h3>Why and what?</h3>
<p>Our team met with the following requirement: identify which item of the <em>RecyclerView</em> list was viewed and perceived by
the user. Perceived in this context means that the user held the item in the viewport for at least 250 milliseconds. The
image below illustrates this with an example.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5fd3ddd1db05f0d878171fd664daf568c00a63ff_medium-image-tracking.png?auto=compress,format"></p>
<p>Technically, this means we need to send <em>“list item id## was viewed”</em> tracking events to the analytics SDK (it can be
Firebase, Google Analytics, etc) based on a few conditions. Below I have formalized the requirements we need to meet to
implement this logic:</p>
<ul>
<li>Distinct: skip the event when the visible item set is equal to one that has just been processed. Use case is
multiple callbacks from the swipe gesture;</li>
<li>Timeout: fire the event only after a specific timeout, 250ms in our case;</li>
<li>Skip previous event if a distinct event has arrived before the timeout: a previous tracking event should be skipped
if the user hasn’t held the item for the defined timeout and scrolled to another list item;</li>
<li>Reset: reset the state of the logic defined above in case the current <em>Activity</em> is stopped. We need this to track
the view again when our user comes back.</li>
</ul>
<h3>RecyclerView and visible items</h3>
<p>The <em>RecyclerView</em> itself is only a structure to provide a limited window to the list of items. To measure, position and
determine visibility state, we need to use the
<a href="https://developer.android.com/reference/android/support/v7/widget/RecyclerView.LayoutManager.html">LayoutManager</a>
abstract class. One of the most common implementations is a
<a href="https://developer.android.com/reference/android/support/v7/widget/LinearLayoutManager.html">LinearLayoutManager</a>. It
makes your <em>RecyclerView</em> look and feel like a good old <em>ListView</em>. To achieve basic list item visibility detection, we
can go with these two methods to be called on every scroll:</p>
<div class="highlight"><pre><span></span><code>int findFirstCompletelyVisibleItemPosition()
int findLastCompletelyVisibleItemPosition()
</code></pre></div>
<p>To detect scroll events in <em>RecyclerView</em> we need to add scroll listener
<a href="https://developer.android.com/reference/android/support/v7/widget/RecyclerView.OnScrollListener.html#onScrolled%28android.support.v7.widget.RecyclerView,%20int,%20int%29">RecyclerView.OnScrollListener</a>,
which provides us with <em>onScroll()</em> callback. The annoying thing about this callback is that it is called multiple times
during one swipe action completed by a user.</p>
<p>However, these classes don’t provide us with information about how long a user was looking at the current item. We need
to do this on our own.</p>
<h3>Approach #1: Scroll callbacks and visible items state</h3>
<p>The most obvious way to detect items perceived by the user is to check the scroll state and mark your list items
“viewed”. In this way you will need to add a timestamp to every item. This timestamp should be set when the item comes
to the viewport. You’d also perform a check and optionally trigger tracking if needed when the list item gets out of the
viewport. Additionally, you will need to keep the currently visible item list to compare them with those that have
appeared/disappeared after a scroll event.</p>
<p>However, this means you would only be able to catch the “view” event when the user scrolls out the item, but not
immediately when the timeout (250ms in our case) will fire. Moreover, you need a separate trick to “force” the tracking
when your current <em>Activity</em> is stopped (so force tracking in <em>onStop()</em> callback and not on scroll).</p>
<p>Another trade-off of this pattern is the amount of <em>ScrollListener</em> callbacks you need to process for every swipe. It
becomes an issue because with every callback you will need to do a visible items and timeouts check, which might impact
app performance.</p>
<h3>Approach #2: Scroll callbacks and RxJava Subscribers</h3>
<p>Discussing Approach #1, my colleague Simon Percic revealed a possible use case for <em>RxJava</em> to solve this problem in a
more elegant way. Indeed, we can implement event bus functionality using
<a href="http://reactivex.io/documentation/subject.html">PublishSubject</a> and post a new event to observe each time the list item
appears in the viewport. To achieve the timeout effect and to not track the same item several times, we can use
filtering operators available in Rx.</p>
<p>To isolate this piece of logic from the main code we put it to the separate <em>TrackingBus</em> class with all required
callbacks inside. This class should be instantiated in <em>onResume()</em> callback of the target <em>Activity/Fragment</em> and
unsubscribed in <em>onPause()</em>.</p>
<p>Below is the set of filters we used to meet the requirements:</p>
<ul>
<li><a href="http://reactivex.io/documentation/operators/distinct.html">distinctUntilChanged</a> to skip equal events in case of
multiple scroll callbacks;</li>
<li><a href="http://reactivex.io/documentation/operators/debounce.html">throttleWithTimeout/debounce</a> to pass an event with a
delay and drop the current event if another event arrives before the timeout.</li>
</ul>
<p>Our bus itself requires the following setup:</p>
<ul>
<li>Keep the <a href="http://reactivex.io/documentation/subject.html">PublishSubject</a> instance to apply filters on view events
and fire tracking callback. You can use <a href="https://github.com/JakeWharton/RxRelay/">PublishRelay</a> as well. It omits a
terminal state behaviour in case of <em>onComplete()</em> or <em>onError()</em>;</li>
<li>Keep the <a href="http://reactivex.io/RxJava/javadoc/rx/Subscription.html">Subscription</a> instance to unsubscribe and avoid
leaks when <em>Activity/Fragment</em> is not visible any more.</li>
</ul>
<h3>Complete solution: View Tracking Bus with RxJava</h3>
<p>The code snippet below illustrates the <em>RxJava</em> solution we developed. Check the
<a href="https://github.com/sergiiz/GroceryStore">GroceryStore</a> project from GitHub to see the complete demo project.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">java.util.concurrent.TimeUnit</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">rx.Subscription</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">rx.functions.Action1</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">rx.subjects.PublishSubject</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">rx.subjects.Subject</span><span class="p">;</span>
<span class="n">public</span> <span class="k">class</span> <span class="nc">ThrottleTrackingBus</span> <span class="p">{</span>
<span class="n">private</span> <span class="n">static</span> <span class="n">final</span> <span class="nb">int</span> <span class="n">THRESHOLD_MS</span> <span class="o">=</span> <span class="mi">250</span><span class="p">;</span>
<span class="n">private</span> <span class="n">Subject</span> <span class="n">publishSubject</span><span class="p">;</span>
<span class="n">private</span> <span class="n">Subscription</span> <span class="n">subscription</span><span class="p">;</span>
<span class="n">private</span> <span class="n">final</span> <span class="n">Action1</span> <span class="n">onSuccess</span><span class="p">;</span>
<span class="n">public</span> <span class="n">ThrottleTrackingBus</span><span class="p">(</span><span class="n">final</span> <span class="n">Action1</span> <span class="n">onSuccess</span><span class="p">,</span>
<span class="n">final</span> <span class="n">Action1</span> <span class="n">onError</span><span class="p">)</span> <span class="p">{</span>
<span class="n">this</span><span class="o">.</span><span class="n">onSuccess</span> <span class="o">=</span> <span class="n">onSuccess</span><span class="p">;</span>
<span class="n">this</span><span class="o">.</span><span class="n">publishSubject</span> <span class="o">=</span> <span class="n">PublishSubject</span><span class="o">.</span><span class="n">create</span><span class="p">();</span>
<span class="n">this</span><span class="o">.</span><span class="n">subscription</span> <span class="o">=</span> <span class="n">publishSubject</span>
<span class="o">.</span><span class="n">distinctUntilChanged</span><span class="p">()</span>
<span class="o">.</span><span class="n">throttleWithTimeout</span><span class="p">(</span><span class="n">THRESHOLD_MS</span><span class="p">,</span> <span class="n">TimeUnit</span><span class="o">.</span><span class="n">MILLISECONDS</span><span class="p">)</span>
<span class="o">.</span><span class="n">subscribe</span><span class="p">(</span><span class="n">this</span><span class="p">::</span><span class="n">onCallback</span><span class="p">,</span> <span class="n">onError</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">public</span> <span class="n">void</span> <span class="n">postViewEvent</span><span class="p">(</span><span class="n">final</span> <span class="n">VisibleState</span> <span class="n">visibleState</span><span class="p">)</span> <span class="p">{</span>
<span class="n">publishSubject</span><span class="o">.</span><span class="n">onNext</span><span class="p">(</span><span class="n">visibleState</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">public</span> <span class="n">void</span> <span class="n">unsubscribe</span><span class="p">()</span> <span class="p">{</span>
<span class="n">subscription</span><span class="o">.</span><span class="n">unsubscribe</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">private</span> <span class="n">void</span> <span class="n">onCallback</span><span class="p">(</span><span class="n">VisibleState</span> <span class="n">visibleState</span><span class="p">)</span> <span class="p">{</span>
<span class="n">onSuccess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">visibleState</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">public</span> <span class="n">static</span> <span class="k">class</span> <span class="nc">VisibleState</span> <span class="p">{</span>
<span class="n">final</span> <span class="nb">int</span> <span class="n">firstCompletelyVisible</span><span class="p">;</span>
<span class="n">final</span> <span class="nb">int</span> <span class="n">lastCompletelyVisible</span><span class="p">;</span>
<span class="n">public</span> <span class="n">VisibleState</span><span class="p">(</span><span class="nb">int</span> <span class="n">firstCompletelyVisible</span><span class="p">,</span>
<span class="nb">int</span> <span class="n">lastCompletelyVisible</span><span class="p">)</span> <span class="p">{</span>
<span class="n">this</span><span class="o">.</span><span class="n">firstCompletelyVisible</span> <span class="o">=</span> <span class="n">firstCompletelyVisible</span><span class="p">;</span>
<span class="n">this</span><span class="o">.</span><span class="n">lastCompletelyVisible</span> <span class="o">=</span> <span class="n">lastCompletelyVisible</span><span class="p">;</span>
<span class="p">}</span>
<span class="o">//</span> <span class="n">TODO</span> <span class="n">please</span> <span class="n">implement</span> <span class="n">equals</span> <span class="ow">and</span> <span class="n">hashCode</span><span class="p">,</span> <span class="n">required</span> <span class="k">for</span> <span class="n">the</span> <span class="n">distinction</span> <span class="n">logic</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The logic behind this code is the following. Each <em>RecyclerView</em> scroll event calls the <em>postViewEvent()</em> method, which
puts the provided <em>VisibleState</em> to the bus. Since that bus has a <em>distinctUntilChanged</em>, it won’t post any new
<em>VisibleState</em> which is equal to the current one. Since it has a <em>throttle</em>, it won’t be posted if another one comes
right after it. If no new event comes within 250 ms, the event will be propagated down the chain and in <em>onCallback()</em>,
we’ll finally call the provided function to track the <em>VisibleState</em>.</p>
<h3>Feedback welcome!</h3>
<p>I hope this post improved your knowledge of RxJava and RecyclerView APIs. Feel free to use this ready-to-go solution for
scrolled items tracking and suggest your improvements. You can find me on Twitter at
<a href="https://twitter.com/sergiizhuk">@sergiizhuk</a>.</p>How to Dress Code – The Creation of Fashion for Tech2017-05-08T00:00:00+02:002017-05-08T00:00:00+02:00Constanze Bilogantag:engineering.zalando.com,2017-05-08:/posts/2017/05/how-to-dress-code.html<p>We dress code. The brand for tech enthusiasts, programmers, and startup whizkids.</p><p>There are certain things that simply shouldn’t go together. Yet, miraculously, they do: Think about peanut butter and
jelly, Stan and Laurel or more on topic, fashion and tech.</p>
<p>Both of these departments are part of what make up Zalando’s DNA and are essential to our daily work. Both also have
clear opinions of the other, which make a collaboration between the two not the most seamless of tasks.</p>
<p>Our platform is powered by tech, and it was about time that the forces of the company unite to make fashion more
inclusive to those who work behind the scenes and build our fashion-as-a-service platform: <strong>Techies</strong>. With that in
mind, we came together to create a fashion line that lives up to their expectations, meets their style criteria and
makes them look good.</p>
<p>The fact that there was a clothing line being specially made for tech on the horizon had our entire tech department at
Zalando excited about the endeavour. They might not all be interested in fashion, but still they wear clothes.</p>
<p>Surveys, focus groups, MVPs and think tanks followed and gave birth to a holistic, loved, basic urban fashion line made
by zLabels for Tech: <strong>We dress code.</strong> A clothing line of smart styles, clean design, some gadgets and 100%
functionality – all the things that, according to our research, matter to techies.</p>
<p>Not only did the creation of <strong>We dress code.</strong> unite the worlds of fashion and tech, but it also brought together
different people and personalities from all over Zalando: Designers, UX, Product, Tech Employer Branding, all types of
engineers, community managers (just to name a few), with the Creative Department ultimately taking care of launching the
brand with a BANG.</p>
<p>This project was an exciting and challenging task for creatives like us, who usually think and create for fashion in the
context of glossy visuals and fashion shoppers in mind. The approach with <strong>We dress code.</strong> was a different one: How do
we speak Tech, a language that’s not our native tongue? Focus groups with in-house Techies provided us with input en
masse: Models? No, thanks. Fashion communication? Doesn’t reach them.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/227ffb34f7e39949baf95746afebf3d927c71704_wdc-environment.003.jpeg?auto=compress,format"></p>
<p>Tech talked, we listened. We created an awesome stage for the launch of <strong>We dress code.</strong> featuring influencers, with
claims that really speak Tech (“buttondown=validObject;”), a fun web series about Tech life, on top of highlighting the
quirks of wearing <strong>We dress code.</strong> The posting of wild posters all over Berlin to create maximum impact was the icing
on the cake – all without drawing from the aforementioned clichés.</p>
<p>Our new brand, <strong>We dress code.</strong>, is not only for our in-house Techies, the initial inspiration for all our efforts,
but for tech enthusiasts, programmers and startup whizkids from all over Europe.</p>
<p>See the full collection now at <a href="http://www.wedresscode.com">www.wedresscode.com</a>.</p>Selenium Conf Gets a Dose of Zalenium2017-05-04T00:00:00+02:002017-05-04T00:00:00+02:00Zalando Technologytag:engineering.zalando.com,2017-05-04:/posts/2017/05/selenium-conf-gets-a-dose-of-zalenium.html<p>Our Zalando developers recently presented Zalenium to the world in Austin, Texas.</p><p><a href="https://tech.zalando.com/blog/zalenium-a-disposable-and-flexible-selenium-grid-infrastructure/">Zalenium</a> is an open
source software extension to scale up and down your local grid dynamically with Docker containers. It uses
Docker-Selenium to run tests in Firefox/Chrome, and when a different browser is needed, tests get redirected to a cloud
testing service.</p>
<p>At the latest Selenium Conf, Zalando presented Zalenium to the world, with Diego Molina and Leo Gallucci giving their
rundown to Selenium developers and enthusiasts from around the world.</p>
<p>We know how complicated it is to have a stable grid, and how hard it is to maintain over time with enough capabilities
to cover most browsers and platforms. Watch Diego and Leo’s presentation below to hear how the tool was developed and
how tests suites run faster on local Firefox/Chrome nodes, utilising our cloud testing service we pay for in a smarter
way.</p>Taking a Walk in a Producer's Shoes2017-04-26T00:00:00+02:002017-04-26T00:00:00+02:00Irina Nistortag:engineering.zalando.com,2017-04-26:/posts/2017/04/taking-a-walk-in-a-producers-shoes.html<p>Every Producer has their own style at Zalando, but the agile mindset is what unites them.</p><p>When I started working as a <a href="https://tech.zalando.com/blog/an-introduction-to-the-producer-role-at-zalando/">Producer</a>,
my new team let me know that they feel motivated when they have an 80-20 week: 80% programming and 20% learning. My
challenge as a Producer is to help my team be organized well enough to focus on the right priorities, while limiting
their daily interruptions. To get up to speed with the <a href="https://tech.zalando.com/blog/an-introduction-to-the-producer-role-at-zalando/">Producer
role</a>, I asked two of my fellow
Producers, Sascha and Sophie, to talk about the methods and practices they use in their teams.</p>
<p>After speaking with both extensively, it turns out that they have similar experiences: Both have been with Zalando for
the past four years, both have backgrounds in Quality Assurance, which made them a perfect fit for the Producer role.</p>
<p>We sat together and identified some of the topics that are important for our work as Producers. In my experience so far,
every Producer has their own style, but what unites them is the agile mindset they study unquenchably either
individually, in guilds, or with our internal Agile Coaches.</p>
<h3>Sophie’s view on team process</h3>
<p>At Zalando, things change rapidly. Five new members joined Sophie’s team recently and this team change directly showed
that their way of working together needed to be reviewed. For advice, Sophie met with one of our <a href="https://tech.zalando.com/blog/how-agile-coaches-scale-continuous-improvement/">Agile
Coaches</a> who offered her support. She
plans to change our way of working by introducing a board that reflects our team workflow and the general setup of her
team. She also asked other Producers for further insights into their experiences with introducing a board. She started
to write down and visualize the new process with a simple workflow and is now ready to present it to the team.</p>
<h3>OKRs</h3>
<p>OKRs are Objectives and Key Results and if you want to learn about them, you’d better read <a href="https://www.amazon.com/Radical-Focus-Achieving-Important-Objectives-ebook/dp/B01BFKJA0Y"><em>Radical Focus</em> by Christina
Wodtke</a>. This method helps
teams visualize their goals, plans, and also highlights the importance of always keeping these goals in focus. I use a
whiteboard to visualize the four Radical Focus quadrants: OKRs, intention of the current sprint, health of the team, and
upcoming opportunities or threads. On a weekly basis, my team rates each quadrant and reviews what was achieved in the
previous week, followed by what is next in focus.</p>
<h3>Retrospectives</h3>
<p>The Retrospective is used to collect data about what events went well or badly during <a href="https://tech.zalando.com/blog/the-sprint-exposed--how-we-use-it-at-zalando/">the
sprint</a>, clustering them, generating
insights, and checking what could be done to improve processes or workflow. Retrospectives are at the heart of our
continuous improvement culture and I make sure I conduct one at the end of each sprint. I try to diversify the type of
moderation techniques that I use and take my inspiration from <a href="https://plans-for-retrospectives.com">Retromat</a>.</p>
<h3>Scrum of Scrums</h3>
<p>A Scrum of Scrums meeting is when all the Producers of our department meet twice a week to align on the progress of our
teams. We ensure that all stakeholders (Technical Project Managers, Producers, and other teams) are informed about their
tasks and our team’s progress, in order to run our projects smoothly and efficiently. We also use this opportunity to
keep track of the progress of other projects which are important for each team.</p>
<h3>Friday Demos in Sascha’s team</h3>
<p>On Friday we conduct Friday Demos, when we watch with admiration the progress our engineers have made throughout the
week. If there are no new features to show, we use the time to share knowledge or skills we’ve picked up along the way.
This meeting is not open to all, but only to the delivery team, our Team Lead, and our Product Specialist. You could
also think of our Friday Demo as a review meeting, where we receive feedback from our stakeholders and check if we match
their requirements. Last, but not least, we open our Kudo Box and read aloud the submitted <a href="https://management30.com/product/kudo-cards/">Kudo
Cards</a> before handing them over to the respective team member. We have had
the Kudo Box system in place for approximately six months and we’re really happy with how it has fostered a culture of
feedback and recognition.</p>
<h3>The Producer Guild</h3>
<p>The Producer Guild is an internal group that meets every two weeks to discuss how we can evolve the Producer role
further. We come up with ideas to share knowledge we have gained and evolve overall best practices for Producers
throughout the Tech department. As an example, we have something called the Session Market, where Producers offer guest
seats in their meetings or workshops to allow fellow Producers to find <a href="https://en.wikipedia.org/wiki/Job_shadow">shadowing
partners</a> over the course of a few days.</p>
<h3>Want more? Get in touch!</h3>
<p>We hope the above information has given you a deeper insight into what Producers across our department do for their
teams and the overall organization of services at Zalando. If you are curious and want to know more, check out our
<a href="https://www.linkedin.com/groups/8566668">LinkedIn Group</a> or follow us on Twitter:
<a href="https://twitter.com/IarinaIrina">@IarinaIrina</a> and <a href="https://twitter.com/WittigSascha">@WittigSascha</a>.</p>Achieving 3.2x Faster Scala Compile Time2017-04-19T00:00:00+02:002017-04-19T00:00:00+02:00Eric Torreborretag:engineering.zalando.com,2017-04-19:/posts/2017/04/achieving-3.2x-faster-scala-compile-time.html<p>Reducing compilation time of Scala programs can be challenging, but we got some great help.</p><p>For the Glitch team who looks after Quality Control at Zalando, the year 2017 started with a new resolution: Make our
compile times faster. Toward the end of 2016, the Glitch team experienced a steady increase in compilation time on their
project. In just one month, the compile time doubled and it was hard to understand why. This was clearly hampering the
team’s productivity, and a number of strategies were attempted to reduce the long compile time. For instance, some
improvement was obtained by removing wildcard imports, better code modularization, and by making some implicit values
explicits. However, it was still taking too long to compile, and the actual root cause was far from clear. In this
article, we describe how <strong>by engaging with</strong> <strong>Triplequote, we were able to obtain a 3.2x compilation time speedup</strong>.</p>
<h3>The problem</h3>
<p>The Glitch team is currently working on the delivery of an application called “Quala” (for “QUALity Assessment”). This
application enables the Content Team to check the quality of products that merchants want to sell on the Zalando
platform: Are the descriptions correct? Are the images of good quality? Does the product come with the right washing
machine instructions?</p>
<p>The backend of Quala is called “tinbox” and is written in <a href="http://www.scala-lang.org/">Scala</a>, using many type-intensive
libraries such as <a href="https://github.com/milessabin/shapeless">Shapeless</a>, <a href="https://github.com/circe/circe">Circe</a>,
<a href="https://github.com/zalando/grafter">Grafter</a>, and <a href="https://github.com/http4s/rho">http4s/rho</a>. One important design
goal behind these libraries is to reduce boilerplate by letting the Scala compiler generate as much ceremony code as
possible. However, the downside is that compile time can increase substantially. Unexpected interactions between macros
and implicit search can lead to an exponential growth of compilation time, and it is usually difficult to understand if
the long compile time is symptomatic of a deeper problem. This pushed us to get in touch with Triplequote, a Swiss
company that promises to relieve Scala teams of long compile times.</p>
<p>At the beginning of February, the Triplequote team joined the Glitch team in their Zalando office for three days. The
mission included the following goals:</p>
<ul>
<li>Evaluate <a href="https://triplequote.com/hydra">Triplequote Hydra</a>.</li>
<li>Identify areas for compilation-speed improvements.</li>
</ul>
<p>Let’s see how the problem was tackled.</p>
<h3>Methodology</h3>
<p>The first task was to collect metrics to objectively compare against. As the tinbox project uses
<a href="http://www.scala-sbt.org/">Sbt</a> as a build tool, it is meaningful to record both the “cold” and “warm” compile time.
The terminology “cold” and “warm” refers to the state of the JVM that an application is running on.</p>
<p>When launching an application, the JVM starts by loading the required classes and interpreting the code, and as it runs
it starts just-in-time compiling and optimizing the code paths that are taken more often. We call a JVM that isn't
optimized yet a "cold" JVM, while a JVM that is optimized is referred to as “warm”. Because Sbt is used to compile the
project, you can warm-up the JVM by entering the <a href="http://www.scala-sbt.org/0.13/docs/Running.html#sbt+shell">Sbt interactive
shell</a> and executing a few full compiles. In fact, you will
immediately notice that your sources take considerably more time to compile the first time, and that’s indeed because
the JVM is initially “cold”. The reason why it’s interesting to collect both “cold” and “warm” compile time is that when
sources are compiled on a Continuous Integration (CI) server, we usually observe a “cold” compile time. When a developer
compiles on their own machine, he or she will observe a “warm” compile time. There are clear productivity benefits in
reducing both “cold” and “warm” compile time. After all, the less one needs to wait, the more productive one can be.</p>
<p>All experiments were run on a Macbook Pro (Retina, late 2013), 16G, Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz, using
Scala 2.12.1 and Java 1.8.0_112, and giving Sbt 4G of memory (using the JVM flag -Xmx4G).</p>
<h3>Initial State</h3>
<p>The chart below reports the time in seconds that it takes to compile all tinbox sources (both main and test). Take a
look at how compilation time improves as the JVM warms up.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/401c64aecdd367500255a99c585a6fec5f1fd4f0_initial-main-test-chart1.png?auto=compress,format"></p>
<p>We now have the coarsed, grained numbers we will compare our work against. Let’s start our journey by discussing how
much speedup we could obtain by just using the Hydra Scala parallel compiler, <em>without making any change</em> to the tinbox
codebase.</p>
<h3>Evaluating Triplequote Hydra</h3>
<p>Using Hydra on a Scala project is simple, as it consists of just adding the sbt-hydra plugin to the project/plugins.sbt.
After this small change, all of the project’s sources are compiled in parallel using Hydra, utilizing four workers. We
chose to work with 4 cores because modern developer machines have 4 physical cores. Hydra can use more cores if
available.</p>
<p>The next chart visually compares the tinbox project’s compile time performance when using the vanilla Scala 2.12.1
versus Hydra.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6dfe648c2dfd866f852fb20dbdc2b7bdc1278821_tinbox-main-test-chart2.png?auto=compress,format"></p>
<p>If we compare the best full compile time result with the vanilla Scala 2.12.1 (64 seconds) against the best result with
Hydra (26 seconds), we see that <strong>using Hydra yields a 2.66x compile time speedup</strong> with a warm JVM.</p>
<p>Furthermore, the <strong>cold compile time performance is considerably improved. In fact, the cold compile time when using
Hydra is shorter than the warm compile time when the vanilla Scala 2.12.1 compiler is used!</strong></p>
<p>After evaluating Hydra we moved on to the second goal, which consisted in identifying areas for single-threaded
compilation-speed improvements.</p>
<h3>Improving single-threaded compilation time</h3>
<p>To improve single-threaded compile performances, it was paramount to gain greater insights on what the Scala compiler
does. We had to be able to answer questions such as:</p>
<ol>
<li>How much time does each compiler phase take?</li>
<li>What are the sources that take the most to compile?</li>
<li>What work is the compiler doing when compiling a single source? (this is especially relevant for sources that take
more time than expected to compile).</li>
</ol>
<p>But before going any further, let’s take a quick detour and briefly touch on the Scala compiler architecture.</p>
<h3>The Scala Compiler Architecture</h3>
<p>The Scala compiler is made up of many phases. Each phase takes as its input an Abstract Syntax Tree (AST) and returns a
new, transformed AST. To see the Scala compiler phases just pass the flag -Xshow-phases when invoking scalac.</p>
<div class="highlight"><pre><span></span><code><span class="o">$</span><span class="w"> </span><span class="n">scalac</span><span class="w"> </span><span class="o">-</span><span class="n">Xshow</span><span class="o">-</span><span class="n">phases</span>
<span class="w"> </span><span class="n">phase</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="n">description</span>
<span class="w"> </span><span class="o">----------</span><span class="w"> </span><span class="o">--</span><span class="w"> </span><span class="o">-----------</span>
<span class="w"> </span><span class="n">parser</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">parse</span><span class="w"> </span><span class="n">source</span><span class="w"> </span><span class="n">into</span><span class="w"> </span><span class="n">ASTs</span><span class="p">,</span><span class="w"> </span><span class="n">perform</span><span class="w"> </span><span class="n">simple</span><span class="w"> </span><span class="n">desugaring</span>
<span class="w"> </span><span class="n">namer</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="n">resolve</span><span class="w"> </span><span class="n">names</span><span class="p">,</span><span class="w"> </span><span class="n">attach</span><span class="w"> </span><span class="n">symbols</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">named</span><span class="w"> </span><span class="n">trees</span>
<span class="n">packageobjects</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="nb">load</span><span class="w"> </span><span class="n">package</span><span class="w"> </span><span class="n">objects</span>
<span class="w"> </span><span class="n">typer</span><span class="w"> </span><span class="mi">4</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">meat</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">potatoes</span><span class="p">:</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">trees</span>
<span class="w"> </span><span class="n">patmat</span><span class="w"> </span><span class="mi">5</span><span class="w"> </span><span class="n">translate</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">expressions</span>
<span class="n">superaccessors</span><span class="w"> </span><span class="mi">6</span><span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="n">super</span><span class="w"> </span><span class="n">accessors</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">traits</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">nested</span><span class="w"> </span><span class="n">classes</span>
<span class="w"> </span><span class="n">extmethods</span><span class="w"> </span><span class="mi">7</span><span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="n">extension</span><span class="w"> </span><span class="n">methods</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">inline</span><span class="w"> </span><span class="n">classes</span>
<span class="w"> </span><span class="n">pickler</span><span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="n">serialize</span><span class="w"> </span><span class="n">symbol</span><span class="w"> </span><span class="n">tables</span>
<span class="w"> </span><span class="n">refchecks</span><span class="w"> </span><span class="mi">9</span><span class="w"> </span><span class="n">reference</span><span class="o">/</span><span class="n">override</span><span class="w"> </span><span class="n">checking</span><span class="p">,</span><span class="w"> </span><span class="n">translate</span><span class="w"> </span><span class="n">nested</span><span class="w"> </span><span class="n">objects</span>
<span class="w"> </span><span class="n">uncurry</span><span class="w"> </span><span class="mi">10</span><span class="w"> </span><span class="n">uncurry</span><span class="p">,</span><span class="w"> </span><span class="n">translate</span><span class="w"> </span><span class="n">function</span><span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">anonymous</span><span class="w"> </span><span class="n">classes</span>
<span class="w"> </span><span class="n">fields</span><span class="w"> </span><span class="mi">11</span><span class="w"> </span><span class="n">synthesize</span><span class="w"> </span><span class="n">accessors</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">fields</span><span class="p">,</span><span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="n">bitmaps</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">lazy</span><span class="w"> </span><span class="n">vals</span>
<span class="w"> </span><span class="n">tailcalls</span><span class="w"> </span><span class="mi">12</span><span class="w"> </span><span class="n">replace</span><span class="w"> </span><span class="n">tail</span><span class="w"> </span><span class="n">calls</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="n">jumps</span>
<span class="w"> </span><span class="n">specialize</span><span class="w"> </span><span class="mi">13</span><span class="w"> </span><span class="err">@</span><span class="n">specialized</span><span class="o">-</span><span class="n">driven</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="n">specialization</span>
<span class="w"> </span><span class="n">explicitouter</span><span class="w"> </span><span class="mi">14</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="n">refs</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">outer</span><span class="w"> </span><span class="n">pointers</span>
<span class="w"> </span><span class="n">erasure</span><span class="w"> </span><span class="mi">15</span><span class="w"> </span><span class="n">erase</span><span class="w"> </span><span class="n">types</span><span class="p">,</span><span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="n">interfaces</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">traits</span>
<span class="w"> </span><span class="n">posterasure</span><span class="w"> </span><span class="mi">16</span><span class="w"> </span><span class="n">clean</span><span class="w"> </span><span class="n">up</span><span class="w"> </span><span class="n">erased</span><span class="w"> </span><span class="n">inline</span><span class="w"> </span><span class="n">classes</span>
<span class="w"> </span><span class="n">lambdalift</span><span class="w"> </span><span class="mi">17</span><span class="w"> </span><span class="n">move</span><span class="w"> </span><span class="n">nested</span><span class="w"> </span><span class="n">functions</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">top</span><span class="w"> </span><span class="n">level</span>
<span class="w"> </span><span class="n">constructors</span><span class="w"> </span><span class="mi">18</span><span class="w"> </span><span class="n">move</span><span class="w"> </span><span class="n">field</span><span class="w"> </span><span class="n">definitions</span><span class="w"> </span><span class="n">into</span><span class="w"> </span><span class="n">constructors</span>
<span class="w"> </span><span class="n">flatten</span><span class="w"> </span><span class="mi">19</span><span class="w"> </span><span class="n">eliminate</span><span class="w"> </span><span class="n">inner</span><span class="w"> </span><span class="n">classes</span>
<span class="w"> </span><span class="n">mixin</span><span class="w"> </span><span class="mi">20</span><span class="w"> </span><span class="n">mixin</span><span class="w"> </span><span class="n">composition</span>
<span class="w"> </span><span class="n">cleanup</span><span class="w"> </span><span class="mi">21</span><span class="w"> </span><span class="n">platform</span><span class="o">-</span><span class="n">specific</span><span class="w"> </span><span class="n">cleanups</span><span class="p">,</span><span class="w"> </span><span class="n">generate</span><span class="w"> </span><span class="n">reflective</span><span class="w"> </span><span class="n">calls</span>
<span class="w"> </span><span class="n">delambdafy</span><span class="w"> </span><span class="mi">22</span><span class="w"> </span><span class="n">remove</span><span class="w"> </span><span class="n">lambdas</span>
<span class="w"> </span><span class="n">jvm</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="n">generate</span><span class="w"> </span><span class="n">JVM</span><span class="w"> </span><span class="n">bytecode</span>
<span class="w"> </span><span class="n">terminal</span><span class="w"> </span><span class="mi">24</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">last</span><span class="w"> </span><span class="n">phase</span><span class="w"> </span><span class="n">during</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">compilation</span><span class="w"> </span><span class="n">run</span>
</code></pre></div>
<p>As you can see, each Scala source has to go through 24 phases before binaries are produced. Of course, some phases take
more time than others to execute. In particular, the typer phase is known to often take 30%+ of the whole compile time,
as it takes care of typechecking, which is a fundamental operation in a statically typed language such as Scala.</p>
<h3>Gaining insights</h3>
<p>We said we needed to gain visibility into what the compiler is doing, but how can we do so? The bad news is that there
is little to no tool available today that can help with this task. The good news is that Triplequote is developing
certain tooling to address this problem. All metrics reported in this section are obtained using Triplequote tooling.</p>
<p>The first question that needed to be answered was: <em>How much time does each compiler phase take?</em></p>
<p>This question is interesting because the time per phase gives us a broad view on whether there might be opportunities to
speed up compilation. The histogram below gives a high-level view of the time (in milliseconds) consumed by each phase
for compiling main and test sources (with a cold JVM).</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c480f3c4140cdb7d813e3d55e35c9637fdc72813_chart-3.png?auto=compress,format"></p>
<p>The one phase that you should pay attention to is [typer], as it takes in both cases more than 34 seconds to execute.
This means typechecking accounts for more than 60% of the whole compile time, which is definitely atypical. Because the
tinbox project uses several type-intensive libraries, it is not entirely surprising to find out that typechecking
sources takes time. However, it was remarkable that the typechecking time for test sources was so long, considering
there were less than 5k LOC. Hence, the decision to take a closer look at the test sources.</p>
<h3>Investigating tests</h3>
<p>To direct our efforts, we needed to know which test source took the most to typecheck. With the help of Triplequote
tooling, we collected the following statistics:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/44ef903ee95d8a3cdeb708b77f7d58e24294ced5_screen-shot-2017-04-19-at-10.24.12.png?auto=compress,format"></p>
<p>ConfigsRouteSpec.scala is the test source file that took the most to compile. What’s stunning is that
ConfigRouteSpec.scala contains only 56 lines of code, for a total of two unit tests. How could such a small source take
so long to typecheck?</p>
<p>We needed more visibility into what the Scala compiler was doing. The next table reports two insightful metrics we
collected on ConfigRouteSpec.scala:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/738ae195ffc8ee9289f3fa7609744745dc287fcc_screen-shot-2017-04-19-at-10.21.38.png?auto=compress,format"></p>
<p>The problem was evident: The many macro expansions were responsible for ConfigRouteSpec.scala’s long typechecking time.
To understand whether this was normal or not, we had to have a look at the code generated by the triggered macros.</p>
<h3>Macro generated code</h3>
<p>To see the code generated by macros we can simply inspect the AST of the source ConfigRouteSpec.scala after the typer
phase (after typer, all code generated by macros is in the AST). To print the AST after typer we use the Scala compiler
option -Xprint:typer.</p>
<p>As expected, the amount of code generated by macros into the AST was substantial. In particular, we noticed that all
macro code was injected into the following helper method:</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span><span class="w"> </span><span class="n">route</span><span class="p">(</span><span class="nl">configDetails</span><span class="p">:</span><span class="w"> </span><span class="k">Option</span><span class="o">[</span><span class="n">ConfigDetails</span><span class="o">]</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">ConfigsRoute</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="n">configure</span><span class="o">[</span><span class="n">ConfigsRoute</span><span class="o">]</span><span class="p">(</span><span class="n">ApplicationConfig</span><span class="p">.</span><span class="n">test</span><span class="p">)</span>
</code></pre></div>
<p>You don’t need to understand what the method does. What’s interesting is to look at definition of the configure method:</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span><span class="w"> </span><span class="n">configure</span><span class="o">[</span><span class="n">A</span><span class="o">]</span><span class="p">(</span><span class="nl">c</span><span class="p">:</span><span class="w"> </span><span class="n">ApplicationConfig</span><span class="p">)(</span><span class="n">implicit</span><span class="w"> </span><span class="nl">r</span><span class="p">:</span><span class="w"> </span><span class="n">ConfigReader</span><span class="o">[</span><span class="n">A</span><span class="o">]</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">A</span>
<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">r</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
</code></pre></div>
<p>Note that configure takes an additional, implicit parameter that needs to be filled in by the compiler. What was
intriguing is that the Scala compiler synthesized this value using macros instead of using an existing value in the
<strong>implicit scope</strong>, and that was why the source file took so long to typecheck.</p>
<p>The interesting part in the implicit scope is the ConfigsRoute companion object:</p>
<div class="highlight"><pre><span></span><code><span class="k">object</span><span class="w"> </span><span class="n">ConfigsRoute</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">implicit</span><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="nl">reader</span><span class="p">:</span><span class="w"> </span><span class="n">ConfigReader</span><span class="o">[</span><span class="n">ConfigsRoute</span><span class="o">]</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="n">createReader</span>
<span class="err">}</span>
</code></pre></div>
<p>As you can see, there is an implicit definition that can be used to create ConfigReader[ConfigsRoute] implicit value
instances. But why wasn’t this implicit picked up?</p>
<p>Before digging deeper into the problem, we tested that passing the argument explicitly would have an impact on
compilation time:</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span><span class="w"> </span><span class="n">route</span><span class="p">(</span><span class="nl">configDetails</span><span class="p">:</span><span class="w"> </span><span class="k">Option</span><span class="o">[</span><span class="n">ConfigDetails</span><span class="o">]</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">ConfigsRoute</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="n">configure</span><span class="o">[</span><span class="n">ConfigsRoute</span><span class="o">]</span><span class="p">(</span><span class="n">ApplicationConfig</span><span class="p">.</span><span class="n">test</span><span class="p">)(</span><span class="n">ConfigsRoute</span><span class="p">.</span><span class="n">reader</span><span class="p">)</span>
</code></pre></div>
<p>With this small change, compilation time of ConfigRouteSpec.scala was <strong>drastically reduced to 99ms</strong>, which is 56x
faster than it was initially!</p>
<p>While great, the above is not an ideal solution, as no one likes to pass implicit values explicitly. Said otherwise, we
treated the symptom but not the cause. In fact, we would like the Scala compiler to find and use the ConfigsRoute.reader
implicit value, instead of synthesizing an implicit value using expensive macros. So why wasn’t the Scala compiler
injecting the desired implicit value?</p>
<p>The answer turned out to be simple: The problem was that the expensive macros used to synthesize a
ConfigReader[ConfigsRoute] instance were imported into the local scope via a package object. Hence, the Scala compiler
couldn’t do anything else other than use these macros to create an implicit value instance of
ConfigReader[ConfigsRoute].</p>
<p>Armed with this knowledge, the solution to the problem consisted of ensuring that the macro code that was previously
triggered would no longer be accessible from the ConfigRouteSpec.scala source. The consequence of this refactoring is
that the Scala compiler would now look for an implicit ConfigReader[ConfigsRoute] value in the implicit scope of
ConfigsRoute, and it will manage to find ConfigsRoute.reader as expected. Therefore, the implementation of the route
method could be reverted to its original state <em>without losing the obtained compile time speedup</em>.</p>
<p>It’s worth mentioning that while the solution to this inefficiency in compile time was relatively simple, it would have
been impossible to know where to focus our efforts without adequate diagnostic tooling. It’s Triplequote’s intention to
integrate diagnostic tooling into Hydra, and hence automatize the process of detecting compile time inefficiencies.</p>
<p>It was now time to re-run a full project compile and compare the compilation time for our current optimized state versus
the initial state.</p>
<h3>Optimized State</h3>
<p>The chart below visually compares the tinbox project’s compile time performance prior to and after implementing the
discussed code optimization.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c6a64f2dfbac5e0140428873034ffa0c7d812fc4_chart-4.png?auto=compress,format"></p>
<p><strong>Single-threaded compile time has improved by 17% on a cold JVM and 37% on a warm JVM.</strong></p>
<h3>Optimized State with Triplequote Hydra</h3>
<p>Finally, we wanted to check that the initial speedup obtained with Hydra was still there after having optimized
single-threaded compilation time. Hence, we ran once more the same experiments, but this time using Hydra.</p>
<p>The next chart visually compares the compile time performance when using the vanilla Scala 2.12.1 versus Hydra.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/cc2711e4be6d6dcd25901e98fb2f0e580274de5f_optimized-tinbox-chart5.png?auto=compress,format"></p>
<p><strong>Notice how using Hydra yields a 2x compile time speed-up with a warm JVM</strong>. And cold compile time performance is 33%
faster when using Hydra.</p>
<p>This all looks very promising and we now need to validate those good results by deploying Hydra in the team and on the
Continuous Integration server. In particular we are checking:</p>
<ul>
<li>If we can confirm the productivity gains across the day, which is a mix of cold/hot compilations, either full or
incremental</li>
<li>If there is still a benefit of running Hydra on machine having only 2 cores with hyperthreading</li>
<li>If Hydra is robust and doesn’t break on some new code structure which we would be introducing</li>
<li>How we can collaborate to make better diagnostic tools to better understand the performance bottlenecks and how they
evolve as the project grows</li>
</ul>
<h3>Summary</h3>
<p>Reducing compilation time of Scala programs can be challenging, but with the help of Triplequote we have obtained a
drastic speedup. Using Hydra yielded a 2.66x compile time reduction for free on the initial tinbox codebase. This is
impressive, as all we had to do was add a Sbt plugin to our build.</p>
<p>Moreover, thanks to their expertise and advanced tooling, we were able to pinpoint compile time inefficiencies that
would otherwise have gone unnoticed. By detecting single-threaded inefficiencies and using Hydra to parallelize
compilation, the tinbox project compiles now 3.2x faster!</p>
<p>If you’re interested in finding out more about our compilation time improvements, I’d be happy to chat! Get in touch via
Twitter at <a href="https://twitter.com/etorreborre">@etorreborre</a>.</p>Adapting to the Mobile Consumer2017-04-18T00:00:00+02:002017-04-18T00:00:00+02:00Kristina Walcker-Mayertag:engineering.zalando.com,2017-04-18:/posts/2017/04/adapting-to-the-mobile-consumer.html<p>Our Mobile Apps Production Manager shares how Zalando refashioned its approach to mobile.</p><p>It’s 2017 and mobile is mainstream. Last year, 59% of German customers <a href="http://www.slideshare.net/ING/world-on-the-move-for-mobile-banking?ref=https://www.ezonomics.com/ing_international_surveys/mobile_banking_2016/">shopped using their mobile
device</a>.
The whole consumer experience has changed dramatically over the last decade. Where ten years ago, shoppers researched
products online before purchasing in-store, these days consumers prefer a more holistic experience; one where
researching, comparing, and buying are not fragmented pieces, but one convenient whole.</p>
<p>When I first started at Zalando in 2014, our mobile app was mainly an engagement tool people used to browse. Even as
recently as three years ago, we didn’t believe people would shop mainly on mobile. Our app played a supporting role with
little to no unique mobile content.</p>
<p>Several months later, the number of people shopping by mobile began to increase drastically. A transformation was
happening in consumer culture. Devices were growing better and faster, while social media ushered in a new mode of
fashion inspiration. Content creators on Facebook, Instagram and YouTube became just as important as fashion magazines.
People were plugged into fashion 24/7, and therefore retailers had to be too.</p>
<p>We realised we had to change how we looked at our service. The shift from desktop to mobile was happening much faster
than that from offline to online shopping. We had to transform our approach rapidly. ‘Mobile first’ became our
objective.</p>
<h3>Becoming Mobile First</h3>
<p>To embrace the changing culture of the mobile consumer, we revised how we ourselves at Zalando understood mobile.</p>
<p>We created strategic workstreams with various Zalando teams, implementing the m-bassador programme where teams sent a
‘mobile ambassador’ to disruption workshops and training sessions. Testing stations were distributed throughout our
offices where staff could interact with devices showcasing new ways to engage and retain mobile customers. In Autumn
2015, we held a <a href="https://tech.zalando.com/blog/why-zalando-is-celebrating-mobile-first-day/">#MobileFirst</a> day with
speakers from Uber, Facebook, Google, and others. We wanted to create a buzz to inspire and enable staff to go back to
their teams and change their daily work, putting mobile first.</p>
<h3>Teachings</h3>
<p>We learned mobile users mustn’t feel as though they’re using an adaptation of the desktop screen, but a separate,
sophisticated application which caters to their needs and preferences. For example, we built scrollable lookbooks and
created videos which are only available on mobile. We did a complete overhaul of the home screen and dramatically
changed our PDP. We rethought the entire mobile customer experience, adapting browse and shop functionalities. Discovery
was made possible on mobile where it wasn’t previously a feature.</p>
<p>Companies who are slow to offer their customers the whole journey, from the initial contact to aftercare, are at risk of
being left behind. It’s clear we must minimise the reasons why customers go back to desktop shopping. Honing the client
journey is an ongoing process: integrating online returns, speeding things up, offering more convenient service,
introducing innovative payment functionalities, etc. If shoppers want a comprehensive experience from start to finish,
we need to support them.</p>
<p>That mobile sales now outstrip desktop sales in Zalando speaks to the success of ‘mobile first’ thinking, and just how
beneficial it is to work with both tech and commercial teams. Discovering the true mobile experience: flexibility,
speed, touch, and play, is challenging but vital if we want to keep up with consumer needs.</p>
<h3>Tomorrow’s Mobile</h3>
<p>There’s lots to look forward to. Big Data is intriguing. The opportunities it offers retailers are incredible. Zalando’s
aim is to be the number one destination for fashion, and to do this we need to continue to personalise our customer
experience. Big Data will allow us to do that. Recommending big labels isn’t rocket science, but understanding customers
and offering new brands that are truly related is really special. People get amped up about personalized music or movie
recommendations. They feel that certain platforms get them. There’s a feeling, for example, that <a href="https://www.wired.com/2016/08/spotifys-latest-algorithmic-playlist-full-favorite-new-music/">Spotify’s
algorithms</a> are totally in
tune with the customer, and able to anticipate their wants, from the biggest bands down to undiscovered artists.
Similarly, <a href="https://www.wired.com/2013/08/qq_netflix-algorithm/">Netflix</a> is lauded for its sophisticated
recommendations that keep viewers tuned-in. If we can emulate this anticipation and customer knowledge with fashion,
we’re on the right track.</p>
<p>Yes, mobile is a remote device, but it’s bringing us closer to our customers, and that’s where we want to be.</p>Parallel Computing with Scala2017-04-13T00:00:00+02:002017-04-13T00:00:00+02:00Mohd Nadeem Akhtartag:engineering.zalando.com,2017-04-13:/posts/2017/04/parallel-computing-with-scala.html<p>What can we do to solve problems that don’t fit on a single core CPU?</p><p>In the current world, data is growing progressively, with larger or more complex computation problems needing to be
solved. What should we do about solving problems that don’t fit on a single core CPU or can’t be solved in reasonable
time?</p>
<p><strong>Parallel programming</strong> not only solves these problems but also reduces computation time. All computers are
<strong>multi-core processors</strong> these days, even your iPhone 4s has two cores. In this article, I will look over the possible
scenarios where we can apply parallel computing and the factors affecting computation time. We will also look into the
benchmarking of parallel programs.</p>
<p><em>Parallel computing is a type of computation where different computations can be performed at the same time.</em></p>
<p><em>Basic principle:</em> The problem can be divided into subproblems, each of which can be solved simultaneously.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2d0650500b1748fc351a5e8db9742e25a16da79d_fotorcreated2.jpg?auto=compress,format"></p>
<p>Many of us cannot find <strong>how parallel programming differs from concurrent programming</strong>. Let’s take a step back and
understand the concept and reasoning behind parallel computation.</p>
<p><strong>Parallel computation:</strong></p>
<ul>
<li>Optimal use of parallel hardware to execute computation quickly</li>
<li>Division into subproblems</li>
<li>Mainly concerns about: Speed</li>
<li>Mainly used for: Algorithmic problems, numeric computation, Big Data processing</li>
</ul>
<p><strong>Concurrent programming:</strong></p>
<ul>
<li>May or may not offer multiple execution starts at the same time</li>
<li>Mainly concerns about: Convenience, better responsiveness, maintainability</li>
</ul>
<p>There are different levels of parallelism e.g. bit level, instruction level, and task level. For the purpose of this
article, we will focus on task level parallelism.</p>
<p>Before getting into parallel programming problems, let’s first understand how the hardware behaves.</p>
<p>Processor vendors like IBM and Oracle provide a multi-core CPU on the same processor chip, each of which is capable of
executing a separate instruction stream:</p>
<ul>
<li>Multi-core processor</li>
<li>Symmetric multiprocessors</li>
<li>Graphic processing unit</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0a8268b3c67acf7ccb69dd9d333292eeb8db7584_screen-shot-2017-04-14-at-12.38.27.png?auto=compress,format"></p>
<p>Parallel programming is much harder than sequential programming. It even makes developer life harder. However, the speed
at which results can be delivered is a big plus on the side of parallel programming.</p>
<p>The operating system and JVM are the underlying environments which make parallelism tasks possible. As we know, two
different processes don’t share the memory which serves as a blocker for running the task in parallel. But <em>each process
contains multiple independent concurrent units called threads</em>. Threads share the same memory address space.</p>
<h3>Task Level parallelism</h3>
<p>A task can be stated as executing separate instruction streams in parallel.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c7ef035e6b5cc6c7cc7ab8a38ae770d3ae29678c_screen-shot-2017-04-14-at-12.35.43.png?auto=compress,format"></p>
<p>The signature of parallel execution, where taskA and taskB are called by name, means it doesn’t execute sequentially.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/bb04c0a90fef180436bd3774fad7d5e4088bce26_screen-shot-2017-04-14-at-12.39.01.png?auto=compress,format"></p>
<p>Here you can see the method ‘task’, which is scheduled on a separate thread</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c7e7fdb9ba8ac510988326441a7836ca942556af_screen-shot-2017-04-14-at-12.39.28.png?auto=compress,format"></p>
<p>Let’s look at an example:</p>
<p>Find the total number of ways a change can be made for the specified list of coins for the specified amount of money.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d2c9b110869acee13d1d3959e9a5aab39f9f5469_screen-shot-2017-04-14-at-12.40.09.png?auto=compress,format"></p>
<p>This method “countChange” runs sequentially and produces a result in ‘<strong>82.568 ms</strong>’:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a10de2430f72fd0a94cdd57ed0ada7276592612c_screen-shot-2017-04-14-at-12.40.35.png?auto=compress,format"></p>
<p>Whereas the parallel version runs in ‘<strong>48.686 ms</strong>’ with a speedup of ‘1.695 ms’.</p>
<p>What hardware are we using for this example? The MacBook Pro "Core i5":</p>
<ul>
<li>Processor 2.4 GHz (dual independent processor "cores" on a single silicon chip)</li>
<li>Memory 8GB</li>
</ul>
<p><strong>Of course you can’t parallelise everything.</strong> Let’s try to understand the cost of splitting and combining the data
structures, such as arrays.</p>
<p>Assume we have a <strong>four core processor</strong> and an array size 100 over, which we have to perform a <strong>filter</strong> operation on.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c853774c2ce3db98ec6d533e50b21e0ceeb2a83a_screen-shot-2017-04-14-at-12.40.59.png?auto=compress,format"></p>
<p>At the leaf level of the parallel reduction tree, it will traverse N elements in N/4 computational steps and perform
filtering. Process P0 and P1 produces two arrays and, in similar way, P2 and P3 also.</p>
<p>In the next step, two arrays will be combined together in N/2 computational steps. Finally, at the root level, we have
to combine two arrays which take N computational steps.</p>
<p>While summing up all the computational steps, we have found that 7N/4 > N > N/4.</p>
<p>The overall complexity is O(n + m), which is not efficient. Also, we can see that combining takes as much time as
filtering does. Let’s define the combine operation in order to understand it.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/89ea7a02f6caef26d0a68bc30618149d6908b0c7_screen-shot-2017-04-14-at-12.41.22.png?auto=compress,format"></p>
<p>As we can see, it will take 2n + 2m = 2(n+m). In complexity analysis constants are ignored, hence O(n+m).</p>
<p>The data structure looks like this:</p>
<ul>
<li>HashTables compute the hashcode of the elements where lookup, insert, and delete operations take constant time O(1)</li>
<li>Balanced tree expect O(logn) to reach any Node</li>
<li>Mutable linked list can have O(1) concatenation</li>
</ul>
<p>With this we can see that it’s possible to combine data structures in trivial time. In order to proceed with parallel
computation, we have to perform the execution in O(logn + logm) time.</p>
<h3>Why should we benchmark parallel programs?</h3>
<p>Performance benefits are the main reason why we write parallel programs in the first place. It computes performance
metrics for parts of the program, which is why it’s even more important than benchmarking sequential programs.</p>
<p><strong>Factors affecting performance:</strong></p>
<ul>
<li>The number of processors</li>
<li>Latency while accessing memory and throughput</li>
<li>Processor speed</li>
<li>Cache behavior</li>
<li>Garbage collection</li>
<li>JIT compilation</li>
<li>Thread scheduling</li>
</ul>
<p><strong>ScalaMeter</strong> is a benchmarking and performance regression testing framework for the JVM which is a great tool for this
purpose. It computes performance metrics for the part of the program. The metrics could be running time, memory
footprint, network traffic, disk usage, and latency.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8c2a53b9e2cc20a40b6ec7ce6879e34f009e8411_screen-shot-2017-04-14-at-12.41.47.png?auto=compress,format"></p>
<h3>Summary</h3>
<p>In this article I have covered the basic principle of parallel programming, the role of hardware resources, complexity,
and benchmarking. I would like to highlight a few important points which usually pop up while designing parallel
computation algorithms.</p>
<ul>
<li>Limit of parallelisation is bound by number of cores</li>
<li>Use of threshold, even though you have enough cores in order to avoid parallelism overhead.</li>
</ul>
<p>Then there's the question of when should you NOT use parallelised computation?</p>
<ul>
<li>If you do not have enough parallel resources: Whether it is the hardware core and its abstraction though a thread
pool, or something less obvious like memory bandwidth.</li>
<li>If the overhead of parallelisation exceeds the benefit: Even if you have enough cores, there is overhead in doing
computation in parallel. This includes, among other things, spawning the tasks and scheduling</li>
<li>Both of the above mentioned conditions alone can make parallel execution less efficient than sequential ones, so a
robust algorithm should ensure that none of these issues come up.</li>
</ul>
<p>Consider the case when the array has only 100 elements and we have to perform an addition of elements in parallel on
several cores. Here the overhead of parallelising the computation is not worth it, irrespective of how many cores you
have. It’s simply not enough data.</p>
<p>On the other hand, if we have an array with 1000 elements and we have to perform filtering or any other operation upon
it, than parallelising the computation becomes worth it.</p>
<p>I hope to keep exploring this area and am working on coming up with a tutorial or workshop. In the meantime if you have
questions, you can reach me through Twitter at <a href="https://twitter.com/mohdnadeem">@mohdnadeem</a>.</p>Improving Swift Compilation Times from 12 to 2 Minutes2017-04-12T00:00:00+02:002017-04-12T00:00:00+02:00Joao Nunestag:engineering.zalando.com,2017-04-12:/posts/2017/04/improving-swift-compilation-times-from-12-to-2-minutes.html<p>Module optimization and cutting compilation times for a better app experience overall.</p><p>With our Fleek app growing and new features being introduced, its compile times have started to become a real challenge.
We recently discovered that to compile the app or just make a minor change it would take approximately 12 minutes.</p>
<p>We wanted to cut this time dramatically to improve the customer experience overall, as well as our work overhead. In
this article, I’ll show how we managed to decrease it to just 2 minutes.</p>
<p>In Xcode, we have the option to select three optimization levels: <em>None</em>, <em>Fast</em> and <em>Fast, Whole Module Optimization</em>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/db5d3110d4b42f3dec754f4df7af561a89e298f6_swifttime1.png?auto=compress,format"></p>
<p>Using <em>Whole Module Optimization</em> makes compilation very fast. But choosing <em>Fast</em> or <em>Fast, Whole Module Optimization</em>
won’t allow a developer to debug the application. If we select one of the above, compile the app, and then try to debug
the app, we see this in the console:</p>
<div class="highlight"><pre><span></span><code><span class="n">App</span><span class="w"> </span><span class="n">was</span><span class="w"> </span><span class="n">compiled</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="n">optimization</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">stepping</span><span class="w"> </span><span class="n">may</span><span class="w"> </span><span class="n">behave</span><span class="w"> </span><span class="n">oddly</span><span class="p">;</span><span class="w"> </span><span class="n">variables</span><span class="w"> </span><span class="n">may</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="n">be</span><span class="w"> </span><span class="n">available</span><span class="o">.</span>
</code></pre></div>
<p>So how can we activate the whole module optimization without activating the -O flag? The solution to this problem is
easy and was found via <a href="https://news.ycombinator.com/item?id=13214431">this post from Y Combinator</a>. Xcode will not
allow us to complete this task via the UI options. Thus, we need to add it as a User-Defined Setting.</p>
<p>To do so, open the target build settings and set your <em>Optimization Level</em> to <em>None</em> in Debug configuration. Next, add a
new setting called <em>SWIFT_WHOLE_MODULE_OPTIMIZATION</em> and set it to <em>YES</em>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/23e571139b9b8d364619a0539013890793afe68c_swifttime2.png?auto=compress,format"></p>
<p>And you’re done. With this technique, the app was able to compile from a clean state in about 2:26 mins.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0d84c1679ff721a9d373df7b5dac8392c67a7bc6_swifttime3.png?auto=compress,format"></p>
<p>Changing a file that most of the app depends on now takes only 1 minute instead of the full recompilation time.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1febabb732a5effa262d9ab6d6ee448a3abad791_swifttime4.png?auto=compress,format"></p>
<p>What does this mean for Fleek?</p>
<p>As an iOS team making changes everyday to our code, this improvement was important for better time management and cost
efficiency of our work, on top of overall customer experience. Let’s say we complete 20 compilations a day. This means
that we saved approximately 26 hours of compilation time in a day, or about three extra developers.</p>
<p>More information on whole module optimization can be found <a href="https://swift.org/blog/whole-module-optimizations/">here</a>.
If you are curious about how to measure compile times, I have used <a href="https://github.com/RobertGummesson/BuildTimeAnalyzer-for-Xcode">this project on
GitHub</a> in the past with much success.</p>Tech Destination: Berlin2017-04-10T00:00:00+02:002017-04-10T00:00:00+02:00Marc Lamiktag:engineering.zalando.com,2017-04-10:/posts/2017/04/tech-destination-berlin.html<p>Attracting great talent is key to growing Berlin's tech ecosystem, and we're up to the challenge.</p><p>In terms of branding, Berlin does very well. It’s been known as a haven for gifted fugitives or creators – or both –
since the roaring twenties, and while the city’s squat skyline remains low the ceiling for success is inordinately high.
Berlin’s startup scene is a constant
<a href="https://techcrunch.com/2016/07/07/how-berlin-can-become-europes-no-1-tech-hub/">newsmaker</a>, as its growth rockets and
it’s inevitably compared to older tech hubs like London or Silicon Valley.</p>
<p>The work hard, party hard capital comes with more than a few myths, even in the platform industry. That steaming cup of
coffee on your co-worker’s desk on a Monday morning might very well suggest they’ve come straight to the office from
Berghain. It happens. It’s a reputation the city largely enjoys, and a handy conversation starter at international
conferences; that you’re more likely to find a couple of beers in the kitchen of a Berlin startup than containers of
<a href="https://www.nytimes.com/2015/05/25/technology/in-busy-silicon-valley-protein-powder-is-in-demand.html?_r=0">protein
shakes</a>.</p>
<p>I first moved to Berlin eight years ago, seeking somewhere I could involve myself deeply in startup culture. I wasn’t
disappointed. A series of early movers such as brands4friends brought the first of some very talented people to the
city; Zalando followed, together with Soundcloud. The late 00s marks the time Berlin recognised itself as a startup hub
and started growing its community in earnest.</p>
<p>As with any ecosystem, there are advantages and disadvantages to living, working and creating in Berlin. Affordability,
lifestyle, and diversity hangs in a constantly changing balance with complex paperwork and a tug-o-war for the best
talent. But we’re not worried: Berlin’s agile, driven nature is as inexhaustible as its partygoers.</p>
<h3>Benefits</h3>
<p>Berlin is affordable. Londoners are <a href="https://www.theguardian.com/cities/2015/sep/23/housing-trap-how-berlin-avoid-following-london-pricey-footsteps">reported to
spend</a>
72% of their income on rent, while in Berlin the figure drops to half that figure; this when London pays startup
employees only <a href="http://www.ifse.de/uploads/media/IFSE_Booming_Berlin_English.pdf">$3,000 more per year than Berlin</a>. The
work/life balance is also kinder than in other famous tech hubs, especially as regards families. The three years of
kindergarten before school are free, the education system is good, and pleasant, family-friendly boroughs are still
close to the city centre.</p>
<p>Over <a href="http://europeanstartupmonitor.com/fileadmin/esm_2016/report/ESM_2016.pdf">30% of startup employees</a> in Germany are
foreign nationals, while approximately <a href="https://www.statistik-berlin-brandenburg.de/pms/2016/16-02-11.pdf">621,000 of Berlin’s 3.5 million
residents</a> do not hold a German passport. These
numbers make for a vibrant, international city which ups the attraction for potential incoming talent. The diversity of
personnel is reflected in the spectrum of industries represented. Berlin has some interesting fintech startups like
<a href="https://n26.com/">N26</a> and <a href="https://www.finleap.com/">Finleap</a>, but it’s not a fintech city. Soundcloud is one of the
most exciting platforms to emerge from Europe, but creative services don’t dominate the scene. We have a vibrant gaming
background with the likes of <a href="https://www.wooga.com/">Wooga</a>, as well as strong e-commerce and service platforms like
<a href="https://www.home24.de/">Home24</a> and <a href="https://www.deliveryhero.com/">Delivery Hero</a>.</p>
<p>For younger companies, the ecosystem is equally rich. Incubator concepts like Betahaus and Factory enable businesses to
come here, start fairly small and work together in a vibrant, creative atmosphere. The abundance of meetups,
discussions, workshops, and pitching events allow for valuable connections and a healthy, supportive community.</p>
<p>Although we’re still trying to join the dots between tech hubs within Europe and beyond, the hunger and potential for
connection is huge. Berlin enjoys a close relationship with Stockholm, comparable in terms of maturity, business
mentality and the types of startups to emerge. Soundcloud’s connection to New York and Zalando’s tech hubs in Dublin and
Helsinki also highlight the benefits that come from linking cities and focusing on relationships additional to Silicon
Valley.</p>
<h3>Challenges</h3>
<p>Like any tech hub, business in Berlin doesn’t come without challenges. Germany has a well-founded reputation for being
somewhat enamoured with bureaucracy, especially when compared to other tech hubs in Europe. In a <a href="http://www.reuters.com/article/us-britain-eu-berlin-idUSKCN12K0HB">recent survey by a
German startup monitor</a>, over half the founders
questioned said the government’s grasp of startup requirements was “unsatisfactory” or “deficient”. Coupled with the
absolute necessity of German language proficiency and the complicated tax rules, young international startups have a lot
of leg work to do before they place their order for crates of Club Mate.</p>
<p>Competition for talent is getting tougher year on year, especially as large companies like Amazon are drawn to the city
and inevitably seek out the most capable people. For young startups, this unfortunately means a much harder fight for
personnel. To ensure Berlin remains attractive internationally, we need to support potential founders or talent, and
significantly simplify the setup process. The more great people we get here, the better the whole ecosystem becomes.</p>
<h3>Future of Berlin</h3>
<p>It’s tough to anticipate Berlin’s future given how much the ecosystem has changed in just the last eight years. What we
are seeing recently is greater confidence from venture capitalists. <a href="http://www.ey.com/Publication/vwLUAssets/EY-Start-up-Barometer-2016/$FILE/EY-Start-up-Barometer-2016.pdf">Ernst & Young
reported</a> that
€2.145 billion were invested in Berlin’s startups in 2015– €1.254 billion more than in 2014; a figure that ranked the
city as Europe’s biggest venture capital beneficiary. Cherry Ventures raised a €150 million fund last year, and many
more former founders of the Berlin area who made their exits are now reinvesting in the local scene; an absolutely vital
component of maturing any ecosystem.</p>
<p>Greater connectivity between hubs, more talent, and a streamlined setup process are all areas we’d like to improve in
the coming years, but as Berlin pushes harder and diversifies more, we’re likely to see these challenges shrink against
the city’s ever-growing strength and world-famous dynamism.</p>Nine Tips for Planning User Research in Foreign Countries2017-04-06T00:00:00+02:002017-04-06T00:00:00+02:00Carina Kuhrtag:engineering.zalando.com,2017-04-06:/posts/2017/04/nine-tips-for-planning-user-research-in-foreign-countries.html<p>Check out our advice for planning field research in different countries for Zalando overall.</p><p>Zalando is active in 15 European markets. In addition to an array of other user testing activities, in 2016 we performed
field research in France, UK, Italy, Switzerland and Germany to get a better understanding of the differences in
consumer behavior and the local needs of our customers. We have experimented with different set-ups on how to conduct
local interviews and after successfully organising 77 of them, we have the following advice to offer about planning
field research in different countries.</p>
<h3>The set-up of the research</h3>
<p>As an in-house UX research team, our ambition is always to do as much research on our own and outsource as little as
possible to agencies or freelancers. We believe this most effectively embeds insights into the organization and is
therefore the more sustainable approach. Furthermore, it is well in line with our objective to become a truly
customer-centric company.</p>
<p>We reached out to all of our market teams within Zalando and put together experts from different departments to join us
on our 2-3 day field research trips.</p>
<p>This is how our field trips looked like overall:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f4dc0b30d5858f09994819097569db5f0c6a845d_planninguserresearch.png?auto=compress,format"></p>
<h3>The goal of the research: Discovery</h3>
<p>In comparison to our usability tests -- which aim at evaluating concrete features -- we took a more exploratory approach
here and wanted to learn about the characteristics of local shopping behavior. To prepare for the research trip we
consulted our market research, customer satisfaction, and customer care departments. Based on this we formed assumptions
about our local customers’ pain points.</p>
<p>Our aim for every field trip was not only confirming or rejecting our assumptions, but more importantly to derive a
customer journey map that shows us all the pain points we need to fix or create solutions for.</p>
<p>Thus, the discussions that we had with customers were focussed on their general (online and offline) fashion shopping
behavior and their perception of our service, as well as the service that competitors offer. With every customer we
conducted a 1.5 hour insight interview about general shopping behavior, but also included a little on-site test of
Zalando and local competitors.</p>
<p>This is what we learned about the execution of field trips with up to 16 colleagues.</p>
<h3>Learnings</h3>
<p><strong>Include a variety of cities and areas in your research</strong>
Once we narrowed down the markets we wished to focus on for this exercise, deciding on specific countries was not a
tough decision. However, choosing a place for our field research in these countries turned out to be more complicated.
Should we go where most of our active customers are? Or should we rather interview non-customers or even the ones that
used Zalando once and never came back? Should we go to the really big cities or focus on the countryside? Which areas
are most and least representative for their country? What about travelling connections to and from Berlin, and shouldn’t
we choose a city with a good public transport system? After all, we were 12-16 Zalando colleagues with a limited amount
of time for a research trip.</p>
<p>In the end, the cities we chose were well-thought through compromises of each of the above mentioned considerations.</p>
<p>Some things that we learned along the way were that we wanted to increase the diversity of answers and get a better
picture of the needs of urban as well as rural customers. After the first two field trips, we made sure we included
different cities and more rural areas in our research.</p>
<p><strong>Work with customer-facing colleagues to recruit participants</strong>
Because of time constraints, we worked together with external partners who scheduled the customer interviews for us.
They provided a good service, but we had to give up control over the quality of the participants. In one case, our
recruiting partner was not able to find enough customers for us and we spontaneously decided to ask people on the street
if they would have time for a coffee and a discussion about online shopping.</p>
<p>In the future, we want to work more closely together with our own customer-facing departments and set up our own
recruiting processes.</p>
<p><strong>Get your colleagues on board</strong>
For us it was extremely beneficial to invite colleagues from a variety of departments to our research trip. Not only did
we find a lot of native speakers in our local market teams that made it possible to conduct interviews in the local
language, but we were also exposed to a more holistic view of local market needs from a brand, commercial, operations
and customer care perspective.</p>
<p>Tip: If your company has a customer care team, try to convince some of your customer care agents to join the research.
They are trained to speak to customers and naturally do a very good job in conducting interviews. If your service
operates in different countries, you will most probably also find native speakers in customer care.</p>
<p><strong>Give colleagues a crash course in UX research and manage expectations</strong>
Of course, not everybody was trained in research and conducting user interviews. For some of our colleagues this was the
first time they had participated in something like this. We spent some time on a user research crash course, explaining
to everybody involved about why we do what we do, how it is done, what to pay attention to and finally, what to avoid in
an interview. We also planned interview practice sessions and provided feedback. This helped our colleagues to improve
their technical interview skills and also made them more confident about meeting customers in real life.</p>
<p><strong>Prepare all the material and send reminders</strong>
The more lean and efficient you plan your research to be, the better you need to prepare it. Since we travelled with a
group of 12-16 colleagues, we needed to make the research trip itself as short as possible to save resources. We split
up in groups of three and conducted the interviews in parallel. This way, we were able to conduct up to 18 interviews in
two days and our colleagues only needed to spend 2.5 days of their work week on tour.</p>
<p>We did not want to leave anything to chance, so we tried to equip everybody with everything they needed:</p>
<ul>
<li>We set up calendar invites for the research teams with the address, phone number, and a short description of the
interviewee</li>
<li>We provided everybody with an English and national version of the interview guide</li>
<li>We set up Google folders and templates to upload photos and interview notes directly after the interviews</li>
<li>We sent reminders to our colleagues to upload their interview notes</li>
<li>We created a WhatsApp group to answer urgent questions and share early interesting customer quotes</li>
</ul>
<p><strong>Plan the commute beforehand</strong>
Navigating an unknown city can be extremely stressful, especially when you are a bigger group of people and under time
pressure. Often, our first interviews would start a few hours after arrival at the airport, so we would make sure that
we looked up all the addresses beforehand and planned our commutes thoroughly. It sometimes took up to 2 hours (London)
to get to an in-home interview and we did not want to leave anything to chance.</p>
<p><strong>Book tables for dinner and debrief</strong>
With tight schedules of three interviews per day, there was often little or no time to gather and discuss interview
insights in between, so we always made sure that we had some time in the evening to go out for dinner and debrief. We
usually updated each other by introducing the person we’ve interviewed to the rest of the group and give a summary of
what we learned about their special characteristics and shopping behavior during the interview.</p>
<p><strong>Take pictures</strong>
Taking pictures is something that you often forget as a researcher because you are so immersed in the interview. For
later buy-in of decision makers, but also for a general illustration of research insights, it is extremely helpful to
have photos of the interviewees. This makes them more real and creates empathy.</p>
<p><strong>Share the results widely</strong>
Research trips of this scale raise high expectations among colleagues in the respective teams. In the beginning we
underestimated how many parties would benefit from our insights and made the mistake of only sharing the insights with
the people that were involved. When we started to invite more people to our presentations and share results more openly
throughout the organization, we noticed a positive influence on the interest in user research and an increase of buy-in
for what we do.</p>
<p>Take these tips to heart and take home -- like we did -- great insights about your international customers. Use them to
build products that your customers love.</p>
<p>If you’d like more information about our user research techniques and tips, find me on Twitter at
<a href="https://twitter.com/careeeeena">@careeeeena</a>.</p>Crafting a Digital Fashion Vocabulary2017-04-04T00:00:00+02:002017-04-04T00:00:00+02:00Christoph Langetag:engineering.zalando.com,2017-04-04:/posts/2017/04/crafting-a-digital-fashion-vocabulary.html<p>Fashion and tech make a powerful partnership, you just need to know the lingo.</p><p>Companies involved in the business of fashion like Zalando have shifting priorities, and technology has become more
important than ever. Technology at its heart is incredibly granular – highly detailed, and composed of many complex
parts. This description can also be applied to fashion: Trends, seasons, fabrics, textures – the entire industry is
astonishing in its scope and depth.</p>
<p>Contextualising the world of fashion to create the most relevant digital footprint is at the core of our work when it
comes to fashion insights. How can we map the trajectories of trends and stay relevant? How can companies be technically
innovative while being en vogue and design-led in terms of usability and relevance? The way we communicate this mindset
is key.</p>
<p>Consumers often struggle to describe the different types of fashion items they’re looking for, and while this is a very
promising area of research, we need to also ensure we’re the experts when it comes to terminology and language. Software
can combine an understanding of what we know about consumers and what we understand about them, but it’s equally
important to stay on trend to account for fashion’s fast paced nature. This means expanding your range beyond mere
words, including images, styles, and behaviour.</p>
<p>Fashion recommendations and trend forecasting have a technical underbelly that smart companies are already taking
advantage of. To do so, they’re bringing together the worlds of data science and fashion to collect and create a digital
fashion lexicon. What is the difference between a chevron stripe and pinstripe? How can e-commerce players recognise
whether a certain curve in the jeans trend is on an upward or downward trajectory? Zalando is putting together and
updating physical and digital fashion glossaries to better support their engineering teams and bolster research in the
realm of trend forecasting.</p>
<p>By bringing in industry veterans of the fashion world and pairing them with product, engineering, and data science
professionals, innovations such as image recognition can be optimised by deep learning to deliver truly relevant content
for consumers of fashion. For example, we’ve integrated a Fashion Librarian into our Fashion Insights Centre in Dublin,
who has utilised her background in the fashion industry to clarify fashion terminology for entity recognition.
Explaining the difference between 'style' and 'look' clarified the focus for our data scientists and guided their
research in a precise direction. It enabled them to understand and develop a better grasp of trend forecasting and the
trajectory of trends.</p>
<p>Understanding the fashion industry and its transient nature is integral to forecasting, where we must stay on top of
what is happening on a daily basis. This is one of the most important roles that a Fashion Librarian plays for Zalando –
their constant pulse check on the latest in fashion ensures we’re kept in the e-commerce game. Following on, the
translation and contextualisation of this information educates our engineers, whose job it is to create the best tools
and services for our customers and the greater e-commerce world.</p>
<p>Fashion and technology make a powerful partnership. Their relationship has created a much wider understanding of both
industries from either side, opening up opportunities that weren’t even foreseeable five years ago. In the realm of
fashion recommendations and trend forecasting, subtle differences can make or break your attempts to innovate, which is
why we’re setting the groundwork via a well-crafted digital fashion vocabulary. By looking after the smallest of
details, you’ll be making the most fashionable of impressions – you just need to know the lingo.</p>An Open Source Pulse Check at Zalando for 20172017-03-30T00:00:00+02:002017-03-30T00:00:00+02:00Natali Vlatkotag:engineering.zalando.com,2017-03-30:/posts/2017/03/an-open-source-pulse-check-at-zalando-for-2017.html<p>Zalando depends on open source technologies to exist, so where are we at with some of our projects?</p><p>Zalando depends on open source technologies to exist. Take a look at our Tech Radar and you’ll see PostgreSQL, Kafka,
React, and many other household-name projects listed there. We also depend heavily on our own <a href="https://github.com/zalando">open source
software</a> — and in the last year, a growing number of developers and teams from other
companies have begun to do the same. Many of our projects are now co-developed by Zalandos and talented devs from
RedHat, Mozilla, and more.</p>
<p>Here’s a quick pulse check on the open source projects we’ve built and are passionate about, which have also garnered
some great feedback and support from the wider community.</p>
<h3>Connexion</h3>
<p>In mid 2016 we were lucky enough to <a href="https://tech.zalando.com/blog/connexion-interview-with-tony-tam/">sit down with Swagger creator Tony
Tam</a>, who is a big fan of
<a href="https://github.com/zalando/connexion">Connexion</a>, our Swagger/OpenAPI First framework for Python on top of Flask.
<em>“There have been a few efforts to do a true design-first implementation of REST APIs and Zalando has been right on the
leading edge of that movement”, says Tam. “It has been a delight for the development community to see a large retailer
putting efforts into an open source framework.”</em></p>
<p>Tam’s praise has not gone unnoticed, with Connexion remaining one of our strongest open source projects to date with an
active group of regular contributors. We’re currently looking for help with additional ways to handle OAuth 2
authentications, overriding default validation error messages, and documentation for response handling, passing
arguments to functions, and so on. A <a href="https://waffle.io/zalando/connexion">Waffle Board</a> for issues has been set up to
show a better overview and further information.</p>
<h3>SwiftMonkey</h3>
<p>Our open source dark horse comes at you in the form of <a href="https://github.com/zalando/SwiftMonkey">SwiftMonkey</a>, a
framework for performing randomised UI testing of iOS apps. It also contains a related framework called SwiftMonkeyPaws
that provides a visualization of the generated events, increasing the usefulness of your testing. Released in November
2016, the framework picked up its first external pull request not long after, and has surpassed 650 GitHub stars.</p>
<p>We’re incredibly proud of the grassroots support SwiftMonkey has earned, which is a testament to creator <a href="https://github.com/DagAgren?tab=activity">Dag Ågren’s
efforts</a> around essential open source resources: A great README, contributor
guidelines, and implementation of the <a href="http://contributor-covenant.org/version/1/4/">Contributor Covenant</a>.</p>
<h3>Skipper</h3>
<p>An HTTP router, <a href="https://github.com/zalando/skipper">Skipper</a> has been built with Go and acts as a reverse proxy with
support for custom route definitions. It can also be used out of the box or have custom filters and predicates added.
Largely inspired by <a href="https://github.com/vulcand/vulcand">Vulcand</a>, it’s a core component of Zalando’s <a href="https://www.mosaic9.org/">Project
Mosaic</a>, a set of services and libraries that support a microservice-style architecture for
large scale websites, or what we like to call microservices for the frontend.</p>
<p>What does Skipper do? It identifies routes based on a request’s properties, such as path, method, host and headers, then
routes each request to the configured server endpoint. It also allows the modification of requests and responds with
filters that are independently configured for each route. To get started, all you need is <a href="https://golang.org/">the latest version of
Go</a>… so go!</p>
<p>We’ve also recently begun using Skipper as an <a href="https://github.com/zalando-incubator/kubernetes-on-aws/issues/169">ingress for
Kubernetes</a>.</p>
<h3>Patroni</h3>
<p><a href="https://github.com/zalando/patroni">Patroni</a> is the Zalando template that allows you to create your own customized,
high availability solution using Python and – for maximum accessibility – a distributed configuration store like
<a href="https://zookeeper.apache.org/">ZooKeeper</a>, <a href="https://github.com/coreos/etcd">etcd</a> or
<a href="https://github.com/hashicorp/consul">Consul</a>. A recent development in the project will be exciting for Kubernetes
users: Patroni is currently being reworked to be useful for teams running Kubernetes on top of Google Compute Engine,
with a <a href="https://github.com/kubernetes/charts/tree/master/incubator/patroni">Helm Chart</a> available that lets users deploy
a five-node Patroni cluster using a Kubernetes PetSet.</p>
<p><a href="https://patroni.readthedocs.io/en/latest/">Thorough documentation?</a> Check. Easily navigable? Check again. Patroni’s
maintainers have been sure to provide the wider community with as much information and supporting resources for their
project and associated technologies as possible, making this repository a stalwart in our open source catalogue. Simple,
easy-to-grasp <a href="https://github.com/zalando/patroni/blob/master/docs/CONTRIBUTING.rst">contributing guidelines</a> are also
available, along with incredibly detailed <a href="https://github.com/zalando/patroni/releases">release notes</a> for past and
current versions.</p>
<h3>Friboo</h3>
<p>Looking for a lightweight utility library for writing microservices in Clojure, with support for
<a href="http://swagger.io/">Swagger</a> and <a href="https://oauth.net/">OAuth?</a> Look no further than
<a href="https://github.com/zalando/friboo">Friboo</a>. Another project following the <a href="https://tech.zalando.com/blog/crafting-effective-microservices-in-python/">API First
approach</a>, our Zalando Clojure community
maintain this project with gusto, allowing you to first define your API in a portable, language-agnostic format, and
then implement it (with the help of <a href="https://github.com/sarnowski/swagger1st">swagger1st</a>).</p>
<p>Friboo also has a great <a href="https://leiningen.org/">Leiningen</a> template to make using it simpler, with the service focusing
on project automation and declarative configuration. Friboo’s <a href="https://github.com/zalando/friboo#components">Hystrix
dashboard</a> is another ace feature, in the case that you have compliance
requirements to follow. <a href="https://github.com/Netflix/Hystrix">Hystrix</a>, as many might know, is a Netflix creation.</p>
<p>Maintainers are ready to welcome new contributors with open arms. Check out the <a href="https://github.com/zalando/friboo/issues">Issues
Tracker</a> to see where your input would be welcomed.</p>
<h3>Grafter</h3>
<p>To bring this roundup full circle we have <a href="https://github.com/zalando/grafter">Grafter</a>, a library to configure and wire
Scala applications with <a href="http://etorreborre.github.io/specs2/">specs2</a> creator Eric Torreborre at the heart of the
project. There are <a href="https://github.com/adamw/macwire">many libraries</a> or
<a href="http://www.cakesolutions.net/teamblogs/2011/12/19/cake-pattern-in-depth">approaches</a> for going about <a href="https://en.wikipedia.org/wiki/Dependency_injection">dependency
injection</a> in Scala. Grafter goes back to basics on dependency
injection by <em>just using constructor injection</em>: No reflection, no xml, no annotations, no inheritance or self-types.</p>
<p>Constructive feedback is incredibly welcomed on the project. How is it better or worse than other libraries? Is the core
model more approachable than other libraries? What could be improved? Another example of first-class documentation,
Grafter lays out step-by-step application configuration, the creation of your first component, as well as making
singletons. For interested Scala-heads, open issues for the library can be <a href="https://github.com/zalando/grafter/issues">found
here</a>.</p>
<p>--</p>
<p>Zalando’s open source culture continues to evolve, with much of our learnings reflected in our <a href="https://github.com/zalando/zalando-howto-open-source">how-to
guidelines</a>. Get in touch on Twitter with
<a href="https://twitter.com/LauritaApplez">@LauritaApplez</a> to chat all things open source and <a href="https://github.com/zalando">contribute
today</a>!</p>HMM PySpark Implementation: A Zalando Hack Week Project2017-03-29T00:00:00+02:002017-03-29T00:00:00+02:00Sergio Gonzalez Sanztag:engineering.zalando.com,2017-03-29:/posts/2017/03/hmm-pyspark-implementation-a-zalando-hack-week-project.html<p>Training models using larger amounts of data for a great Zalando Hack Week project.</p><p>Every year, Zalando’s <a href="https://tech.zalando.com/blog/hack-week-5-is-live/">Hack Week</a> gives us the opportunity to join
together in cross-disciplinary teams to solve a wide variety of problems (you can check this year’s amazing winners
<a href="https://tech.zalando.com/blog/the-finish-line--hack-week-5-awards-and-more/">here</a>). The projects come from any point
of the organization and we are encouraged to band together with other employees across locations and business units.</p>
<p>For our 2016 edition of Hack Week, we implemented a PySpark version of Hidden Markov Model (HMM). HMM has been
extensively used in a wide variety of problems such as speech recognition, stock market prediction or behavioural
analysis.</p>
<p>Our approach was based on the Python library <em>hmmlearn</em>. It provides a complete set of tools for multinomial (discrete),
Gaussian and Gaussian mixture HMM. <em>hmmlearn</em> was previously part of the well-known machine learning library
scikit-learn (http://scikit-learn.org/stable/). This implementation is limited by two major factors:</p>
<ul>
<li>Memory: it constrains the amount of data we can use for training. Traditionally, the datasets used for training an
HMM contained just a few thousand of samples and could be easily fitted in a single machine. Nowadays, we cannot
limit ourselves to just one single machine if we want to harness the power of Big Data.</li>
<li>CPU: as we will see later, training an HMM model is a very expensive process. If we want to integrate HMM into live
systems, we need to reduce the time required to update our models.</li>
</ul>
<p>Since prediction using HMM is easily parallelizable (the model can be applied independently to each input sequence), we
focused our efforts on training. An HMM model is trained using the Expectation-Maximization (EM) algorithm. During the
expectation phase, the current model parameters are used to compute the expected value of the likelihood function. In
the maximization (M) step, the model parameters are updated to maximize the likelihood function.</p>
<p>The EM is an iterative algorithm. The initial value of the model’s parameters may be chosen randomly. Then, on each
iteration, the parameters are recomputed and the likelihood function increases. EM always converges but unfortunately,
it is not possible to guarantee that the algorithm will find the global maximum. Since each iteration uses the values of
the parameters computed in the previous one, it is not possible to parallelize the overall process. Our efforts were
thus aimed to distribute the computations required on each one of the iterations.</p>
<p>Assuming the input sequences are stored in a PySpark data frame, each distributed iteration is made up of the following
steps:</p>
<p><strong>1.</strong> The values of the transition matrix, emission probabilities and start probabilities are broadcasted:</p>
<div class="highlight"><pre><span></span><code>transmat_broadcast = sparkSession.sparkContext.broadcast(self.transmat_)
emissionprob_broadcast = sparkSession.sparkContext.broadcast(self.emissionprob_)
startprob_broadcast = sparkSession.sparkContext.broadcast(self.startprob_)
</code></pre></div>
<p><strong>2.</strong> Complete the E step. It can be parallelized easily by computing the necessary stats independently of each input
sequence (using the map method in the rdd).</p>
<div class="highlight"><pre><span></span><code>partial_rdd = df.rdd.map(lambda row: _process_sequence(row['sequence']))
</code></pre></div>
<p>It consists of the following steps:</p>
<ul>
<li>Compute the logarithmic likelihood according to the distribution of the input sequences (multinomial in our case).</li>
</ul>
<!-- -->
<div class="highlight"><pre><span></span><code>framelogprob = np.log(emissionprob_broadcast.value)[:, X].T
n_samples, n_components = framelogprob.shape
</code></pre></div>
<ul>
<li>Calculate the forward and backward lattice.</li>
</ul>
<!-- -->
<div class="highlight"><pre><span></span><code><span class="gh">#</span> _BaseHMM._do_forward_pass()
fwdlattice = np.zeros((n_samples, n_components))
_hmmc._forward(n_samples, n_components,
log_mask_zero(startprob_broadcast.value),
log_mask_zero(transmat_broadcast.value),
framelogprob, fwdlattice)
logprob = logsumexp(fwdlattice[-1])
<span class="gh">#</span> _BaseHMM._do_backward_pass()
bwdlattice = np.zeros((n_samples, n_components))
_hmmc._backward(n_samples, n_components,
log_mask_zero(startprob_broadcast.value),
log_mask_zero(transmat_broadcast.value),
framelogprob, bwdlattice)
</code></pre></div>
<ul>
<li>Calculate the posteriors.</li>
</ul>
<!-- -->
<div class="highlight"><pre><span></span><code>log_gamma = fwdlattice + bwdlattice
log_normalize(log_gamma, axis=1)
posteriors = np.exp(log_gamma)
</code></pre></div>
<ul>
<li>Compute the changes on the transition matrix, emission and start probabilities using the previously obtained values.</li>
</ul>
<!-- -->
<div class="highlight"><pre><span></span><code><span class="err">#</span><span class="w"> </span><span class="n">accumulate</span><span class="w"> </span><span class="n">sufficient</span><span class="w"> </span><span class="k">statistics</span><span class="w"> </span><span class="n">variables</span>
<span class="err">#</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="k">start</span><span class="w"> </span><span class="n">probabilities</span>
<span class="k">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">posteriors</span><span class="o">[</span><span class="n">0</span><span class="o">]</span>
<span class="err">#</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">transition</span><span class="w"> </span><span class="n">probabilities</span>
<span class="k">if</span><span class="w"> </span><span class="n">n_samples</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">1</span><span class="err">:</span>
<span class="w"> </span><span class="n">lneta</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n_samples</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">n_components</span><span class="p">,</span><span class="w"> </span><span class="n">n_components</span><span class="p">))</span>
<span class="w"> </span><span class="n">_hmmc</span><span class="p">.</span><span class="n">_compute_lneta</span><span class="p">(</span><span class="n">n_samples</span><span class="p">,</span><span class="w"> </span><span class="n">n_components</span><span class="p">,</span><span class="w"> </span><span class="n">fwdlattice</span><span class="p">,</span>
<span class="w"> </span><span class="n">np</span><span class="p">.</span><span class="nf">log</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">transmat_</span><span class="p">),</span>
<span class="w"> </span><span class="n">bwdlattice</span><span class="p">,</span><span class="w"> </span><span class="n">framelogprob</span><span class="p">,</span><span class="w"> </span><span class="n">lneta</span><span class="p">)</span>
<span class="w"> </span><span class="n">trans</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">np</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="n">logsumexp</span><span class="p">(</span><span class="n">lneta</span><span class="p">,</span><span class="w"> </span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span>
<span class="k">else</span><span class="err">:</span>
<span class="w"> </span><span class="n">trans</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n_samples</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">n_components</span><span class="p">,</span><span class="w"> </span><span class="n">n_components</span><span class="p">))</span>
<span class="err">#</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">emission</span><span class="w"> </span><span class="n">probabilities</span>
<span class="n">obs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n_components</span><span class="p">,</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">n_features</span><span class="p">))</span>
<span class="k">for</span><span class="w"> </span><span class="n">t</span><span class="p">,</span><span class="w"> </span><span class="n">symbol</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">enumerate</span><span class="p">(</span><span class="n">X</span><span class="p">)</span><span class="err">:</span>
<span class="w"> </span><span class="n">obs</span><span class="o">[</span><span class="n">:, symbol</span><span class="o">]</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">posteriors</span><span class="o">[</span><span class="n">t</span><span class="o">]</span>
<span class="k">return</span><span class="w"> </span><span class="o">[</span><span class="n">start.tolist(), trans.tolist(), obs.tolist(), float(logprob)</span><span class="o">]</span>
</code></pre></div>
<p>Aggregate the partial results of the previous step.</p>
<div class="highlight"><pre><span></span><code><span class="n">aggregation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">partial_df</span><span class="p">.</span><span class="n">groupBy</span><span class="p">().</span><span class="n">agg</span><span class="p">(</span>
<span class="w"> </span><span class="n">array</span><span class="p">(</span><span class="o">*</span><span class="p">[</span><span class="n">sum</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s">"start"</span><span class="p">).</span><span class="n">getItem</span><span class="p">(</span><span class="n">i</span><span class="p">))</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="n">in</span><span class="w"> </span><span class="n">range</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">n_components</span><span class="p">)]).</span><span class="n">alias</span><span class="p">(</span><span class="s">"start"</span><span class="p">),</span>
<span class="w"> </span><span class="n">array</span><span class="p">(</span><span class="o">*</span><span class="p">[</span>
<span class="w"> </span><span class="n">array</span><span class="p">(</span><span class="o">*</span><span class="p">[</span><span class="n">sum</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s">"trans"</span><span class="p">).</span><span class="n">getItem</span><span class="p">(</span><span class="n">i</span><span class="p">).</span><span class="n">getItem</span><span class="p">(</span><span class="n">j</span><span class="p">))</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">j</span><span class="w"> </span><span class="n">in</span><span class="w"> </span><span class="n">range</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">n_components</span><span class="p">)])</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="n">in</span><span class="w"> </span><span class="n">range</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">n_components</span><span class="p">)]).</span><span class="n">alias</span><span class="p">(</span><span class="s">"trans"</span><span class="p">),</span>
<span class="w"> </span><span class="n">array</span><span class="p">(</span><span class="o">*</span><span class="p">[</span>
<span class="w"> </span><span class="n">array</span><span class="p">(</span><span class="o">*</span><span class="p">[</span><span class="n">sum</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s">"obs"</span><span class="p">).</span><span class="n">getItem</span><span class="p">(</span><span class="n">i</span><span class="p">).</span><span class="n">getItem</span><span class="p">(</span><span class="n">j</span><span class="p">))</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">j</span><span class="w"> </span><span class="n">in</span><span class="w"> </span><span class="n">range</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">n_features</span><span class="p">)])</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="n">in</span><span class="w"> </span><span class="n">range</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">n_components</span><span class="p">)]).</span><span class="n">alias</span><span class="p">(</span><span class="s">"obs"</span><span class="p">),</span>
<span class="w"> </span><span class="n">sum</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s">"logprob"</span><span class="p">)).</span><span class="n">alias</span><span class="p">(</span><span class="s">"logprob"</span><span class="p">),</span>
<span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="sc">'*'</span><span class="p">)</span>
<span class="p">).</span><span class="n">collect</span><span class="p">()[</span><span class="mh">0</span><span class="p">]</span>
</code></pre></div>
<p><strong>3.</strong> Complete the M step. The values of the transition matrix, emission probabilities and start probabilities are
updated with the aggregated values of the stats computed in the previous step.</p>
<p><strong>4.</strong> Convergence monitor. If the update of the model parameters is smaller than the training tolerance, or if we have
reached the maximum number of iterations, then stop or go back to the first step.</p>
<p>Our distributed HMM implementation was tested on EMR 5.2.0 using Python 3.4 and Spark 2.0.2. We compared the training
phase of the standalone version of the algorithm on an m4.2xlarge instance versus the distributed version running on a
cluster made up of 15 m4.2xlarge nodes. The training data was made up of integer sequences drawn from a multinomial
distribution. The training times were computed using datasets containing 1,000 sequences and up to 10 million sequences.</p>
<p>The figure below contains the results we obtained from our experiments. The standalone version was only faster on the
1,000-sequences dataset, where the cost of communication between executors overwhelmed the gain of using distributed
computing. The standalone version hitt its limits on the 500,000-sequences dataset, with a training time of 91.6
minutes. Our distributed implementation completed the training in only 6.4 minutes! This means an improvement in speed
of a factor of 14.3 (really close to 15, the maximum we could reach since we were using 15 nodes in our cluster). We
also experimented with very large datasets. The distributed algorithm was able to train a model on a dataset containing
10 million sequences in 102 minutes.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c6a76cde467cb53361a5980d504a35691c761ab4_chart.png?auto=compress,format"></p>
<p>Thanks to our HMM PySpark implementation, we can now train models using larger amounts of data. It translates into a
better understanding of the requirements of our customers and into services tailored to their needs. Our next steps are:
(i) explore how we can share our project with the open source community and (ii) extend the implementation to other
emission probabilities (e.g. Gaussian or Poisson). And… we are already thinking about new cool ideas for this year’s
Hack Week!</p>
<p>If you have some questions about our HMM PySpark implementation, please feel free to reach out via Twitter at
<a href="https://twitter.com/S_Gonzalez_S">@S_Gonzalez_S</a>. We’d be happy to hear from you.</p>
<hr>
<p><strong>Footnotes:</strong></p>
<ol>
<li>
<p>Rabiner, L. and Juang, B., 1986. An introduction to hidden Markov models. IEEE ASSP magazine, 3(1), pp.4-16.</p>
</li>
<li>
<p>Moon, T.K., 1996. The expectation-maximization algorithm. IEEE Signal processing magazine, 13(6), pp.47-60.</p>
</li>
</ol>Deep Learning in Production for Predicting Consumer Behavior2017-03-22T00:00:00+01:002017-03-22T00:00:00+01:00Matthias Rettenmeiertag:engineering.zalando.com,2017-03-22:/posts/2017/03/deep-learning-in-production-for-predicting-consumer-behavior.html<p>We are excited to see how user experiences in e-commerce will evolve to personalized encounters.</p><p>At <a href="https://tech.zalando.com/blog/zalando-adtech-lab-hamburg/">Zalando adtech lab</a> in Hamburg, machine learning drives
many of our production systems to build great user experiences. Our most recent product requires precise estimates of
future interests of Zalando consumers based on their history of interacting with the fashion platform. For example, we
want to predict a consumer's interest in ordering selected fashion articles.</p>
<p>We set ourselves the goal to build a powerful and versatile prediction tool that not only fits the task at hand, but is
also ready for future product developments. Deep learning approaches have many advantages over traditional techniques,
making them a great fit for our requirements. <a href="http://bioinf.jku.at/publications/older/2604.pdf">Recurrent neural networks
(RNNs)</a> in particular are a promising candidate to provide the
methodological backbone for an e-commerce experience that gets more and more personalized.</p>
<p>We have developed a <a href="http://mlrec.org/2017/papers/paper2.pdf">deep learning</a> system <a href="https://tech.zalando.com/blog/deep-learning-for-understanding-consumer-histories/">based on
RNNs</a> and put it into production.
Like most new technologies, bringing deep learning into production has its challenges. In the following article, we want
to share our experiences and the choices we have made along the way in bringing this product to life.</p>
<h3>Deep learning</h3>
<p>At Zalando, we are convinced about the potential of deep learning and the value it can add to our products, as well as
to the consumer experience. <a href="https://tech.zalando.com/blog/zalando-launches-research-lab/">Zalando Research</a> was
launched recently, consisting of a group of research scientists that explore
<a href="https://c4209155-a-62cb3a1a-s-sites.googlegroups.com/site/nips2016adversarial/WAT16_paper_16.pdf">novel</a> deep
<a href="https://kddfashion2016.mybluemix.net/kddfashion_finalSubmissions/Fashion%20DNA%20Merging%20Content%20and%20Sales%20Data%20for%20Recommendation%20and%20Article%20Mapping.pdf">learning</a>
solutions.</p>
<p>While not a silver bullet for all scenarios, existing deep learning techniques are already beneficial today. Recurrent
neural networks (RNNs) offer several advantages for our product. Most prominently, they operate directly on sequences of
data and thus are a perfect fit for modeling consumer histories. Time-intensive human feature engineering is no longer
required. Instead, we can focus on building a flexible and versatile model that can be easily extended to new types of
input data and applied to a variety of prediction tasks. In general, learning from raw data can help to avoid
limitations when placing <a href="https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow">too much confidence</a> in human domain
modeling.</p>
<p>Furthermore, demand for <a href="https://arxiv.org/pdf/1606.03490.pdf">explaining</a> the
<a href="https://arxiv.org/pdf/1602.04938.pdf">predictions</a> of machine learning models is <a href="https://arxiv.org/abs/1606.08813">increasing
strongly</a>. RNNs can be helpful in providing explanations as they make it easy to
directly relate event sequences to predicted probabilities.</p>
<p>These advantages convinced us to investigate an RNN prototype. Based on our positive experiences we decided to switch to
RNNs altogether, leaving behind traditional techniques such as logistic regression and random forests. The latter
methods were part of our stack beforehand.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2e48238be0de136d12875c44bf32d5d87ba48c8a_rnn.png?auto=compress,format"></p>
<h3>Moving to production</h3>
<p>Major companies have adopted deep learning, alongside machine learning in general, as a major
<a href="https://backchannel.com/how-google-is-remaking-itself-as-a-machine-learning-first-company-ada63defcb70#.38cx2jcgr">strategy</a>
for product development. The <a href="http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf">detection of handwritten zip codes</a>
is one of the earliest success stories of deep learning, which was followed by many popular applications in recent years
like <a href="https://techcrunch.com/2013/06/12/how-googles-acquisition-of-dnnresearch-allowed-it-to-build-its-impressive-google-photo-search-in-6-months/">photo search at
Google</a>,
the <a href="https://blogs.technet.microsoft.com/machinelearning/2016/11/29/empowering-developers-with-ai-deep-learning/">Skype
translator</a>,
or multiple production systems at
<a href="https://www.re-work.co/blog/deep-learning-in-production-at-facebook-andrew-tulloch-video-presentation">Facebook</a>. While
these big companies use deep learning in production with great success, best practices in the wider community are still
<a href="https://de.slideshare.net/agibsonccc/wrangleconf-big-data-malaysia-2016?next_slideshow=1">evolving</a>. The step from
research prototypes to <a href="http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf">prediction</a> models <a href="https://tech.zalando.com/blog/what-is-hardcore-data-science--in-practice/">in
production</a> is challenging, <a href="https://tech.zalando.com/blog/scalable-fraud-detection-fashion-platform/">not only for
RNNs</a>, but for machine learning in general.</p>
<p>In our fashion context, a major challenge is that fashion seasons change, popular brands and articles come and go, and a
large number of new articles are entering the Zalando platform every day. This necessitates frequent re-training and
model deployment. In addition, deep learning approaches have their own set of <a href="https://cdn.oreillystatic.com/en/assets/1/event/179/Lessons%20learned%20from%20deploying%20the%20top%20deep%20learning%20frameworks%20in%20production%20Presentation.pdf">challenges when moving to
production</a>,
like computing time, GPU usage and robustness of optimization.</p>
<p>Moving deep-learning machinery into production requires regular data-aggregation-, model-training- and prediction-tasks.
We decided in favor of a modular and separated approach for maximum of flexibility and efficiency.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d873c6bc2426861f2ecd84aa93087527e588ac85_deeplearningdataflow.png?auto=compress,format"></p>
<h3>Data Preparation</h3>
<p>Before any machine learning is applied, data has to be gathered and organized to fit the input format of the machine
learning model. Our raw data consists of tracking data which is collected as an event-stream from the fashion store and
saved to AWS S3.</p>
<p>A side remark: Our models are based on the histories of anonymous profiles. (That is, we do not use customer data.) For
ease of readability, we speak of consumers instead, but anonymous profiles are what we really refer to.</p>
<p>Months worth of event-stream data have to be compiled into consumer histories that can be inserted directly into our
RNNs for training and prediction. We accomplish this by utilizing a data processing pipeline based on Apache Spark. The
aggregation jobs run daily on AWS EMR and are scheduled using AWS's data-pipelines: Once yesterday's raw data is
available at S3, a new cluster is spawned to transform the newly available data. The output is again written to S3 where
it can be picked up by the succeeding tasks.</p>
<h3>Training</h3>
<p>Training models in production requires efficiency and stability. To limit the number of parameters, we decided to start
off with a simple but powerful RNN architecture with a single LSTM-layer. We implemented the model in
<a href="https://github.com/Element-Research/rnn">Torch</a>, together with scripts for training and prediction. Due to
data-aggregation, the input data for the RNNs is reduced to multiple gigabytes. Hence, training can be achieved on a
single machine without the necessity of a distributed approach. We further enhance efficiency by making use of the
computational power of GPUs. Torch <a href="https://github.com/torch/cutorch">integrates the Cuda framework</a> of Nvidia for GPU
computing and provides good support for switching between CPU and GPU computing.</p>
<p>We optimized our training code to the GPU setting and achieved multiple times the performance as with CPUs. Current
model training with several million consumer histories takes about two hours and a single GPU. Further improvements can
be achieved by re-training from previous models instead of starting training from scratch. At the moment, we use an
in-house GPU cluster for training but we are currently working on moving to the new p-instances available in AWS EC2.</p>
<p>After training, the models are validated on independent test data, using metrics like AUC and data likelihood. Recording
these metrics allows us to monitor stability and enables us to prevent uploading models that do not achieve satisfactory
validation performance.</p>
<h3>Prediction</h3>
<p>For the current stage of our product, predictions are carried out on demand for batches of data. This simplifies the
task, as no real-time prediction system is required. Instead, predictions are scheduled and performed at regular
intervals for batches of data. Computing predictions is less involving and thus can be handled with regular CPUs.
Calculations for several million consumer histories take about 20-30 minutes on a single machine.</p>
<p>Similar to the data-aggregation tasks, we compute predictions on AWS EC2 instances and use AWS data-pipelines for
scheduling. The models used for prediction, which have been trained on our in-house cluster, are stored in S3. The
models are picked up together with the input data from worker machines running dockerized Torch environments. These
environments are configured to perform the prediction, validation and other post-processing steps required for our case.
Again, the resulting output is stored in S3.</p>
<p>During these processing steps, we closely monitor input data as well as prediction results. Key statistics, like the
number of data points and the distribution of variables and targets, help to detect major changes in the incoming data
distributions. In addition, we track prediction performance by checking various metrics on validation data. General
heuristics like the difference between actual and predicted targets provide sanity checks for model health. Alongside
these operational stats, more abstract business metrics are captured. These business metrics allow us to understand how
our model supports Zalando in delivering value.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9c12c3022c770787bc18b7b7445f5cca4a962115_prediction_monitoring_blurred.png?auto=compress,format"></p>
<h3>Future</h3>
<p>The system is live, serving consumers on Zalando today. Our first experiences with operating the live system are
positive, both in regards to performance and robustness. Overall, the required efforts for putting deep learning into
production have not been much greater than with other machine learning products. The fact that offline predictions are
sufficient, however, greatly reduced the complexity of the system. Despite this being the case, we are eager to extend
our system to real-time scenarios. For now, our next step is to move model-training to the cloud for a more stable and
scalable solution.</p>
<p>Another ongoing topic is the addition of new data sources as input to our RNNs. With the additional inputs we seek to
increase prediction accuracy as well as to extend our product to a wider range of use cases. We are in the midst of
creating a data ingestion layer that couples various data sources, like article databases and fashion insights, with our
RNN system.</p>
<p>Beyond our use case, we are excited to see how user experiences in e-commerce will evolve to truly personalized
encounters. RNN production systems are a promising technique to enable this fascinating trend.</p>
<h3>Credits</h3>
<p>Lorenz Knies, Gunnar Selke, <a href="https://www.linkedin.com/in/matthias-rettenmeier-957b9756/">Matthias Rettenmeier</a> and
<a href="https://www.linkedin.com/in/tobias-lang-6948b4118">Tobias Lang</a> are designing and engineering deep learning systems at
Zalando adtech lab; Michael Gravert drives the first product at Zalando that makes use of these deep learning systems.</p>Practical Challenges For RxJava Learners2017-03-21T00:00:00+01:002017-03-21T00:00:00+01:00Sergii Zhuktag:engineering.zalando.com,2017-03-21:/posts/2017/03/practical-challenges-for-rxjava-learners.html<p>If you're an Android developer, you can learn RxJava by coding with this useful guide.</p><p><a href="https://github.com/ReactiveX/RxJava">RxJava</a> is a valuable part of the Java developer toolset and the number one
language improvement framework for Android developers. Many of us want to learn it better, read some blogs and sources,
but often miss practice to consolidate collected knowledge. See below for how you can challenge yourself with coding
tasks and improve your practical RxJava skills.</p>
<h3>Test Driven Learning</h3>
<p><a href="http://agiledata.org/essays/tdd.html">Test Driven Development (TDD)</a> has become a significant part of development
culture; everyone is aware of it, even if they’re not completely following it. I think adopting this idea to learn new
frameworks and libraries makes absolute sense.</p>
<p>What if you have a set of cases with an acceptance criteria? With TDD, not only can you read another blog post or code
snippet, but also quickly try out how you understand the problem. It basically means that you can code a solution with
an immediate right/wrong answer.</p>
<p>This approach would require a minimum amount of configuration and dependencies. You just check out the code, import it
into your IDE, and run unit tests. To be sure, these tests will fail from scratch. What you need now is to add proper
logic implementation to make them green and in this way pass the challenge. Pretty simple, isn’t it?</p>
<h3>Learn RxJava by coding</h3>
<p>I’ve built a set of simple code challenges to learn RxJava using JUnit tests as an acceptance criteria. For now they are
focused on some basic concepts described in the classic blog series for RxJava beginners - <a href="http://blog.danlew.net/2014/09/15/grokking-rxjava-part-1/">Grokking RxJava by Dan
Lew</a>. Another pretty useful resource which has helped me was
the <a href="https://github.com/Froussios/Intro-To-RxJava">Intro to RxJava guide</a> at GitHub.</p>
<p>The context of the challenge is following. The class <em>Country</em> with fields “currency”, “population” and “name” is
provided. You have an interface <em>CountriesService</em> with a set of methods you should implement in the class
<em>CountriesServiceSolved</em> to make unit tests pass. Obviously, you should use RxJava and meet the interface contract.</p>
<p>Current cases don’t cover any Android topics, so the project doesn’t contain an Android module, only plain old Java.</p>
<h3>Current code challenge implementation</h3>
<p><strong>Dependencies:</strong></p>
<ul>
<li>RxJava 2.0.5</li>
<li>JUnit 4.12</li>
</ul>
<p><strong>Reactive types covered:</strong></p>
<ul>
<li><a href="http://reactivex.io/documentation/observable.html">Observable</a>: the heart of Rx, a class that emits a stream of
data or events</li>
<li><a href="http://reactivex.io/documentation/single.html">Single</a>: a version of an <em>Observable</em> that emits a single item or
fails</li>
<li><a href="https://github.com/ReactiveX/RxJava/wiki/What%27s-different-in-2.0#maybehttp://">Maybe</a>: lazy emission pattern, can
emit 1 or 0 items or an error signal</li>
</ul>
<p><strong>Operators covered:</strong></p>
<ul>
<li><a href="http://reactivex.io/documentation/operators/map.html">map</a>: transforms the items by applying a function to each
item</li>
<li><a href="http://reactivex.io/documentation/operators/flatmap.html">flatMap</a>: takes the emissions of one <em>Observable</em> and
returns merged emissions in another <em>Observable</em> to take its place</li>
<li><a href="http://reactivex.io/documentation/operators/filter.html">filter</a>: emits only those items that pass a criteria
(predicate test)</li>
<li><a href="http://reactivex.io/documentation/operators/skip">skip</a>/ <a href="http://reactivex.io/documentation/operators/take">take</a>:
suppress or takes the first <em>n</em> items</li>
<li><a href="http://reactivex.io/documentation/operators/all">all</a>: determines whether all items meet some criteria</li>
<li><a href="http://reactivex.io/documentation/operators/reduce">reduce</a>: applies a function to each item sequentially, and
emits the final value. For example, it can be used to sum up all emitted items</li>
<li><a href="http://reactivex.io/documentation/operators/to.html">toMap</a>: converts an <em>Observable</em> into another object or data
structure</li>
<li><a href="https://github.com/ReactiveX/RxJava/wiki/What%27s-different-in-2.0#test-operator">test</a>: returns <em>TestObserver</em>
with current <em>Observable</em> subscribed</li>
<li><a href="http://reactivex.io/documentation/operators/timeout.html">timeout</a>: to handle timeouts, e.g. deliver some fallback
data</li>
</ul>
<p><strong>Testing approach:</strong></p>
<ul>
<li>The set of test cases are defined in a separate Java file</li>
<li>As a “receiver” of emitted test events we use <em>TestObserver</em>. It records events and allows us to make assertions
about them</li>
<li>All tests will fail when you take them from the master branch of a repository. This is expected behaviour. You
should make tests pass by implementing the logic in <em>CountriesServiceSolved</em> class</li>
</ul>
<h3>Ready, Steady, Go!</h3>
<p>Try out the code challenge in the repository <a href="https://github.com/sergiiz/RxBasicsKata">here</a>. I’m looking forward to
hearing your feedback and would be happy to add new test cases. Also, feel free to submit your own pull requests and add
more challenges. You can find me on Twitter at <a href="https://twitter.com/sergiizhuk">@sergiizhuk</a> if you have further
questions.</p>Linting and ESLint: Write Better Code2017-03-16T00:00:00+01:002017-03-16T00:00:00+01:00Ferit Topcutag:engineering.zalando.com,2017-03-16:/posts/2017/03/linting-and-eslint-write-better-code.html<p>Why should you spend the time required to lint your code? Up your code quality game right here.</p><p>Since joining Zalando, I have had the opportunity to dive into some open source projects like
<a href="http://eslint.org/">ESLint</a>, a pluggable JavaScript linter. Here is my take on what ESLint is, a brief description of
linting in general, and why it is so important.</p>
<h3>What is linting?</h3>
<p>Generally speaking, linting is a tool for static code analysis and therefore part of white-box testing. The main purpose
of <a href="http://softwaretestingfundamentals.com/white-box-testing/">white-box testing</a> is to analyse the internal structure
of components or a system. To make sense of this, a developer would already have knowledge of the written code and will
define rules or expectations about how a component should behave (unit tests), or how it should be structured (linting).</p>
<p>In modern web development, this describes the process (and tools) of applying rules against a codebase and flagging code
structures that violate these rules. Rules can vary from code <a href="http://eslint.org/docs/rules/#stylistic-issues">styling</a>
rules, so code appears to be written by one person, to much more complex rules (e.g.
<a href="http://eslint.org/docs/rules/#ecmascript-6">here</a>). Even fixing these issues is part of modern JavaScript
<a href="http://eslint.org/docs/user-guide/command-line-interface#fix">linting</a>.</p>
<h3>Why should you lint your code?</h3>
<p>Linting code is already an established part of any (popular) JavaScript project and, in my opinion, has a lot of
benefits such as:</p>
<ul>
<li>Readability</li>
<li>Pre-code review</li>
<li>Finding (syntax) errors before execution</li>
</ul>
<p>As we have the possibility to define a set of styling rules, this increases the readability of our code towards the
effort of having our codebase look like it was written by <a href="http://www.slideshare.net/ThoughtWorks/tech-leadskillsfordevelopers/10-ProgrammingPPeoplePProcessPProgrammingPeopleProcess">“one
person”</a>.
This is important, as it can happen that software engineers move from codebase to codebase within projects meaning a lot
of people become involved. A common set of rules makes it easier to really understand what the code is doing.</p>
<p>Further linting rules help to improve code reviews, as linting already acts as a pre-code review, checking against all
the basic issues such as syntax errors, incorrect naming, the tab vs. spaces debate, etc. It increases the value of
having code reviews, as people are then more willing to check the implementation rather than complain about syntax
errors.</p>
<h3>ESLint in a nutshell</h3>
<p><a href="http://eslint.org/">ESLint</a> is an open source, JavaScript linting utility originally created by Nicholas C. Zakas. Code
linting is a type of static analysis that is frequently used to find problematic patterns or code that doesn’t adhere to
certain style guidelines. There are code linters for most programming languages, and compilers can sometimes incorporate
linting into the compilation process.</p>
<p>ESLint is a CLI tool which is wrapped around two other open source projects. One is
<a href="https://github.com/eslint/espree">Espree</a>, which creates an Abstract Syntax Tree (AST) from JavaScript code. Based on
this, AST ESLint uses another project called <a href="https://github.com/estools/estraverse">Estraverse</a> which provides
traversal functions for an AST. During the traversal, ESLint emits an event for each visited node where the node type is
the Event name, e.g. <em>FunctionExpression</em> or <em>WithStatement</em>. ESLint rules are therefore just functions subscribing to
node types it wants to check.</p>
<p>It is also important to note that ESLint doesn’t support new language features until they reach Stage 4 of the <a href="https://tc39.github.io/process-document/">proposal
process of TC39</a>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/beba80be1e67db99ef8397122c644a46e23b9f06_1-wi-3pi1main3yczn7rz_xq.png?auto=compress,format"></p>
<p>For further explanation, I will use the following simple script which generates an AST for the given JavaScript code.</p>
<div class="highlight"><pre><span></span><code><span class="k">var</span><span class="w"> </span><span class="n">espree</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">require</span><span class="p">(</span><span class="s1">'espree'</span><span class="p">);</span>
<span class="k">var</span><span class="w"> </span><span class="n">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">);</span>
<span class="k">var</span><span class="w"> </span><span class="n">code</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">`</span><span class="n">let</span><span class="w"> </span><span class="n">array</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="s1">'b'</span><span class="p">];</span>
<span class="err">`</span><span class="p">;</span>
<span class="k">var</span><span class="w"> </span><span class="n">ast</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">espree</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">code</span><span class="p">,</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ecmaVersion</span><span class="p">:</span><span class="w"> </span><span class="mi">6</span>
<span class="p">});</span>
<span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="s2">"writing ast to ast.json"</span><span class="p">)</span>
<span class="n">fs</span><span class="o">.</span><span class="n">writeFile</span><span class="p">(</span><span class="s2">"ast.json"</span><span class="p">,</span><span class="w"> </span><span class="n">JSON</span><span class="o">.</span><span class="n">stringify</span><span class="p">(</span><span class="n">ast</span><span class="p">,</span><span class="w"> </span><span class="nb nb-Type">null</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">function</span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="k">if</span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">err</span><span class="p">;</span>
<span class="p">});</span>
</code></pre></div>
<h3>Abstract Syntax Tree</h3>
<p>The <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">AST</a> is an abstract representation of your code structure. In
the JavaScript world, this abstract representation is defined by the <a href="https://github.com/estree/estree">ESTree project</a>
and is the starting point if you want to understand JavaScript AST’s or want to build / extend on your own. I highly
recommend reading the <a href="https://github.com/estree/estree/blob/master/es5.md">ES5 AST</a> grammar documentation, as it helps
to understand the different types of nodes represented in the AST.</p>
<p>A specific point of difference with the AST is that each node describes a specific grammar definition in your code, as
you can see below – our one line of JavaScript code already produces an AST with 5 nodes, and each node itself contains
additional information which you can see in the JSON representation below. This graph representation is just one example
of how a Tree representation of our code might look like (I would also interpret the values in <em>ArrayExpression</em> as
individual nodes, but developers are free to choose themselves).</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/07c8d4ade583434a93dfd17e05ced57ee8bb4904_1-wgkuszz56p2--hbndcbttg.png?auto=compress,format"></p>
<p>Visualizing our AST (with <a href="http://resources.jointjs.com/demos/javascript-ast">http://resources.jointjs.com/demos/javascript-ast</a>)</p>
<p>As you can see, the AST of just one line of code and a simple Tree representation contains a lot of information.</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="s">"type"</span><span class="p">:</span><span class="w"> </span><span class="s">"Program"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"start"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="s">"end"</span><span class="p">:</span><span class="w"> </span><span class="mi">23</span><span class="p">,</span>
<span class="w"> </span><span class="s">"body"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"type"</span><span class="p">:</span><span class="w"> </span><span class="s">"VariableDeclaration"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"start"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="s">"end"</span><span class="p">:</span><span class="w"> </span><span class="mi">22</span><span class="p">,</span>
<span class="w"> </span><span class="s">"declarations"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"type"</span><span class="p">:</span><span class="w"> </span><span class="s">"VariableDeclarator"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"start"</span><span class="p">:</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span>
<span class="w"> </span><span class="s">"end"</span><span class="p">:</span><span class="w"> </span><span class="mi">21</span><span class="p">,</span>
<span class="w"> </span><span class="s">"id"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"type"</span><span class="p">:</span><span class="w"> </span><span class="s">"Identifier"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"start"</span><span class="p">:</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span>
<span class="w"> </span><span class="s">"end"</span><span class="p">:</span><span class="w"> </span><span class="mi">9</span><span class="p">,</span>
<span class="w"> </span><span class="s">"name"</span><span class="p">:</span><span class="w"> </span><span class="s">"array"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s">"init"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"type"</span><span class="p">:</span><span class="w"> </span><span class="s">"ArrayExpression"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"start"</span><span class="p">:</span><span class="w"> </span><span class="mi">12</span><span class="p">,</span>
<span class="w"> </span><span class="s">"end"</span><span class="p">:</span><span class="w"> </span><span class="mi">21</span><span class="p">,</span>
<span class="w"> </span><span class="s">"elements"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"type"</span><span class="p">:</span><span class="w"> </span><span class="s">"Literal"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"start"</span><span class="p">:</span><span class="w"> </span><span class="mi">13</span><span class="p">,</span>
<span class="w"> </span><span class="s">"end"</span><span class="p">:</span><span class="w"> </span><span class="mi">14</span><span class="p">,</span>
<span class="w"> </span><span class="s">"value"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="s">"raw"</span><span class="p">:</span><span class="w"> </span><span class="s">"1"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"type"</span><span class="p">:</span><span class="w"> </span><span class="s">"Literal"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"start"</span><span class="p">:</span><span class="w"> </span><span class="mi">15</span><span class="p">,</span>
<span class="w"> </span><span class="s">"end"</span><span class="p">:</span><span class="w"> </span><span class="mi">16</span><span class="p">,</span>
<span class="w"> </span><span class="s">"value"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span>
<span class="w"> </span><span class="s">"raw"</span><span class="p">:</span><span class="w"> </span><span class="s">"2"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"type"</span><span class="p">:</span><span class="w"> </span><span class="s">"Literal"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"start"</span><span class="p">:</span><span class="w"> </span><span class="mi">17</span><span class="p">,</span>
<span class="w"> </span><span class="s">"end"</span><span class="p">:</span><span class="w"> </span><span class="mi">20</span><span class="p">,</span>
<span class="w"> </span><span class="s">"value"</span><span class="p">:</span><span class="w"> </span><span class="s">"b"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"raw"</span><span class="p">:</span><span class="w"> </span><span class="s">"'b'"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="s">"kind"</span><span class="p">:</span><span class="w"> </span><span class="s">"let"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="s">"sourceType"</span><span class="p">:</span><span class="w"> </span><span class="s">"script"</span>
<span class="p">}</span>
</code></pre></div>
<h3>What kind of information do we find in the AST?</h3>
<p>As previously mentioned, one line of JavaScript code contains a vast amount of information for an AST. Every entry in
the AST is a <a href="https://github.com/estree/estree/blob/master/es5.md#node-objects">Node object</a>, consisting of a ‘type’
property and a <em>SourceLocation Object</em>.</p>
<p>The type property is a string representing the different Node variants in the AST. The <em>SourceLocation Object</em> consists
of a start and end property. The type, start, and end line number provide us with information about the structure of our
code. Generally, a JavaScript AST consists of
<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements">Statements</a>,
<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators">Expressions</a> and Declarations (for ES5),
which doesn’t sound like much, but have a lot of variations. For those interested in ES2016 and beyond, see the
following:</p>
<ul>
<li><a href="https://github.com/estree/estree/blob/master/es2015.md#patterns">Patterns</a></li>
<li><a href="https://github.com/estree/estree/blob/master/es2015.md#classes">Classes</a></li>
<li><a href="https://github.com/estree/estree/blob/master/es2015.md#modules">Modules</a></li>
<li><a href="https://github.com/estree/estree/blob/master/es2015.md#template-literals">Template Literals</a></li>
</ul>
<h3>Traversing the AST</h3>
<p>Now that we have generated our AST for our JavaScript code, what can we do next? Traversing! As great as the AST is, the
exciting part starts with traversing it and analysing the information it has. For traversing the AST, ESLint relies on
the Estraverse project which traverses over the generated AST (from Espree).</p>
<p>Estraverse provides a traverse functionality which executes a given statement. In this traverse function we can
subscribe to specific types of nodes and complete our analysis. For example, if we want to make sure that all Literal
Definitions in our Array declaration are Integers, we could check for that with the sample script below:</p>
<div class="highlight"><pre><span></span><code><span class="k">var</span><span class="w"> </span><span class="n">estraverse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">require</span><span class="p">(</span><span class="s1">'estraverse'</span><span class="p">);</span>
<span class="k">var</span><span class="w"> </span><span class="n">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">);</span>
<span class="n">fs</span><span class="o">.</span><span class="n">readFile</span><span class="p">(</span><span class="s1">'./ast.json'</span><span class="p">,</span><span class="w"> </span><span class="s1">'utf-8'</span><span class="p">,</span><span class="w"> </span><span class="n">function</span><span class="p">(</span><span class="n">err</span><span class="p">,</span><span class="w"> </span><span class="n">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="n">throw</span><span class="w"> </span><span class="n">err</span><span class="p">;</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSON</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">ast</span><span class="p">);</span>
<span class="w"> </span><span class="n">estraverse</span><span class="o">.</span><span class="n">traverse</span><span class="p">(</span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">enter</span><span class="p">:</span><span class="w"> </span><span class="n">function</span><span class="p">(</span><span class="n">node</span><span class="p">,</span><span class="w"> </span><span class="n">parent</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">type</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s2">"Literal"</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">parent</span><span class="o">.</span><span class="n">type</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s2">"ArrayExpression"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="n">Number</span><span class="o">.</span><span class="n">isInteger</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">value</span><span class="p">)){</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">stop</span><span class="w"> </span><span class="n">traversal</span><span class="w"> </span><span class="n">when</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">Literal</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">number</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">estraverse</span><span class="o">.</span><span class="n">VisitorOption</span><span class="o">.</span><span class="n">Break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="n">leave</span><span class="p">:</span><span class="w"> </span><span class="n">function</span><span class="p">(</span><span class="n">node</span><span class="p">,</span><span class="w"> </span><span class="n">parent</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">//</span><span class="n">nothing</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">now</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">});</span>
<span class="p">});</span>
</code></pre></div>
<p>Furthermore, the traverse functionality of
<a href="https://github.com/estools/estraverse/blob/master/estraverse.js#L459">Estraverse</a> uses the <a href="https://en.wikipedia.org/wiki/Visitor_pattern">Visitor design
pattern</a>, which allows us to execute functions for each visited node of
the AST instead of traversing the whole AST and performing operations afterwards. By debugging the <a href="https://github.com/estools/estraverse/blob/master/estraverse.js#L459">traverse
function</a> of Estraverse, it has showed that it is
using the depth-first algorithm of traversing a Tree. This means Estraverse is traversing to the end of a child node
(depth-first) before it begins backtracking. The Visitor pattern, combined with a depth-first search, then allows ESLint
to trigger Events whenever it enters an AST node or leaves it immediately.</p>
<h3>Plugins</h3>
<p>The architecture of ESLint is pluggable, so if you want to create new rules for specific problems, frameworks, or use
cases you have, it is recommended to develop ESLint plugins rather opening up an issue and requesting changes. This
presents quite a powerful opportunity to create much more complex code analysis than ESLint can provide. Therefore,
developers should consider creating plugins as they are npm modules.</p>
<p>Thanks to the pluggable architecture, it is easy to use already existing plugins or rules for different frameworks,
libraries, or companies, e.g. <a href="https://github.com/yannickcr/eslint-plugin-react">eslint-plugin-react</a>,
<a href="https://github.com/vuejs/eslint-plugin-vue">eslint-plugin-vue</a> or
<a href="https://www.npmjs.com/package/eslint-config-airbnb">eslint-contrib-airbnb</a>.</p>
<h3>Wrap-Up</h3>
<p>While we all might use ESLint and linting tools in our JavaScript projects, we don’t know much about the insights of
these projects and the challenges behind them, or how they provide us with the tools to create good quality code. I
highly recommend every JavaScript developer play around with these tools, as this is still a topic only a select few
really work on, but are actually used everywhere. A few projects doing similar work are:
<a href="https://github.com/jlongster/prettier">Prettier</a>, <a href="https://github.com/babel/babel-eslint">babel-eslint</a>, and <a href="https://github.com/facebook/flow/tree/master/src/parser">The Flow
Parser</a>.</p>
<p>As a next step, I’d like to create a small ESLint plugin tutorial for interested developers to see how they can create
their own custom ruleset for different use cases, e.g. placing rules into the core of ESLint. If you have any
suggestions or ideas for a ESLint plugin you’d like to see, grab me on Twitter at
<a href="http://www.twitter.com/Fokusman">@Fokusman</a>.</p>One-click Deployments for iOS Apps using Xcode 8 and More2017-03-14T00:00:00+01:002017-03-14T00:00:00+01:00Fotis Dimanidistag:engineering.zalando.com,2017-03-14:/posts/2017/03/one-click-deployments-for-ios-apps.html<p>Exploring tools to provide a seamless Continuous Delivery experience for your iOS team.</p><p>The macOS Server 5.2 is a new fruit. It was released (almost) in parallel with Xcode 8 and might come as no surprise
that it is the minimum required version by Xcode 8, which also spans new territory. Most importantly, it’s the name
change. Say goodbye to OS X Server as now you have macOS Server. But changes go beyond that: While not being mentioned
in the changelog, the good old “_xcsbuildd” user is now gone.</p>
<p>So what was this “_xcsbuildd” user? It was a system user that your bots ran under up to OS X Server 5.1.x. This led to
a few inconveniences and crypticness, but thankfully this is all you need to know nowadays. Instead of the “_xcsbuildd”
user, you can now pick one of your existing macOS users or even create a new one. This is so convenient.</p>
<p>In this article we will explore a way to set up an Xcode Server along with <a href="https://fastlane.tools/">Fastlane</a> tools to
provide a seamless Continuous Delivery experience for your iOS team. We will address some complexities such as having
multiple apps (with different bundle IDs), bumping the build numbers automatically, and submitting it all on on iTunes
Connect (ITC). We will try to keep things as simple and straightforward as possible.</p>
<p>This process was created during my everyday duties in the iOS team of <a href="https://www.zalando-lounge.com/">Zalando Lounge</a>.
We have two mains apps ( <a href="https://itunes.apple.com/app/zalando-lounge-fashion-shopping/id1004381470">Zalando Lounge</a>,
<a href="https://itunes.apple.com/app/zalando-prive-vendite-private/id1110685353">Zalando Privé</a>) which are identical, but
uploaded to the App Store as different binaries for localization reasons. In addition, each one of our apps comes in
three flavours (Staging, Live, AppStore) which point to different backend environments and use different API keys. We
have a single Xcode project for all of these and one target for each app flavour (six in total). Deploying six apps to
ITC every few days is annoying and time consuming. Continuous Delivery to the rescue.</p>
<p>For the sake of simplicity, in this article we will use 2 instead of 6 app targets named <em>MY_SCHEME1</em> and <em>MY_SCHEME2</em>
respectively. You can easily replicate the existing lanes for as many targets you want.</p>
<h3>Section 1: Xcode Server</h3>
<p>The first thing we need to care about is setting up an Xcode server. We assume you are working locally on the server
machine. See below for the steps involved.</p>
<ul>
<li>Install macOS Server and enable Xcode Server. When prompted for a user, create a new one named ‘xcodeserver’.</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f6479641c836922a7d1f867bed679a3d112f424e_xcodeservercreateuser.png?auto=compress,format"></p>
<ul>
<li>Follow the wizard and when done, logout of macOS and login as the newly created ‘xcodeserver’ user.</li>
<li>Make sure you have an email for our new user. iCloud is fine.</li>
<li>Create a separate account in developer.apple.com and ITC. We will use these accounts to package and distribute our
app. The user needs to have the ‘app manager’ role under ITC.</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/82afd687fbb5b2178bec9bbecb41c56a8646ea42_appmanagerrole.png?auto=compress,format"></p>
<p>At this stage, we can leave Xcode Server be. We now move over to Fastlane.</p>
<h3>Section 2: Fastlane</h3>
<p>What is Fastlane? It’s a feature-rich toolkit that helps iOS (and Android) developers automate many of the tasks
involved around app deployment.</p>
<ol>
<li>Install Fastlane following the instructions on the <a href="https://github.com/fastlane/fastlane#installation">public
repository</a>. We need version 1.107.0 as the minimum, since we are
using commands that were added very recently.</li>
<li>Take special care with the ‘update_fastlane’ command that we need to support, as this requires <a href="https://docs.fastlane.tools/actions/#update_fastlane">special
setup</a>.</li>
<li>Open terminal and add ‘cd’ to your project’s root folder.</li>
<li>Now create a ‘fastlane’ folder with a ‘Fastfile’ and ‘Appfile’ inside. The contents of our files are the following:</li>
</ol>
<!-- -->
<div class="highlight"><pre><span></span><code><span class="c1">### Appfile ###</span>
<span class="p">(</span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">gist</span><span class="o">.</span><span class="n">githubusercontent</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">fotiDim</span><span class="o">/</span><span class="mi">68</span><span class="n">f2614a7cb37bb0595473039c33b348</span><span class="o">/</span><span class="n">raw</span><span class="o">/</span><span class="mi">50</span><span class="n">ad741b90eab9254714148c53907ce5ee9382f7</span><span class="o">/</span><span class="n">Appfile</span><span class="p">)</span>
<span class="n">team_name</span><span class="w"> </span><span class="s2">"MY_TEAM_NAME"</span>
<span class="n">team_id</span><span class="w"> </span><span class="s2">"MY_TEAM_ID"</span>
<span class="n">for_platform</span><span class="w"> </span><span class="p">:</span><span class="n">ios</span><span class="w"> </span><span class="n">do</span>
<span class="w"> </span><span class="n">for_lane</span><span class="w"> </span><span class="p">:</span><span class="n">MY_SCHEME1</span><span class="w"> </span><span class="n">do</span>
<span class="w"> </span><span class="n">apple_id</span><span class="w"> </span><span class="n">MY_APPLE_ID1_STAGING</span>
<span class="w"> </span><span class="n">app_identifier</span><span class="w"> </span><span class="s2">"MY.BUNDLE.ID1.STAGING"</span>
<span class="w"> </span><span class="n">end</span>
<span class="w"> </span><span class="n">for_lane</span><span class="w"> </span><span class="p">:</span><span class="n">MY_SCHEME2</span><span class="w"> </span><span class="n">do</span>
<span class="w"> </span><span class="n">apple_id</span><span class="w"> </span><span class="n">MY_APPLE_ID1_LIVE</span>
<span class="w"> </span><span class="n">app_identifier</span><span class="w"> </span><span class="s2">"MY.BUNDLE.ID1.LIVE"</span>
<span class="w"> </span><span class="n">end</span>
<span class="n">end</span>
<span class="n">apple_dev_portal_id</span><span class="w"> </span><span class="s2">"MY_DEV_PORTAL_EMAIL"</span><span class="w"> </span><span class="c1"># Your Apple email address</span>
<span class="n">itunes_connect_id</span><span class="w"> </span><span class="s2">"MY_ITC_EMAIL"</span><span class="w"> </span><span class="c1"># Your iTunes Connect email address</span>
<span class="c1">### Fastfile ###</span>
<span class="p">(</span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">gist</span><span class="o">.</span><span class="n">githubusercontent</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">fotiDim</span><span class="o">/</span><span class="mi">68</span><span class="n">f2614a7cb37bb0595473039c33b348</span><span class="o">/</span><span class="n">raw</span><span class="o">/</span><span class="mi">50</span><span class="n">ad741b90eab9254714148c53907ce5ee9382f7</span><span class="o">/</span><span class="n">Fastfile</span><span class="p">)</span>
<span class="n">fastlane_version</span><span class="w"> </span><span class="s2">"1.107.0"</span><span class="w"> </span><span class="c1"># Minimum required fastlane version. Find your version with: gem list fastlane"</span>
<span class="n">default_platform</span><span class="w"> </span><span class="p">:</span><span class="n">ios</span>
<span class="n">platform</span><span class="w"> </span><span class="p">:</span><span class="n">ios</span><span class="w"> </span><span class="n">do</span>
<span class="w"> </span><span class="c1">########## Before ##########</span>
<span class="w"> </span><span class="n">before_all</span><span class="w"> </span><span class="n">do</span>
<span class="w"> </span><span class="n">update_fastlane</span>
<span class="w"> </span><span class="n">reset_git_repo</span><span class="p">(</span><span class="n">force</span><span class="p">:</span><span class="w"> </span><span class="bp">true</span><span class="p">)</span><span class="w"> </span><span class="c1"># Ensure that no artifacts are left when you build the app.</span>
<span class="w"> </span><span class="n">ensure_git_status_clean</span>
<span class="w"> </span><span class="n">git_pull</span>
<span class="w"> </span><span class="n">increment_build_number</span><span class="p">({</span>
<span class="w"> </span><span class="n">build_number</span><span class="p">:</span><span class="w"> </span><span class="n">latest_testflight_build_number</span><span class="p">(</span>
<span class="w"> </span><span class="n">initial_build_number</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="c1"># Sets the build number to given value if no matching uploaded build is found</span>
<span class="w"> </span><span class="n">app_identifier</span><span class="p">:</span><span class="w"> </span><span class="s1">'MY.BUNDLE.ID1.STAGING'</span><span class="w"> </span><span class="c1"># One of our app targets gets to be the guide for build versions. It is expected to be always uploaded first.</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="n">end</span>
<span class="w"> </span><span class="n">before_each</span><span class="w"> </span><span class="n">do</span>
<span class="w"> </span><span class="n">clean_build_artifacts</span>
<span class="w"> </span><span class="n">clear_derived_data</span>
<span class="w"> </span><span class="n">end</span>
<span class="c1">########## On Error ##########</span>
<span class="w"> </span><span class="n">error</span><span class="w"> </span><span class="n">do</span><span class="w"> </span><span class="o">|</span><span class="n">lane</span><span class="p">,</span><span class="w"> </span><span class="n">exception</span><span class="o">|</span>
<span class="w"> </span><span class="n">reset_git_repo</span><span class="p">(</span><span class="n">force</span><span class="p">:</span><span class="w"> </span><span class="bp">true</span><span class="p">)</span>
<span class="w"> </span><span class="n">end</span>
<span class="c1">########## Lanes ##########</span>
<span class="w"> </span><span class="n">desc</span><span class="w"> </span><span class="s2">"Deploy all versions of the app"</span>
<span class="w"> </span><span class="n">lane</span><span class="w"> </span><span class="p">:</span><span class="n">all</span><span class="w"> </span><span class="n">do</span><span class="w"> </span><span class="c1"># This is a convenience lane that executes all other lanes</span>
<span class="w"> </span><span class="n">MY_SCHEME1</span>
<span class="w"> </span><span class="n">MY_SCHEME2</span>
<span class="w"> </span><span class="n">end</span>
<span class="n">desc</span><span class="w"> </span><span class="s2">"Deploy MY_SCHEME1"</span><span class="w"> </span><span class="c1"># Lane for app target 1</span>
<span class="w"> </span><span class="n">lane</span><span class="w"> </span><span class="p">:</span><span class="n">MY_SCHEME1</span><span class="w"> </span><span class="n">do</span>
<span class="w"> </span><span class="n">gym</span><span class="p">(</span>
<span class="w"> </span><span class="n">scheme</span><span class="p">:</span><span class="w"> </span><span class="s2">"MY_SCHEME1"</span><span class="p">,</span>
<span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="n">testflight</span><span class="p">(</span>
<span class="w"> </span><span class="n">skip_submission</span><span class="p">:</span><span class="w"> </span><span class="bp">false</span><span class="p">,</span>
<span class="w"> </span><span class="n">distribute_external</span><span class="p">:</span><span class="w"> </span><span class="bp">false</span><span class="p">,</span>
<span class="w"> </span><span class="n">app_identifier</span><span class="p">:</span><span class="w"> </span><span class="s2">"MY.BUNDLE.ID1.STAGING"</span>
<span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="n">end</span>
<span class="n">desc</span><span class="w"> </span><span class="s2">"Deploy MY_SCHEME2"</span><span class="w"> </span><span class="c1"># Lane for app target 2</span>
<span class="w"> </span><span class="n">lane</span><span class="w"> </span><span class="p">:</span><span class="n">MY_SCHEME2</span><span class="w"> </span><span class="n">do</span>
<span class="w"> </span><span class="n">gym</span><span class="p">(</span>
<span class="w"> </span><span class="n">scheme</span><span class="p">:</span><span class="w"> </span><span class="s2">"MY_SCHEME2"</span><span class="p">,</span>
<span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="n">testflight</span><span class="p">(</span>
<span class="w"> </span><span class="n">skip_submission</span><span class="p">:</span><span class="w"> </span><span class="bp">false</span><span class="p">,</span><span class="w"> </span><span class="c1"># Refers to submission for external testing (not App Store)</span>
<span class="w"> </span><span class="n">distribute_external</span><span class="p">:</span><span class="w"> </span><span class="bp">false</span><span class="p">,</span>
<span class="w"> </span><span class="n">app_identifier</span><span class="p">:</span><span class="w"> </span><span class="s2">"MY.BUNDLE.ID1.LIVE"</span>
<span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="n">end</span>
<span class="c1">########## After ##########</span>
<span class="w"> </span><span class="n">after_all</span><span class="w"> </span><span class="n">do</span>
<span class="w"> </span><span class="n">clean_build_artifacts</span>
<span class="w"> </span><span class="n">clear_derived_data</span>
<span class="w"> </span><span class="n">commit_version_bump</span><span class="p">(</span>
<span class="w"> </span><span class="n">message</span><span class="p">:</span><span class="w"> </span><span class="s2">"Bumping Build Number to: "</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Actions</span><span class="o">.</span><span class="n">lane_context</span><span class="p">[</span><span class="n">SharedValues</span><span class="p">::</span><span class="n">BUILD_NUMBER</span><span class="p">],</span><span class="w"> </span><span class="c1"># create a commit with a custom message</span>
<span class="w"> </span><span class="n">xcodeproj</span><span class="p">:</span><span class="w"> </span><span class="s2">"../MY_PROJECT_FOLDER/MY_PROJECT.xcodeproj"</span><span class="p">,</span><span class="w"> </span><span class="c1"># optional, if you have multiple Xcode project files, you must specify your main project here</span>
<span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="n">add_git_tag</span><span class="w"> </span><span class="n">tag</span><span class="p">:</span><span class="w"> </span><span class="n">Actions</span><span class="o">.</span><span class="n">lane_context</span><span class="p">[</span><span class="n">SharedValues</span><span class="p">::</span><span class="n">BUILD_NUMBER</span><span class="p">]</span>
<span class="w"> </span><span class="n">push_to_git_remote</span>
<span class="w"> </span><span class="n">end</span>
<span class="n">end</span>
</code></pre></div>
<p>Pay attention to the comments in the file. The key points here are:</p>
<ul>
<li>We let Xcode deal with provisioning profiles and code signing. The main actions we use from Fastlane is Gym, which
builds the app, and Testflight, which uploads the builds to ITC.</li>
<li>Remember to edit the above files with your own credentials and keys. To help you find the places you need to edit
easier, I have made them start with a capitalised ‘MY’. Whatever starts from ‘MY’ is supposed to be replaced by your
own values. Don’t just copy paste, as you need to understand what is going on. The good news is, unlike other
examples, the above is written for maximum readability.</li>
<li>Make sure you commit your changes before each run of the script, as it includes a command that resets the
repository. This is to ensure that no artifacts are left when you build the app.</li>
<li>After you’re done, go to the Fastlane folder you previously created and using your terminal, execute ‘fastlane all’.
During the first run, the script should occasionally ask you for credentials. After your enter them, they are stored
in the keychain and from the second run onwards the script should be able to run unattended.</li>
<li>When the script is complete, a git tag is added and pushed to your repository that indicates the exact code base for
the current build number. Build numbers are unique and incremented by one each time you run the script.</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e57e8cd5c585c0278a1ee27462aea07996531e41_gittagadded.png?auto=compress,format"></p>
<h3>Section 3: Configure our bots</h3>
<p>After we manage to run and deploy our app using the Fastlane script, it is time to set up an Xcode Server bot so that we
can execute one-click deployments.</p>
<p>As bots can only build a single scheme, we will ignore this build and instead use the builds generated by Fastlane’s
Gym. As a reminder, we have 2 targets, thus we need 2 different builds to happen.</p>
<ul>
<li>Create a bot called ‘Deployer Bot’. Point it to your repository and desired branch.</li>
<li>The build configuration should be as shown below. Feel free to enable the <em>analyze</em> or the <em>test</em> action if you
think they are needed.</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8e67324a28d1366d8e0dd54c770ade6150409471_buildconfig.png?auto=compress,format"></p>
<ul>
<li>Schedule it to run manually.</li>
<li>In the environment tab you need to set the ‘PATH’ variable to <em>‘/usr/local/bin:$PATH’</em> in order for the bot to be
able to discover the Fastlane binary.</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d6ad286dafa8a5f4694cb9eb9c75acd2a0fbb0ed_pathvariable.png?auto=compress,format"></p>
<ul>
<li>Add a Post-Integration trigger.</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/48f85e0e5a5709f80d3cf18092d510889047d39c_postintegrationtrigger.png?auto=compress,format"></p>
<ul>
<li>Make it run only on the success and build warnings/static analysis issues. Since bots run on the root folder of your
repository, you need to continuously deliver to the project folder before running the Fastlane command:</li>
</ul>
<!-- -->
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="ch">#!/bin/sh</span>
<span class="nb">cd</span><span class="w"> </span>ios-app/lounge
fastlane<span class="w"> </span>all
</code></pre></div></td></tr></table></div>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8878e00872ef52b9b64f7ed5a36736cccdb23d9f_fastlanecommand.png?auto=compress,format"></p>
<p>That is it! You can now do the following:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0281e01bab0221ab2a2e55cc989f8e3d1bca863c_integratebot.png?auto=compress,format"></p>
<p>As a closing line, I would like to mention something that I wish was better about the process in general. As bots can
only build one scheme, this means we have to ignore their build and instead rebuild our targets with Fastlane. This is a
waste of time and raises the question: “do we really need the Xcode bot?”. The answer to that is probably not, but it is
convenient to have one-click deployment without having to resort to terminal and remembering commands off-hand. Perhaps
on a future Xcode Server version we can hope for bots that either skip the build process, or are able to build multiple
schemes.</p>
<p>Happy deploying!</p>How the Zalando iOS App Abandoned CocoaPods and Reduced Build Time2017-02-22T00:00:00+01:002017-02-22T00:00:00+01:00Dmitry Bespalovtag:engineering.zalando.com,2017-02-22:/posts/2017/02/how-the-zalando-ios-app-abandoned-cocoapods-and-reduced-build-time.html<p>Read the takeaways from our adoption of manual dependency management.</p><p>Dependency management doesn’t have to be complicated. The current dependency managers for iOS are
<a href="https://cocoapods.org/">CocoaPods</a>, which is the de facto standard tool,
<a href="https://github.com/Carthage/Carthage">Carthage</a>, and <a href="https://swift.org/package-manager/">Swift Package Manager</a>.
Despite the range of automated solutions, we found at Zalando that manual management brings about better performance and
doesn’t require much maintenance in a big, modular project. In this article, I will describe my experience transitioning
Zalando’s iOS app from CocoaPods to manual dependency management, which in the end resulted in a 40% improvement in
build time and startup time.</p>
<h3>The initial problem</h3>
<p>Although an app such as ours may seem simple and straightforward in appearance, the project overall is quite complex
under the hood. Our main project for building the app is split into several subprojects. Each subproject can import
another. On top of that, each subproject can have third party dependencies.</p>
<p>An illustration of our model of subprojects and the dependencies between them can be seen in the following diagram:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d7aee681eb425c9fdf4d2c86166e7d1f938cab2e_cocoapods1.png?auto=compress,format"></p>
<p>Our model of a project structure: The main app imports many frameworks (represented by blue arrows), some of which
import other frameworks, and most frameworks depend on external third party code, provided by CocoaPods. Due to link
issues, we have to link all pods twice; once for a framework target and once in the main app target (represented by red
dashed arrows).</p>
<p><strong>Our project’s usual clean build time is 5 minutes.</strong> Since our subprojects are interdependent on one another, changes
in one subproject can trigger an avalanche of rebuilding for other subprojects, along with all third party code in the
workspace.</p>
<p>We use third party libraries inside our main project and within subprojects. Our subprojects can import one another,
resulting in chains of up to six nested imports. In other words, there were frameworks, importing other frameworks,
which import even more frameworks. And all of them also use third party dependencies.</p>
<p>For the Zalando iOS app, we used CocoaPods for dependency management. While using it over the past three years, we have
been faced with its advantages and disadvantages.</p>
<p>CocoaPods does a great job automating updates to third party code and integrating them into your main project. You don’t
have to wire up everything manually. When any of your dependencies needs to be updated, it’s just a matter of running a
‘pod update’ command.</p>
<p>While being very convenient to use with a monolithic project, we found it hard configuring Podfile dependencies for a
modular project structure. Our Podfile was big (230+ lines of code!) and had some tricky configuration, both for target
dependencies and post-install actions. We regularly had issues with build settings that changed due to automatic pod
integration. An additional hassle was updating and installing CocoaPods itself.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1ff043f7ceee2f24538f4dc61ded8d0b65c3a6ba_cocoapods2.png?auto=compress,format"></p>
<p>CocoaPods is a wonderful tool. It allows you to automatically integrate and update dependencies, whilst supporting
recursive dependencies. However, the software comes with a maintenance cost when your project becomes bigger and begins
to use more dependencies. If your project is a mix of Swift and Objective-C, you only have the one choice of using
dependencies as dynamic libraries, even though dependencies are written in Objective-C. Additionally, since CocoaPods
integrates source code of dependencies into your project, it will be recompiled every time and affect build time.</p>
<p>When it comes to a mixed project, like ours, having 60% of Swift and rest Objective-C code, CocoaPods <strong>forces all
dependencies to be dynamic frameworks</strong>, even though more than a half of them are written in Objective-C. With more than
50 dynamic frameworks, our app’s startup time became very slow due to <strong>slow dynamic library loading during app
startup</strong>.</p>
<p>Another issue was our slow build time. The root cause came via including a pods’ source code into the workspace, which
often forces recompilation of the same files that very rarely change. Again, with a number of dependencies we use, it
becomes a problem over time.</p>
<p>So, what are the other alternatives for a project structure like ours and third party dependency management?</p>
<h3>Alternative solutions</h3>
<p>Besides CocoaPods, there are three alternatives when it comes to dependency management: Carthage, Swift Package Manager,
and the manual approach.</p>
<p>When looking at Carthage, it is close to what we want: it compiles frameworks and includes them in your project. You can
even include pre-compiled Objective-C libraries. The downside is that not all CocoaPods libraries are also available in
Carthage.</p>
<p>Of course, we could use CocoaPods for Objective-C and Carthage for Swift dependencies. In fact, we did have a setup like
this going for our project. But it made dependency management complicated. Instead of working with one tool, developers
needed to handle two different tools for the one project. If there were conflicts between the two tools, then they
needed to be resolved, which costs time and resources.</p>
<p>The next tool we assessed was Swift Package Manager. While it also compiles code into executable modules, and developers
don’t need to rebuild the dependency over and over again, it is a Swift-only service, thus doesn’t suit our needs.</p>
<p>This leaves us with the manual approach. As it turns out, we needed just a few simple scripts that support the most
common tasks for dependency management. Namely, downloading source code, compiling it into a static library or dynamic
framework and integrating those products into our project. The rest of the work is done via Xcode build configuration
files.</p>
<h3>Our idea for a solution</h3>
<p>After researching the available alternatives, I decided to try out the manual approach. Before performing the task, I
wanted to test its feasibility. For this, I created a small prototype which replicated the complicated setup we had. You
can take a look at the example project <a href="https://github.com/DmitryBespalov/ManualDependencyExample">here</a>.</p>
<p>The prototype consists of the main project and three subprojects. The main project builds an app and imports two
subprojects. One of them is a static library and the other is a dynamic framework importing another dynamic framework.</p>
<p>After playing with the build settings for some time, I learned how to configure dependencies and wrote down the
necessary Xcode configuration files. I also adapted a script from CocoaPods that embeds third party frameworks into the
final app bundle. Great! Now I had an idea that might just work.</p>
<h3>How to manually integrate dependencies</h3>
<p>First off, you’ll need to place all your frameworks and libraries into the respective directories of your project. You
don’t have to import them into the Xcode project itself. All we need to do is specify the path to those directories in
the build settings.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8ed8d90734ca72cbf78b68d5d2a9bee93495af0a_cocoapods3.png?auto=compress,format"></p>
<p>All that is needed is to create a simple directory structure, build configuration file, and a “copy-frameworks” shell
script to integrate the compiled dependencies into the Xcode project.</p>
<p>Once this is completed, you place all of your frameworks into the Frameworks folder and create a configuration file with
the following setting:</p>
<div class="highlight"><pre><span></span><code>ThirdPartyConfig.xcconfig:
FRAMEWORKS_SEARCH_PATH<span class="w"> </span>=<span class="w"> </span>“<span class="cp">${</span><span class="n">SRCROOT</span><span class="cp">}</span>/Frameworks”
</code></pre></div>
<p><em>SRCROOT</em> is a Xcode-provided build variable that contains a path to your <em>.xcodeproj</em> file.</p>
<p>Next, you’ll need to place all of your libraries into another directory, perhaps called “Libraries”, and specify the
path to it in your build settings</p>
<div class="highlight"><pre><span></span><code>ThirdPartyConfig.xcconfig:
…
LIBRARY_SEARCH_PATH = “{SRCROOT}/Libraries”
</code></pre></div>
<p>To be able to import a library into your code, you also need to copy the library’s header files into a folder, say
Libraries/include/, and give it a path to include a folder to Xcode:</p>
<div class="highlight"><pre><span></span><code>ThirdPartyConfig.xcconfig:
…
HEADER_SEARCH_PATH = “{SRCROOT}/Libraries/include”
</code></pre></div>
<p>If your library comes with resources, it’s convenient to package them into a resource bundle and include it into the
“Copy Bundle Resources” build phase. You can place all such resources into a Libraries/Resources/ directory and import
them into your Xcode main project from there.</p>
<p>From here you’ll need to specify which libraries your app is using in linker flags. This is needed in order for your app
to link to compiled frameworks and third party libraries. To do so, add the setting <em>`-framework “FrameworkName”`</em> for
each framework to OTHER_LDFLAGS; for each library add `-lLibraryName` (note that typically a library’s name is
`libABC.a` and you’ll need to specify, for example, `-lABC`, which is without the `lib` prefix and extension).</p>
<div class="highlight"><pre><span></span><code>ThirdPartyConfig.xcconfig:
…
OTHER_LDFLAGS = -framework “” -l …
</code></pre></div>
<p>Lastly, you’ll need to add a Run Script phase in which you copy all of your third party frameworks into the app bundle.
This is required because your app needs dynamic libraries to run, which will be searched during runtime. On the one
hand, Xcode doesn’t automatically copy frameworks that are specified inside the OTHER_LDFLAGS build setting, so you’ll
have to complete this step yourself. On the other hand, the team from CocoaPods and Carthage have already solved this
problem, so we can use their script to copy all the frameworks we have inside the main app. You can find a link to <a href="https://gist.github.com/DmitryBespalov/1f05acb3d58c23710ab0a87fea9e24a4">the
script here</a>.</p>
<p>The final configuration file will look like this:</p>
<div class="highlight"><pre><span></span><code>FRAMEWORKS_SEARCH_PATH<span class="w"> </span>=<span class="w"> </span>“<span class="cp">${</span><span class="n">SRCROOT</span><span class="cp">}</span>/Frameworks”
LIBRARY_SEARCH_PATH<span class="w"> </span>=<span class="w"> </span>“{SRCROOT}/Libraries”
HEADER_SEARCH_PATH<span class="w"> </span>=<span class="w"> </span>“{SRCROOT}/Libraries/include”
OTHER_LDFLAGS<span class="w"> </span>=<span class="w"> </span>-framework<span class="w"> </span>“Framework1”<span class="w"> </span>-lLibrary1
</code></pre></div>
<p>After creating my prototype, it became clear that the manual approach was a feasible choice – our team decided to scale
it up into our real world project.</p>
<h3>Implementation</h3>
<p>The real work here will require more effort than just building a prototype. To approach the task of replacing CocoaPods
with hand-built libraries and frameworks, I needed a plan. After analyzing the project and its dependencies, I set about
creating diagrams to better illustrate our implementation plan.</p>
<p>The first diagram consisted of subprojects and their interdependencies. Next, I counted incoming and outgoing
dependencies for each subproject we had. All of this helped me to think about how to better merge smaller subprojects
with bigger ones and reduce the scope of configuration work. The latter part of this sequence will be achieved during
the integration of third party dependencies.</p>
<p>Instead of having 20 subprojects, my aim was to have 10. In practice, I was able to reduce this down to only 8
relatively big subprojects. This benefited me later during third party dependency configuration.</p>
<p>I then listed <strong>all of our third party dependencies</strong> from Podfile and Podfile.lock, which allowed me to capture
dependency versions. All in all, we had 61 items on the list.</p>
<p>This analysis really helped. Having a plan or a checklist is a good idea, because it covers your bases, ensures all
steps are covered, and also gives you an overview of potential problems, for example, coping with the risk of shipping
an unstable solution by adding the test and verification steps.</p>
<h3>The plan</h3>
<p>Below are the steps we defined for the transformation of our project:</p>
<p><strong>Preparation</strong></p>
<ul>
<li>Merge small subprojects with bigger ones or back to the main project</li>
<li>Remove unnecessary dependencies</li>
<li>De-integrate CocoaPods</li>
</ul>
<p><strong>Integrate third party dependencies</strong></p>
<ul>
<li>Manually compile and integrate all dependencies</li>
<li>Configure subprojects and the main app</li>
</ul>
<p><strong>Verify that the app still works</strong></p>
<ul>
<li>App must run on simulator and device (and not crash!)</li>
<li>Tests must succeed for phone and tablet targets</li>
<li>App archiving must work</li>
</ul>
<p>Initially, I thought it would take 2.5 months to complete this plan. In practice, it took 1.5 weeks, each step taking 2
to 3 days.</p>
<p>Planning is an important task in almost any problem solving activity. It allows us to spot potential problems early,
allocate resources more efficiently, and eventually save time during implementation.</p>
<h3>Preparation</h3>
<p>After some careful planning we got started. I de-integrated CocoaPods, merged some subprojects together, downloaded and
compiled dependencies, then integrated them. All in all, it took me 2 working days to accomplish.</p>
<p>I have started with preparation step, as it was the easiest to perform and didn’t require any significant code changes
or testing. I merged smaller subprojects, adjusted import statements referencing those small projects in the code, and
verified that the app was still working. I then de-integrated CocoaPods using the <em>`deintegrate`</em> plugin:</p>
<div class="highlight"><pre><span></span><code>$> for x in <span class="gs">*/*</span>.xcodeproj; do cd “$%.x”; pod deintegrate $x; done
</code></pre></div>
<p>The command above will go into every directory containing the <em>xcodeproj</em> file and run `pod deintegrate` on each
project.</p>
<h3>Integration of third party code</h3>
<p>The next step that took approximately two days, which was the manual integration of third party dependencies. I had
started doing everything manually and later automated some tasks using shell scripts.</p>
<p>What I learned from this experience is that many third party library authors do not provide ready-to-use Xcode projects,
which I had to manually create to compile sources in a static library or (for Swift sources) a dynamic framework.</p>
<p>Another interesting aspect was that although some frameworks are provided in a ready-to-use form, they are actually just
static libraries wrapped within a framework bundle – I went about converting such frameworks into static libraries. For
example, the Google Analytics framework depends on other libraries from Google, all of which are distributed as static
frameworks packaged in a framework bundle. I had to move the binary out of the frameworks and rename them to be static
libraries.</p>
<p>My usual flow of converting a pod into manually wired library went like this: First, I looked up the pod name and
concrete version in the Podfile and Podfile.lock files and searched for it on <a href="http://cocoapods.org">CocoaPods.org</a>.
With these steps, I had access to the pod’s repository.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3397134342c22f153c54001d05d02458e3664b4e_cocoapods5.png?auto=compress,format"></p>
<p>The integration of third party libraries or frameworks has several steps. I began by downloading the source code, then
optionally, creating an Xcode project with a library or framework target. I then compiled the product for simulator and
device. Finally, I copied the produced framework or library to it’s respective location in the root of the project.</p>
<p>Once done, I cloned the repository and checked out the needed tag or branch. If a project came with a Xcode project
file, I would build a static library or dynamic framework for release configurations – both for simulator and device
architecture (choose Simulator, hit Cmd + Shift + I, then choose “Generic iOS Device”, hit Cmd+Shift+I).</p>
<p>Next up I compiled products inside the Build/Products/Release-iphonesimulator/ and Build/Products/Release-iphoneos/
directories. I then copied the libraries into main project’s Libraries/<span PLATFORM_95_NAME="PLATFORM_NAME" class="math">\({PLATFORM\_NAME} folder and did the same with
the frameworks into the Frameworks/\)</span> folder. As I mentioned above, I also had to copy all header files
into included directories for the libraries.</p>
<p>I want to take the opportunity here to make a small side note about Clang modules and how you can make any static
library into such a module.</p>
<h3>Clang modules</h3>
<p>In the code, I prefer to use <a href="http://clang.llvm.org/docs/Modules.html">Clang modules</a>. For example, instead of #import
I use @import Framework; which makes use of a precompiled module and results in better build time for Objective-C. Using
modules is the only way for a static library to be accessible via Swift code.</p>
<p>Most of the time when compiling third party Objective-C libraries, I also had to add a special “module.modulemap” file
which would describe the library as a module to the Clang compiler. I would then put this file into the library’s
include directory. The usual modulemap file has the following content:</p>
<div class="highlight"><pre><span></span><code><span class="n">module</span><span class="w"> </span><span class="n">FrameworkName</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">umbrella</span><span class="w"> </span><span class="n">header</span><span class="w"> </span><span class="err">“</span><span class="n">FrameworkName</span><span class="o">.</span><span class="n">h</span><span class="err">”</span>
<span class="w"> </span><span class="k">export</span><span class="w"> </span><span class="o">*</span>
<span class="p">}</span>
</code></pre></div>
<p>The code above makes sure that all of the classes imported from the umbrella header are accessible in the defined
module.</p>
<p>Now, let’s go back to our transition into manual dependency management.</p>
<h3>Saving third party code in the main repository</h3>
<p>One problem I faced during third party code integration was how to store the source code of a dependency in a project.
Initially, I started to work with git submodules which would store references to remote repositories in the main
repository.</p>
<p>Working with submodules quickly became an issue. The drawback I faced was that all of the changes to third party source
code I was making in order to integrate it into the main project (such as creating an Xcode project and sometimes
adjusting source code to use quoted imports instead of angled) would be lost if not committed back to the remote
repository.</p>
<p>I didn’t want this to happen because the changes were project-specific. One option would be to fork the remote
repository and use it as another remote, then pull upstream changes once there are updates in the original repository.
This option seemed like too much work.</p>
<p>In the end, I simply went with cloning the repository, checking out the revision I needed, and deleting the .git
directory. By doing this, I had the source code and could commit the changes in the main repository I was using.</p>
<h3>Xcode build settings from configuration files</h3>
<p>After I had downloaded, compiled, and moved all of the libraries and frameworks to a central “ThirdParty” directory, I
wrote build configuration files for each subproject as well as the main app.</p>
<p>Each configuration file would have a similar code:</p>
<div class="highlight"><pre><span></span><code>FRAMEWORK_SEARCH_PATH = …
HEADER_SEARCH_PATH = …
LIBRARY_SEARCH_PATH = …
OTHER_LDFLAGS =
</code></pre></div>
<p>When I wanted to include framework X.framework, I would add ‘-framework “X”’ to OTHER_LDFLAGS, and when I wanted to
include library libY.a, I would add “-lY” to OTHER_LDFLAGS. If a library or a framework required other system
frameworks, I would also add it here.</p>
<p>In order for the build configuration file to be effective, it needs to be added to targets from the “Info” tab in a
project.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/85490fdafa1c6c4615d4922c73ea4890f40ed7ec_cocoapods6.png?auto=compress,format"></p>
<p>In order to apply the build configuration file, you need to add it to your project first. Select your project in the
project navigator, then select the Project’s Info page. Under the “Configurations” menu, select the build configuration
file name for each configuration of your target.</p>
<p>For the main app target and for all of the test targets I also added a Run Script build phase that was copying all of
the third party frameworks into a product bundle. Frameworks are dynamically linked in runtime and not part of the app
or test bundle, making this step a must. They need to be explicitly copied into the final product. I’ve adapted the
<a href="https://gist.github.com/DmitryBespalov/1f05acb3d58c23710ab0a87fea9e24a4">CocoaPods copy frameworks script</a> to fit this
need, <a href="https://gist.github.com/DmitryBespalov/1f05acb3d58c23710ab0a87fea9e24a4.js">see here</a>.</p>
<p>After all of that and several cycles of fix-and-build, I could successfully compile the app. One of the errors I faced
during the integration was that some libraries were compiled with an incompatible iOS platform version, and some were
compiled with different versions of Swift, requiring me to compile those libraries again. Thankfully, the
<a href="https://en.wikipedia.org/wiki/Linker_(computing)">linker</a> tells you whether there is an error or warning.</p>
<h3>Verifying that the app works</h3>
<p>After getting the app compiling and running, I wanted to make sure it was still working as before, that the tests were
running successfully, and that app could be archived for distribution. This was a crucial step: If I ship my changes to
teammates as is, with the possibility of errors, we would lose unnecessary time fixing the problem. Test before you
declare your work to be done.</p>
<p>After moving dependencies to a manual integration model, I found that some of the test bundles were crashing. After some
debug and analysis it was clear that the test bundles couldn’t access the dynamic libraries linked to the app. To fix
this, I had to add a Run Script build phase to copy a frameworks script to every test bundle, which worked for this
project. I also did some quick bug-bashing to make sure the app worked fine and ran an archiving script on top. You can
find an example project illustrating the manual approach to dependency management on <a href="https://github.com/DmitryBespalov/ManualDependencyExample">GitHub
here</a>.</p>
<h3>Results</h3>
<p>After compiling all of the dependencies, I was pleasantly surprised that this work had good side effects. I first
experienced a clean build time that decreased by two minutes, dropping from five minutes to three minutes. This was made
possible by Xcode since it didn’t need to recompile all of the dependency’s source code.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/73b7cbd87bcb2b9b48e5adba52c8e2487c720d79_resultschartdata.png?auto=compress,format"></p>
<p>Another improvement was in the startup time of the app, which decreased from five seconds to three seconds. This came
about due to many third party dependencies being converted to static libraries, and a number of dynamically linked
frameworks then being dramatically decreased, meaning our app didn’t need to load those frameworks during startup. This
work was interesting to do and I was glad that it led to improvements both for developers (build time) and our users
(startup time).</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f26376d66f86c8923c418daac9f347dace98bd57_cocoapods7.png?auto=compress,format"></p>
<h3>Summary</h3>
<p>To sum up, here are the takeaways from our adoption of manual dependency management</p>
<ul>
<li>Having a complex project structure with CocoaPods leads to maintenance problems and increased build times</li>
<li>Using CocoaPods in a mixed Swift and Objective-C project leads to using third party dependencies as dynamic
frameworks, despite many dependencies being Objective-C</li>
<li>Most of the dependencies were rarely updated</li>
<li>Currently, neither Carthage nor Swift Package Manager support Objective-C packages, meaning we had to switch to
manual dependency management</li>
<li>The manual approach consists of proper build settings for the compiler and linker to find required libraries and
frameworks, and a shell script to copy third party frameworks into the final app bundle</li>
<li>Transitioning from CocoaPods to manual package management is a complicated task and requires proper planning</li>
<li>Saving the source code of your dependencies in the main repository is a viable alternative to submodules</li>
<li>Switching from CocoaPods to manual package management improved build time and startup time</li>
</ul>
<p>This was our experience at Zalando when we transformed our project from CocoaPods-managed dependencies to manual
dependency management. If you have any questions or would like some help with your own project, reach out via
<a href="https://github.com/DmitryBespalov">GitHub</a> and I’d be happy to lend a hand.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Dress Code: An In-house Style Guide for Zalando’s Solution Center2017-02-21T00:00:00+01:002017-02-21T00:00:00+01:00Gabriel Lovatotag:engineering.zalando.com,2017-02-21:/posts/2017/02/dress-code-an-in-house-style-guide-for-zalandos-solution-center.html<p>How do we work when we Dress Code? Discover our pattern library and style guide.</p><p>Saying the design team in Zalando has grown in the last few years would be an understatement: it has exploded, jumping
from around 10 to over 40 designers in an 18-month period. With this comes many challenges, one of which is making sure
our interfaces stay coherent and cohesive, no matter who designed or coded it.</p>
<p>Dress Code is a pattern library and style guide created by the Brand Solutions Central Services team to address these
issues in the scope of Zalando’s Solution Center – a central repository where our business partners find everything they
need to work with the <a href="https://tech.zalando.com/blog/zalandos-vp-brand-solutions-presents-at-the-july-2015-fashtech-konferenz./">Zalando
Platform</a>. In
this article, we’ll talk about style guides, why we needed one, how we created it, and how it’s helped multiple teams on
top of it going above and beyond its original scope.</p>
<h3>What’s a style guide and why use one?</h3>
<p>In the context of <a href="https://en.wikipedia.org/wiki/User_interface">User Interfaces</a>, a style guide is a set of components
-- from basic building blocks such as typefaces, shapes and colors all the way to more complex ones such as buttons,
images, and menus, including their interaction and behaviour -- that allows one or more websites to maintain a sense of
branding, unity, and coherence. Using a style guide brings different benefits to different kinds of people involved with
the website.</p>
<h3>For users</h3>
<p>From a user experience perspective, the main advantage of a unified design style across related websites and
applications is that users can keep using the same mental model; they don’t need to learn a new interface language every
time they open a new app or site. This reduces cognitive load, and helps users be more efficient and feel more confident
during their usage of an app or website.</p>
<h3>For developers</h3>
<p>Developers strive to solve difficult problems, develop efficient algorithms and write quality code. They shouldn’t need
to reinvent the wheel each and every time they build something. A style guide with reusable components gets design
idiosyncrasies and details out of their way. They simply reuse the components of the style guide and focus on assembling
them to solve a greater challenge.</p>
<h3>For designers</h3>
<p>There are two phases to design: one is exploratory, where it pays off to play around and be creative; the other is
production, where it pays off to be straightforward and repetitive. Having a style guide affects both these phases: it
sets positive constraints for the first and speeds up the second.</p>
<h3>A design system for autonomous teams</h3>
<p>Zalando’s Solution Center was started back in 2014 with the idea of creating a unified hub where our business partners
can find all the tools they need to work with Zalando. Professionals from brands, retailers and other partners can use
several web applications to help them in their daily tasks, such as publishing content, advertising, asset management,
managing orders, and much more.</p>
<p>At around the same time, Zalando Tech increased focus on autonomous teams and an organizational structure where small
product teams of around 5-10 people work autonomously, and have full ownership of their project and product.</p>
<p>Creating and growing a hub like the Solution Center to support team autonomy meant that the multiple web applications
within it would be designed by different designers and developed by different development teams, each of which might be
using completely different technologies.</p>
<h3>Before Dress Code</h3>
<p>In the beginning, the Solution Center started with two interconnected applications -- Brand CMS and Brand Analytics --
which were built by two different teams. Together, the UX and UI designers in each team agreed upon common UI patterns
and a common look and feel.</p>
<p>While this might seem enough, it meant that every change or update made by one team had to be communicated back to the
designers and developers of the other team, to ensure consistency.</p>
<p>To improve this situation, <a href="https://www.linkedin.com/in/thomas-nägele-448681b6">Tom Nägele</a>, one of the designers,
created a simple style guide based on <a href="https://fbrctr.github.io/">Fabricator</a>. This was then made available internally
to the teams so they could check if their UI matched the reference. Still, this meant Tom had to tell people whenever he
had updated the guide, and they had to manually change their code to match Tom’s guide.</p>
<h3>First steps</h3>
<p>At this point, work was starting on another application and on the Solution Center portal itself, which would better
connect all the applications. Were the teams to continue the process used at the time, they would each create their own
version of the interface components, on their own codebase, all over again. This would bring the total implementations
of the same components to four.</p>
<p>This is when the idea of having a centralized, deployable style guide started to be taken seriously.</p>
<p>Our first idea was to go for a components API, inspired by <a href="https://rizzo.lonelyplanet.com/styleguide/design-elements/colours">Lonely Planet’s
Rizzo</a>. However, we soon realized this wouldn’t fit
our use case as each different team might be working with different technologies – one team might be using Angular 1,
the other Angular 2, while the third might be using React.</p>
<h3>Evolution</h3>
<p>With the help of very dedicated Frontend Engineer <a href="https://www.linkedin.com/in/rubenbarilani">Ruben Barilani</a>, we
decided to integrate Tom’s demo library directly into his production code so he could stay on top of any changes.</p>
<p>For maximum tech compatibility, we kept Dress Code itself as a set of SCSS files that could be imported, customized and
built by each team, no matter what their technology choice was.</p>
<p>Dress Code was then transferred to GitHub, first as a private repository on GitHub Enterprise. Using tools like
<a href="https://www.npmjs.com/">npm</a> and <a href="http://gulpjs.com/">gulp</a> to set up a build system, Ruben made it easy for other
developers to link Dress Code into their workflows, making sure they would always get the latest Dress Code files in
their own development environment.</p>
<p>Once <a href="https://github.com/zalando/dress-code/">deployed to GitHub</a>, Dress Code quickly became a collaborative,
distributed project. New features came at a quick pace. Tom pushed for organizing the elements into Atoms and Molecules,
following <a href="http://bradfrost.com/blog/post/atomic-web-design/">Brad Frost’s Atomic Design Principle</a>. We all worked on
refactoring much of the SCSS to adhere strictly to <a href="http://getbem.com/">BEM</a> conventions. Sambhav Gore, a new Frontend
Engineer who had just joined Zalando, developed a responsive grid system based on
<a href="https://css-tricks.com/snippets/css/a-guide-to-flexbox/">flexbox</a>. I designed new components, new icons and improved
the existing ones.</p>
<h3>Adoption</h3>
<p>Currently, <a href="https://github.com/zalando/dress-code/">Dress Code</a> is used by around a dozen different teams at Zalando,
building products for businesses and employees – and the list keeps growing. Contributions from new people are showing
up more and more often.</p>
<p>We’ve had pull requests from the Advertising Team, the Customer Benefits Team, the Merchant Center team, and others.
Even teams whose products will not be integrated into the Solution Center -- our original scope for Dress Code -- have
started using it as a starting point for their standalone projects.</p>
<p>And those are only the internal teams -- since the project is available to anyone with a GitHub account, many other
people may use it in their own projects.</p>
<p>Could it be useful for you too? Take a look at our <a href="http://zalando.github.io/dress-code/">live Dress Code demo</a> and
check out the <a href="https://github.com/zalando/dress-code/">GitHub project</a> -- we’re always happy to get new contributions!</p>
<p>Find me on Twitter at <a href="https://twitter.com/gabrielhl">@gabrielhl</a> for feedback and questions – I'd love to hear from
you.</p>Riding the Scalawave in 20162017-02-15T00:00:00+01:002017-02-15T00:00:00+01:00Patryk Koryznatag:engineering.zalando.com,2017-02-15:/posts/2017/02/riding-the-scalawave-in-2016.html<p>Scala and Akka are mind-bending in this conference overview from Gdańsk, Poland.</p><p><em>"Do not try and bend the spoon, that's impossible. Instead, only try to realize the truth... there is no spoon. Then
you will see it is not the spoon that bends, it is only yourself."</em></p>
<p>This classic sci-fi quote seems to be quite a fitting summary of the workshop and talks I attended at
<a href="http://www.scalawave.io/">Scalawave</a> last November in Gdańsk, Poland. But instead of the spoon, there's Scala. And let
me tell you, it gives you a lot of occasions to bend your mind. Let's go!</p>
<h3>A workshop workout</h3>
<p>The day before the conference was dedicated to workshops. I chose to participate in type-level (meta)programming using
<a href="https://github.com/milessabin/shapeless">Shapeless</a>. If it sounds complicated to you, then you would be absolutely
right.</p>
<p>Let me deconstruct this workshop title for you: The “type level” part is implying that it’s concerned with operating on
the types of values used by computations of your Scala programs, in opposition to the regular value level meaning.
“Metaprogramming” literally means a level above your typical programs. So what this boils down to is programming
software that manipulates software, or in other words, writing little programs that write other programs. What?</p>
<p>If it sounds completely alien (or philosophical even), you’d be surprised to know that you might’ve been using this
method already without even knowing about it. Chances are you’ve used a Scala library that has generated some
serialisation formats (say, JSONs) during compile time. Such libraries use the advanced type system of the Scala
language (and/or some macro magic for some specific information not provided by types alone) to generate code and
compile-time that otherwise would have to be written by hand or by using reflection – and no-one wants to write those
JsObjects by hand. These abstractions can also help to overcome some limits of a standard library (e.g. the 22 argument
limit for functions).</p>
<p>Piotr Krzemiński, who led the workshop, took participants on a great, four hour long journey through the facilities
available in Shapeless. We began with a quick introduction into the language features making it possible – namely
implicit mechanisms, type members, path dependent types and more – and then moved over to the big league.</p>
<p>The first exercise required participants to write some very simple <a href="https://en.wikipedia.org/wiki/Peano_axioms">Peano
arithmetic</a>: defining integers.[1] The real challenge was to implement
addition and comparisons, which required recalling some long-unused knowledge back from your university days. Oh, and
remember, this is type-level! No recursive functions were used – the actual calculation (ab)uses implicit resolution and
compile time. Inefficient? Of course, but the goal is not for some twisted type of premature optimisation.</p>
<p>The next example demonstrated how the type-level numbers allow you to create a type that also includes information about
the length of a vector. This might actually come in handy if you want to guarantee that a correct number of elements is
passed to and/or returned by your functions.</p>
<p>The following part of the workshop took us closer to everyday challenges: exploring mechanisms needed for automatic
encoder derivation. Enter
<a href="https://github.com/milessabin/shapeless/blob/master/core/src/main/scala/shapeless/hlists.scala">HList</a> – a heterogenous
list, which can be compared to either a list of different types of elements (and keeping this type information, instead
of being coerced to lowest upper bound, like <em>Any</em> or <em>Product</em> with <em>Serializable</em>) or an arbitrary-length tuple.</p>
<p>This is more powerful than you might think. Using <em>Generic</em>, you can convert a case class to an HList, keeping type
information. You can then add a field name using some type-level trickery and dark magic[2], and you have everything
you need to make your <em>JSON/Protobuf/??? de-/encoder</em> ready to accept any case class without using any accessors or
tightly coupling it to a concrete class.</p>
<p>However, in my opinion the most impressive part is that there aren't really many macros in play here. Most of the code
is just using regular old Scala syntax, so you're not really changing the syntax (or bending <a href="https://www.scala-lang.org/files/archive/spec/2.11/">the
rules</a>). Instead, you're bending your mind while using whatever the
compiler can already do for you.</p>
<p>A side note: runtime reflection, such as JVM’s familiar <em>getClass</em>, etc. is also a form of metaprogramming – you’re
literally writing a program that knows about the program that’s running. This approach is different however, because it
can only introspect the code that has already been compiled, loaded, and is currently running. This has performance
implications and is less safe (and less elegant in the case of Java, if you ask me).</p>
<h3>Getting a good talking to</h3>
<p>The next day was dedicated to talks, taking place in Stary Maneż (Old Manège) – a very impressive venue on its own. The
conference’s first keynote talk was by <a href="https://rolandkuhn.com/">Roland Kuhn</a>, who presented on Distributed Systems and
Composability, mentioning ways of describing distributed computations in a mathematical way, such as pi calculus (think
lambda calculus with channels) and <a href="http://www.scribble.org/">Scribble</a>, a "language to describe application-level
protocols among communicating systems". One of the biggest complaints about distributed applications is the
unpredictability present, due to network errors, asynchronicity and more – these ideas definitely seem like a step in
the right direction. And of course there is <a href="http://doc.akka.io/docs/akka/current/scala/typed.html">Akka Typed</a>,
providing a higher level of type safety.[3]</p>
<p>Next up was Jon Pretty's "Interpolating Strings Like a Boss", exploring the humble string interpolator mechanism in
Scala. There were some surprising conclusions in this talk – did you know that you can actually create a class named
<em>StringContext</em> and provide your own implementation? It turns out that it is just a simple text substitution performed
by the compiler during the desugaring phase. Also, <a href="https://twitter.com/pkoryzna/status/802462211314585600">this</a> can
happen[4]. This shows how simple, everyday features have absolutely crazy corner cases. Throw a few macros into the
mix and you're already getting into mind-bending territory.</p>
<p>Another talk I would like to mention was given by Jan Pustelnik about <a href="http://www.reactive-streams.org/">Reactive
Streams</a> for fast data processing. This being a Scala conference, I put this into the
context of using Akka Streams, which we do a lot here at Zalando Tech. Being familiar with these, the highlight for me
was that stream processing your data is not a new idea at all. In fact, it's almost as old computers themselves – back
in the day, when RAM was just <a href="https://en.wikipedia.org/wiki/Magnetic-core_memory">a bunch of magnets on wires</a> sewn
together by hand, you didn't really have too much space to spare. Hence, streaming algorithms were the way to go. And in
the words of the author, the talk did "mix obscure algorithms found in dust-covered textbooks with the hottest newest
features from the Streams ecosystem". The old is new again in a sense here, because in the place of spinning tape
streaming the data byte-by-byte, we now have blazingly fast SSDs with gigabit connections. On top of that, rather than a
rack sized single CPU, we have multicore processors running tens of threads. We also have the luxury of GraphStage DSL
rather than physically flipping switches or punching holes in cardboard.</p>
<h3>Final thoughts</h3>
<p>But what's so mind-bending here you ask? Consider how many of the above mentioned themes seem so simple on the surface:
Akka? Just sending messages! Yes, until you distribute your actors across machines (also not impossible to screw up on
one machine when you're just a little less than careful). String interpolators? Just intersperse a few strings together,
what’s the big deal? But then you have libraries that use this for much, much more than just a nicer <em>printf</em>, for
example Slick or ScalikeJDBC, who use it for creating SQL statements. Akka Streams? You can just map all the things and
call it a day. Yet you can also write your own graph stage to do fan-out, fan-in, stateful flows and more – not to
mention the machinery under the hood that interprets the graph you’ve built and runs it on actors.</p>
<p>Shapeless? Well, Shapeless is mind-bending by definition – you can't really shrug it off that easily.</p>
<p>While you may say that all of this is just layers of abstractions, as old as programming itself, I definitely agree. But
looking at how all of these things are achieved in mostly normal, run-of-the-mill Scala, without any fancy compiler
plugins, or with help of a macro (written in Scala nevertheless) is in my opinion truly mind blowing. Do you agree? Let
me know via Twitter at <a href="https://twitter.com/pkoryzna">@pkoryzna</a>.</p>
<hr>
<p>[1] Using mathematical induction - that is, beginning with the edge case being zero, and then defining other integers
as successors of each other, i.e. One = Successor(Zero), Two = Successor(One), etc.
[2] Singleton typed symbols generated with macros
[3] A popular joke in some circles was that the flagship product of a company called Typesafe was based on Any =>
Unit method
[4] Anything that you write before your string literal becomes a method invocation on StringContext instance - in this
case, it's calling the equals method on the StringContext</p>Zalenium: A Disposable and Flexible Selenium Grid Infrastructure2017-02-14T00:00:00+01:002017-02-14T00:00:00+01:00Diego Fernando Molina Bocanegratag:engineering.zalando.com,2017-02-14:/posts/2017/02/zalenium-a-disposable-and-flexible-selenium-grid-infrastructure.html<p>Let us help you scale your local grid dynamically with Docker containers via open source software.</p><p>Engineering Productivity is an area in Zalando where our main goal is to help other teams test their products by
providing great tools and services. For UI testing execution, we offer teams a mix of internal and external tools. Some
teams use <a href="https://github.com/elgalu/docker-selenium">docker-selenium</a>, a dockerized Selenium Grid, and other teams use
<a href="https://saucelabs.com/">Sauce Labs</a>.</p>
<p>Each tool has its own advantages and disadvantages, but they fulfill our needs and complement each other. We thought it
would be a great idea to mix them, so we can run tests quickly in Chrome/Firefox, and when we need other browsers, use
Sauce Labs. Chrome and Firefox are the most used browsers for testing at Zalando, while Safari and IE/Edge come
afterwards.</p>
<p>We decided to create <a href="https://github.com/zalando/zalenium">Zalenium</a>, a tool to help you scale your local grid
dynamically with Docker containers. It uses docker-selenium to run your tests in Firefox and Chrome locally, and when
you need a different browser, your tests get redirected to a cloud testing provider.</p>
<p>We know how complicated it is to have a stable Selenium Grid with enough capabilities to cover all browsers and
platforms, and how hard it can be to maintain it over time. With this approach, you can run your UI tests faster with
Firefox and Chrome due to the fact that they are running on a local grid, on a node created from scratch and disposed of
after the test finishes. Whenever you need a capability that cannot be fulfilled by docker-selenium, the test gets
redirected to a cloud testing provider – so far we have integrated <a href="https://saucelabs.com/">Sauce Labs</a>,
<a href="https://www.browserstack.com/">BrowserStack</a> and <a href="https://testingbot.com/">TestingBot</a>.</p>
<p>This highlights Zalenium's main goal: to allow anyone to have a disposable and flexible Selenium Grid infrastructure.</p>
<p>This image shows how Zalenium works conceptually:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite%2Fed47df08-3c62-4cb4-9b27-a523128b3212_how_it_works.gif?auto=compress,format"></p>
<p>Let’s dive into how to use it.</p>
<p>First, pull the following Docker images (may take some minutes)</p>
<div class="highlight"><pre><span></span><code>docker pull elgalu/selenium
docker pull dosel/zalenium
</code></pre></div>
<p>Start Zalenium, without any cloud testing provider enabled:</p>
<div class="highlight"><pre><span></span><code><span class="n">docker</span><span class="w"> </span><span class="n">run</span><span class="w"> </span><span class="o">--</span><span class="n">rm</span><span class="w"> </span><span class="o">-</span><span class="n">ti</span><span class="w"> </span><span class="o">--</span><span class="n">name</span><span class="w"> </span><span class="n">zalenium</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="mi">4444</span><span class="p">:</span><span class="mi">4444</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="mi">5555</span><span class="p">:</span><span class="mi">5555</span><span class="w"> </span>\
<span class="w"> </span><span class="o">-</span><span class="n">v</span><span class="w"> </span><span class="o">/</span><span class="k">var</span><span class="o">/</span><span class="n">run</span><span class="o">/</span><span class="n">docker</span><span class="o">.</span><span class="n">sock</span><span class="p">:</span><span class="o">/</span><span class="k">var</span><span class="o">/</span><span class="n">run</span><span class="o">/</span><span class="n">docker</span><span class="o">.</span><span class="n">sock</span><span class="w"> </span>\
<span class="w"> </span><span class="o">-</span><span class="n">v</span><span class="w"> </span><span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">videos</span><span class="p">:</span><span class="o">/</span><span class="n">home</span><span class="o">/</span><span class="n">seluser</span><span class="o">/</span><span class="n">videos</span><span class="w"> </span>\
<span class="w"> </span><span class="n">dosel</span><span class="o">/</span><span class="n">zalenium</span><span class="w"> </span><span class="n">start</span>
</code></pre></div>
<p>Start Zalenium with Sauce Labs enabled:</p>
<div class="highlight"><pre><span></span><code><span class="k">export</span><span class="w"> </span><span class="n">SAUCE_USERNAME</span><span class="o">=</span>
<span class="k">export</span><span class="w"> </span><span class="n">SAUCE_ACCESS_KEY</span><span class="o">=</span>
<span class="n">docker</span><span class="w"> </span><span class="n">run</span><span class="w"> </span><span class="o">--</span><span class="n">rm</span><span class="w"> </span><span class="o">-</span><span class="n">ti</span><span class="w"> </span><span class="o">--</span><span class="n">name</span><span class="w"> </span><span class="n">zalenium</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="mi">4444</span><span class="p">:</span><span class="mi">4444</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="mi">5555</span><span class="p">:</span><span class="mi">5555</span><span class="w"> </span>\
<span class="w"> </span><span class="o">-</span><span class="n">e</span><span class="w"> </span><span class="n">SAUCE_USERNAME</span><span class="w"> </span><span class="o">-</span><span class="n">e</span><span class="w"> </span><span class="n">SAUCE_ACCESS_KEY</span><span class="w"> </span>\
<span class="w"> </span><span class="o">-</span><span class="n">v</span><span class="w"> </span><span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">videos</span><span class="p">:</span><span class="o">/</span><span class="n">home</span><span class="o">/</span><span class="n">seluser</span><span class="o">/</span><span class="n">videos</span><span class="w"> </span>\
<span class="w"> </span><span class="o">-</span><span class="n">v</span><span class="w"> </span><span class="o">/</span><span class="k">var</span><span class="o">/</span><span class="n">run</span><span class="o">/</span><span class="n">docker</span><span class="o">.</span><span class="n">sock</span><span class="p">:</span><span class="o">/</span><span class="k">var</span><span class="o">/</span><span class="n">run</span><span class="o">/</span><span class="n">docker</span><span class="o">.</span><span class="n">sock</span><span class="w"> </span>\
<span class="w"> </span><span class="n">dosel</span><span class="o">/</span><span class="n">zalenium</span><span class="w"> </span><span class="n">start</span><span class="w"> </span><span class="o">--</span><span class="n">sauceLabsEnabled</span><span class="w"> </span><span class="bp">true</span>
</code></pre></div>
<p>And that’s it! Check your local <a href="http://localhost:4444/grid/console">grid</a> and run some tests.</p>
<p>After running your tests, you will notice that videos are recorded if you check your local “tmp/videos” folder. You can
also see the tests running in the <a href="http://localhost:4444/grid/admin/live">live preview</a> feature. Check out further
startup options <a href="https://github.com/zalando/zalenium/blob/master/docs/usage_examples.md">here</a> (like screen sizes, using
BrowserStack or TestingBot, etc.).</p>
<p>With Zalenium, we are giving teams the opportunity to use their resources in a more efficient way. By combining Zalenium
and Sauce Labs, teams can run UI tests during development much faster, and before releasing they can do a final check
using the additional capabilities provided by Sauce Labs. With this integration, our test suites operate with greater
speed, since most of the tests run on local Firefox/Chrome nodes, and we use the cloud testing service we pay for in a
smarter way.</p>
<p><a href="https://github.com/zalando/zalenium">Zalenium</a> is open source, so please feel free to try it and use it as part of your
infrastructure. Do not hesitate to create an issue asking for help or suggest features, or contribute in any way you
would like.</p>
<p>Stay tuned for upcoming features by giving us a star and watching the project on
<a href="https://github.com/zalando/zalenium">GitHub</a>. Contact us on Twitter for further questions at
<a href="https://twitter.com/diegofmolina">@diegofmolina</a> and <a href="https://twitter.com/elgalu">@elgalu</a>.</p>Building a Relay-compatible GraphQL Server2017-02-03T00:00:00+01:002017-02-03T00:00:00+01:00Nikolaus Piccolottotag:engineering.zalando.com,2017-02-03:/posts/2017/02/building-a-relay-compatible-graphql-server.html<p>Experimenting with our technology stack via GraphQL and Relay – ES6 and React knowledge required!</p><p>You’ve probably heard about <a href="http://graphql.org/">GraphQL</a> and <a href="https://facebook.github.io/relay/">Relay</a>, and how they
will change everything we know about data management in our applications. At Zalando, we’re open to experimenting with
and adding to our technology stack, and this is no exception. In this article I’d like to shed some light on Relay’s
internals and build a GraphQL server on top of the existing <a href="https://github.com/zalando/shop-api-documentation">Zalando REST
API</a>, making it compatible with Relay. The application will consist
of a list of articles and an article detail page, featuring recommendations. It is assumed that readers will be familiar
with ES6, React, and have some cursory knowledge of GraphQL.</p>
<h3>A GraphQL refresher</h3>
<p>GraphQL is a query language developed by Facebook ( <a href="https://facebook.github.io/react/blog/2015/05/01/graphql-introduction.html">GraphQL
Introduction</a>, <a href="https://code.facebook.com/posts/1691455094417024/graphql-a-data-query-language/">GraphQL: A data query
language</a>). Though general usage is
possible, it was designed with the needs of UIs in mind and is considered more efficient than REST APIs for this
purpose. If you ever thought that <em>/recommendations?articleIds=1,2,4&include=name,price</em> is not a very RESTful endpoint,
you might want to consider a <a href="http://samnewman.io/patterns/architectural/bff/">BFF</a> with GraphQL.</p>
<p>One of the major differences compared to a REST API is the necessity to define a schema for your data and queries. While
you can do that for REST APIs as well (e.g. with <a href="https://openapis.org/">OpenAPI</a>), you can’t ask for specific data: You
would have to implement the include parameter from the above example yourself. Also, the server may or may not return
data according to the defined schema. This is in stark contrast to GraphQL, where you get exactly what you asked for.</p>
<p>To query an article in our GraphQL server you might submit a query like this:</p>
<div class="highlight"><pre><span></span><code>query {
Article(id: "PU142E04G-Q11") {
name
}
}
</code></pre></div>
<p>And get a response like this:</p>
<div class="highlight"><pre><span></span><code>{
"data": {
"Article": {
"name": "UB - Tracksuit bottoms - black"
}
}
}
</code></pre></div>
<p>Note how the response structure mirrors your query. Also, you can’t query for “everything” or something that is not
contained in the schema:</p>
<div class="highlight"><pre><span></span><code><span class="nx">query</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Article</span><span class="p">(</span><span class="nx">id</span><span class="p">:</span><span class="w"> </span><span class="s">"some-article-id"</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">{</span>
<span class="w"> </span><span class="s">"errors"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"message"</span><span class="p">:</span><span class="w"> </span><span class="s">"Field \"Article\" of type \"Article\" must have a selection of subfields. Did you mean \"Article { ... }\"?"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"locations"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"line"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span>
<span class="w"> </span><span class="s">"column"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
<span class="nx">query</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Article</span><span class="p">(</span><span class="nx">id</span><span class="p">:</span><span class="w"> </span><span class="s">"some-article-id"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">isPretty</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="p">{</span>
<span class="w"> </span><span class="s">"errors"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"message"</span><span class="p">:</span><span class="w"> </span><span class="s">"Cannot query field \"isPretty\" on type \"Article\"."</span><span class="p">,</span>
<span class="w"> </span><span class="s">"locations"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"line"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span>
<span class="w"> </span><span class="s">"column"</span><span class="p">:</span><span class="w"> </span><span class="mi">5</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div>
<p>With that in mind, let’s first build the schema for a non-Relay GraphQL server, because having GraphQL does not
automatically mean you can use Relay with it. We’ll implement the necessary changes afterwards.</p>
<h3>The schema</h3>
<p>Our data is pretty straightforward as it only deals with articles. We have an article type consisting of a name, preview
image (thumbnail), brand information, a list of proper images, and recommendations, which are a list of articles. Note
that fields without an exclamation mark are nullable, so we can use the same data type for preview and detail purposes.</p>
<div class="highlight"><pre><span></span><code><span class="n">type</span><span class="w"> </span><span class="n">Brand</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">name</span><span class="p">:</span><span class="w"> </span><span class="n">String</span>
<span class="w"> </span><span class="nl">logoUrl</span><span class="p">:</span><span class="w"> </span><span class="n">String</span>
<span class="err">}</span>
<span class="n">enum</span><span class="w"> </span><span class="n">Gender</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">MALE</span>
<span class="w"> </span><span class="n">FEMALE</span>
<span class="err">}</span>
<span class="n">type</span><span class="w"> </span><span class="nc">Image</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">thumbnailUrl</span><span class="p">:</span><span class="w"> </span><span class="n">String</span>
<span class="w"> </span><span class="nl">smallUrl</span><span class="p">:</span><span class="w"> </span><span class="n">String</span>
<span class="w"> </span><span class="nl">mediumUrl</span><span class="p">:</span><span class="w"> </span><span class="n">String</span>
<span class="w"> </span><span class="nl">largeUrl</span><span class="p">:</span><span class="w"> </span><span class="n">String</span>
<span class="err">}</span>
<span class="n">type</span><span class="w"> </span><span class="n">Article</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">ID</span><span class="err">!</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">non</span><span class="o">-</span><span class="n">nullable</span><span class="p">,</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">guaranteed</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">exist</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="k">every</span><span class="w"> </span><span class="n">article</span>
<span class="w"> </span><span class="nl">name</span><span class="p">:</span><span class="w"> </span><span class="n">String</span>
<span class="w"> </span><span class="nl">thumbnailUrl</span><span class="p">:</span><span class="w"> </span><span class="n">String</span>
<span class="w"> </span><span class="nl">brand</span><span class="p">:</span><span class="w"> </span><span class="n">Brand</span>
<span class="w"> </span><span class="nl">genders</span><span class="p">:</span><span class="w"> </span><span class="o">[</span><span class="n">Gender</span><span class="o">]</span>
<span class="w"> </span><span class="nl">images</span><span class="p">:</span><span class="w"> </span><span class="o">[</span><span class="n">Image</span><span class="o">]</span>
<span class="w"> </span><span class="nl">recommendations</span><span class="p">:</span><span class="w"> </span><span class="o">[</span><span class="n">Article</span><span class="o">]</span>
<span class="err">}</span>
<span class="n">type</span><span class="w"> </span><span class="n">Query</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">Article</span><span class="p">(</span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">ID</span><span class="err">!</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Article</span>
<span class="w"> </span><span class="nl">Articles</span><span class="p">:</span><span class="w"> </span><span class="o">[</span><span class="n">Articles</span><span class="o">]</span>
<span class="err">}</span>
</code></pre></div>
<p>The Query type is a special GraphQL type as it defines entry points to the GraphQL API. You can only query for fields of
the Query type. Here we have defined queries for a list of articles and a specific article.</p>
<h3>The first GraphQL server implementation</h3>
<p>Since our GraphQL server will at first only proxy calls to the Zalando REST API, we can take the schema description from
above and generate a JavaScript representation from it. It’s very convenient — see below for all of the server code
(excluding API calls):</p>
<div class="highlight"><pre><span></span><code><span class="k">const</span><span class="w"> </span><span class="n">express</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">require</span><span class="p">(</span><span class="s1">'express'</span><span class="p">),</span>
<span class="w"> </span><span class="n">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">),</span>
<span class="w"> </span><span class="n">bodyParser</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">require</span><span class="p">(</span><span class="s1">'body-parser'</span><span class="p">),</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">we</span><span class="w"> </span><span class="n">use</span><span class="w"> </span><span class="n">graphql</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">query</span><span class="w"> </span><span class="n">execution</span>
<span class="w"> </span><span class="p">{</span><span class="n">graphql</span><span class="p">,</span><span class="w"> </span><span class="n">buildSchema</span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">require</span><span class="p">(</span><span class="s1">'graphql'</span><span class="p">),</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">our</span><span class="w"> </span><span class="n">article</span><span class="w"> </span><span class="n">schema</span>
<span class="w"> </span><span class="n">Schema</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb nb-Type">String</span><span class="p">(</span><span class="n">fs</span><span class="o">.</span><span class="n">readFileSync</span><span class="p">(</span><span class="s1">'./data/schema.graphql'</span><span class="p">)),</span>
<span class="w"> </span><span class="n">app</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">express</span><span class="p">(),</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">functions</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">fetch</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="n">from</span><span class="w"> </span><span class="n">REST</span><span class="w"> </span><span class="n">API</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">transform</span><span class="w"> </span><span class="n">according</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">schema</span>
<span class="w"> </span><span class="n">api</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">require</span><span class="p">(</span><span class="s1">'./api'</span><span class="p">),</span>
<span class="w"> </span><span class="n">jsSchema</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">buildSchema</span><span class="p">(</span><span class="n">Schema</span><span class="p">);</span>
<span class="o">//</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="n">defines</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">entry</span><span class="w"> </span><span class="n">point</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">GraphQL</span><span class="w"> </span><span class="n">API</span>
<span class="o">//</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">GraphQL</span><span class="w"> </span><span class="n">query</span><span class="w"> </span><span class="n">executor</span><span class="w"> </span><span class="n">will</span><span class="w"> </span><span class="n">start</span><span class="w"> </span><span class="n">resolving</span><span class="w"> </span><span class="n">fields</span><span class="w"> </span><span class="n">from</span><span class="w"> </span><span class="n">here</span>
<span class="o">//</span><span class="w"> </span><span class="p">(</span><span class="n">how</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="n">works</span><span class="w"> </span><span class="n">exactly</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">explained</span><span class="w"> </span><span class="n">later</span><span class="p">)</span>
<span class="k">const</span><span class="w"> </span><span class="n">queryResolver</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Article</span><span class="p">:</span><span class="w"> </span><span class="p">({</span><span class="n">id</span><span class="p">})</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">api</span><span class="o">.</span><span class="n">fetchArticle</span><span class="p">(</span><span class="n">id</span><span class="p">),</span>
<span class="w"> </span><span class="n">Articles</span><span class="p">:</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">api</span><span class="o">.</span><span class="n">fetchArticles</span><span class="p">()</span>
<span class="p">};</span>
<span class="o">//</span><span class="w"> </span><span class="n">GraphQL</span><span class="w"> </span><span class="n">queries</span><span class="w"> </span><span class="n">will</span><span class="w"> </span><span class="n">be</span><span class="w"> </span><span class="n">plain</span><span class="w"> </span><span class="n">text</span>
<span class="n">app</span><span class="o">.</span><span class="n">use</span><span class="p">(</span><span class="n">bodyParser</span><span class="o">.</span><span class="n">text</span><span class="p">());</span>
<span class="o">//</span><span class="w"> </span><span class="n">we</span><span class="w"> </span><span class="n">use</span><span class="w"> </span><span class="n">POST</span><span class="w"> </span><span class="o">/</span><span class="n">graphql</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">submit</span><span class="w"> </span><span class="n">queries</span>
<span class="n">app</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="s1">'/graphql'</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">req</span><span class="p">,</span><span class="w"> </span><span class="n">res</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">query</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">req</span><span class="o">.</span><span class="n">body</span><span class="p">;</span>
<span class="w"> </span><span class="n">graphql</span><span class="p">(</span><span class="n">jsSchema</span><span class="p">,</span><span class="w"> </span><span class="n">query</span><span class="p">,</span><span class="w"> </span><span class="n">queryResolver</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="n">then</span><span class="p">(</span><span class="n">result</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">res</span><span class="o">.</span><span class="n">status</span><span class="p">(</span><span class="mi">200</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="n">json</span><span class="p">(</span><span class="n">result</span><span class="p">))</span>
<span class="p">});</span>
<span class="n">app</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="n">process</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">PORT</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="mi">3001</span><span class="p">);</span>
</code></pre></div>
<p>And we can verify it works by executing:</p>
<div class="highlight"><pre><span></span><code>curl -X POST -H "Content-Type: text/plain" -d "{ Articles { name } }" http://localhost:3001/graphql -v
</code></pre></div>
<p>That was quick! If you just want to wrap your REST API in GraphQL, this is already enough. But let’s go a step further
and take a look at Relay.</p>
<h3>Relay introduction</h3>
<p><a href="https://facebook.github.io/relay/">Relay</a> is a client framework designed to work with a compatible GraphQL server. It
makes assumptions on which types exist, how types are named, and which filters are available. This means that it won’t
work out of the box with any GraphQL server. Essentially, you tie React components to “fragments” of GraphQL types,
specifying fields that you expect. Relay takes this data dependency tree, generates the necessary GraphQL query (note
the singular!), runs it, and distributes the data back to React components.</p>
<p>What kind of problems does it solve? (Which other problems it introduces we’ll see later — everything is a tradeoff.)</p>
<p>Firstly, the network access is more efficient with Relay compared to working with REST APIs. Relay collects all data
requirements and sends a single query to the server. (I’m not sure if it’s the case already, but it could also generate
the minimum necessary query, e.g. when you have multiple fragments with overlapping fields on the same resource.) It
also caches data already fetched automatically and won’t refetch if it finds something in the cache.</p>
<p>Second, your queries are located next to your component. Suppose you want to use another field of a type. Without Relay,
you now need to touch the data fetching code (probably <a href="https://github.com/reactjs/redux">Redux</a> actions) and possibly
one or many parent components that distribute this data. Changing some props on the parent component might break child
components. With Relay, you add the field to the fragment and can use it directly — that’s it.</p>
<h3>Updating our GraphQL server</h3>
<p>Relay wants to do three things:</p>
<ol>
<li><a href="https://facebook.github.io/relay/graphql/objectidentification.htm">Identify objects</a></li>
<li><a href="https://facebook.github.io/relay/graphql/connections.htm">Navigate large lists</a></li>
<li><a href="https://facebook.github.io/relay/graphql/mutations.htm">Do mutations</a> (not covered as our application is read-only)</li>
</ol>
<p><strong>Identifying objects
</strong>Relay wants a single way to query for an object. What’s required is an interface <em>Node</em> with a single field id of
<em>type ID</em> and a query for that, called <em>node</em>. Relay assumes you have globally unique IDs. If you don’t, you can make
them so by concatenating type name with id, e.g. <em>article-42</em>. Since we only deal with a single object type (Article),
we can reuse the existing ID.</p>
<p>The following changes are necessary to our schema:</p>
<div class="highlight"><pre><span></span><code><span class="n">interface</span><span class="w"> </span><span class="n">Node</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">ID</span><span class="err">!</span>
<span class="err">}</span>
<span class="n">type</span><span class="w"> </span><span class="n">Article</span><span class="w"> </span><span class="n">implements</span><span class="w"> </span><span class="n">Node</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">ID</span><span class="err">!</span>
<span class="w"> </span><span class="nl">name</span><span class="p">:</span><span class="w"> </span><span class="n">String</span>
<span class="w"> </span><span class="nl">thumbnailUrl</span><span class="p">:</span><span class="w"> </span><span class="n">String</span>
<span class="w"> </span><span class="nl">brand</span><span class="p">:</span><span class="w"> </span><span class="n">Brand</span>
<span class="w"> </span><span class="nl">genders</span><span class="p">:</span><span class="w"> </span><span class="o">[</span><span class="n">Gender</span><span class="o">]</span>
<span class="w"> </span><span class="nl">images</span><span class="p">:</span><span class="w"> </span><span class="o">[</span><span class="n">Image</span><span class="o">]</span>
<span class="w"> </span><span class="nl">recommendations</span><span class="p">:</span><span class="w"> </span><span class="o">[</span><span class="n">Article</span><span class="o">]</span>
<span class="err">}</span>
<span class="n">type</span><span class="w"> </span><span class="n">Query</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">Articles</span><span class="p">:</span><span class="w"> </span><span class="o">[</span><span class="n">Article</span><span class="o">]</span>
<span class="w"> </span><span class="n">Article</span><span class="p">(</span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">ID</span><span class="err">!</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Article</span>
<span class="w"> </span><span class="n">node</span><span class="p">(</span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">ID</span><span class="err">!</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Node</span>
<span class="err">}</span>
</code></pre></div>
<p>Since the signature and semantics of our Article and node queries are the same, we can use the same code too!</p>
<div class="highlight"><pre><span></span><code><span class="k">const</span><span class="w"> </span><span class="n">queryResolver</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Article</span><span class="p">:</span><span class="w"> </span><span class="p">({</span><span class="n">id</span><span class="p">})</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">api</span><span class="o">.</span><span class="n">fetchArticle</span><span class="p">(</span><span class="n">id</span><span class="p">),</span>
<span class="w"> </span><span class="n">Articles</span><span class="p">:</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">api</span><span class="o">.</span><span class="n">fetchArticles</span><span class="p">(),</span>
<span class="w"> </span><span class="n">node</span><span class="p">:</span><span class="w"> </span><span class="p">({</span><span class="n">id</span><span class="p">})</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">api</span><span class="o">.</span><span class="n">fetchArticle</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="p">};</span>
</code></pre></div>
<p>Let’s run a query to test it:</p>
<div class="highlight"><pre><span></span><code><span class="nx">query</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">node</span><span class="p">(</span><span class="nx">id</span><span class="p">:</span><span class="w"> </span><span class="s">"PU142E04G-Q11"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">id</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="p">{</span>
<span class="w"> </span><span class="s">"data"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"node"</span><span class="p">:</span><span class="w"> </span><span class="nx">null</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s">"errors"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"message"</span><span class="p">:</span><span class="w"> </span><span class="s">"Generated Schema cannot use Interface or Union types for execution."</span><span class="p">,</span>
<span class="w"> </span><span class="s">"locations"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"line"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span>
<span class="w"> </span><span class="s">"column"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="s">"path"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s">"node"</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div>
<p>What happened here? This is related to how GraphQL fragments are supposed to work. Suppose we want to query for an
article via the <em>node</em> query and fetch the name. Since an article is also a node, this should be possible. However, we
can’t just add the <em>name</em> field to the query, because it returns a Node and this (interface) type only has an id. What
we can do instead is this:</p>
<div class="highlight"><pre><span></span><code>query {
node(id: "PU142E04G-Q11") {
id
... on Article {
name
}
}
}
</code></pre></div>
<p>We say “if the node returned is of type Article, then take the name from it”. That’s why an interface type has to be
mapped to an object type at runtime. Unfortunately, there is no way to my knowledge that we can achieve this by
modifying the schema (string) definition — we have to rewrite our JS schema representation by hand. We can then
implement <em>isTypeOf</em> on the Article type:</p>
<div class="highlight"><pre><span></span><code><span class="n">const</span><span class="w"> </span><span class="n">Article</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">GraphQLObjectType</span><span class="p">(</span><span class="err">{</span>
<span class="w"> </span><span class="nl">name</span><span class="p">:</span><span class="w"> </span><span class="s1">'Article'</span><span class="p">,</span>
<span class="w"> </span><span class="nl">interfaces</span><span class="p">:</span><span class="w"> </span><span class="o">[</span><span class="n">Node</span><span class="o">]</span><span class="p">,</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">whatever</span><span class="w"> </span><span class="n">has</span><span class="w"> </span><span class="n">an</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">considered</span><span class="w"> </span><span class="n">an</span><span class="w"> </span><span class="n">article</span><span class="w"> </span><span class="p">(</span><span class="n">we</span><span class="w"> </span><span class="n">don</span><span class="err">’</span><span class="n">t</span><span class="w"> </span><span class="n">have</span><span class="w"> </span><span class="n">anything</span><span class="w"> </span><span class="k">else</span><span class="p">)</span>
<span class="w"> </span><span class="nl">isTypeOf</span><span class="p">:</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="err">!!</span><span class="k">value</span><span class="p">.</span><span class="n">id</span><span class="p">,</span>
<span class="w"> </span><span class="nl">fields</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">GraphQLNonNull</span><span class="p">(</span><span class="n">GraphQLID</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">name</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="n">GraphQLString</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">thumbnailUrl</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="n">GraphQLString</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">brand</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="n">Brand</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">genders</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">GraphQLList</span><span class="p">(</span><span class="n">Gender</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">images</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">GraphQLList</span><span class="p">(</span><span class="nc">Image</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">recommendations</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">GraphQLList</span><span class="p">(</span><span class="n">Article</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span>
<span class="err">}</span>
</code></pre></div>
<p>As you can see, our server stays more or less the same, we just exchange the buildSchema step with the JS schema we just
built.</p>
<h3>Navigating large lists</h3>
<p>Next up, Relay wants a single way to essentially do pagination. Usually, you have two types of one-to-many relations in
your data: Those with a limited amount of items, like our article images (we don’t know exactly how many there are, but
usually only a few), and those with unlimited items, like our recommendations (they will get worse the more we fetch,
but we can always get more). For the former you will use <em>List</em> types, whereas Relay has a special type for the latter:
Connections.</p>
<p>A Connection type in Relay ends with “Connection”, so our recommendations will be of type <em>ArticleConnection</em>. It holds
two fields: Edges and PageInfo. <em>Edge</em> is a sort of intermediate type between the Connection and the type you’re
connecting to, holding a cursor (the id of a node in the simplest case) and the node. Its name needs to end in “Edge”.
Cursors are used to… well, do <a href="https://zalando.github.io/restful-api-guidelines/pagination/Pagination.html#should-prefer-cursorbased-pagination-avoid-offsetbased-pagination">cursor-based
pagination</a>,
as opposed to offset/page-based pagination. The <em>PageInfo</em> type should be straightforward.</p>
<p>We’ll change our schema like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">type</span><span class="w"> </span><span class="n">ArticleConnection</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">pageInfo</span><span class="p">:</span><span class="w"> </span><span class="n">PageInfo</span>
<span class="w"> </span><span class="nl">edges</span><span class="p">:</span><span class="w"> </span><span class="o">[</span><span class="n">ArticleEdge</span><span class="o">]</span>
<span class="err">}</span>
<span class="n">type</span><span class="w"> </span><span class="n">ArticleEdge</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">node</span><span class="p">:</span><span class="w"> </span><span class="n">Article</span><span class="err">!</span>
<span class="w"> </span><span class="nc">cursor</span><span class="err">:</span><span class="w"> </span><span class="n">ID</span><span class="err">!</span>
<span class="err">}</span>
<span class="n">type</span><span class="w"> </span><span class="n">PageInfo</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">hasNextPage</span><span class="p">:</span><span class="w"> </span><span class="k">Boolean</span><span class="err">!</span>
<span class="w"> </span><span class="nl">hasPreviousPage</span><span class="p">:</span><span class="w"> </span><span class="k">Boolean</span><span class="err">!</span>
<span class="err">}</span>
<span class="n">type</span><span class="w"> </span><span class="n">Article</span><span class="w"> </span><span class="n">implements</span><span class="w"> </span><span class="n">Node</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">ID</span><span class="err">!</span>
<span class="w"> </span><span class="nl">name</span><span class="p">:</span><span class="w"> </span><span class="n">String</span>
<span class="w"> </span><span class="nl">thumbnailUrl</span><span class="p">:</span><span class="w"> </span><span class="n">String</span>
<span class="w"> </span><span class="nl">brand</span><span class="p">:</span><span class="w"> </span><span class="n">Brand</span>
<span class="w"> </span><span class="nl">genders</span><span class="p">:</span><span class="w"> </span><span class="o">[</span><span class="n">Gender</span><span class="o">]</span>
<span class="w"> </span><span class="nl">images</span><span class="p">:</span><span class="w"> </span><span class="o">[</span><span class="n">Image</span><span class="o">]</span>
<span class="w"> </span><span class="n">recommendations</span><span class="p">(</span><span class="k">first</span><span class="err">:</span><span class="w"> </span><span class="nc">Int</span><span class="p">,</span><span class="w"> </span><span class="k">last</span><span class="err">:</span><span class="w"> </span><span class="nc">Int</span><span class="p">,</span><span class="w"> </span><span class="k">before</span><span class="err">:</span><span class="w"> </span><span class="n">ID</span><span class="p">,</span><span class="w"> </span><span class="k">after</span><span class="err">:</span><span class="w"> </span><span class="n">ID</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">ArticleConnection</span>
<span class="err">}</span>
</code></pre></div>
<p>The filters on the recommendations field of the Article are also a requirement by Relay, as it otherwise doesn’t have a
way to formulate its pagination queries (“give me 10 items after the one with ID foo”).</p>
<p>Before implementing the desired behavior, let’s think about how we’ll manage recommendations. It’s a separate API call,
so we don’t want to execute it every time an article is requested. And relatedly, since recommendations are also
articles, we have to make sure not to find ourselves in an infinite loop of fetching recommendations.</p>
<p>Luckily, there is already a way to achieve this in GraphQL. The way a query works in a nutshell is that for every object
returned, it will call a resolver function for requested attributes, and repeat the process on the returned objects
until only scalar types (Int, String…) are left. The default resolver is a simple lookup (<em>obj[‘attribute’]</em>), but we
can override it! If an attribute is not requested, its resolver function will not be called, thus it’s a perfect fit for
our recommendations.</p>
<div class="highlight"><pre><span></span><code><span class="n">const</span><span class="w"> </span><span class="n">Article</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">GraphQLObjectType</span><span class="p">(</span><span class="err">{</span>
<span class="w"> </span><span class="nl">name</span><span class="p">:</span><span class="w"> </span><span class="s1">'Article'</span><span class="p">,</span>
<span class="w"> </span><span class="nl">interfaces</span><span class="p">:</span><span class="w"> </span><span class="o">[</span><span class="n">Node</span><span class="o">]</span><span class="p">,</span>
<span class="w"> </span><span class="nl">isTypeOf</span><span class="p">:</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="err">!!</span><span class="k">value</span><span class="p">.</span><span class="n">id</span><span class="p">,</span>
<span class="w"> </span><span class="nl">fields</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">GraphQLNonNull</span><span class="p">(</span><span class="n">GraphQLID</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">name</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="n">GraphQLString</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">thumbnailUrl</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="n">GraphQLString</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">brand</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="n">Brand</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">genders</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">GraphQLList</span><span class="p">(</span><span class="n">Gender</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">images</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">GraphQLList</span><span class="p">(</span><span class="nc">Image</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">recommendations</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="n">ArticleConnection</span><span class="p">,</span>
<span class="w"> </span><span class="nl">args</span><span class="p">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">first</span><span class="err">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="n">GraphQLInt</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="k">last</span><span class="err">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="n">GraphQLInt</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="k">before</span><span class="err">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="n">GraphQLID</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="k">after</span><span class="err">:</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">type</span><span class="p">:</span><span class="w"> </span><span class="n">GraphQLID</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">resolve</span><span class="p">:</span><span class="w"> </span><span class="k">function</span><span class="w"> </span><span class="p">(</span><span class="n">article</span><span class="p">,</span><span class="w"> </span><span class="n">params</span><span class="p">)</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">toConnection</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">helper</span><span class="w"> </span><span class="k">function</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="k">Connection</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="n">compatible</span><span class="w"> </span><span class="k">structure</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">an</span><span class="w"> </span><span class="n">API</span><span class="w"> </span><span class="k">call</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">toConnection</span><span class="p">(</span><span class="n">api</span><span class="p">.</span><span class="n">fetchRecommendations</span><span class="p">(</span><span class="n">article</span><span class="p">.</span><span class="n">id</span><span class="p">),</span><span class="w"> </span><span class="n">params</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="err">}</span>
<span class="err">}</span><span class="p">);</span>
</code></pre></div>
<p>Let’s try it out!</p>
<div class="highlight"><pre><span></span><code>query {
node(id: "PU142E04G-Q11") {
id
... on Article {
name
recommendations(first: 1) {
edges { node { id name } }
}
}
}
}
{
"data": {
"node": {
"id": "PU142E04G-Q11",
"name": "UB - Tracksuit bottoms - black",
"recommendations": {
"edges": [
{
"node": {
"id": "AD542E0FX-C11",
"name": "Tracksuit bottoms - medium grey heather/black"
}
}
]
}
}
}
}
</code></pre></div>
<p>Note that there is a <a href="https://github.com/graphql/graphql-relay-js">helper library</a> available for these modifications,
but it’s better to understand something first before using an abstraction.</p>
<h3>The Relay application</h3>
<p>Now that we’re ready, let’s build a client that talks to our new Relay-compatible API. We will have a list of articles:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/41f86097610056809243b4eddaca6903d9e223e0_graphql1.png?auto=compress,format"></p>
<p>And an article detail view that leverages the article list for displaying recommendations:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/10a959033c95f5c4ba21b23479fb284b0091ef9f_graphql2.png?auto=compress,format"></p>
<p>First we’ll build the article list, as we’ll need that component from the outset.</p>
<h3>ArticleList</h3>
<p>ArticleList is a pretty simple component, as it only needs some articles and a flag for whether or not there are more
articles available. There is no point in showing the “load more” button without the aforementioned flag. This is how it
looks without Relay:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span><span class="w"> </span><span class="n">ArticleList</span><span class="w"> </span><span class="k">extends</span><span class="w"> </span><span class="n">Component</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">render</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="n">articles</span><span class="p">,</span><span class="w"> </span><span class="n">hasNext</span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">{</span><span class="n">articles</span><span class="o">.</span><span class="n">edges</span><span class="o">.</span><span class="n">map</span><span class="p">((</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">)}</span>
<span class="w"> </span><span class="p">{</span><span class="n">hasNext</span><span class="w"> </span><span class="err">?</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">onLoadMore</span><span class="p">()}</span><span class="o">></span>
<span class="w"> </span><span class="n">Load</span><span class="w"> </span><span class="n">more</span><span class="w"> </span><span class="n">articles</span>
<span class="w"> </span><span class="p">:</span>
<span class="w"> </span><span class="nb nb-Type">null</span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>With Relay, we need to create a container along with the appropriate fragment, since our articles and <em>hasNext</em> are
returned by the backend. Both fields are contained in an <em>ArticleConnection</em>, so we write a fragment for that type.</p>
<div class="highlight"><pre><span></span><code><span class="k">export</span><span class="w"> </span><span class="n">default</span><span class="w"> </span><span class="n">Relay</span><span class="o">.</span><span class="n">createContainer</span><span class="p">(</span><span class="n">ArticleList</span><span class="p">,</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">fragments</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">articles</span><span class="p">:</span><span class="w"> </span><span class="p">(</span><span class="n">vars</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Relay</span><span class="o">.</span><span class="n">QL</span><span class="err">`</span>
<span class="w"> </span><span class="n">fragment</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">ArticleConnection</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">pageInfo</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">hasNextPage</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">edges</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">node</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1"># defensive programming in case we pass Relay variables around</span>
<span class="w"> </span><span class="c1"># we need to pull in the ArticlePreview fragment, because we render article previews</span>
<span class="w"> </span><span class="o">$</span><span class="p">{</span><span class="n">ArticlePreview</span><span class="o">.</span><span class="n">getFragment</span><span class="p">(</span><span class="s1">'article'</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="o">...</span><span class="n">vars</span><span class="p">})}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="err">`</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">})</span>
</code></pre></div>
<p>(Note: You might wonder what <em>Relay.QL</em> is exactly. It will be transformed by the
<a href="https://facebook.github.io/relay/docs/guides-babel-plugin.html">babel-relay-plugin</a> at build time to an abstract syntax
tree representing the GraphQL query. That’s the reason why you need to <a href="https://github.com/prayerslayer/zalando-graphql-relay/blob/master/scripts/update-schema.js">export your schema to a JSON
file</a>, so that Relay can do
client-side validation of the query.)</p>
<p>If everything works we can be sure to have the property articles in our component which contains the data of the
fragment (<em>pageInfo.hasNextPage</em> and <em>edges.node</em>). The Relay container will throw an error otherwise. Of course, we
have to make minor changes on the React component itself to accommodate new props.</p>
<p>We’ll build the other components with the same logic.</p>
<h3>ArticlePreview</h3>
<p>An article preview consists of only a thumbnail, name, and brand name (and id for navigation purposes).</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span><span class="w"> </span><span class="n">ArticlePreview</span><span class="w"> </span><span class="k">extends</span><span class="w"> </span><span class="n">Component</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">render</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">onClick</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">onClick</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">article</span><span class="o">.</span><span class="n">id</span><span class="p">)}</span><span class="o">></span>
<span class="w"> </span><span class="p">{</span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">article</span><span class="o">.</span><span class="n">brand</span><span class="o">.</span><span class="n">name</span><span class="p">}</span>
<span class="w"> </span><span class="p">{</span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">article</span><span class="o">.</span><span class="n">name</span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="k">export</span><span class="w"> </span><span class="n">default</span><span class="w"> </span><span class="n">Relay</span><span class="o">.</span><span class="n">createContainer</span><span class="p">(</span><span class="n">ArticlePreview</span><span class="p">,</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">fragments</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">article</span><span class="p">:</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Relay</span><span class="o">.</span><span class="n">QL</span><span class="err">`</span>
<span class="w"> </span><span class="n">fragment</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">Article</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">id</span>
<span class="w"> </span><span class="n">name</span>
<span class="w"> </span><span class="n">thumbnailUrl</span>
<span class="w"> </span><span class="n">brand</span><span class="w"> </span><span class="p">{</span><span class="n">name</span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="err">`</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">})</span>
</code></pre></div>
<h3>ArticleDetail</h3>
<p>The detail view features more images at a bigger size as well as recommendations for an article. We will initially show
the first five recommended articles and then show the next five every time the button is clicked. The way this works is
we increment the <em>pageSize</em> variable in the Relay query each time with r<em>elay.setVariables</em>. (When you wrap a React
component in a Relay component, <em>relay</em> is automatically set on the props — you then have some <a href="https://facebook.github.io/relay/docs/api-reference-relay-container.html#overview">useful methods
available</a>.)</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span><span class="w"> </span><span class="n">ArticleDetail</span><span class="w"> </span><span class="k">extends</span><span class="w"> </span><span class="n">Component</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">onLoadMore</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">setVariables</span><span class="p">({</span>
<span class="w"> </span><span class="n">pageSize</span><span class="p">:</span><span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">variables</span><span class="o">.</span><span class="n">pageSize</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">5</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">render</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="n">article</span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">{</span><span class="n">article</span><span class="o">.</span><span class="n">brand</span><span class="o">.</span><span class="n">name</span><span class="p">}</span>
<span class="w"> </span><span class="p">{</span><span class="n">article</span><span class="o">.</span><span class="n">name</span><span class="p">}</span>
<span class="w"> </span><span class="n">img</span><span class="o">.</span><span class="n">largeUrl</span><span class="p">)}</span><span class="o">/></span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="k">export</span><span class="w"> </span><span class="n">default</span><span class="w"> </span><span class="n">Relay</span><span class="o">.</span><span class="n">createContainer</span><span class="p">(</span><span class="n">ArticleDetail</span><span class="p">,</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">initialVariables</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">pageSize</span><span class="p">:</span><span class="w"> </span><span class="mi">5</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="n">fragments</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">article</span><span class="p">:</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Relay</span><span class="o">.</span><span class="n">QL</span><span class="err">`</span>
<span class="w"> </span><span class="n">fragment</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">Article</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">name</span>
<span class="w"> </span><span class="n">thumbnailUrl</span>
<span class="w"> </span><span class="n">brand</span><span class="w"> </span><span class="p">{</span><span class="n">name</span><span class="w"> </span><span class="n">logoUrl</span><span class="p">}</span>
<span class="w"> </span><span class="n">images</span><span class="w"> </span><span class="p">{</span><span class="n">largeUrl</span><span class="p">}</span>
<span class="w"> </span><span class="n">recommendations</span><span class="p">(</span><span class="n">first</span><span class="p">:</span><span class="w"> </span><span class="o">$</span><span class="n">pageSize</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="o">$</span><span class="p">{</span><span class="n">ArticleList</span><span class="o">.</span><span class="n">getFragment</span><span class="p">(</span><span class="s1">'articles'</span><span class="p">)}</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="err">`</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">});</span>
</code></pre></div>
<h3>GraphQL server</h3>
<p>Previously, we defined our server endpoint to expect a <em>POST</em> request with <em>text/plain</em> body on <em>/graphql</em>. This works
perfectly fine, however Relay will submit an <em>application/json</em> body with <em>query</em> and <em>variables</em> fields, so we have to
change our server accordingly.</p>
<div class="highlight"><pre><span></span><code><span class="n">app</span><span class="o">.</span><span class="n">use</span><span class="p">(</span><span class="n">bodyParser</span><span class="o">.</span><span class="n">json</span><span class="p">());</span>
<span class="n">app</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="s1">'/graphql'</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">req</span><span class="p">,</span><span class="w"> </span><span class="n">res</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">let</span><span class="w"> </span><span class="n">query</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">,</span>
<span class="w"> </span><span class="n">variables</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">typeof</span><span class="w"> </span><span class="n">req</span><span class="o">.</span><span class="n">body</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">'string'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">query</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">req</span><span class="o">.</span><span class="n">body</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">typeof</span><span class="w"> </span><span class="n">req</span><span class="o">.</span><span class="n">body</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">'object'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">query</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">req</span><span class="o">.</span><span class="n">body</span><span class="o">.</span><span class="n">query</span>
<span class="w"> </span><span class="n">variables</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">req</span><span class="o">.</span><span class="n">body</span><span class="o">.</span><span class="n">variables</span>
<span class="w"> </span><span class="p">}</span>
<span class="n">try</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">graphql</span><span class="p">(</span><span class="n">Schema</span><span class="p">,</span><span class="w"> </span><span class="n">query</span><span class="p">,</span><span class="w"> </span><span class="p">{},</span><span class="w"> </span><span class="p">{},</span><span class="w"> </span><span class="n">variables</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="n">then</span><span class="p">(</span><span class="n">result</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">res</span><span class="o">.</span><span class="n">status</span><span class="p">(</span><span class="mi">200</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="n">json</span><span class="p">(</span><span class="n">result</span><span class="p">))</span>
<span class="w"> </span><span class="o">.</span><span class="n">catch</span><span class="p">(</span><span class="n">e</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">throw</span><span class="w"> </span><span class="n">e</span><span class="p">;</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">catch</span><span class="w"> </span><span class="p">(</span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">console</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
<span class="w"> </span><span class="n">res</span><span class="o">.</span><span class="n">status</span><span class="p">(</span><span class="mi">500</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">e</span><span class="o">.</span><span class="n">message</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">});</span>
</code></pre></div>
<p>By treating it this way, it will accept both queries sent with curl or a REST client and everything that comes from
Relay. Let’s tie it all together in the next step.</p>
<h3>App</h3>
<p>We created our primitives in the previous steps, but how will we switch between list and detail views? How do we fetch
the initial list of articles? We have only defined fragments at this point, so where are all the objects coming from
that they are supposed to work on?</p>
<p>To answer all of these questions, let’s first try to persuade Relay to render *anything*. The first approach could be
like so:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// in case your Relay server does not run on the same host</span>
<span class="n">Relay</span><span class="p">.</span><span class="n">injectNetworkLayer</span><span class="p">(</span>
<span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">Relay</span><span class="p">.</span><span class="n">DefaultNetworkLayer</span><span class="p">(</span><span class="s">'http://localhost:3001/graphql'</span><span class="p">,</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">credentials</span><span class="p">:</span><span class="w"> </span><span class="s">'cors'</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="p">);</span>
<span class="n">DOM</span><span class="p">.</span><span class="n">render</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relay</span><span class="p">.</span><span class="n">QL</span>`<span class="n">query</span><span class="w"> </span><span class="p">{</span><span class="n">Articles</span><span class="p">(</span><span class="n">first</span><span class="p">:</span><span class="w"> </span><span class="no">$</span><span class="n">pageSize</span><span class="p">)}</span>`
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="n">params</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">pageSize</span><span class="p">:</span><span class="w"> </span><span class="mi">5</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="n">name</span><span class="p">:</span><span class="w"> </span><span class="s">'ArticlesQuery'</span>
<span class="w"> </span><span class="p">}}</span><span class="w"> </span><span class="o">/></span><span class="p">,</span>
<span class="w"> </span><span class="n">document</span><span class="p">.</span><span class="n">getElementById</span><span class="p">(</span><span class="s">'app'</span><span class="p">));</span>
</code></pre></div>
<p>This does not look like much, but a lot of things happen under the hood: We define a query skeleton <em>articles</em>. Relay
will look for a fragment <em>articles</em> in the container passed as Component and automatically insert it into the query sent
to the GraphQL server. If the query succeeds, it will pass the data as the property <em>articles</em> to ArticleList and render
it. In case the query fails, it will retry three times and then ultimately fail. (You can configure the amount of
retries, timeouts, and what to render on the RootContainer.) If the data requested is already in the local Relay cache,
it will take it from there and not query the server. In any case, it seems to work:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1532f4ad00bb845e0f1649a8b3db755d6193b775_graphql3.png?auto=compress,format"></p>
<p>You might not immediately notice, but the “load more” button is missing because the component does not know what to do
if it were clicked (<em>onLoadMore</em> property is missing). How do we fix this? Relay’s RootComponent does not pass unknown
properties down to ArticleList. We could probably create a Higher-Order Component that returns an ArticleList with
<em>onLoadMore</em> property, but what should the function execute? We would have to change the page size and trigger a
re-render ourselves, which doesn’t sound like fun. We also can’t change the query parameters from inside the ArticleList
components (Relay variables are only valid locally). Can we make Relay take care of all this? In the end, the whole
point of it is to be in charge of data fetching and re-rendering as necessary.</p>
<p>We can make this happen by creating an intermediate container, our actual application. It will know when to display an
ArticleList or ArticleDetail and change Relay variables accordingly. To make fragment handling easier for the container,
we will introduce an additional query on our server that can return a single article or a list of articles.</p>
<div class="highlight"><pre><span></span><code><span class="k">type</span><span class="w"> </span><span class="nx">Query</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Articles</span><span class="p">(</span><span class="nx">first</span><span class="p">:</span><span class="w"> </span><span class="nx">Int</span><span class="p">,</span><span class="w"> </span><span class="nx">last</span><span class="p">:</span><span class="w"> </span><span class="nx">Int</span><span class="p">,</span><span class="w"> </span><span class="nx">before</span><span class="p">:</span><span class="w"> </span><span class="nx">ID</span><span class="p">,</span><span class="w"> </span><span class="nx">after</span><span class="p">:</span><span class="w"> </span><span class="nx">ID</span><span class="p">):</span><span class="w"> </span><span class="nx">ArticleConnection</span>
<span class="w"> </span><span class="nx">Article</span><span class="p">(</span><span class="nx">id</span><span class="p">:</span><span class="w"> </span><span class="nx">ID</span><span class="p">!):</span><span class="w"> </span><span class="nx">Article</span>
<span class="w"> </span><span class="nx">Viewer</span><span class="p">:</span><span class="w"> </span><span class="nx">Viewer</span>
<span class="w"> </span><span class="nx">node</span><span class="p">(</span><span class="nx">id</span><span class="p">:</span><span class="w"> </span><span class="nx">ID</span><span class="p">!):</span><span class="w"> </span><span class="nx">Node</span>
<span class="p">}</span>
<span class="k">type</span><span class="w"> </span><span class="nx">Viewer</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">article</span><span class="p">(</span><span class="nx">id</span><span class="p">:</span><span class="w"> </span><span class="nx">ID</span><span class="p">!):</span><span class="w"> </span><span class="nx">Article</span>
<span class="w"> </span><span class="nx">articles</span><span class="p">(</span><span class="nx">first</span><span class="p">:</span><span class="w"> </span><span class="nx">Int</span><span class="p">,</span><span class="w"> </span><span class="nx">last</span><span class="p">:</span><span class="w"> </span><span class="nx">Int</span><span class="p">,</span><span class="w"> </span><span class="nx">before</span><span class="p">:</span><span class="w"> </span><span class="nx">ID</span><span class="p">,</span><span class="w"> </span><span class="nx">after</span><span class="p">:</span><span class="w"> </span><span class="nx">ID</span><span class="p">):</span><span class="w"> </span><span class="nx">ArticleConnection</span>
<span class="p">}</span>
</code></pre></div>
<p>The application can then work on fragments of the Viewer type.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span><span class="w"> </span><span class="n">App</span><span class="w"> </span><span class="k">extends</span><span class="w"> </span><span class="n">React</span><span class="o">.</span><span class="n">Component</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">onShowList</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">setVariables</span><span class="p">({</span>
<span class="w"> </span><span class="n">showDetailPage</span><span class="p">:</span><span class="w"> </span><span class="bp">false</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="n">onLoadMore</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">setVariables</span><span class="p">({</span>
<span class="w"> </span><span class="n">pageSize</span><span class="p">:</span><span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">variables</span><span class="o">.</span><span class="n">pageSize</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span>
<span class="w"> </span><span class="n">showDetailPage</span><span class="p">:</span><span class="w"> </span><span class="bp">false</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="n">onNavigate</span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">updateHistory</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="bp">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">setVariables</span><span class="p">({</span>
<span class="w"> </span><span class="n">articleId</span><span class="p">:</span><span class="w"> </span><span class="n">id</span><span class="p">,</span>
<span class="w"> </span><span class="n">showDetailPage</span><span class="p">:</span><span class="w"> </span><span class="bp">true</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="n">render</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">variables</span><span class="o">.</span><span class="n">showDetailPage</span><span class="w"> </span><span class="err">?</span>
<span class="w"> </span><span class="p">:</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="k">export</span><span class="w"> </span><span class="n">default</span><span class="w"> </span><span class="n">Relay</span><span class="o">.</span><span class="n">createContainer</span><span class="p">(</span><span class="n">App</span><span class="p">,</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">initialVariables</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">articleId</span><span class="p">:</span><span class="w"> </span><span class="n">window</span><span class="o">.</span><span class="n">location</span><span class="o">.</span><span class="n">pathname</span><span class="o">.</span><span class="n">substr</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span>
<span class="w"> </span><span class="n">showDetailPage</span><span class="p">:</span><span class="w"> </span><span class="n">window</span><span class="o">.</span><span class="n">location</span><span class="o">.</span><span class="n">pathname</span><span class="o">.</span><span class="n">substr</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="s1">''</span><span class="p">,</span>
<span class="w"> </span><span class="n">pageSize</span><span class="p">:</span><span class="w"> </span><span class="mi">20</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="n">fragments</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Viewer</span><span class="p">:</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Relay</span><span class="o">.</span><span class="n">QL</span><span class="err">`</span>
<span class="w"> </span><span class="n">fragment</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">Viewer</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">articles</span><span class="p">(</span><span class="n">first</span><span class="p">:</span><span class="w"> </span><span class="o">$</span><span class="n">pageSize</span><span class="p">)</span><span class="w"> </span><span class="err">@</span><span class="n">skip</span><span class="p">(</span><span class="k">if</span><span class="p">:</span><span class="w"> </span><span class="o">$</span><span class="n">showDetailPage</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">$</span><span class="p">{</span><span class="n">ArticleList</span><span class="o">.</span><span class="n">getFragment</span><span class="p">(</span><span class="s1">'articles'</span><span class="p">)}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">article</span><span class="p">(</span><span class="n">id</span><span class="p">:</span><span class="w"> </span><span class="o">$</span><span class="n">articleId</span><span class="p">)</span><span class="w"> </span><span class="err">@</span><span class="n">include</span><span class="p">(</span><span class="k">if</span><span class="p">:</span><span class="w"> </span><span class="o">$</span><span class="n">showDetailPage</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">$</span><span class="p">{</span><span class="n">ArticleDetail</span><span class="o">.</span><span class="n">getFragment</span><span class="p">(</span><span class="s1">'article'</span><span class="p">)}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="err">`</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">});</span>
</code></pre></div>
<p>The <em>@skip</em> and <em>@include</em> annotations are specific to Relay. As you probably guessed, depending on the annotation and
passed boolean value, they may or may not include a fragment in the GraphQL query. For our application this is
important, as we don’t have an articleId when we start on the list. The GraphQL server would return an error.</p>
<p>I glossed over a couple of things, most notably the routing code, but you can take a look at the whole project
<a href="https://github.com/prayerslayer/zalando-graphql-relay">here</a>.</p>
<h3>Wrapping up</h3>
<p>This was the first time I created a Relay application. I’m used to and usually work with the React+Redux combination, so
a few differences stood out for me.</p>
<p>First of all, I really like the declarative data fetching. Do you want to use some more fields of an object? Just add it
to the fragment and go ahead. Everything else is taken care of.</p>
<p>I also like that Relay is smart about fetching data. If I want to have twenty articles after previously having fetched
ten, it already knows to only query for the next ten. The same goes for revisiting detail pages, it just takes what it
has in the cache and doesn’t go to the network at all.</p>
<p>State management was not as easy to grasp. This is usually handled with Redux actions and stores. In this simple case,
it may as well have been a local state of the App component. However, since GraphQL queries have to be generated at
build time, they cannot access component state or properties, so we ended up inserting state management into Relay
(<em>showDetailPage</em> — I feel that Relay shouldn’t care about this). This is where
<a href="https://github.com/relay-tools/react-router-relay">react-router-relay</a> comes into play, as it helps when rendering
different RootComponents. It would likely would have helped with my next problem.</p>
<p>A loading indicator (“spinner”) is only shown at application start, because that’s when the RootComponent is rendered.
When switching between ArticleDetails, we don’t have this visual feedback. I initially thought that we could mitigate
this by checking <em>relay.pendingVariables</em>, but it was always null.</p>
<p>To wrap things up, when should you use Relay? As always, it depends. There are a couple of companies <a href="https://github.com/facebook/relay/blob/master/USERS.md">using it
already</a>, however except for Facebook and Twitter, they’re not
currently on my radar. Relay is a very new technology, which means you won’t find many resources to learn from compared
to what we’ve already figured out. Having efficient network access and caching out of the box is quite nice, but there
is an upfront cost (in terms of time) you pay for new technology. Colocated queries and components are also awesome, but
when you lack a Facebook-sized team working on the same API, it might not be worth it. I’d say that Relay is worth
investigating if you can tick a couple of these boxes:</p>
<ul>
<li>Small, independent service/application</li>
<li>Desire to dive into uncharted territory</li>
<li>Many of your UI components work on subsets of data from the same object, like our ArticleDetail and ArticlePreview</li>
<li>Expensive queries on the backend, so that the client making fewer requests pays off there too</li>
<li>Objects exposed by API have many fields, but your client only needs a couple of them (not Relay-specific)</li>
<li>Facebook uses Relay for their mobile page, so it might be worth considering for that specific format</li>
</ul>
<p>And as always, do whatever floats your boat. Send any questions you might have my way via Twitter at
<a href="https://twitter.com/prayerslayer">@prayerslayer</a>.</p>Using Microservices to Power Fashion Search and Discovery2017-02-02T00:00:00+01:002017-02-02T00:00:00+01:00Dmitry Kolesnikovtag:engineering.zalando.com,2017-02-02:/posts/2017/02/using-microservices-to-power-fashion-search-and-discovery.html<p>Focusing on our customer's search solution that targets a consumer facing application.</p><p>Microservices became a design style to define system architectures, purify core business concepts, evolve solutions in
parallel, make things look uniform, and implement stable and consistent interfaces across systems. At Zalando, we’ve
been putting together <a href="https://tech.zalando.com/blog">a series of articles</a> that explain how we are applying
microservice patterns to our applications: <a href="https://tech.zalando.com/blog/from-jimmy-to-microservices-rebuilding-zalandos-fashion-store/">Fashion
Store</a>,
<a href="https://tech.zalando.com/blog/building-our-own-open-source-http-routing-solution/">Skipper</a>, and
<a href="https://www.mosaic9.org">Mosaic</a>. In this article, I’ll be discussing our principles for microservice development of
stateful solutions such as fashion search and discovery, a promising field that we’re excited to further evolve.</p>
<p>We’ve focused here on the search solution that targets a consumer facing application. It addresses scenarios where
consumers already have either a vague or very specific idea of what they are looking for. The service objective is to
point them in the direction of the most specific and relevant fashion articles as quickly as possible.</p>
<h3>What does the consumer journey look like?</h3>
<p>Let’s start by emphasizing important consumer-oriented use cases addressed by the solution. We will later show how they
are implemented by our microservice architecture. Use cases help us to build functional boundaries as an initial step in
system design and decomposition.</p>
<p><strong>Discovery</strong> provides direct access to product catalogs, using the category tree as the primary entry point. Catalog
discovery is not an idempotent operation for all customers, as the results set are impacted by consumer profile,
preferences, search history, and by recommendation or advertisement systems. The discovery is implicit contextual search
that is triggered programmatically by mobile applications as a response to consumer interaction. For example, when a
customer opens a mobile application, it shows trending products: As soon as a brand/retailer is liked, the mobile
application shows popular products in this context with a higher ranking.</p>
<p><strong>Search</strong> matches documents from fashion catalogs using consumer defined keywords. In other words, the consumer
provides their intent which is then translated into pattern match requirements for the catalog. The pattern match
implies both an exact search term match and fuzzy query adaptation to maximize relevancy.</p>
<p><strong>Refinement</strong> is a process to narrow down a large set of search hits using filters and facets, also known as faceted
navigation. Both concepts are built to analyze and exclude any document that does not meet consumer intent. The
difference between these approaches is the mechanism used to analyse aspects of the content. Filters offer a static set
of dimensions to narrow down content; usually built offline using data mining techniques. Facets are built dynamically,
reflecting active search intent. Because faceted navigation reflects active context, it has to be rebuilt each time a
consumer interacts with search results. This difference is not explicitly exposed to consumers through the user
experience: System design implements distinct functions to handle these aspects. For example, Figure 1 shows the usage
of static and dynamic aspects during the refinement process. Catalog hierarchy is statically configured per application
while brand, color, price, and other facets are dynamically evaluated.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5152fcc5755e43a6432bb736197b5ec1b2eebbda_usingmicroservicestopowerfashionsearchimage1.png?auto=compress,format"></p>
<p><strong>Relevance sorting</strong> provides a collection of algorithms to rank search results depending on the weight of certain
content aspects: Inclusion of scores based on click behavior to identify products that have a higher attractiveness for
customers. Sorting algorithms consider static aspects of the content such as brand, category, and dynamic aspects -
availability, merchant rank, delivery time etc. Configurability is an important requirement for the solution, as
consumer facing applications impact relevance properties of algorithms online using configuration tools. Formally,
relevance is determined by consumer intent, while the overall relevance of content is based on various attributes.</p>
<p><strong>Intent analysis</strong> transforms unstructured consumer intent (e.g. keywords, visual search) into semi-structured queries
using feature extraction techniques. Consumer intent contains relations to well-known fashion entities such as brands,
categories, colors, materials; dependencies between typed words and synonyms, slang, or spelling errors. The structured
query allows us to improve relevance using attribute boosting.</p>
<p><strong>Personalization</strong> is required to efficiently respond to consumer intent. There is a need to engage and create a
relevant fashion experience for each individual consumer. It is all about knowing consumers, getting to know consumer
preferences and emerging trends from both the consumer and the retailer. The successful implementation of personalized
consumer engagement depends on the ability to access consumer behavior, deduct relevant knowledge using implicit or
explicit consumer intent, and accumulate consumer profiles to leverage individual product offerings at earlier phases of
consumer journey.</p>
<h3>Non-functional requirements</h3>
<p><strong>Interoperability:</strong> Consumer facing applications are key stakeholders for our search solution. The major concern is
compatibility and interoperability of interfaces as well as data contracts supplied by the search solution. Core
interfaces are built around a common data model that follows common metadata and provides the baseline for
interoperability between various microservices and its evolution.</p>
<p><strong>Evolution:</strong> As fashion metadata is distributed, our dynamic environment operates with multiple content sources:
Product metadata, availability information, partner content. The solution should ensure the continuity of data access to
consumer facing applications associated with old versions of data contracts. Simultaneously, it should not prevent the
evolution of these contracts to cover new metadata aspects. The evolution must ensure the development of search
architecture that allows for improvements of the indexing algorithm and technological adaptation.</p>
<p><strong>Consistency:</strong> The traditional secondary index approach is used to build a search solution. The smallest unit of
atomicity is a snippet and attributes - the snippet is an opaque data structure used by frontend applications to
visualize search results; attributes facilitate discovery and refinement processes. This ensures we follow domain driven
design. We cannot guarantee 100% consistency of product metadata within a system-of-records or search indexes due to
<a href="https://en.wikipedia.org/wiki/CAP_theorem">natural limitations of distributed systems</a>. However, the searchable data
consists of frequently changeable attributes such as price and availability. We have defined 30 minutes as the maximum
propagation delay of these attributes.</p>
<p><strong>Latency:</strong> Software development practices consider end-to-end latency as a user-oriented characteristic during the
entire lifecycle of an application. This user-oriented metric shall be defined independently of underlying solutions or
technologies and will be used for quality assessment of the delivered solution. The interactive traffic is the classical
data communication scheme employed by mobile search. This pattern gives a recommendation that interaction between human
and remote equipment should not exceed 1 second. It makes an implication on the latency of the search interface, thus
should not exceed 100 ms due to <a href="https://tech.zalando.com/blog/end-to-end-latency-challenges-for-microservices/">latency challenges in
microservices</a>. The system design
considers infrastructure, used protocols, and other microservices.</p>
<p><strong>Scalability:</strong> The scalability is the system’s ability to handle the required amount of work and its potential to be
enlarged to accommodate growing traffic and data. The technology should not be a limiting factor for business growth or
limit integration of new data sources. The horizontal scalability of the storage layer is a major consideration for
architecture decisions. Interface scalability is achieved by using state of the art principles of microservice design,
but the storage layer requires a sharable infrastructure across multiple product catalogs.</p>
<p><strong>Availability:</strong> Availability here concerns data availability and durability. The availability concerns
search-and-discovery interfaces, as it defines uptime requirements on system external interfaces: The ratio of the total
time a consumer facing application is capable of handling product catalog retrieval. Durability reflects the risks of
losing data during maintenance or downtimes. Durability is the ultimate requirement for catalogs built from event-based
data sources.</p>
<h3>End-to-end search architecture</h3>
<p>Let’s define the search microservice architecture in terms of logical layers. Each layer is built from multiple related
microservices with similar requirements. Layers helps us define operational boundaries, leveraging issues of
stateless/stateful service delivery.</p>
<p>The <strong>storage layer</strong> is a set of stateful components that hold mission critical data. It is built around off-the-shelf
open source technologies, the core part being the <a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf">TF-IDF</a> information
retrieval system. The usage of AWS value-added services helps us streamline operational processes and improve data
durability and service availability. The availability of the storage layer has a high impact on customer experience and
sales. Our recovery strategy targets high availability (capacity over provisioning) and automated recovery for faulty
nodes.</p>
<p>The <strong>intake layer</strong> is a traditional layer to <a href="https://en.wikipedia.org/wiki/Extract,_transform,_load">extract, transform, and load
processes</a> – we call it an event-driven content integration
pipe. It is used to extract content from a heterogeneous source using both synchronous and asynchronous communication
patterns. Usually, it reads content either from ground truth business sources hosted by the Zalando platform or content
extensions driven by a consumer facing application. The content transformation stage is a series of purely functional
operations that join data from multiple sources and prepare the smallest unit of atomicity required by search use cases.
The content is asynchronously loaded to microservices at the storage layer, enabling article discovery.</p>
<p>The <strong>search layer</strong> is based on consumer-oriented microservices that fulfills the search use case. We call it “search
business logic” that solely focuses on the consumer journey. The layer is a pure, stateless microservice adjusted for
execution of a “single” use case.</p>
<p>A <strong>consumer facing application</strong> is a farm of backend for frontend (BFF) components or reverse proxies that implement
an integration of consumer-oriented use cases with search infrastructure. It is used by us to decompose API evolution
life cycles of applications from service evolution.</p>
<p>Let’s take a closer look at our microservices and their end-to-end collaboration in our system design.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e7495c7629432cbca80af79b832675b9f5d5bf6a_blog---using-microservices-to-power-fashion-search-and-discovery.png?auto=compress,format"></p>
<h3>Interfaces</h3>
<p>The successful development of a large system using microservice patterns requires the definition of interfaces between
functional elements. We learn that traditional REST principles do not suit each microservice involved in fashion search.
We have made a choice to use different API technology according to our needs. Let’s elaborate on these below.</p>
<p><strong>REST</strong> - a standard of communication between microservices using REST principles. The communication pattern follows
typical client-server interaction over HTTP protocol, leveraging JSON as the primary payload.</p>
<p><strong>MQi</strong> - an asynchronous communication protocol that involves message-oriented technologies. This interface indicates a
publisher-subscriber communication paradigm between components. The implementation technique might vary from plain
socket to high-level client library. Microservices often use the REST protocol family to implement asynchronous
communication (for example, AWS SQS, AWS Kinesis, or <a href="https://github.com/zalando/nakadi">Nakadi</a>).</p>
<p><strong>KVi</strong> - an interface for storing, retrieving, and managing associative arrays (hashmap). The interface provides access
to a collection of binary objects a.k.a blobs. These blobs are stored and retrieved using a unique identifier (key).</p>
<p><strong>QLi</strong> - a vendor dependent interface used by the product search and discovery solution. The interface refers to the
REST API defined and implemented either by Elasticsearch or SolrCloud. It provides various primitives to retrieve
information using TF-IDF indexes. The ability to use different TF-IDF implementations is a key feature in our solution.
The storage layer might be seen as the system-of-records for search, thus we trade-off scalability and availability of
indexes versus linguistic and other add-on features. The “hot-swap” of technologies becomes a driver to fulfill customer
demand.</p>
<p><strong>CSi</strong> - a clickstream event publishing interface. It delivers a summary of consumer interaction with a
search-and-discovery solution. The architecture evaluates two possibilities to ship consumer events. State of the art
log management solutions are used for traditional long-term, off-line analytics. Alongside this, interactive protocols
immediately stream consumer behavior using communication sockets to the analytics platform.</p>
<h3>Microservices</h3>
<p>Our <strong>Content Crawler</strong> aggregates, joins, and interleaves content from various independent data sources. The service is
also responsible for normalizing input content according to principles of the Common Data Model. We have designed the
crawler as an asynchronous event streams gateway that implements the
<a href="http://zguide.zeromq.org/php:chapter5#Last-Value-Caching">last-value-caching</a> pattern to build a snapshot of all events
seen in streams. It complicates the design of a microservice but guarantees local data availability. The replay feature
facilitates data loss recovery; it makes the solution independent of data sources and guarantees complete data replay in
minutes.</p>
<p><strong>Content Analysis</strong> performs extraction of content aspects (features). It mines facts about fashion from ingress
content and builds searchable snippets. The snippet is the smallest unit of atomicity used by the solution with
normalized attributes and its values. Snippets are asynchronously posted to TF-IDF indexes using message queues. The
microservice facilitates refinement, semantical analysis, and relevant use cases in the system. It builds the fashion
profile and is enhanced by data sources using data mining techniques. Its actions defines a keyword processing pipeline:
tokenizer splits input documents to collection of tokens, lower-case textual documents; stemming normalizes text,
reduces the footprint, and performs stop word eviction; text analysis creates indexable documents, performs metadata
quality analysis, and augments textual data with content aspects (data dimensions such as category, price, brand, etc).</p>
<p><strong>Catalog</strong> is a collection of TF-IDF indexes. They are optimized for content retrieval and refinement using consumer
intent. The storage defines a relaxed schema for snippets allowing on-the-fly relevance adjustments depending on
business needs. Catalog scalability and availability is a key concern of technology. We aim for elasticity – scalable to
millions of requests per minute.</p>
<p><strong>Fashion metadata</strong> provides core facts about the fashion domain (e.g. hierarchical categories, filters, facets); it
facilitates refinement, semantical analysis, and relevant use cases in the system. The profile is built and enhanced
from data sources using data mining techniques.</p>
<p><strong>Structured Search</strong> implements a discovery use case by offering a platform-wide Structured Search API to browse the
product catalog. It involves snippet (snippet has an inherited structure) pattern matching to a structured pattern
supplied by a consumer facing application. The component does not exclude free-text search, but requires a snippet model
with a typed layout of searchable attributes. The consumer application is aware of attribute roles before issuing a
search request. In contrast to traditional databases, structured search concerns document relevance or scoring. The
service translates consumer intent into machine processable signals, building a query using QL interface syntax. The
subsystem performs boosting of relevant attributes to facilitate content sorting and personalization.</p>
<p><strong>Query Planner</strong> implements the search use case by offering a platform-wide Unstructured Search API. The microservice
is responsible for translating search intent into pattern match requirements for TF-IDF indexes. We have built query
planning detached from actual TF-IDF technology. It allows us to deal with more complex input types (e.g. voice,
images), apply the best algorithms connected to input processing (e.g. ambiguity, subjective interpretation, language
specific requirements), and define more complicated processing pipelines required for natural language and voice.</p>
<p><strong>Relevance sorting</strong> is a microservice responsible for the personalization of search content. You can think about it as
a gateway to Zalando platform analytics. We employ two types of relevance sorting techniques: Preprocessing and
postprocessing. Preprocessing is used to rewrite or enhance structured queries with boosting parameters and additional
attributes relevant for given customers. Postprocessing techniques applies re-sorting of search results returned by the
catalog. This technique helps us to minimize search query diversity to the TF-IDF engine, thus scale it using excessive
caching techniques.</p>
<p><strong>Product Gateway</strong> facilitates consumer facing applications with entire metadata about discovered products. The gateway
enriches this metadata with additional information using online services (e.g. partner stocks, related products, etc).</p>
<p><strong>Consumer Analytics</strong> maps consumer search behaviour to discrete shopping intent. It enables the delivery of relevant
search results, boosting conversion rates and ensuring a personalized experience. The atomic knowledge facts about
consumer behavior map digital footprints, constructing a consumer genome. The consumer genome is a collection of
attributes to depict consumer behavior using dimension and decay properties in a discrete manner. It empowers with
insights on various facets of the consumer.</p>
<h3>Common Data Model</h3>
<p>The Common Data Model defines generic vocabulary terms (often called object model or meta-model) of layout content into
generic structures. This object model guarantees an interoperability baseline between consumer facing applications and
platform microservices, in the absence of strong content negotiation techniques. The interpretation of data semantics
allows us to build solutions (e.g. discovery) that are isolated from the evolution of the individual data models.
Simultaneously, it is able to aggregate and interleave content from various independent sources. In other words,
interoperability, evolution, and isolation in order to achieve flexible multi-merchant use cases.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ea4ca43873e5209e429362c9dfb79e21c22edc32_blog---using-microservices-to-power-fashion-search-and-discovery-new-cut.png?auto=compress,format"></p>
<p>Domain-level data is often exposed as a collection of nested objects. Ultimately, this is a directed, labeled graph -
nodes are objects, edges correspond to properties; the endpoints of edges are either scalar values or other nodes
(either named embedded or inline anonymous objects). This traditional approach is missing serialization consistency and
operational simplicity. Most importantly, serialized data requires knowledge of domain-level schema to parse and
interpret data.</p>
<p>A simple solution is required to address compatibility and interoperability requirements driven by stakeholders (e.g.
consumer facing applications). The domain-level objects are fragmented into flat structures and packaged into adjacency
lists with the following principles:</p>
<ul>
<li>Each fragment has at least one type, they aggregate common domain properties, and are manageable using an
independent System of Records (SOR)</li>
<li>Each fragment has a reference (globally unique identifier)</li>
<li>Each fragment has properties associated with one or more values. All values are either scalar or references to other
objects. Properties with multiple values are bags (unordered lists).</li>
</ul>
<p>This transforms domain-level data as multi-class instances - split management, representation, and provisioning
concerns. This is a foundation to create machine interpretable data across the fashion domain.</p>
<p>The chosen model does not enforce any implementation strategy for consumer facing applications or data provisioning
pipelines. However, the product search and discovery solution is built around the Common Data Model which is usable
directly as JSON, but provides a small number of consistently applied principles to build interoperability into Web
services. It solves the integration of JSON objects from different sources (e.g. brand or retailer data sources feeds)
as the data may contain keys that conflict with other data sources.</p>
<h3>Summary</h3>
<p>The depicted Fashion Search solution has been materialized during the Zalando Platform Search rebuild project. At first
glance, you might see too many moving parts here but there is a purpose to the design. Our company applies <a href="https://tech.zalando.com/blog/radical-agility-study-notes/">Radical
Agility</a> as a software development methodology. “We’re
constantly pushing nonstop innovation, creativity, and hard work”. The microservice adoptions grant us with a “powerful
architectural style that lets us move fast while we keep our complexity low”. We have shown that Fashion Search covers a
broad range of technologies including data-mining, natural language processing, consumer analytics development of fault
tolerant services, etc. This article defines the system decomposition that helped us to isolate technological challenges
at the microservice level and apply efficient and innovative product delivery pipelines.</p>
<p>We have also discovered architectural challenges that contradict certain styles of microservice development such as REST
Interfaces and the Common Data Model. We learned that traditional HTTP-based REST principles do not suit each
microservice involved in Fashion Search. We have made a choice to use different API technology according to our needs
but still retaining our API First design principle so that “we’re able to scale as our department and business grows in
scope, evolving our systems in parallel.” Additionally, we also need to guarantee the interoperability baseline between
consumer facing applications and platform microservices in the absence of strong content negotiation techniques. Thus,
we have defined Common Data Model principles used across microservices within the Fashion Search solution. We defined a
small number of consistently applied principles to build interoperability in microservices.</p>
<p>This architecture helped us deliver ambitious search requirements of serving and retrieving data at a large scale during
our Black Friday event of 2016. If you have any further questions about microservice usage in search and how we use it
at Zalando, you can contact me <a href="mailto:dmitry.kolesnikov@zalando.fi">via email</a>. I’d love to hear from you.</p>Your Lifelike Hologram using Structure and HoloLens for Hack Week2017-01-27T00:00:00+01:002017-01-27T00:00:00+01:00Fotis Dimanidistag:engineering.zalando.com,2017-01-27:/posts/2017/01/your-lifelike-hologram-using-structure-and-hololens-for-hack-week.html<p>More Hack Week fun with VRify, an immersive conference call experience for remote colleagues.</p><p>Zalando recently held its annual <a href="https://tech.zalando.com/blog/hack-week-5-is-live/">Hack Week</a>, during which teams can
brainstorm and work on innovative ideas that bring about real business value. This year I went for something totally
geeky compared to last year, which had me working on a <a href="https://www.wired.de/collection/business/wie-zalando-mithilfe-der-hack-week-erfolgreich-bleiben-will">refugee-related
project</a>. The name
of it is VRify.</p>
<p>VRify’s purpose is to create an immersive conference call experience that allows you to communicate with your remote
colleagues more efficiently. It brings geographically dispersed team members together by utilizing cutting edge VR and
AR technology to create virtual meetings with realistic 3D avatars and real-time audio. For the scope of this article,
we will only focus on the 3D avatar, which we will refer to from now on as holograms.</p>
<p>So, how can somebody get started with 3D avatars?</p>
<p>First things first, we need to digitize people, and this is possible using 3D scanning. The scans produced will be the
starting point for our holograms. Using a special sensor called <a href="http://structure.io/">Structure</a> and an iPad, we went
about scanning people and creating 3D models out of them. Apart from the hardware, we also needed some software to make
it all work. After running some experiments with <a href="http://skanect.occipital.com/">Skanect</a>, we decided to go for
<a href="http://itseez3d.com/">itSeez3D</a> for it’s ease of use and quality.</p>
<p>The scanning process looked like this:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e8192eed8674ad294f6189ff0e76e23046565412_vrify_scanning.jpeg?auto=compress,format"></p>
<p>… and as you can see, the results were stunning:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/713e387ceb979d8ca4fe7b141f025698b136e75e_unify3dmodels.png?auto=compress,format"></p>
<p>Once we have our 3D models, it’s almost time to take them into <a href="https://unity3d.com/">Unity</a>, a game development
platform that allows you to create 2D and 3D visuals. There seemed to be an issue with that as the imported model had
some artifacts. In order to fix this, I had to use <a href="http://www.meshlab.net/">Meshlab</a>, an open source system for
processing and editing 3D triangular meshes, with tools for editing, cleaning, healing, inspecting, rendering, texturing
and converting.</p>
<p>In Meshlab, open the .obj file:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/576f44ccf4bc1a1d49b9ad7991f608486de4be8a_vrify_meshlab.png?auto=compress,format"></p>
<p>And then do: <em>Filters → Normals, Curvature and Orientation → Compute Face Normals</em>, finishing off by re-exporting the
.obj file. At this point, we are ready to import into Unity and have a perfect looking model.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/881e0658af1651b26babf3ee3492545d923888b3_vrify_unity.png?auto=compress,format"></p>
<p>Now that we’ve created our model in Unity, we can start building our virtual experience. To make things a bit more
impressive however, we will be achieving this with Augmented Reality and placing our 3D models into the real, physical
world as holograms using Microsoft HoloLens.</p>
<p>HoloLens is an AR or Mixed Reality set of goggles that allows you to pin and see virtual holograms or 2D UWP apps in
real space. Out of the many cool things it can do, such as viewing your favorite TV shows or movies on a virtual screen,
or playing Super Mario on the wall using an Xbox One S controller, we will use its power for the sake of productivity.</p>
<p>In order to develop applications for HoloLens, you first need to <a href="https://developer.microsoft.com/en-us/windows/holographic/install_the_tools">install the
tools</a> required and <a href="https://developer.microsoft.com/EN-US/WINDOWS/HOLOGRAPHIC/holograms_100">configure Unity and
Visual Studio</a> – there is no separate SDK for
HoloLens; holographic app development uses Visual Studio 2015 Update 3 with the Windows 10 SDK (version 1511 or later).
We managed to build our own HoloLens app called ARify and pin and project our previously scanned 3D model in the real
world as a hologram. The result? Almost scary…</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/918406d8ffb7a545e3d335d7e59e3cb7dd94b992_vrify_twins.jpeg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite%2F0c509904-5e87-44d3-ac75-4632853b8d79_vrify_walking.gif?auto=compress,format"></p>
<p>It turns out that we managed to bring the 3D scanning quality into HoloLens quite nicely. The resolution was adequate
and the overall feeling quite realistic. The next step would be to try to rig and animate the model, which we also
managed to do.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite%2Fe0f059c6-a457-4b06-b327-3895bd74c6d5_vrify_animation.gif?auto=compress,format"></p>
<p>We were successful in getting real-time audio working between two users of the app. We had begun looking into animating
lips and audio-lip syncing as well.</p>
<p>During the time of Hack Week, we were able to make the first steps towards building a real-time holographic conference
experience. The results surpassed our expectations, super ignited our excitement, and definitely demonstrated how nicely
3D scanning and AR technologies play together. You can see more photos of the Hack Week event, which includes some more
shots of our team’s efforts, over on the <a href="https://www.flickr.com/photos/zalandotech/albums/72157678022374686">Zalando Technology Flickr
page</a>.</p>The Role of UX in Hack Week2017-01-25T00:00:00+01:002017-01-25T00:00:00+01:00Carina Kuhrtag:engineering.zalando.com,2017-01-25:/posts/2017/01/the-role-of-ux-in-hack-week.html<p>When UX-savvy people are involved in Hack Week, design-related problems get solved.</p><p>Zalando’s annual <a href="https://tech.zalando.com/blog/hack-week-5-is-live/">Hack Week</a> is the perfect opportunity for
engineers, product specialists, UX designers, and user researchers to experiment and play around with ideas. Projects
range in scope and focus on improving the customer experience of Zalando, but also rethink the way we work together
across our tech department. Many Hack Week projects, especially when UX-savvy people are involved, not only solve
design-related problems but also apply user centered design methods in the development process.</p>
<p>For the project Alexandria, a team of six UX designers and researchers focused on a problem that many UX professionals
know all too well from working in agile contexts: The sharing of user insights. “We want user testing and its results to
be effective, collaborative, and accessible” says Thomas, UX Researcher here at Zalando and the initiator of the
project.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/97c7be97742a43a83f819e7ebecaa366c2997f50_31431720640_cde6b2b296_o.jpg?auto=compress,format"></p>
<p>Zalando has its own team of UX researchers and a userlab that hosts weekly usability tests. In the analysis sessions
that take place afterwards, product teams collaborate by using post it’s and canvases to transform observations into
insights. Afterwards, the challenge lies in bringing this to a format that can be shared with other teams and
stakeholders. The tool that the Alexandria project conceived would allow us to document all insights uncovered during
the user research process and serve as a library and archive for the whole company.</p>
<p>The team conducted a design sprint during Hack Week to come up with a user journey that explains in detail how
Alexandria should work and what kind of functionalities it needs. Jana, Interaction Designer at Zalando explains: “One
of the ideation techniques that we use is called ’Crazy Eights’. It’s a great way to generate a lot of different ideas
in a compressed time period. Each team member draws eight sketches in a very short amount of time: 40 seconds per
drawing. This technique really forces you to scrape the bottom of the barrel of your ideas.”</p>
<p>Hack Week isn’t the only time we’re able to work on challenges that come up through our daily work, but it serves as a
great outlet to try new ways of collaboration. Gloria, Interaction Designer at Zalando, who used Hack Week to work with
a team looking at Zalando gift vouchers, says: “For me, Hack Week is a way to closely work with new people and teach
them UX methods like prototyping. Now I co-design wireframes with my product specialist in Axure. This happens rarely in
my day-to-day work.”</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f3fc37066f9227d725008174352e6a564542353e_31016052793_34c508fbbb_o.jpg?auto=compress,format"></p>
<p>For the <a href="https://medium.com/@janhorwell/part-1-2-building-a-google-home-hack-23ea25debca1#.j1e2cw2p6">In-home Stylist
project</a>, which utilized
the Google Home device, UX Researcher Franziska and Product Specialist Janet tried out new UX methods they were always
curious about. Based on their developed personas and their <a href="https://strategyn.com/jobs-to-be-done/">jobs-to-be-done
research</a>, they wanted to come up with a solution that helped people find the
right style when they get dressed. For Janet, this project was a great way to learn more about UX for her own work:
“Products should be built with UX research throughout the development. Continuously learning and improving”</p>
<p>UX is a growing field and we’re excited to explore it further in 2017. We’d love to hear from you if you’re keen to
delve into the UX world via the realm of fashion and e-commerce – take a look at our <a href="https://tech.zalando.com/jobs/ux/">jobs
page</a> for more information.</p>About Akka Streams2017-01-19T00:00:00+01:002017-01-19T00:00:00+01:00Ivan Yurchenkotag:engineering.zalando.com,2017-01-19:/posts/2017/01/about-akka-streams.html<p>Pipeline processing on a high level with Akka Streams and our Helsinki Engineering Team.</p><p>In many computer programs, the whole logic (or a vast part of it) is essentially the step-by-step processing of data. Of
course, this includes the situation when we iterate over the data and just execute the processing logic on every piece
of it. However, there are a couple of complications here:</p>
<ul>
<li>The processing logic may be quite complex, with various aggregations, merging, routing, error recoveries, etc.</li>
<li>We might want to execute steps asynchronously, for instance, to take advantage of multiprocessor machines or use I/O</li>
<li>Asynchronous execution of data processing steps inherently involves buffering, queues, congestion, and other matter,
which are really difficult to handle properly (read <a href="http://ferd.ca/handling-overload.html">“Handling Overload” by Fred
Hébert</a>)</li>
</ul>
<p>Therefore, sometimes it is a good idea to express this logic on a high level using some kind of a framework, without a
need to implement (possibly complex) mechanics of asynchronous pipeline processing. This was one of the rationales
behind frameworks like <a href="https://en.wikipedia.org/wiki/Apache_Camel">Apache Camel</a> or <a href="https://en.wikipedia.org/wiki/Storm_(event_processor)">Apache
Storm</a>.</p>
<p>Actor systems like Erlang or <a href="http://akka.io/">Akka</a> are fairly good for building robust asynchronous data pipelines.
However, they are quite low-level by themselves, so writing such pipelines might be tiresome. Newer versions of Akka
include the possibility of doing pipeline processing on quite a high level, called Akka Streams. Akka Streams have grown
from the <a href="http://www.reactive-streams.org/">Reactive Streams initiative</a>. It implements a streaming interface on top of
the Akka actor system. In this post I would like to give a short introduction to this library.</p>
<p>We will need a Scala project with two dependencies:</p>
<div class="highlight"><pre><span></span><code><span class="s">"com.typesafe.akka"</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="s">"akka-actor"</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="s">"2.4.14"</span><span class="p">,</span>
<span class="s">"com.typesafe.akka"</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="s">"akka-stream"</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="s">"2.4.14"</span>
</code></pre></div>
<h3>Akka Streams basics</h3>
<p>In Akka Streams, we represent data processing in a form of data flow through an arbitrary complex graph of processing
stages. Stages have zero or more inputs and zero or more outputs. The basic building blocks are <em>Source</em>s (one output),
<em>Sink</em>s (one input) and <em>Flow</em>s (one input and one output). Using them, we can build arbitrary long linear pipelines. An
example of a stream with just a <em>Source</em>, one <em>Flow</em> and a <em>Sink</em>:</p>
<div class="highlight"><pre><span></span><code>val<span class="w"> </span>helloWorldStream:<span class="w"> </span>RunnableGraph[NotUsed]<span class="w"> </span>=
<span class="w"> </span>Source.single("Hello<span class="w"> </span>world")
<span class="w"> </span>.via(Flow[String].map(s<span class="w"> </span>=><span class="w"> </span>s.toUpperCase()))
<span class="w"> </span>.to(Sink.foreach(println))
</code></pre></div>
<p>I think the idea is quite obvious: a single string value goes from its <em>Source</em> through a mapping stage
<em>Flow[String].map</em> and ends up in a <em>Sink</em> that <em>println</em>s its input.</p>
<p>We can also use some syntactic sugar and write mapping in a more simple way:</p>
<div class="highlight"><pre><span></span><code>val<span class="w"> </span>helloWorldStream:<span class="w"> </span>RunnableGraph[NotUsed]<span class="w"> </span>=
<span class="w"> </span>Source.single("Hello<span class="w"> </span>world")
<span class="w"> </span>.map(s<span class="w"> </span>=><span class="w"> </span>s.toUpperCase())
<span class="w"> </span>.to(Sink.foreach(println))
</code></pre></div>
<p>However, if we execute this code, nothing will be printed. Here lies the border between streams description and streams
execution in Akka Streams. We have just created a <em>RunnableGraph</em>, which is kind of a blueprint, and any other
(arbitrary complex) streams are only blueprints as well.</p>
<p>To execute, materialize (in Akka Streams’ terms) them, we need a <em>Materializer</em> — a special tool that actually runs
streams, allocating all resources that are necessary and starting all the mechanics. It is theoretically possible to
have any kind of <em>Materializer</em>, but out of the box, the library includes only one, <em>ActorMaterializer</em>. It executes
stream stages on top of Akka actors.</p>
<div class="highlight"><pre><span></span><code><span class="nx">implicit</span><span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="nx">actorSystem</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ActorSystem</span><span class="p">(</span><span class="s">"akka-streams-example"</span><span class="p">)</span>
<span class="nx">implicit</span><span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="nx">materializer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ActorMaterializer</span><span class="p">()</span>
</code></pre></div>
<p>Now, having this <em>Materializer</em> implicitly accessible in the scope, we can materialize the stream:</p>
<div class="highlight"><pre><span></span><code>helloWorldStream.run()
</code></pre></div>
<p>It will print <strong>HELLO WORLD</strong> to the console.</p>
<p>We can do this as many times as we like, and the result will be the same — blueprints are immutable.
There are many more interesting stages out of the box:</p>
<ul>
<li>Various <em>Source</em>s (like <em>Source.fromIterator</em>, <em>Source.queue</em>, <em>Source.actorRef</em>, etc.)</li>
<li>Various <em>Sink</em>s (like <em>Sink.head</em>, <em>Sink.fold</em>, <em>Sink.actorRef</em>, etc.)</li>
<li>Various <em>Flow</em>s (like <em>Flow.filter</em>, <em>Flow.fold</em>. <em>Flow.throttle</em>, <em>Flow.mapAsync</em>, <em>Flow.delay</em>, <em>Flow.merge</em>,
<em>Flow.broadcast</em>, etc.), many of them are available via simple DSL (like <em>.map</em>, <em>.filter</em>, etc.)</li>
</ul>
<p>Check out the <a href="http://doc.akka.io/docs/akka/2.5.3/scala/stream/stages-overview.html">overview of built-in stages and their semantics
page</a> in the documentation.</p>
<p>The cool thing about Akka Streams building blocks is that they are reusable and composable. Here is an example of
compositions taken from the <a href="http://doc.akka.io/docs/akka/2.5.3/scala/stream/stream-composition.html">Modularity, Composition and Hierarchy
page</a> in the documentation:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/54e9486a657ec3f67015c6b89007598262d4018a_compose_composites1.png?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0f79690ad4ec31c4a347df734e4ebfbe376fc694_compose_nested_flow1.png?auto=compress,format"></p>
<h3>Materialized values and kill switches</h3>
<p>One of the concepts that deserve attention here is materialized values. Let us rewrite the last line of code just a
little:</p>
<div class="highlight"><pre><span></span><code>val<span class="w"> </span>materializedValue:<span class="w"> </span>NotUsed<span class="w"> </span>=<span class="w"> </span>helloWorldStream.run()
</code></pre></div>
<p>I just added the <em>val</em> with a type <em>NotUsed</em>, the same <em>NotUsed</em> we just have seen as a type parameter of the
<em>RunnableGraph</em> earlier. As we can see, the value of this type has been created and returned during the materialization
— any materialization. Run a stream five times — get five materialized values, completely independent.</p>
<p>Materialized values (and their type) originate in a <em>Source</em> and are propagated through all stages of a stream to a
<em>Sink</em>. We can modify this behaviour and create other materialized values.</p>
<p>But what is interesting in this <em>NotUsed</em>? Little, in fact, so let us make it more useful:</p>
<div class="highlight"><pre><span></span><code>val<span class="w"> </span>helloWorldStream:<span class="w"> </span>RunnableGraph[Future[Done]]<span class="w"> </span>=
<span class="w"> </span>Source.single("Hello<span class="w"> </span>world")
<span class="w"> </span>.map(s<span class="w"> </span>=><span class="w"> </span>s.toUpperCase())
<span class="w"> </span>.toMat(Sink.foreach(println))(Keep.right)
val<span class="w"> </span>doneF:<span class="w"> </span>Future[Done]<span class="w"> </span>=<span class="w"> </span>helloWorldStream.run()
doneF.onComplete<span class="w"> </span>{
<span class="w"> </span>case<span class="w"> </span>Success(Done)<span class="w"> </span>=>
<span class="w"> </span>println("Stream<span class="w"> </span>finished<span class="w"> </span>successfully.")
<span class="w"> </span>case<span class="w"> </span>Failure(e)<span class="w"> </span>=>
<span class="w"> </span>println(s"Stream<span class="w"> </span>failed<span class="w"> </span>with<span class="w"> </span>$e")
}
</code></pre></div>
<p>Here we replaced <em>to</em> with <em>toMat</em>. <em>toMat</em> allows a materialized value provided by a <em>Sink</em> to be used. In this case,
the materialized value of <em>Sink.foreach</em> is <em>Future[Done]</em>, a <em>Future</em> that completes with <em>Success[Done]</em> when a
stream finishes (its materialization, to be precise) successfully, and with <em>Failure</em> when it fails. <em>Done</em> is just a
signalling object that brings no information inside. We can think of materialized values as of some kind of external
handler to a materialized stream.</p>
<p><em>toMat</em> takes the additional parameter <em>combine</em>, a function that combines two materialized values: one from the
previous stage and one from the current stages. There are four predefined functions: <em>Keep.left</em> (used by default,
<a href="https://github.com/akka/akka/blob/b8e76586391fdf4c5013bde9198f032906f0b8c1/akka-stream/src/main/scala/akka/stream/scaladsl/Source.scala#L60">check the
implementation</a>),
<em>Keep.right</em>, <em>Keep.both</em> and <em>Keep.none</em>. It is, of course, possible to use a custom function with arbitrary combining
logic.</p>
<p>This is a good place to introduce another useful concept — kill switches. This is an object used externally to stop the
materialization of a stream. Let us bring one into the code and demonstrate materialized values a little more:</p>
<div class="highlight"><pre><span></span><code>val<span class="w"> </span>helloWorldStream:<span class="w"> </span>RunnableGraph[(UniqueKillSwitch,<span class="w"> </span>Future[Done])]<span class="w"> </span>=
<span class="w"> </span>Source.single("Hello<span class="w"> </span>world")
<span class="w"> </span>.map(s<span class="w"> </span>=><span class="w"> </span>s.toUpperCase())
<span class="w"> </span>.viaMat(KillSwitches.single)(Keep.right)
<span class="w"> </span>.toMat(Sink.foreach(println))(Keep.both)
val<span class="w"> </span>(killSwitch,<span class="w"> </span>doneF):<span class="w"> </span>(UniqueKillSwitch,<span class="w"> </span>Future[Done])<span class="w"> </span>=
<span class="w"> </span>helloWorldStream.run()
killSwitch.shutdown()
//<span class="w"> </span>or
killSwitch.abort(new<span class="w"> </span>Exception("Exception<span class="w"> </span>from<span class="w"> </span>KillSwitch"))
</code></pre></div>
<p>The new thing here is <em>viaMat</em>, a full version of <em>via</em> (in the same way as <em>toMat</em> is a full version of <em>to</em>), which
gives more control of materialized values. Another stage we added is <em>KillSwitches.single</em>, which just creates a kill
switch per materialization (not shared between materializations) as its materialized values. We use <em>Keep.right</em> to
preserve it and pass downstream, and <em>Keep.both</em> to preserve both <em>KillSwitch</em> and <em>Future[Done]</em>.</p>
<p><a href="http://doc.akka.io/docs/akka/2.5.3/scala/stream/stream-composition.html">The documentation</a> has a good illustration for
this:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f3eb9230fac85be716948e8867a578b6b6c9cccf_compose_mat1.png?auto=compress,format"></p>
<h3>Back-pressure</h3>
<p>In real-world systems, it is not uncommon that a producer of data is faster than a consumer at some point of the data
processing pipeline. In this case, there are several ways to deal with this. First, we can buffer incoming data on the
consumer side, but this leads to memory consumption problems (including out-of-memory errors) if the consumer is
consistently slower and the data is big enough. Second, we can drop messages on the consumer side, which, of course, is
not always acceptable.</p>
<p>There is a technique called back pressure where the idea is to basically provide a mechanism for consumers to signal to
producers how much data they can accept at the present moment. This might be done in the form of NACK, negative
acknowledgement (when the consumer denies receiving a piece of data and signals this to the producer), or in the form of
requests (when the consumer explicitly tells the producer how much data it is ready to accept). Akka Streams adheres to
the second option.</p>
<p>Users of Akka Streams rarely see back pressure mechanics. However, you can explicitly control it while implementing your
own stages. For instance, if a source is made of an actor, the actor will receive <em>Request(n: Long)</em> messages, which
means “I am ready to receive n more elements”.</p>
<p>Here is an illustration of this:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/bae6c4abc39d885aee7f6364668ec6226da9ccee_back_pressure_illustration.png?auto=compress,format"></p>
<p>The producer had previously accumulated the consumer’s demand of 2. It has just sent one message, so the demand
decreased from 2 to 1. Meanwhile, the consumer has sent a request for another 3 messages. The consumer’s demand
accumulated in the producer and will increase by 3 when the request arrives.</p>
<p>Akka Streams are back-pressured by default, but it is possible to alter this behaviour. For example, we can add a fixed
size buffer with different strategies:</p>
<div class="highlight"><pre><span></span><code>stream.buffer(100, OverflowStrategy.dropTail)
</code></pre></div>
<p>In this case, up to 100 elements will be collected, and on the arrival of 101, the youngest element will be dropped.
There are some more strategies: <em>dropHead</em> (like <em>dropTail</em> but drops the oldest element), <em>dropBuffer</em> (drop the whole
buffer), <em>dropNew</em> (drop the element that just came), <em>backpressure</em> (normal back pressure), <em>fail</em> (fails the stream).</p>
<h3>Practical example</h3>
<p>I have been using Akka Streams quite intensively over the last few months. One of the tasks was to consume events from
<a href="https://github.com/zalando/nakadi">Nakadi</a> (a RESTful API to a distributed Kafka-like event bus), store them in AWS
DynamoDB, send them to AWS SQS, and save a just-processed event offset in DynamoDB as well.</p>
<p>Events must be processed sequentially but only within one stage (e.g. an older event cannot be written to the database
after a newer one), due to two reasons:</p>
<ul>
<li>The system is idempotent, one event can be processed multiple times with no harm (not good, as it is a waste of
resources);</li>
<li>Newer events have a higher priority than older ones.</li>
</ul>
<p>Nakadi provides a RESTful API, i.e. can be used through HTTP. It responds with an infinite HTTP response with one event
batch in JSON format per line (<em>application/stream+json</em>).</p>
<p>Akka HTTP — another part of Akka — is tightly integrated with Akka Streams. It uses Akka Streams for sending and
processing HTTP requests.</p>
<p>Let us see the code (very simplified):</p>
<div class="highlight"><pre><span></span><code>val<span class="w"> </span>http<span class="w"> </span>=<span class="w"> </span>Http(actorSystem)
val<span class="w"> </span>nakadiConnectionFlow<span class="w"> </span>=<span class="w"> </span>http.outgoingConnectionHttps(
<span class="w"> </span>nakadiSource.uri.getHost,<span class="w"> </span>nakadiSource.uri.getPort)
val<span class="w"> </span>eventBatchSource:<span class="w"> </span>Source[EventBatch,<span class="w"> </span>NotUsed]<span class="w"> </span>=
<span class="w"> </span>//<span class="w"> </span>The<span class="w"> </span>stream<span class="w"> </span>start<span class="w"> </span>with<span class="w"> </span>a<span class="w"> </span>single<span class="w"> </span>request<span class="w"> </span>object<span class="w"> </span>...
<span class="w"> </span>Source.single(HttpRequest(HttpMethods.GET,<span class="w"> </span>uri,<span class="w"> </span>headers))
<span class="w"> </span>//<span class="w"> </span>...<span class="w"> </span>that<span class="w"> </span>goes<span class="w"> </span>through<span class="w"> </span>a<span class="w"> </span>connection<span class="w"> </span>(i.e.<span class="w"> </span>is<span class="w"> </span>sent<span class="w"> </span>to<span class="w"> </span>the<span class="w"> </span>server)
<span class="w"> </span>.via(nakadiConnectionFlow)
<span class="w"> </span>.flatMapConcat<span class="w"> </span>{
<span class="w"> </span>case<span class="w"> </span>response<span class="w"> </span>@<span class="w"> </span>HttpResponse(StatusCodes.OK,<span class="w"> </span>_,<span class="w"> </span>_,<span class="w"> </span>_)<span class="w"> </span>=>
<span class="w"> </span>response.entity.dataBytes
<span class="w"> </span>//<span class="w"> </span>Decompress<span class="w"> </span>deflate-compressed<span class="w"> </span>bytes.
<span class="w"> </span>.via(Deflate.decoderFlow)
<span class="w"> </span>//<span class="w"> </span>Coalesce<span class="w"> </span>chunks<span class="w"> </span>into<span class="w"> </span>a<span class="w"> </span>line.
<span class="w"> </span>.via(Framing.delimiter(ByteString("\n"),<span class="w"> </span>Int.MaxValue))
<span class="w"> </span>//<span class="w"> </span>Deserialize<span class="w"> </span>JSON.
<span class="w"> </span>.map(bs<span class="w"> </span>=><span class="w"> </span>Json.read[EventBatch](bs.utf8String))
<span class="w"> </span>//<span class="w"> </span>process<span class="w"> </span>erroneous<span class="w"> </span>responses
<span class="w"> </span>}
</code></pre></div>
<p>This <em>Source</em> presents an infinite (normally it should not finish) stream of events represented as the <em>EventBatch</em> case
class. Then we pass these event batches through several stages:</p>
<div class="highlight"><pre><span></span><code><span class="n">eventBatchSource</span>
<span class="w"> </span><span class="o">.</span><span class="n">via</span><span class="p">(</span><span class="n">metricsStart</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="n">via</span><span class="p">(</span><span class="n">dataWriteStage</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="n">via</span><span class="p">(</span><span class="n">signalStage</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="n">via</span><span class="p">(</span><span class="n">offsetWriteStage</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="n">via</span><span class="p">(</span><span class="n">metricsFinish</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="n">viaMat</span><span class="p">(</span><span class="n">KillSwitches</span><span class="o">.</span><span class="n">single</span><span class="p">)(</span><span class="n">Keep</span><span class="o">.</span><span class="n">right</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="n">toMat</span><span class="p">(</span><span class="n">Sink</span><span class="o">.</span><span class="n">ignore</span><span class="p">)(</span><span class="n">Keep</span><span class="o">.</span><span class="n">both</span><span class="p">)</span>
</code></pre></div>
<p>All of them are of type <em>Flow[EventBatch, EventBatch, NotUsed]</em>. Let us look at <em>dataWriteStep</em>, which might be worth
noting:</p>
<div class="highlight"><pre><span></span><code>val<span class="w"> </span>dataWriteStage:<span class="w"> </span>FlowType<span class="w"> </span>=<span class="w"> </span>Flow[EventBatch].map<span class="w"> </span>{<span class="w"> </span>batch<span class="w"> </span>=>
<span class="w"> </span>dynamoDBEventsWriter.write(batch)
<span class="w"> </span>batch
}.addAttributes(ActorAttributes.dispatcher("dynamo-db-dispatcher"))
<span class="w"> </span>.async
</code></pre></div>
<p>What is interesting here is that <em>dynamoDBEventsWriter</em> is only a tiny wrapper around Amazon’s DynamoDB driver for Java,
which blocks I/O. We do not want to block in our data processing pipeline (otherwise, there’s no HTTP I/O or anything
else while writing to DynamoDB). This is why this stage is made asynchronous (<em>.async</em>) and attached to a specific Akka
dispatcher, dedicated to blocking I/O operations with DynamoDB. The other stages are pretty much the same. You can find
more information about asynchronous stages in the documentation (
<a href="http://doc.akka.io/docs/akka/2.5.3/scala/stream/stream-flows-and-basics.html#operator-fusion">here</a> and
<a href="http://doc.akka.io/docs/akka/2.5.3/scala/stream/stream-rate.html#buffers-and-working-with-rate">here</a>), and in the
<a href="http://blog.akka.io/streams/2016/07/06/threading-and-concurrency-in-akka-streams-explained">Threading and Concurrency in Akka Streams Explained (Part
I)</a> blog post.</p>
<p>Basically, processing of events amounts to the materialization of this stream. Naturally, in the real production
application this is more complex due to configurability of the pipeline itself – the code also includes monitoring,
error recovery, etc.</p>
<p>The interesting moment here is that TCP/IP protocol itself is inherently back-pressured. Akka HTTP just makes a bridge
between the low-level TCP/IP back pressure mechanism (TCP windows and buffers level) and the high-level Akka Streams
back pressure. So, the whole stream that stretches over the network is back-pressured: if, say, <em>signalStage</em> is very
slow and cannot keep up, we will not have the memory overflowed with the data incoming by HTTP.</p>
<h3>GraphDSL</h3>
<p>So far, we have considered only simple linear streams. However, Akka Streams supports so-called GraphDSL, needed to
build graphs of an arbitrary complex structure. I will not go into this topic deeply, but just show an example of such a
graph:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0665b87aff55d9756881fc331ab9b82b14f83db7_compose_graph1.png?auto=compress,format"></p>
<p>This graph is created by the following code:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">GraphDSL.Implicits._</span>
<span class="n">RunnableGraph</span><span class="o">.</span><span class="n">fromGraph</span><span class="p">(</span><span class="n">GraphDSL</span><span class="o">.</span><span class="n">create</span><span class="p">()</span> <span class="p">{</span> <span class="n">implicit</span> <span class="n">builder</span> <span class="o">=></span>
<span class="n">val</span> <span class="n">A</span><span class="p">:</span> <span class="n">Outlet</span><span class="p">[</span><span class="n">Int</span><span class="p">]</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Source</span><span class="o">.</span><span class="n">single</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span><span class="o">.</span><span class="n">out</span>
<span class="n">val</span> <span class="n">B</span><span class="p">:</span> <span class="n">UniformFanOutShape</span><span class="p">[</span><span class="n">Int</span><span class="p">,</span> <span class="n">Int</span><span class="p">]</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Broadcast</span><span class="p">[</span><span class="n">Int</span><span class="p">](</span><span class="mi">2</span><span class="p">))</span>
<span class="n">val</span> <span class="n">C</span><span class="p">:</span> <span class="n">UniformFanInShape</span><span class="p">[</span><span class="n">Int</span><span class="p">,</span> <span class="n">Int</span><span class="p">]</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Merge</span><span class="p">[</span><span class="n">Int</span><span class="p">](</span><span class="mi">2</span><span class="p">))</span>
<span class="n">val</span> <span class="n">D</span><span class="p">:</span> <span class="n">FlowShape</span><span class="p">[</span><span class="n">Int</span><span class="p">,</span> <span class="n">Int</span><span class="p">]</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Flow</span><span class="p">[</span><span class="n">Int</span><span class="p">]</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">_</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">val</span> <span class="n">E</span><span class="p">:</span> <span class="n">UniformFanOutShape</span><span class="p">[</span><span class="n">Int</span><span class="p">,</span> <span class="n">Int</span><span class="p">]</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Balance</span><span class="p">[</span><span class="n">Int</span><span class="p">](</span><span class="mi">2</span><span class="p">))</span>
<span class="n">val</span> <span class="n">F</span><span class="p">:</span> <span class="n">UniformFanInShape</span><span class="p">[</span><span class="n">Int</span><span class="p">,</span> <span class="n">Int</span><span class="p">]</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Merge</span><span class="p">[</span><span class="n">Int</span><span class="p">](</span><span class="mi">2</span><span class="p">))</span>
<span class="n">val</span> <span class="n">G</span><span class="p">:</span> <span class="n">Inlet</span><span class="p">[</span><span class="n">Any</span><span class="p">]</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Sink</span><span class="o">.</span><span class="n">foreach</span><span class="p">(</span><span class="n">println</span><span class="p">))</span><span class="o">.</span><span class="ow">in</span>
<span class="n">C</span> <span class="o"><~</span> <span class="n">F</span>
<span class="n">A</span> <span class="o">~></span> <span class="n">B</span> <span class="o">~></span> <span class="n">C</span> <span class="o">~></span> <span class="n">F</span>
<span class="n">B</span> <span class="o">~></span> <span class="n">D</span> <span class="o">~></span> <span class="n">E</span> <span class="o">~></span> <span class="n">F</span>
<span class="n">E</span> <span class="o">~></span> <span class="n">G</span>
<span class="n">ClosedShape</span>
<span class="p">})</span>
</code></pre></div>
<p>For more of the details you can consult the documentation
<a href="http://doc.akka.io/docs/akka/2.5.3/scala/stream/stream-graphs.html">here</a> and
<a href="http://doc.akka.io/docs/akka/2.5.3/scala/stream/stream-composition.html#composing-complex-systems">here</a> (where the
illustration is taken from).</p>
<h3>Custom stages and integration with Akka actors</h3>
<p>Despite the abundance of out-of-the box processing stages in Akka Streams, it is not impossible or uncommon to write
your own. The topic is quite broad, so not fitting to go into in this article. You can check out <a href="http://blog.akka.io/integrations/2016/09/16/custom-flows-parsing-xml-part-1">this post on the Akka
blog</a> for a better idea. It gives pretty
good explanation of creating custom stages. Also make sure you take a look at the <a href="http://doc.akka.io/docs/akka/2.5.3/scala/stream/stream-customize.html">custom stream processing
page</a> in the documentation.</p>
<p>I wanted to show how easily Akka Streams and actors integrate. Consider the situation when we want to make an actor
produce values for a stream, i.e. to be a <em>Source</em>. The first way is to call <em>Source.actorRef</em> that materializes to an
actor, which sends downstream all messages sent to it. Another option is <em>Source.actorPublisher</em>, which receives <em>Props</em>
of an actor, then implements <em>ActorPublisher[T]</em> trait, like this simple counter:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span><span class="w"> </span><span class="n">LongCounter</span><span class="w"> </span><span class="k">extends</span><span class="w"> </span><span class="n">ActorPublisher</span><span class="p">[</span><span class="n">Long</span><span class="p">]</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">private</span><span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">counter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="n">L</span>
<span class="n">override</span><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">receive</span><span class="p">:</span><span class="w"> </span><span class="n">Receive</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">case</span><span class="w"> </span><span class="n">ActorPublisherMessage</span><span class="o">.</span><span class="n">Request</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">=></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">_</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">counter</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="n">onNext</span><span class="p">(</span><span class="n">counter</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">case</span><span class="w"> </span><span class="n">ActorPublisherMessage</span><span class="o">.</span><span class="n">Cancel</span><span class="w"> </span><span class="o">=></span>
<span class="w"> </span><span class="n">context</span><span class="o">.</span><span class="n">stop</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>It is symmetrical for <em>Sink</em>s: we need to create an actor, which implements the <em>ActorSubscriber</em> trait:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span><span class="w"> </span><span class="n">Printer</span><span class="w"> </span><span class="k">extends</span><span class="w"> </span><span class="n">ActorSubscriber</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">override</span><span class="w"> </span><span class="n">protected</span><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">requestStrategy</span><span class="p">:</span><span class="w"> </span><span class="n">RequestStrategy</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="n">WatermarkRequestStrategy</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="w"> </span><span class="n">override</span><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">receive</span><span class="p">:</span><span class="w"> </span><span class="n">Receive</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">case</span><span class="w"> </span><span class="n">ActorSubscriberMessage</span><span class="o">.</span><span class="n">OnNext</span><span class="p">(</span><span class="n">element</span><span class="p">)</span><span class="w"> </span><span class="o">=></span>
<span class="w"> </span><span class="n">println</span><span class="p">(</span><span class="n">element</span><span class="p">)</span>
<span class="w"> </span><span class="n">case</span><span class="w"> </span><span class="n">ActorSubscriberMessage</span><span class="o">.</span><span class="n">OnError</span><span class="p">(</span><span class="n">throwable</span><span class="p">)</span><span class="w"> </span><span class="o">=></span>
<span class="w"> </span><span class="n">println</span><span class="p">(</span><span class="n">s</span><span class="s2">"Failed with $throwable"</span><span class="p">)</span>
<span class="w"> </span><span class="n">context</span><span class="o">.</span><span class="n">stop</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
<span class="w"> </span><span class="n">case</span><span class="w"> </span><span class="n">ActorSubscriberMessage</span><span class="o">.</span><span class="n">OnComplete</span><span class="w"> </span><span class="o">=></span>
<span class="w"> </span><span class="n">println</span><span class="p">(</span><span class="s2">"Completed"</span><span class="p">)</span>
<span class="w"> </span><span class="n">context</span><span class="o">.</span><span class="n">stop</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>There are other possibilities, too – instead of creating a new actor, you can send messages to an existing one (with or
without acknowledgements).</p>
<h3>Summary and Links</h3>
<p>In this article I have tried to cover the very basics of Akka Streams. This (and the whole field of asynchronous data
processing pipelines) is a very big and interesting topic which you can delve into more at your own choosing.</p>
<p>Perhaps the biggest and most comprehensive guide is <a href="http://doc.akka.io/docs/akka/2.5.3/scala/stream/index.html">the official
documentation</a>, which I referred to throughout this post. Do
not ignore the <a href="http://blog.akka.io/">Akka blog</a>, which is mostly about Streams. There are also plenty of conference
videos online such as <a href="https://www.youtube.com/watch?v=x62K4ObBtw4">Akka Streams & Reactive Streams in Action by Konrad
Malawski</a>.</p>
<p>There are many other streaming libraries, and I must mention <a href="http://reactivex.io/">Reactive Extensions</a> here. It is
implemented for many platforms including JVM, .NET, Android, JavaScript, etc.</p>
<p>I am interested in the real-world applications of the library — and generally in asynchronous data processing pipelines.
If you use any of this, drop me a line via Twitter at <a href="https://twitter.com/ivan0yu">@ivan0yu</a>.</p>Rule Over Your Angular2 State Machine2017-01-18T00:00:00+01:002017-01-18T00:00:00+01:00Vadym Kukhtintag:engineering.zalando.com,2017-01-18:/posts/2017/01/rule-over-your-angular2-state-machine.html<p>Handle your Angular 2 application state better and make your frontend developer life that much easier.</p><p>It is really important to control your application state. <strong>Extremely important.</strong> The application state binds all of
your functionality together, allowing us to do the awesome things we love about programming.</p>
<p>Today I want to write about the <a href="https://github.com/vadym-kukhtin/angular2-state-machine">Angular2-state-machine</a>, which
helps you handle your <strong>Angular 2</strong> application state and, hopefully, makes your life as a developer much easier.</p>
<p>The state-machine is an Angular 2 port of Jake Gordon’s javascript-state-machine, with small changes made according to
the <strong>Angular 2 TypeScript specification</strong>. It was created for small state-machines like <strong>page statuses</strong> in CMS:
Draft -> In Process -> In Review -> Reviewed -> Published.</p>
<p>You can use it as a main state-machine for whole application state manipulation, however it is my opinion that Redux
handles large state-machines better. This small library was created for easy, smooth states, rather than bigger state
manipulation points (like Redux).</p>
<h3>Documentation</h3>
<p>First of all let’s install the library using NPM:</p>
<div class="highlight"><pre><span></span><code>npm i angular2-state-machine --save
</code></pre></div>
<p>As an example, let us attempt to create an app that will handle traffic lights.</p>
<p>Assume you already have a basic Angular2 app and you want to insert the state-machine:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span><span class="n">StateMachine</span><span class="p">,</span> <span class="n">StateEvent</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'angular2-state-machine/core'</span><span class="p">;</span>
<span class="n">let</span> <span class="n">fsm</span> <span class="o">=</span> <span class="n">new</span> <span class="n">StateMachine</span><span class="p">({</span>
<span class="n">initial</span><span class="p">:</span> <span class="s1">'green'</span><span class="p">,</span>
<span class="n">events</span><span class="p">:</span> <span class="p">[</span>
<span class="n">new</span> <span class="n">StateEvent</span><span class="p">({</span>
<span class="n">name</span><span class="p">:</span> <span class="s1">'toGreen'</span><span class="p">,</span> <span class="n">from</span><span class="p">:</span> <span class="p">[</span><span class="s1">'yellow'</span><span class="p">],</span> <span class="n">to</span><span class="p">:</span> <span class="s1">'green'</span>
<span class="p">}),</span>
<span class="n">new</span> <span class="n">StateEvent</span><span class="p">({</span>
<span class="n">name</span><span class="p">:</span> <span class="s1">'toRed'</span><span class="p">,</span> <span class="n">from</span><span class="p">:</span> <span class="p">[</span><span class="s1">'yellow'</span><span class="p">],</span> <span class="n">to</span><span class="p">:</span> <span class="s1">'red'</span>
<span class="p">}),</span>
<span class="n">new</span> <span class="n">StateEvent</span><span class="p">({</span>
<span class="n">name</span><span class="p">:</span> <span class="s1">'toYellow'</span><span class="p">,</span> <span class="n">from</span><span class="p">:</span> <span class="p">[</span><span class="s1">'red'</span><span class="p">,</span> <span class="s1">'green'</span><span class="p">],</span> <span class="n">to</span><span class="p">:</span> <span class="s1">'yellow'</span>
<span class="p">})</span>
<span class="p">]</span>
<span class="p">});</span>
</code></pre></div>
<p>You can see that we’ve used the terms <em>StateMachine</em> and <em>StateEvent</em> above. <em>StateMachine</em> is a service itself which
handles states, whereas <em>StateEvent</em> is a typed object which stores information about state and transitions.</p>
<p>We’ve created a state-machine that consists of three <strong>traffic light states</strong>: <strong>Green</strong>, <strong>Yellow</strong>, and <strong>Red</strong>, and
we’ve established transition names from one to the other: <em>‘toGreen’</em>, <em>‘toYellow’</em>, <em>‘toRed’</em>.</p>
<p>We must only use <strong>unique transition names</strong> or the service will throw an Error: “You have to use unique names for all
events”.</p>
<p>The traffic lights in our app will start from the color Green and after some time we must change the light to Yellow:</p>
<div class="highlight"><pre><span></span><code>fsm.fireAction('toYellow'); // Now we’ve changed the state-machine current state from green to yellow
</code></pre></div>
<p>To be sure, we can run <em>‘getCurrent()’</em>:</p>
<div class="highlight"><pre><span></span><code>fsm.getCurrent() // Yellow
</code></pre></div>
<p>For some use cases, such as huge state amounts or some tricky logic, you must be able to check whether you’re able to
move from this state to another state. Voila! You can use <em>‘can’</em> and <em>‘cannot’</em>:</p>
<div class="highlight"><pre><span></span><code>fsm.can('toYellow') // Error 'You cannot switch to this state' because it is already yellow
</code></pre></div>
<p>With <em>cannot</em>:</p>
<div class="highlight"><pre><span></span><code>fsm.cannot('toYellow') // true
</code></pre></div>
<p>You can also access all transition names which are available to you:</p>
<div class="highlight"><pre><span></span><code>fsm.getEvents() /* [StateEvent({
name: 'toGreen', from: ['yellow'], to: 'green'
}),
StateEvent({
name: 'toRed', from: ['yellow'], to: 'red'
}),
StateEvent({
name: 'toYellow', from: ['red', 'green'], to: 'yellow'
})
]*/
</code></pre></div>
<p>As well as get all available transitions names:</p>
<div class="highlight"><pre><span></span><code>fsm.getTransitions() // 'toYellow'
</code></pre></div>
<p>Or go back to the previous state:</p>
<div class="highlight"><pre><span></span><code>fsm.goToPreviousState() // 'green'
</code></pre></div>
<p>And that’s ALL! With simplicity and ease, you can handle your state to help your app to shine brightly.</p>
<p>I would really appreciate any comments or feedback on this information or the library itself, and hope it helps to
tackle your states. You can find me on Twitter at <a href="https://twitter.com/vadkuhtin">@vadkuhtin</a> for feedback, or head
directly to <a href="https://github.com/vadym-kukhtin/angular2-state-machine">GitHub here</a>. The NPM package can be <a href="https://www.npmjs.com/package/angular2-state-machine">found
here</a>.</p>In Search of the Perfect Fit – Insights from the UX Job Title Survey2017-01-12T00:00:00+01:002017-01-12T00:00:00+01:00Elena Pavlenkotag:engineering.zalando.com,2017-01-12:/posts/2017/01/in-search-of-the-perfect-fit--insights-from-the-ux-job-title-survey.html<p>The results are in! One in three UX professionals aren’t happy with their job title.</p><p>The results are in! -- One in three UX professionals aren’t happy with their job title. <em>“I just wish my field was
called something different”</em>, they told us. Many of them avoid mentioning their title outside of a work context:
<em>“Service Designer? - Is that like designing call centers?”</em>. But the confusion doesn’t stop there. Across the job
market, it is hard to get an overview of which skills are expected from whom. Why is this so and what can be done?</p>
<p>Here at Zalando, we wanted to find out: Back in August 2016, we launched our UX Job Title Survey to collect stories and
opinions from the community. Through forums in professional networks, social media and the <a href="http://www.uxswitch.com/">UXswitch
newsletter</a>, we reached about 185 UX professionals mostly from the EU (61%) and US (28%) --
thank you all for your participation!</p>
<p>Here’s what we learned.</p>
<p>Architect, Engineer, Evangelist… Do these sound similar to you? Put ‘UX’ in front of them and you get a completely
different job description. Throughout UX, the title variations are endless. In our survey, 178 people shared 69
different job titles (Fig. 1) that differ both in numbers and nature. The largest group by far is ‘UX Designer’ (24.7%).
However, most job titles occurred less than five times (42.7%).</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3b8fa53eb008a10e786ec1e57ae9305f1db0a8a3_fig1uxjobsurvey.png?auto=compress,format"></p>
<p>A third of our respondents perceives their current title as unfitting (32.6%) but some of them can’t even pinpoint a
better alternative (17.2%). If there are so many job titles to choose from, why wouldn’t there be a relevant second
option? In order to investigate these insecurities, we asked our participants to:</p>
<ul>
<li>Attribute typical skills from the field to certain job titles: <em>“Which knowledge would you expect from the six
following roles?”</em>, Fig. 5</li>
<li>Rate the clarity of a number of job titles: <em>“Can you predict the skill requirements for each of these job
titles?”</em>, Fig. 6</li>
</ul>
<h3>Mapping skills to job titles</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6d2a1bf4ad8b36f98e7ba6a14f3f7a32e9a8a743_fig2uxjobsurvey.png?auto=compress,format"></p>
<p>Judging from the hills and valleys in our second figure, the <strong>UX Researchers</strong> and <strong>Visual Designers</strong> stand out from
the list (Fig. 2). It seems to be pretty clear what is expected from them -- with a visible peak where many people see
their responsibilities in contrast to only a few votes for other skills. This could be explained by their names
literally containing their operational fields, namely “research” and “visual”, which are not easily misunderstood and
have a long tradition. Such distinct roles are profitable when in search of a fitting job or writing a job ad, because
everyone would be on the same page.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/618520c376077f57da2683caed9aa1903a1f18ec_fig3uxjobsurvey.png?auto=compress,format"></p>
<p>However, if you tried to guess the job titles from the other distributions, you would have a hard time. It gets trickier
when several areas of expertise have to be combined. In our results, the <strong>UI Designer</strong>’s skillset appears to be very
close to the Visual Designer (Fig. 3), but with more skills like wireframing and information architecture, and less
regarding screen, logo, and icon design. The <strong>Interaction Designer</strong> overlaps to a very high degree with the Product
Designer (Fig. 4), however includes other skills such as motion design.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7d0b87575b032ce5da12278d690c67b51610cf04_fig4uxjobsurvey.png?auto=compress,format"></p>
<p>The <strong>Product Designer</strong> and <strong>UX Designer</strong> scarcely show distinctive features. Instead, they demonstrate a rather
broad distribution over all skills, although more participants agreed on their choice when it came to UX Designer (Fig.
5). This role seems to be a good candidate for the so-called <a href="http://uxunicorn.com">UX Unicorn</a> many hiring managers are
dreaming about -- the all-arounder who knows everything <em>“from research, business, strategy through visual design, and
front-end development.”</em>. At this point, it should be considered that more than a quarter of the participants are
themselves UX Designers (Fig. 1). Does their wide choice of skills and high confidence level mean they actually have
magical powers? Probably not. It is more likely that their rating reflects the unrealistic expectations they encounter
in their daily lives.</p>
<p>15 out of the 49 UX Designers in our survey wish they had another job title. This isn’t due to the role having a bad
reputation: Amongst the non-UX Designers, 11 wouldn’t mind switching. What to one means vagueness and stress, is
flexibility and space for others.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0aef526e552435602b5f1730ade1b02f68c4e262_fig5uxjobsurvey.png?auto=compress,format"></p>
<p>When it comes to job ads, there might be some clearer tendencies about what is prefered. One-size-fits-all is mostly
disappointing -- but who doesn’t enjoy a tailored job description, right? Still, being a UX Designer paradoxically leads
to more security: On the one hand, forming the largest group amongst the job titles shows that there are many job ads
aimed at them. On the other hand, the stretchable term allows the same professionals to apply to a large variety of
other, more specialised descriptions just as well -- by simply referring to a subset of their skills. Maybe this gives
us a clue to what a solution might look like.</p>
<h3>A question of sustainability</h3>
<p>What can we conclude for the future of UX job titles? From the comment section of our survey, it is apparent that there
are some insecurities about their current state. <em>“Titles in this industry are really a joke”</em> -- Statements like this
weren’t rare at all. Where are these <em>“random”</em>, <em>“complicated”</em> and <em>“murky at best”</em> titles coming from?</p>
<p>It’s possible that some of the titles use their unconventionality first and foremost to <a href="https://thedesignteam.io/design-disruptor-b1c0c58d90b7#.3n7v5q9cg">attract
attention</a>. They are aimed at expressing the
company's creativity to stand out to the masses. Unfortunately, some businesses might also try to hop on the User
Experience and Service Design trend and use some of these terms to benefit from the general openness towards the field.
When UX becomes just another <a href="https://www.bigeyedeers.co.uk/is-ux-design-becoming-an-industry-buzzword/">buzzword in
marketing</a> rather than containing any
substance, job titles might suffer, creating further scepticism.</p>
<p>A participant also mentioned the aspect of inheriting the <em>“intrinsic level of confusion”</em> from other, older job titles,
such as the UX Designer from the Web Designer. When fields grow in parallel and fast, there is not enough time to tie
responsibilities to one title organically. Smaller businesses especially have to find compromises and ideally someone
who knows everything (aka UX Unicorn), while bigger organizations can afford to pursue specialists.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/86f5225c23083e1d824217a7b480afbe19e9b1ba_fig6uxjobsurvey.png?auto=compress,format"></p>
<p>But it isn’t just <a href="http://www.aaronweyenberg.com/uxgenerator/">the funny titles</a> that get criticized. Key components of
current UX job titles such as ‘UX’ (<em>“most of the time, you can skip the ‘UX’ part”</em>), ‘experience’ (<em>“doesn’t mean
anything and maybe never did”</em>), and ‘designing’ (<em>“you can only try to influence it”</em>) have all been challenged in our
comment section -- that’s how big the disagreement is. It won’t be possible to make everyone happy.</p>
<p>In our confidence rating, job titles adding “UX” to more traditional roles, such as User Experience Manager, UX
Interaction Designer, UX Project Manager and UX Strategist, were amongst the least clear, although the position of
Interaction Designer had ratings above average. When it comes to the traditional term ‘usability’, participant votes
show that it’s a <a href="https://uxdesign.cc/ux-trends-2017-46a63399e3d2#.xtxl3j9qx">relic of the past</a>, with Usability Expert
having a rating only slightly above average, and Usability Engineer coming in third from the bottom.</p>
<p>Another idea addressed in the UX community surveyed is the shift from User Experience to Customer Experience (CX),
Experience Design (XD), Product Design and Service Design. <em>“I think we'll regret the term "user" in the future, but
it's what people understand at the moment.”</em> Looking at the clusters in job titles, those containing the word
‘architect’ might be making <a href="http://www.uxmatters.com/mt/archives/2012/06/ux-design-defined.php">their comeback</a>. When
describing their job to people from outside the field, many professionals used it as an analogy to describe their role.
Nevertheless, while Information Architect comes in at second on the confidence rating, the Experience Architect occupies
last place. Nevertheless, new titles are also regarded with suspicion.</p>
<h3>Summing up</h3>
<p>Our results lead to the conclusion that the term ‘UX’ today might be mostly traditional and a sign of membership. It is
less helpful in narrowing down job roles when distinctions have to be made in the job-seeking or job-advertising
context. Both old and new titles are equally criticized and hard to generalize across <a href="http://www.uxbeginner.com/how-to-navigate-the-ocean-of-ux-job-titles/">different contexts, such as
company size</a>. While there is an apparent need
for a sustainable and operational solution, some people in UX actively nourish the playful and creative naming.</p>
<p>From our participant’s job titles list, we noticed that there are many who have multiple job titles at once: e.g. UX
Design Strategist and Concept Developer, UX Design Technologist, Digital Innovation and UX Design or Experience
Strategist / Information Architect. This can be seen as a first step in narrowing down the responsibilities attached to
certain positions by emphasizing the intersections of multiple titles. Nevertheless, a job title is only useful when its
components are meaningful themselves -- judging from job titles such as UX Guru, this is less obvious than it seems.</p>
<p>We saw from our confidence rating that adding “UX” doesn’t make a job description necessarily clearer. It might be more
helpful for recruiters to focus on developing profiles of so-called <a href="http://chiefexecutive.net/ideo-ceo-tim-brown-t-shaped-stars-the-backbone-of-ideoae%E2%84%A2s-collaborative-culture/">T-shaped
people</a>,
with broad general knowledge, and one specialization that should be evident from the job title.</p>
<p>Curious about the people behind our data, their quotes and more results? Take a look at <a href="https://docs.google.com/presentation/d/1Q3SyKU1BDlwttsG1EqSdmd7cF51FQk7J_YoV5V8QYOs/">our detailed
findings</a> and share your own ideas
and opinions with us at <a href="mailto:survey@zalando.de">survey@zalando.de</a>.</p>What is Hardcore Data Science – In Practice?2017-01-11T00:00:00+01:002017-01-11T00:00:00+01:00Dr. Mikio Brauntag:engineering.zalando.com,2017-01-11:/posts/2017/01/what-is-hardcore-data-science--in-practice.html<p>Originally a research topic, data science has proven to add real business value for Zalando.</p><p><em>This article originally appeared on</em> <em>oreilly.com.</em></p>
<p>Data science has become widely accepted across a broad range of industries in the past few years. Originally more of a
research topic, data science has early roots in scientists efforts to understand human intelligence and create
artificial intelligence; it has since proven that it can add real business value.</p>
<p>This is true for <a href="http://www.zalando.com/">Zalando</a>, Europe’s leading online fashion platform, where data science is
heavily used to provide data-driven recommendations, among other things. Recommendations are provided as a backend
service in many places, including product pages, catalogue pages, newsletters, and for retargeting.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/923f13007bfcb801f4ba6bd6e5d3fbba0daea21c_image_1-24655720fbf19c663573aa6bbd0b2a58.jpg?auto=compress,format"></p>
<h3>Computing recommendations</h3>
<p>Naturally, there are many ways to compute data-driven recommendations. For so-called collaborative filtering, user
actions like product views, actions on a wishlist, and purchases, are collected over the whole user base and then
crunched to determine which items have similar user patterns. The beauty of this approach lies in the fact that the
computer does not have to understand the items at all; the downside is that one has to have a lot of traffic to
accumulate enough information about the items. Another approach only looks at the attributes of the items, for example,
recommending other items from the same brand, or with similar colors. And of course, there are many ways to extend or
combine these approaches.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/52b80cf628f0cc838978251341c98154c9afff86_image_2-a5e8ed6443d5f82bbed5102324acfe7a.jpg?auto=compress,format"></p>
<p>Simpler methods consist of little more than counting to compute recommendations, but of course, there is practically no
limit to the complexity of such methods. For example, for personalized recommendations, we have been working with
<a href="https://en.wikipedia.org/wiki/Learning_to_rank">learning to rank</a> methods that learn individual rankings over item
sets. The above figure shows the cost function to optimize here, mostly to illustrate the level of complexity data
science sometimes brings with it. The function itself uses a pairwise weighted ranking metric, with regularization
terms. While being very mathematically precise, it is also very abstract. This approach can be used not only for
recommendations in an fashion setting, but for all kinds of ranking problems, provided one has reasonable features.</p>
<h3>Bringing mathematical approaches into industry</h3>
<p>So, what does it take to bring a quite formal and mathematical approach, like what we’ve described above, into
production? And what does the interface between data science and engineering look like? What kind of organizational and
team structures are best suited for this approach? These are all very relevant and reasonable questions, because they
decide whether the investment in a data scientist or a whole team of data scientists will ultimately pay off.</p>
<p>In the remainder of this article, I will discuss a few of these aspects, based on my personal experience of having
worked as a machine learning researcher as well as having led teams of data scientists and engineers at Zalando.</p>
<h3>Understanding data science versus production</h3>
<p>Let’s start by having a look at data science and back-end production systems, and see what it takes to integrate these
two systems.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/551f904b3d8e4cc56819d2e9fa2888e3066e6dd1_image_3-5800f7c7856545a3a7a6cf0727e2d044.jpg?auto=compress,format"></p>
<p>The typical data science workflow looks like this: the first step is always identifying the problem and then gathering
some data, which might come from a database or production logs. Depending on the data-readiness of your organization,
this might already prove very difficult because you might have to first figure out who can give you access to the data,
and then figure out who can give you the green light to actually get the data. Once the data is available, it’s
preprocessed to extract features, which are hopefully informative for the task to be solved. These features are fed to
the learning algorithm, and the resulting model is evaluated on test data to get an estimate of how well it will work on
future data.</p>
<p>This pipeline is usually done in a one-off fashion, often with the data scientist manually going through the individual
steps, using a programming language like Python, that comes with many libraries for data analysis and visualization.
Depending on the size of the data, one may also use systems like Spark or Hadoop, but often the data scientist will
start with a subset of the data first.</p>
<h3>Why start small?</h3>
<p>The main reason for starting small is that this is a process that is not done just once, but will in fact be iterated
many times. Data science projects are intrinsically exploratory, and to some amount, open ended. The goal might be
clear, but what data is available, or whether the available data is fit for the task at hand, is often unclear from the
beginning. After all, choosing machine learning as an approach already means that one cannot simply write a program to
solve the problem. Instead, one resorts to a data-driven approach.</p>
<p>This means that this pipeline is iterated and improved many times, trying out different features, different forms of
preprocessing, different learning methods, or maybe even going back to the source and trying to add more data sources.</p>
<p>The whole process is inherently iterative, and often highly explorative. Once the performance looks good, one is ready
to try the method on real data. This brings us to production systems.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ca922e3c5a40f0f4cb8f6a0892519849608473b6_image_4-ac3556edbac05e990582ca3f2b92e973.jpg?auto=compress,format"></p>
<h3>Distinguishing a production system from data science</h3>
<p>Probably the main difference between production systems and data science systems is that production systems are
real-time systems that are continuously running. Data must be processed and models must be updated. The incoming events
are also usually used for computing of key performance indicators like click-through rates. The models are often
retrained on available data every few hours and then loaded into the production system that serve the data via a REST
interface, for example.</p>
<p>These systems are often written in programming languages like Java for performance and stability reasons.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2ed778169b702ca83c2505ceb65424d748351109_image_5-0d8e25c02668e476dd491d457f605d89.jpg?auto=compress,format"></p>
<p>If we put these two systems side-by-side, we get a picture like the Figure above. On the top right, there is the data
science side, characterized by using languages like Python, or systems like Spark, but often with one-shot,
manually-triggered computations, and iterations to optimize the system. The outcome of that is a model, which is
essentially a bunch of numbers that describe the learned model. This model is then loaded by the production system. The
production system is a more classical enterprise system, written in a language like Java, which is continually running.</p>
<p>The picture is a bit simplifying, of course. In reality, models have to be retrained, so that some version of the
processing pipeline must also be put into place on the production side to update the model every now and then.</p>
<p>Note that the A/B testing, which happens in the live system, mirrors the evaluation in the data science side. These are
often not exactly comparable because it is hard to simulate the effect of a recommendation, for example, offline,
without actually showing it to customers, but there should be a link in performance increase.</p>
<p>Finally, it’s important to note that this whole system is not “done” once it is set up. Just as one first needs to
iterate and refine the data analysis pipeline on the data science side, the whole live system also needs to be iterated
as data distributions change, and new possibilities for data analysis open up. To me, this "outer iteration" is the
biggest challenge to get right—and also the most important one, because it will determine whether you can continually
improve the system and secure your initial investment in data science.</p>
<h3>Data scientists and developers: modes of collaboration</h3>
<p>So far, we have focused on how systems typically look in production. There are variations in how far you want to go to
make the production system really robust and efficient. Sometimes, it may suffice to directly deploy a model in Python,
but the separation between the exploratory part and production part is usually there.</p>
<p>One of the big challenges you will face is how to organize the collaboration between data scientists and developers.
“Data scientist” is still a somewhat new role, but the work they have to do differs enough from those of typical
developers that you should expect some misunderstandings and difficulties in communication.</p>
<p>The work of data scientists is usually highly exploratory. Data science projects often start with a vague goal and some
ideas of what kind of data is available and methods that could be used, but very often, you have to try out ideas and
get insights into your data. Data scientists write a lot of code, but much of this code is there to test out ideas and
is expected to not be part of the end solution.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2c424869b4d4a301650e5bbe8a0265596cb47708_image_6-09fc7b089f2c0d1572f5f65bd204a9f7.jpg?auto=compress,format"></p>
<p>Developers, on the other hand, naturally have a much higher focus on coding. It is their goal to write a system, to
build a program that has the required functionality. Developers sometimes also work in an exploratory fashion, building
prototypes, proof of concepts, or performing benchmarks, but the main goal of their work is to write code.</p>
<p>These differences are also very apparent in the way the code evolves over time. Developers usually try to stick to a
very clearly defined process that involves creating branches for independent work streams, then having those reviewed
and merged back into the main branch. People can work in parallel, but need to incorporate approved merges into the main
branch back into their branch, and so on. It is a whole process around making sure that the main branch evolves in an
orderly fashion.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8844a860165d0ab5f9cfb28b0b35e467f00dd22d_image_7-37454ac57045ef55b82f07caf46db888.jpg?auto=compress,format"></p>
<p>While data scientists also write a lot of code, as I mentioned, it often serves to explore and try out ideas. So, you
might come up with a version 1, which didn’t quite do what you expected, then you have a version 2 that leads to
versions 2.1 and 2.2 before you stop working on this approach, and go to versions 3 and 3.1. At this point you realize
that if you take some ideas from 2.1 and 3.1 you can actually get a better solution, leading to versions 3.3 and 3.4,
which is the optimal solution.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/772d7f13e178929e86cecb250cf80201c1c303aa_image_8-13adf99fda3e90256c06866fe34a74b0.jpg?auto=compress,format"></p>
<p>The interesting thing is that you would actually want to keep all those dead ends because you might need them at some
later point. You might also put some of the things that worked well back into a growing toolbox, something like your own
private machine learning library, over time. While developers are interested in removing “dead code“ (also because they
know that you can always retrieve that later on, and they know how to do that quickly), data scientists often like to
keep code, just in case.</p>
<p>Both of these differences mean, in practice, that the collaboration between developers and data scientists is often
challenging. Standard software engineering practices don’t really work out for data scientist’s exploratory work mode
because the goals are different. Introducing code reviews and an orderly branch, review, and merge back workflow would
just not work for data scientists and slow them down. Likewise, applying this exploratory mode to production systems
also won’t work.</p>
<p>So, how can we structure the collaboration to be most productive for both sides? A first reaction might be to keep the
teams separate—for example, by completely separating the codebases and having data scientists work independently,
producing a specification document as outcome that then needs to be implemented by the developers. This approach works,
but it is also very slow and error prone because reimplementing may introduce errors, especially if the developers are
not familiar with data analysis algorithms, and performing the outer iterations to improve the overall system depends on
developers having enough capacity to implement the data scientists specifications.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/104e243dec0a9b578008e9be8fbba09300cc5375_image_9-353987af53451388407242d26044c417.jpg?auto=compress,format"></p>
<p>Luckily, many data scientists are actually interested in becoming better software engineers, and the other way round, so
we have started to experiment with modes of collaboration that are a bit more direct and help to speed up the process.</p>
<p>For example, data science and developer code bases could still be separate, but there is a part of the production system
that has a clearly identified interface into which the data scientists can hook their methods. The code that
communicates with the production system obviously needs to follow stricter software development practices, but would
still be in the responsibility of the data scientists. That way, they can quickly iterate internally, but also with the
production system.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4e545bf4fe221c0c2b38d691660b080dc4a01e4e_image_10-0ab0ebbaa54c9acd93504974ee8c1ef0.jpg?auto=compress,format"></p>
<p>One concrete realization of that architecture pattern is to take a microservice approach and have the ability in the
production system to query a microservice owned by the data scientists for recommendations. That way, the whole pipeline
used in the data scientist’s offline analysis can be repurposed to also perform A/B tests or even go in production
without developers having to reimplement everything. This also puts more emphasis on the software engineering skills of
the data scientists, but we are increasingly seeing more people with that skill set. In fact, we have lately changed the
title of data scientists at Zalando to “research engineer (data science)” to reflect the fact.</p>
<p>With an approach like this, data scientists can move fast, iterate on offline data, iterate in a production setting, and
the whole team can migrate stable data analysis solutions into the production system over time.</p>
<h3>Constantly adapt and improve</h3>
<p>So, I’ve outlined the typical anatomy of an architecture to bring data science into production. The key concept to
understand is that such a system needs to constantly adapt and improve (as almost all data-driven projects working with
live data). Being able to iterate quickly, trying out new methods, and testing the results on live data in A/B-tests is
most important.</p>
<p>In my experience, this cannot be achieved by keeping data scientists and developers separate. At the same time, it’s
important to acknowledge that their working modes are different because they follow different goals—data scientists are
more exploratory and developers are more focused on building software and systems.</p>
<p>By allowing both sides to work in a fashion that best suits these goals and defining a clear interface between them, it
is possible to integrate the two sides so that new methods can be quickly tried out. This requires more software
engineering skills from data scientists, or at least support by engineers who are able to bridge between both worlds.</p>App Migration to Swift 32017-01-06T00:00:00+01:002017-01-06T00:00:00+01:00iOS Guild Helsinkitag:engineering.zalando.com,2017-01-06:/posts/2017/01/app-migration-to-swift-3.html<p>We’re happy to share our experiences with teams needing to migrate their own apps.</p><p>We are a team of 10 iOS developers located in Zalando’s <a href="https://tech.zalando.com/locations/#helsinki">Helsinki hub</a>,
working on a Zalando iOS application called fleek: a fashion e-commerce app that connects mobile-savvy consumers with
brands, retailers, and influencers. To ensure our app was ready for the latest release of Xcode 8/iOS 10, we started to
plan our code migration to the new version of <a href="https://developer.apple.com/swift/">Swift</a>. We wanted to share our
experience and tips for other teams needing to migrate their code.</p>
<h3>Plans and preparation</h3>
<p>Our plan was to make sure all of our dependencies were ready for Swift 3 and upgrade one of our key dependencies, the
internal Zalando Payment SDK, to Swift 3. After all dependencies were ready, we had to think of a way to migrate every
single file without having merge conflicts and without repeating the same work. This could prove difficult as we are a
big team.</p>
<p>The aim was to get this done as quickly as possible to avoid delays in feature delivery, so everyone on the team was
needed. Since our project has more than 450 files, the problem we faced was how to divide up the work. After a quick
brainstorming session, we asked our DevOps and command line guru to label all the source code files based on the last
commit a developer had made. He came up with this:</p>
<div class="highlight"><pre><span></span><code><span class="nx">find</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="o">-</span><span class="k">type</span><span class="w"> </span><span class="nx">f</span><span class="w"> </span><span class="o">-</span><span class="nx">name</span><span class="w"> </span><span class="s">"*swift"</span><span class="w"> </span><span class="o">-</span><span class="nx">exec</span><span class="w"> </span><span class="nx">git</span><span class="w"> </span><span class="nx">log</span><span class="w"> </span><span class="o">-</span><span class="nx">n</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">--</span><span class="nx">pretty</span><span class="p">=</span><span class="nx">format</span><span class="p">:</span><span class="s">"%an ,"</span><span class="w"> </span><span class="p">{}</span><span class="w"> </span><span class="err">\</span><span class="p">;</span><span class="w"> </span><span class="o">-</span><span class="nx">exec</span><span class="w"> </span><span class="nx">echo</span><span class="w"> </span><span class="p">{}</span><span class="w"> </span><span class="err">\</span><span class="p">;</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nx">tee</span><span class="w"> </span><span class="o">~/</span><span class="nx">Desktop</span><span class="o">/</span><span class="nx">result</span><span class="p">.</span><span class="nx">csv</span>
</code></pre></div>
<p>The results of this file served as a great starting point for us to take on the challenge.</p>
<h3>Getting to work</h3>
<p>We started by creating a swift-3 branch for the migration, with one person running the Swift 3 migration tool. Once this
was completed, we updated our podfile with all Swift 3 dependencies and pushed to <a href="https://git-scm.com/">Git</a>.</p>
<p>Using the document we previously created, we shared it and marked which files were under conversion or ready to go. When
someone had completed all of their allocated files, they would talk to the rest of the team and decide where to help
out. As more of the project compiled successfully, some of the already converted files had to be revisited; team members
would simply mark the files being worked on again in the file list.</p>
<p>When all files were converted and the app would compile again, we used the same document to mark who was fixing crashes
and in which files. Soon we were using more of our time for testing and less for fixing crashes or addressing missing
functionality.</p>
<p>Here are a couple of the issues we came across:</p>
<ul>
<li><em>AnyObject</em> to <em>Any</em>. Many APIs changed from AnyObject to Any, which need to be revised by hand along with checking
the documents.</li>
<li>Closures were being typecasted, for example: <em>{ (text: String) -> [String: AnyObject] in return [text:
randomObject] }</em> Was converted to: <em>{ (text: String) -> [String: AnyObject] in return [text: randomObject] }
as! [String: Any]</em> This generated app crashes.</li>
<li>Code inside closures was almost never converted</li>
<li><em>private</em> was changed to <em>fileprivate</em> incorrectly</li>
<li>Tons of foundation types didn’t migrate at all, for example: <em>NSUrl</em> and <em>NSMutableURLRequest</em></li>
<li>Enum cases were migrated only in one file</li>
<li>Working with String characters had to be reviewed. The API had changed a lot</li>
</ul>
<p>Before you start manual migration to Swift 3, we recommend the following steps:</p>
<ul>
<li>Do search-and-replace fixes. For example: UIControlState() to .normal, _ map: map to map: map</li>
<li>Activate the “continue build after errors” Xcode setting</li>
</ul>
<p>While migrating, we recommend that you try to avoid fixing bugs and refactoring, as you can come back to this task
later. You’ll also need to remember to migrate your CI system.</p>
<h3>Ready to go</h3>
<p>After almost 2 weeks of non-stop code conversion, our code finally started to compile and run. By the time we had
finished, we had experienced a huge learning curve with Swift 3 and the new Apple APIs.</p>
<p>We’re happy to share our experiences with teams needing to migrate their own apps. Hopefully the above information will
prove useful if your team is facing the same task.</p>Sapphire Deep Learning Upskilling2017-01-04T00:00:00+01:002017-01-04T00:00:00+01:00Dr. Ana Peleteiro Ramallotag:engineering.zalando.com,2017-01-04:/posts/2017/01/sapphire-deep-learning-upskilling.html<p>Read about how our Dublin Tech Hub balances delivery with data science upskilling.</p><p>At Zalando’s <a href="https://tech.zalando.com/locations/#dublin">Fashion Insights Centre in Dublin</a>, we work in autonomous data
science delivery teams. This means that each team has the responsibility to deliver technology from research work and
MVPs, through to production code and operational systems. This gives us a great opportunity to make decisions about how
we organise our work so that we can balance the investment of effort in developing new data science solutions and
necessary experimentation work, as well as production-ready code and maintenance of a live system.</p>
<p>For the last year, our team has been working on developing products that help derive insights from and make sense of
unstructured web content, specifically focusing on the fashion domain. We primarily work with HTML data and text which
necessitates the use of Natural Language Processing (NLP) and Machine Learning (ML). One of the challenges we face is
keeping up with the state of the art and balancing delivery with data science upskilling.</p>
<h3>Deep Learning For Natural Language Processing (NLP)</h3>
<p>As a data science delivery team with core expertise in NLP, an area that we had been tracking was the application of
deep learning in NLP. Deep learning is having a transformative impact in many areas where machine learning has been
applied. The most early mature adoption has happened in areas where unstructured content can be more accurately
classified or labelled for tasks such as speech recognition and image classification. One of the reasons for the success
in these areas has been the ability of deep nets to learn an optimum features space and reduce time spent on the dark
art of feature engineering.</p>
<p>In previous years, NLP was somewhat behind other fields in terms of adopting deep learning for applications. Text does
not have a spatial feature space suitable for convolutional nets, nor is it entirely unstructured as it is already
encoded via commonly understood vocabulary, syntax and grammar rules and other conventions. However, this has changed
over the last few years, thanks to the use of RNNs, specifically LSTMs, as well as word embeddings. There are distinct
areas in which deep learning can be beneficial for NLP tasks, such as in named entity recognition, machine translation
and language modelling, just to name a few.</p>
<h3>Upskilling Considerations in Data Science Delivery</h3>
<p>In early August, one of our colleagues attended the <a href="http://acl2016.org/">Association for Computational Linguistics (ACL)
conference</a> in Berlin to better understand the benefits and readiness for adoption of deep learning
in our team. As data scientists who have all spent time in academia, it is great to work in a company that not only
allows you, but encourages and sponsors you to attend to conferences to keep up to date with the latest research
approaches and technologies. When our colleague returned, he confirmed what we already knew: deep learning has become
the state of the art for many NLP tasks, including one we are particularly interested in, Named Entity Recognition
(NER).</p>
<p>At Zalando, we strive for excellence, and as researchers, we want to keep up with the state of the art. If our results
could be better by using deep learning, we should test it, and if successful, adopt it. However, there was one initial
obstacle: even if we had previous knowledge, none of us were experts in deep learning.</p>
<p>Contrary to academia, in a delivery team we have to take into account some considerations aside from the state of the
art, such as:</p>
<ul>
<li><strong>Tooling:</strong> Mature libraries with an active developer community, significant adoption, ideally backing from
companies or open source</li>
<li><strong>Performance:</strong> Scaled and tried in production</li>
<li><strong>Ease-of-development:</strong> Cost/time-effective to bring from problem-setting to evaluation to deployment</li>
<li><strong>Validation of application:</strong> Has this approach demonstrated sufficient improvements over longer-established
baseline systems? These benefits might include accuracy, ease of maintenance, or time to delivery</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4465118368f8c6d8a165a1a9591bbcba0823a52c_upskilling1.png?auto=compress,format"></p>
<h3>The Upskilling Plan</h3>
<p>One of the challenges facing autonomous teams is making decisions to determine the what, how and when of invested time
in upskilling. Upskilling is unanimously agreed to have a positive long-term effect for products and for an individual’s
growth. However, spending time that could alternatively be spent on data science and engineering can often be a
difficult pill to swallow, even guilt-inducing. We have a <a href="https://tech.zalando.com/working-at-z/tour-of-mastery/">Tour of
Mastery</a> upheld as a core principle at Zalando, and a Practice
Lead who through regular interaction and evaluation helps our team prioritize competencies to improve and source
courses, training and other materials.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ff45d61930a79a9f4e22074b39113206c50ee481_upskillingimage2.png?auto=compress,format"></p>
<p>It looked like our roadmap for the quarter would contain some development of NLP models, improving the data quality of
the initial systems we had developed previously. We wanted to dedicate time to better plan for this and communicate
effectively with the product and delivery sides on what we felt was the best approach, as well as our capacity to do so.
None of us had extensive experience in deep learning to date so we decided to sharpen the axe.</p>
<p><strong>Get Time</strong>
We ensured that guaranteed study time would be dedicated accordingly during our weekly planning. We chose three hours on
Tuesday afternoons, approximately in the middle of our weekly cadence, for a period of six weeks. This was key, as
otherwise the closest task takes your attention, and less urgent tasks get long-fingered. As well as communicating with
each other, we also made sure that product stakeholders and leads were all aware of and understood the benefits of this
initiative, and how it contributes to our capability to deliver state of the art data science.</p>
<p>We used this group as a way of collecting common thoughts about papers we were reading, compare practical exercises and
engaging in knowledge sharing on mutually interesting. Even beyond the initial six weeks, we’ve still been using this
slot to catch up on papers or articles we have read during the week.</p>
<p><strong>Compile Resources</strong>
We compiled a large list of courses, tutorials and of course (lots of!) papers. Taking the time to compile these
resources means we have now built a repository of knowledge that will help anyone in the company wanting to upskill in
deep learning for NLP.</p>
<p><strong>Choose a Course</strong>
We aimed to choose a course that balanced theoretical and practical understanding, eventually choosing <a href="https://www.udacity.com/course/deep-learning--ud730">Deep Learning by
Google</a>, in Udacity. We felt it balanced coverage and depth,
theoretical and hands-on. It also included lessons on word embeddings and sequential learning (RNNs, LSTM). The videos
are quite quick (if there is one complaint, it is that they are a bit too shallow), but they give you the basic
intuition required to start doing your own research.</p>
<p>The best thing about the course without any doubt was the practical exercises using
<a href="https://www.tensorflow.org/">TensorFlow</a>, which is a deep learning, open source library by Google. It is gaining
popularity since it provides better support for distributed systems than
<a href="http://deeplearning.net/software/theano/">Theano</a>. Moreover, the documentation is very good, and once you get used to
the structuring and how it works, it is easy to architect your own neural nets and design and execute your own
experiments.</p>
<p><strong>Narrow and deeper</strong>
As we mentioned before, we are a team working intensively with NLP, so we focused quite a lot on NLP resources as well.
Therefore, in parallel, we also started following the <a href="https://www.youtube.com/playlist?list=PLCJlDcMjVoEdtem5GaohTC1o9HTTFtK7_">NLP
Stanford</a> classes, which are very complete and
useful, and quite enjoyable since they provide an in depth mathematical and applications background for deep learning in
NLP. We totally recommend it! Also very interesting, you can read <a href="http://lxmls.it.pt/2014/socher-lxmls.pdf">this
tutorial</a> by Socher or check the <a href="http://cs224d.stanford.edu/syllabus.html">Stanford NLP
materials</a>.</p>
<p><strong>Read papers, papers, and more papers!</strong>
What is the current research consensus? What are our reference papers? What is the best approach for our problem? During
this entire process we read a long list of interesting papers on <a href="http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf">Deep Learning in
general</a>, and also focused on
<a href="http://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf">NLP</a> (more
<a href="http://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf">here</a>, also on <a href="http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf">recurrent neural network based
language models</a>, or
<a href="http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf">distributed representations of words, phrases and their
compositionality</a>).
On the NLP side, we focused on word embeddings and RNNs (some interesting resources in <a href="https://github.com/kjw0612/awesome-rnn#lectures">this public knowledge
repo</a>). Reading classic and state of the art papers is a key part of
our jobs, since data scientists need to be up to date on the latest practices in our domain. We ultimately converged on
several papers that we would use as reference papers and developed a consensus amongst ourselves on tools and state of
the art approaches.</p>
<p><strong>Get hands on</strong>
This step is very important: get dirty and play with code. There is plenty of public domain code, models and data with
which you can get started. Try to solve different problems with different architectures and you’ll see that it helps
give you a better understanding and intuition of how to develop your own deep nets, and also about your approach to
tuning those networks.</p>
<h3>What We Learned</h3>
<p>We found our approach to be far more preferable than ad-hoc reading or randomly playing with a library. Forming the
theoretical knowledge, the intuitive understanding and ability to apply what we’ve learned are all key to solving new
data science problems on the front line of delivery. It separates the effective data scientists from the rest.</p>
<p>In summary, our learnings from our upskilling planning would be to plan in advance for the quarter, carefully select the
resources you will use to upskill, commit regular time, and be enthusiastic about learning. This has been a success for
our team, and has left us ready to apply all of our theoretical and practical knowledge for the quarter to come. If
you’d like some more information about how this approach can benefit your own data science team, reach out to us via
Twitter at <a href="https://twitter.com/PeleteiroAna">@PeleteiroAna</a> or <a href="https://twitter.com/adambermingham">@adambermingham</a>.</p>Our Android App wins Editor’s Choice in the Google Play Store2016-12-29T00:00:00+01:002016-12-29T00:00:00+01:00Rushil Davetag:engineering.zalando.com,2016-12-29:/posts/2016/12/our-android-app-wins-editors-choice-in-the-google-play-store.html<p>We’re incredibly proud to share this tremendous feat on behalf of the fashion e-commerce industry.</p><p>Our mobile team has recently learned of some exciting news: Our <a href="https://play.google.com/store/apps/details?id=de.zalando.mobile">Zalando Fashion Store App for
Android</a> has been awarded the prestigious Editor’s
Choice badge in the Google Play Store – a feat that we’re incredibly proud to share on behalf of the fashion e-commerce
industry. The Zalando Fashion Store app is the only fashion e-commerce app at the moment in the Editor’s Choice listing,
which includes the Top 150 apps across the globe from various digital industries. The Editor’s Choice badge and listing
appears on the Google Play Store for all countries the app is available in.</p>
<p>The award is carefully curated by the Google Play Store team, selecting apps that follow Android best practices. Not
only that – the app that receives this badge must have best-in-class customer experience as well as very high Android
app quality ratings. The ratings are derived from various app metrics that indicate user satisfaction, engagement, as
well as technical quality.</p>
<p>The Zalando Fashion Store app, continuously and from the beginning of its launch, performed very well in all the above
mentioned metrics. For example, we maintained the app ratings we received from our users to a minimum 4.3 out of 5.0 at
any given time since app launch, which puts it above the industry benchmark and says a lot about user experience and app
quality. These aspects made our Android app eligible for the Editor’s Choice feedback process, where we started adapting
to Android best practices to receive the badge. How did we do it? Let’s look at some of the highlights:</p>
<h3>Have material design</h3>
<p>Google launched their material design language and related concepts in 2014 to make greater use of grid-based layouts,
animations and transitions, as well as depth effects. Material design made it possible for apps to not only enhance the
design of, but also improve the user experience. We adapted material design aspects in various forms such as depth
effects, touch feedback, floating action buttons (FAB), and meaningful transitions. We experimented by A/B testing
certain aspects to see if they provided a better experience for our users. Navigation elements combined with meaningful
transitions and animations helped us guide our users to the most important areas in our app. We also used bigger and
better imagery to deliver a superior fashion experience in the app.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/619ec9419d2b62b308107590c10c1f54eb41cd9b_screenshotandroid1.png?auto=compress,format"></p>
<h3>Implement latest Android features</h3>
<p>To enhance the seamless and cross-platform shopping experience for our users, we implemented the latest Android features
such as Smart Lock, Android Wear, and enhanced the experience on tablets. Smart Lock allows our users to share login
credentials on their Android smartphone, tablet, and on the web, on top of being highly secure. These features gave us
an added advantage in reducing friction between platforms and enhancing the customer experience across all of our
platforms.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/799f506df56e133b77b3b3ed092ad4bd0a19f81a_screenshotandroid2.png?auto=compress,format"></p>
<h3>Make it trendy</h3>
<p>We are becoming Europe’s biggest online fashion platform and it’s very important that we convey the latest trends in
attractive ways to our customers. We used the idea of striking imagery from material design guidelines and took it even
further to make our images highly relevant, informative, and delightful. We redesigned our home screen experience with
these images to deliver the latest fashion trends in different formats such as images, videos, lookbooks, and carousels.
Aside from the home screen, we made sure that our users would see fashion articles in a way that suggests the look, the
fit, and any relevant outfit that could go with the article itself.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b04890f0001626cca90e3b7f7900b7f1e7b0925a_screenshotandroid3.png?auto=compress,format"></p>
<h3>Get the UX right</h3>
<p>Getting the UX right was another important aspect, especially as we have diverse user groups to serve. We made sure that
we had the most relevant navigational elements for discovery, shopping, and convenience put on the home screen. It was
also crucial to manage deep navigational hierarchies, for e.g. with article recommendations, when rethinking the app
navigation concept. Access to the most important actions and making them easy to find for indecisive users were key
elements we covered. Feedback messages for user actions helped us achieve an improved user experience and better
conversion. During the whole process we paid attention to various details of the app and to guiding users through a
smooth and joyful fashion experience.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e2f4f88c5183f72b2d613f3fdae5dffa10efde7f_screenshotandroid4.png?auto=compress,format"></p>
<p>There were, of course, many other small changes we made to make sure we delivered a superior app experience by following
Android best practices. The app reviews carried out by the Google Play Store team provided us with great feedback for
improvements. It was also important and incredibly helpful to maintain a healthy partner relationship with Google for
continuous follow-up and feedback.</p>
<p>I would like to congratulate and thank the Android app, design, and product teams who made it all possible and achieved
this great feat. I would also like to thank the Google Play Store team and our Partner Manager from Google, Maxim Mai,
for their continuous support and feedback during the whole process.</p>
<p>At Zalando, we always put ourselves in our customer’s shoes to improve our products, incorporating as well as innovating
our best practices to produce the ultimate customer experience. It’s encouraging to receive such a prestigious award.
This is just the beginning for our Android offering!</p>Top 5 Career Tips of 2016: UX and Beyond2016-12-28T00:00:00+01:002016-12-28T00:00:00+01:00Jay Kaufmanntag:engineering.zalando.com,2016-12-28:/posts/2016/12/top-5-career-tips-of-2016-ux-and-beyond.html<p>We’ve been drawing from our hiring experiences this past year to deliver useful career advice.</p><p>In 2016, Zalando UX started sharing tips and tricks for designers and researchers looking for work. We’ve been drawing
from our hiring experiences this past year to deliver practical, useful career advice -- and published this on
<a href="http://www.uxswitch.com/">UXswitch</a>, LinkedIn and the Zalando Tech Blog. Much of the advice -- though grounded in UX --
is broadly applicable beyond just researchers and designers.</p>
<p>Here are five hand-picked favorites plus five alternatives -- for a total of ten tidbits.</p>
<h3>1. Most popular advice: UCD your CV</h3>
<p>Apply user experience methods and principles when you create your résumé: Design for the user, structure information,
apply design principles and test. <a href="https://www.linkedin.com/pulse/ux-your-cv-jay-kaufmann">Read more...</a></p>
<p><strong>Runner-up for most popular:</strong> With almost as many clicks and likes, our request for work samples from team lead
candidates -- <a href="https://www.linkedin.com/pulse/visualize-leadership-ux-managers-portfolio-jay-kaufmann">Visualize leadership: The UX manager’s
portfolio</a> -- made a strong
showing.</p>
<h3>2. Most reassuring advice: Don’t take rejection personally</h3>
<p>At Zalando, we hire roughly 1% of UX applicants. And if the numbers aren’t much consolation, understand that it’s not
about you -- but rather about finding the right fit. <a href="https://www.linkedin.com/pulse/rejection-its-you-jay-kaufmann">Read
more...</a></p>
<p><em>Least</em> <strong>reassuring advice:</strong> A hiring manager or recruiter might spend just 3 minutes reviewing your application. We
implied this in the <a href="https://www.linkedin.com/pulse/ux-your-cv-jay-kaufmann">UX of your CV</a> article, and can confirm --
as a sort of hard-love year-end holiday present -- that we generally spend up to 7 minutes on the first screening.</p>
<h3>3. Simplest advice: Make your portfolio link visible</h3>
<p>Give that call to action some information scent. Make sure we don’t need to go on a treasure hunt for your portfolio
link. <a href="https://www.linkedin.com/pulse/dear-designers-make-your-portfolio-link-pop-jay-kaufmann">Read more...</a></p>
<p><strong>Runner-up for simplicity:</strong> “Make your CV scannable by using meaningful line breaks” from our <a href="https://tech.zalando.com/blog/student-cvs-for-ux-careers-tips--tricks/">career tips for HCI
students</a>.</p>
<h3>4. Most contrary advice: Revive the cover letter</h3>
<p>Buck the trend. Ignore recruiter advice. Include a cover letter -- and make a personal connection by sharing your
motivation for the opportunity at hand. <a href="https://www.linkedin.com/pulse/reanimating-heart-cover-letter-jay-kaufmann">Read
more…</a></p>
<p><strong>Runner-up for contrary advice:</strong> Researchers don’t generally submit a portfolio. That’s why you should. Stand out with
a <a href="http://www.uxswitch.com/portfolio-advice-for-a-ux-researcher/">UX research portfolio</a>.</p>
<h3>5. Most meaningful advice: Bring your values to your job search</h3>
<p>Identifying your personal starting point -- your own core values -- will guide you in engaging with potential employers.
<a href="https://www.linkedin.com/pulse/start-your-job-search-within-5-simple-steps-orient-direct-kaufmann">Read more…</a></p>
<p><strong>Runner-up for meaningful advice:</strong> Finding the right balance between “I” and “we” when you talk about your work is
quite personal and tied into our own egos and emotional constitution. We advise you to <a href="https://www.linkedin.com/pulse/interviews-own-your-work-i-language-jay-kaufmann">own your work with “I”
language</a> without forgetting to credit
teammates liberally, as well.</p>
<p>We look forward to continuing the conversation in 2017. We’re looking for a lot of great <a href="https://tech.zalando.com/jobs/69975-ux-interaction-designer-senior-or-principal/">Interaction
Designers</a> in the new year, primarily
for building best-in-class employee-facing software interfaces and brand-facing B2B solutions. But also for stellar
shapers of the B2C experience for millions of customers in our flagship e-commerce portal. Reach out to me via Twitter
at <a href="https://twitter.com/jaykaufmann">@jaykaufmann</a> or via <a href="https://www.linkedin.com/in/jaykaufmann">LinkedIn</a> for more
information.</p>Zalando and the Docker Global Mentor Week2016-12-27T00:00:00+01:002016-12-27T00:00:00+01:00Jan Stroppeltag:engineering.zalando.com,2016-12-27:/posts/2016/12/zalando-and-the-docker-global-mentor-week.html<p>Docker had its first Global Mentor Week this year and Zalando Dortmund joined the fun.</p><p>Docker declared the week of 14th to 20th of November 2016 to be the first Global Mentor Week, with the incredibly
ambitious goal of providing those interested with self-paced tutorials, wherever they come from. To set up trainings
worldwide, Docker used its strong community and engaged the organizers of all Docker Meetup Groups -- over 250 worldwide
-- to help arrange mentoring sessions in their part of the world.</p>
<p>As Zalando Dortmund organizes the <a href="http://www.meetup.com/de-DE/Docker-Dortmund/">Docker Meetup Group Dortmund</a>, we
discussed with other groups in the Ruhr Area, <a href="https://www.meetup.com/de-DE/Docker-Bochum/">Bochum</a> and
<a href="https://www.meetup.com/de-DE/docker-dus/">Düsseldorf</a>, to coordinate one event with mentors from all three groups,
which was scheduled for November 17th in Düsseldorf. With this approach, we prevented overlapping events and were able
to have <a href="https://www.meetup.com/de-DE/docker-dus/events/234915297/">one big session</a> for the initiative.</p>
<p>Docker provided us with their typical and very welcomed Swag Packs (T-Shirts, stickers etc.) and their training
material, as well as sharing valuable tips on planning the evening and preparing the participants.</p>
<p>The training material consisted of <a href="https://training.docker.com/category/self-paced-online">five self paced courses</a>,
and give a good overview on the whole Docker environment. Every course ends with an online quiz and a certificate when
successfully completing it. The courses are described below in detail:</p>
<ol>
<li>Beginner Linux Container: Set up your system, run your first ‘Hello World’ container, run your first web application
in a container, create your first Dockerfile, and build your first image</li>
<li>Developer Intermediate Linux/Windows: This course consists of networking between containers, managing/persist data,
security scanning, organizing repositories in the Docker Hub, and automated builds and tests using Docker Hub and
Docker Cloud</li>
<li>Operations Beginner: Set up an application consisting of different services, a web UI and databases using Docker
Compose and scale it out in one instance. Set up a local registry for this app. Use Docker Swarm to set up a cluster
with five nodes containing this application</li>
<li>Operations Intermediate: A deeper dive into swarm clusters, exploring networks, rolling service updates, centralized
logging, metric collection, and stateful services</li>
<li>Beginner Windows Containers: Setting up your environment, running basic containers, and creating a Docker Compose
multi-container application using Windows containers</li>
</ol>
<p>The mentoring session was attended by approximately 35 participants, with a range from total newbies to experienced
Docker users, and everybody found a course suited to their skills. It started with a short introduction on the idea
behind this event and how to prepare the laptops, and after a break for dinner, everyone started hacking. As a special
treat, our friends from Bochum brought along a cluster of three Raspberry-Pi-3 computers to build a swarm using Docker
on ARM.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/955de04680b6dc63908d3c35983ce73ecee4d7f6_mentorweek.jpg?auto=compress,format"></p>
<p>The courses were almost self explanatory, so the main tasks of the mentors were to answer questions beyond the exercises
and give tips on different problems the participants had in the past using Docker. A few topics that came up:</p>
<ul>
<li>Can you access data from a Docker volume on the host? You can, and to do so, create a container that mounts this
volume then copy the needed data from this container to the host, using ‘docker cp’</li>
<li>Can you restrict access to an exposed port due to a protocol or for incoming/outgoing connections? I wasn’t sure
about a built-in solution from Docker, but a colleague suggested to use iptables for this</li>
<li>Is there an ideal way to deploy a Dockerized application in a cloud-based environment? Most probably not, but I
described briefly the way we deploy our applications at Zalando on AWS using our open source
<a href="https://stups.io/">STUPS</a> tools</li>
</ul>
<p>After three hours, we engaged in a Q&A session to hear everyone’s experiences with the courses, winding down the evening
with a beer or two.</p>
<p>Our mentoring session and the whole Docker Mentor Week was a great success. We received a lot of positive feedback and
not only did the participants learn so much, but I also benefitted greatly from preparing for the event. Just one
example: During my preparation I tried out the native Docker for Windows solution, together with the Docker client for
Linux installed within the Linux Subsystem for Windows (Bash on Windows). You still need Hyper-V to virtualize the
Docker engine environment and needed kernel features, but with the Linux-based CLI it was worked astonishingly well.</p>
<p>Altogether there were more than 110 events on 5 continents with over 500 mentors, 7500+ RSVPs, and more than 1000
certificates distributed: Incredible numbers for a truly global event.</p>
<p>This was the first cooperation of the three Ruhr Area Docker Groups and hopefully not the last. We’re currently planning
a combined event to celebrate Docker’s 4th birthday next March. If you’re interested in attending, contact me on Twitter
at <a href="https://twitter.com/jans0510">@jans0510</a>.</p>The Finish Line – Hack Week #5 Awards and More!2016-12-23T00:00:00+01:002016-12-23T00:00:00+01:00Zalando Technologytag:engineering.zalando.com,2016-12-23:/posts/2016/12/the-finish-line--hack-week-5-awards-and-more.html<p>It's a wrap on this year's Hack Week – see which projects were praised and rewarded!</p><p>Our annual Hack Week has just wrapped up at Zalando, with over 100 projects being worked on across four tech hubs. We’ve
had a generous amount of hardware, software, and knowledge-sharing ideas being brainstormed and experimented with – but
which were worthy of our Hack Week accolades? Read on to find out.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2ff8fd2db2d04dc3a839c12213b84ece69a3f325_16_053_zalandomaxthrelfallphoto-_xth5342.jpg?auto=compress,format"></p>
<h3>Hack Week Quick Facts</h3>
<p>So how did Hack Week turn out overall? In a nutshell:</p>
<p><strong>Countries participating:</strong> Three – Germany (with two locations), Ireland, and Finland
<strong>Award Categories for 2016:</strong> 10, including the infamous Duke Nukem Forever award for ultimate failure
<strong>Hours spent hacking:</strong> Over 76 hours
<strong>Number of projects:</strong> More than 150 at the starting line
<strong>Amount of Post-It Notes used:</strong> *still counting*
<strong>Bottles of Club Mate consumed:</strong> ∞
<strong>Motivation level:</strong> x10,000</p>
<p>At the culmination of the week we held our Project Fair, where all Hack Week teams presented their work to the rest of
the Zalando Tech community, showing off their product ideas and MVPs. 45 projects made it to the Project Fair stage,
pitching their work to various jury members of our Hack Week awards team, as well as impressing their fellow employees.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a3eb4fc5f608dc46dcec31dd2d2c78a100e202c5_pfhackweek1.jpg?auto=compress,format"></p>
<p>Once the Project Fair finished up, jury members had the chance to collect their thoughts and feedback – which projects
impressed them most? Which teams provided the most impact? Drumroll please: Here are our Hack Week award winners!</p>
<h3>Awards, Awards, Awards</h3>
<p><strong>CUSTOMER JOY – Award for the project that solves an existing problem for our customers:</strong> <em>Friendz & Fashion!</em> A
Tupperware-style party for Zalando items and customers, connecting them with stylists via the Zalando platform.</p>
<p><strong>DO. – Award for the project that increases corporate social responsibility:</strong> <em>zBits!</em> A better way for Zalando to
track the way they do their bit by developing a digital currency for social good within the organization.</p>
<p><strong>INCLUSION NINJA AWARD – Award for the best inclusion-promoting project; a project that contributes to inclusion:</strong>
<em>Lunchies!</em> An app allowing Zalandos to register and meet people they don’t know throughout our growing tech
organization. The app matches you to other employees in your location with a proposed place, time, and food choices.</p>
<p><strong>EMPOWERMENT – Award for contributing positively to Zalando Tech's collective morale or well-being in the workplace:</strong>
<em>zMap!</em> An app allowing anyone in Zalando to find the person they need in the right location, at the right time, by
scanning QR codes to populate building maps.</p>
<p><strong>BSD 4.3 – Award for the best software coding project:</strong> <em>DeepFried Zalando!</em> Deep Learning mixing of images and text,
using data from Instagram, blogs, and magazines to generate a map of fashion terms and relations, on top of automatic
content production from textual data.</p>
<p><strong>QUICK WIN – Award for finding the low hanging fruit; awarded to an achievable project:</strong> <em>Zally!</em> Making APIs
compliant within Zalando Tech by automating the review process.</p>
<p><strong>MARS ROVER – Award for the best system/hardware prototyping project:</strong> <em>The Zalando Home Button!</em> A push notice button
to get your Zalando packages delivered at a time convenient to you, utilizing local parcel buffering and some handy
hardware.</p>
<p><strong>KASPAROV – Award for best contribution to Zalando's</strong> <strong>objectives:</strong> <em>Vitrine!</em> Untapping the potential of outfits at
Zalando via a playful iOS app, combining the power of machine learning and analytic databases to generate outfits in
real-time.</p>
<p><strong>NIKOLA TESLA – Awarded for the most innovative/disruptive project in each location:</strong></p>
<ul>
<li><em>(Helsinki) zLazer:</em> An Internet lazer pointer for all your pointing needs, controlled via Wireless, mounted on a
SHARK</li>
<li><em>(Berlin) Order-A-Tailor:</em> An addition to our Zalando platform that offers alteration services for suits purchased
on desktop or mobile, right to your door</li>
<li><em>(Dortmund) Gamify your Tour of Mastery:</em> Gamifying self-improvement and continuous development within <a href="https://tech.zalando.com/blog/radical-agility-study-notes/">Radical
Agility</a></li>
<li><em>(Dublin) Your Year by Zalando:</em> A snapshot of what customers loved over the past year on Zalando, collecting data
about favourite articles, styles, colours, and trends to share online with friends and family</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ae4ccc412146d311106a9e63fe9bfc83413df381_dublinnikolateslahackweek.jpg?auto=compress,format"></p>
<p><strong>DUKE NUKEM FOREVER – Awarded for the most glorious or spectacular fail after raising great expectations:</strong> <em>NO ONE!</em>
None of our projects failed spectacularly – something to be proud of.</p>
<p>The most anticipating award is no doubt the Golden Ticket to our Slingshot entrepreneurial program: Projects ideal for
customers, utilizing our tech capabilities, and consisting of a strong team are eligible. And the winners are… <strong>Friendz
& Fashion</strong>, <strong>Fashion Sprinter</strong>, <strong>Order-A-Tailor</strong>, and <strong>The Zalando Home Button</strong>. These projects will be given
extended time, more resources, and have help from our dedicated Innovation Lab team to be worked on throughout the next
quarter. We’ll be sure to follow their progress.</p>
<p>A massive congratulations to all of our teams and techies for an inspiring and innovative Hack Week this year. Keep your
eyes peeled to the blog in the New Year for some great video footage we’re currently putting together – we promise it’ll
be worth the wait.</p>Hack Week #5 – The ajudando Project2016-12-22T00:00:00+01:002016-12-22T00:00:00+01:00Sören Blomtag:engineering.zalando.com,2016-12-22:/posts/2016/12/hack-week-5--the-ajudando-project.html<p>Connecting one another with the right expertise via knowledge exchange for Hack Week.</p><p>Imaging the following: You and your team are facing a challenging technical situation and could reach out to hundreds of
potential experts to help. This wouldn’t consist of merely an email exchange: They would actually come visit your team
and spend time with you, making sure the problem was addressed and that you’re equipped to deal with similar situations
in the future.</p>
<p>At Zalando Tech, we happen to have this pool of expertise. For Hack Week, we’re looking at how to match it with
departmental needs and come up with "rules of play" to organize this type of exchange in a fair, undisruptive manner for
delivery.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5d0a17526f7c4c0558a6eb8e0b5f7a1ecff97a4b_16_053_zalandomaxthrelfallphoto-_xth5644.jpg?auto=compress,format"></p>
<h3>Aims of the project</h3>
<p>Throughout the week, we’ve been exploring the need and acceptance of a knowledge exchange model between employees, to
eventually come up with prototypical solutions. Hack Week has allowed us to speak to many different, relevant
stakeholders, to test out our ideas, as well as matching expertise with need between other Hack Week teams for their own
projects.</p>
<p>Ideally, as we have a diverse sample of Tech’s population, we won’t only be including the hard-skill programming
language aspect of expertise. We want to evaluate this idea end-to-end and not just focus on the technical aspects of
building an exchange board web app.</p>
<h3>Operational model and prototype</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9858d5f11b2f82d362e40739568820446cd06248_ajudando-frontpage.png?auto=compress,format"></p>
<p>ajudando – Portuguese for “helping” – is an exchange platform where Zalando staff can access knowledge beyond their
teams, across all of our locations – you simply choose to “Join” (as an expert) or “Use” the exchange (to find an
expert). We’re looking to create a concentrated timeframe or workshop situation that goes beyond the HipChat room, but
isn’t meant to solve staffing issues for teams. This model of exchange would better support knowledge-sharing and
communication of our vast expertise across the organization.</p>
<p>We started with some guerilla UX, running around Hack Week to collect the data required: Were fellow Zalandos willing to
offer their expertise to help other teams? What would motivate them? What were potential blockers that we could identify
from the outset?</p>
<p>Questions we were faced with throughout brainstorming include the amount of time people could dedicate to helping others
vs. time spent directly with their own delivery teams. When we surveyed fellow Zalandos asking what would stand in their
way of helping, their answers were quite revealing and validated the need for such an exchange: <em>“People wouldn't know
how to contact me and/or they don't know that I have the expertise they are looking for.”</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/23237cbe7cf55399fb81a15803fa3d6267563044_ajudando3.jpg?auto=compress,format"></p>
<p>Issues that could potentially be solved via the ajudando platform span all areas of the department: Frontend, Backend,
Training, and Product. We begun with early prototypes for respondents to be able to get the idea off the ground. We also
spoke with owners of existing tools such as zLive (our social intranet) and the Ideas Board (used to organize Hack Week
ideas and teams).</p>
<p>By the end of Hack Week, we hope not only to have helped other Hack Week teams working on projects by connecting them
with various experts, but to have a prototype ready for further testing and possibly integrate the results into existing
tools for the new year.</p>
<p>Have you had experience with or are involved in a similar knowledge exchange? We’d love to hear your feedback and ideas
– reach out via Twitter at <a href="https://twitter.com/soblom">@soblom</a>.</p>Introducing Kids to Tech for Hack Week2016-12-21T00:00:00+01:002016-12-21T00:00:00+01:00Joanna Buchmeyertag:engineering.zalando.com,2016-12-21:/posts/2016/12/introducing-kids-to-tech-for-hack-week.html<p>Exploring ways we can give back to the community via technology for future generations.</p><p>While Hack Week gives Zalandos a chance to experiment and try something new, it also allows us to explore ways we can
give back to the community via technology. Our Hack Week team is channeling that fervor via an initiative to introduce
kids to tech, which is a great way to use our vast resources of people and knowledge.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/215be7819c0fb92151d3e9f1b91a687b50136a1b_kidsintech1.jpg?auto=compress,format"></p>
<p>The scope of the project is to make opportunities in technology visible to kids from any background or gender, to help
shape the tech environment for future generations. Our project team has begun looking at a workshop concept that could
be implemented in schools and afterschool programs, focusing on the following:</p>
<ul>
<li>Role modeling: The Zalando volunteers will represent different roles, and potential possibilities for the students’
future.</li>
<li>Practical workshop: Sparking an interest in tech via en e-commerce game, playing out the whole e-commerce setup and
ecosystem (like Zalando)</li>
<li>Community partnerships: Becoming actively engaged with the community by creating symbiotic partnerships and making
an impact</li>
</ul>
<p>So, how did we get under way?</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8bd5454703c95dfa1b7a925dd2f265dd7fc23901_kidsintech2.jpg?auto=compress,format"></p>
<p>After initially brainstorming the impact, purpose, and outcome of our project, we were visited by a representative from
<a href="https://berlin.impacthub.net/">Impact Hub Berlin</a> who explained the <a href="http://www.theoryofchange.org/what-is-theory-of-change/">Theory of Change
Model</a> (TOC), a process that requires we define all necessary
and sufficient conditions required to bring about a given long-term outcome. Under the TOC Model, we mapped out what we
want kids to take away from the workshop:</p>
<ul>
<li>Understanding the underlying concepts of programming to solve a problem through challenges that require logical and
analytical thinking</li>
<li>Understanding the mechanics of an app they regularly use</li>
<li>Creating an overall more neutral perception of tech, from what we feel kids perceive it to be today</li>
</ul>
<p>We broke our plan down further by looking at different age groups, and making assumptions about what these groups would
be seeing, feeling, and thinking. We were also fortunate enough to be able to have these assumptions validated by kids
of employees, who helped show us that we were on the right track. We also plan to validate assumptions with kids from
our target group, too.</p>
<p>By creating an e-commerce game as the practical element of our workshop, we wanted to introduce the important foundation
skills that are required in technology, making sure kids understood the intent behind their interest: Problem solving,
analysis, and creativity.</p>
<p>Our game will focus on the entire e-commerce set up of an online fashion platform like Zalando, from a customer ordering
their clothing item online, to picking this item in the warehouse, all the way to the eventual shipment of the item.
This would introduce technology ensuingly to come up with solutions the game would need.</p>
<p>For example – our logistics process has many automated actions, and our kids will need to understand the process of
getting an order in, picking it in the warehouse, preparing a package, and writing a label for shipping. How can they
simplify this process with the inclusion of technology?</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e5166cd19122a501c602bd9f39e9d55e134b1130_kidsintech3.jpg?auto=compress,format"></p>
<p>The game is to be set up as role way with different Zalando experts to help, which can also be customized in terms of
difficulty and technologies used. This also allows for easier iterations once some initial feedback has come in.</p>
<p>Ideally, kids participating in the workshop would see the benefits of using technology to solve problems in creative
ways, something we aspire to do every day at Zalando. Awards in the game offer up an extra incentive, as well as giving
kids positive affirmation of their interest in tech.</p>
<p>We’re happy to have Hack Week as an opportunity to pursue initiatives like these, and hope to see it through. We’d love
to hear feedback as well – let us know your ideas via Twitter at <a href="https://twitter.com/ZalandoTech">@ZalandoTech</a>.</p>Hack Week #5 is Live!2016-12-19T00:00:00+01:002016-12-19T00:00:00+01:00Natali Vlatkotag:engineering.zalando.com,2016-12-19:/posts/2016/12/hack-week-5-is-live.html<p>Our annual Zalando Hack Week is here, coming to you bigger and better than ever!</p><p>It’s the event that many of our teams anticipate throughout the entire year – our annual Zalando Hack Week. 2016 sees us
branching out even further than <a href="https://tech.zalando.com/blog/hack-week-4---the-video/">past events</a>, hosting parallel
locations in <a href="https://tech.zalando.com/locations/#berlin">Berlin</a>,
<a href="https://tech.zalando.com/locations/#dortmund">Dortmund</a>, <a href="https://tech.zalando.com/locations/#dublin">Dublin</a>, and
<a href="https://tech.zalando.com/locations/#helsinki">Helsinki</a>. Over 1,600 technologists across three countries are putting
their heads together to hack and innovate the fashion ecosystem, draped in this year’s playful theme of Gaming Allstars.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/808b40f8ef15f595b760c2657bc483a9f92ad464_dortmundhackweek.jpg?auto=compress,format"></p>
<p>What is Hack Week? Our annual, week-long celebration of open innovation and experimentation, where technologists are
free to work on inspiring, inventive new projects for the business. Whether they’re used for improving the way we work,
or if they’re aimed at being game-changing, business-enabling achievements, Hack Week represents our willingness to
create or disrupt our processes and products for the better.</p>
<p>Why do we hack? We’re constantly trying to shape and affect the future of Zalando and the Fashion business overall, so
it’s important to think outside the box of your daily work. We decided to dedicate a full week to innovation and allow
all delivery teams to just hack. You may be wondering then: What are some of the aspiring projects being worked on by
our Zalando Tech teams for this edition of Hack Week? Well, to highlight just some of them:</p>
<ul>
<li><strong>The Parcel Backpack:</strong> A project to develop a small and easy-to-use device in order to transform your Zalando
parcel into a convenient backpack</li>
<li><strong>zWisher-BOT:</strong> Improving the interactivity and capabilities of our Zalando Wishlist with a bot</li>
<li><strong>TRY-ON:</strong> Try on clothes online that you like as you would in real life via Augmented Reality</li>
<li><strong>Introducing kids to tech:</strong> Finding ways to encourage youth to interact with technology</li>
<li><strong>Zalando ZnackMachine:</strong> A vending machine that provides adapters and keyboards to employees</li>
</ul>
<p>We like to recognise the work of our technologists during Hack Week in several ways, one being a fun award ceremony at
the end of Hack Week presented in 10 categories, including customer joy, empowerment, inclusion, and corporate social
responsibility. There’s also the Kasparov Award for the best contribution to Zalando's overall objectives, as well as
the Mars Rover Award for best system/hardware prototyping project.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f5f19d14c408b01bdeca03f3bd2bf803866e906d_16_053_zalandomaxthrelfallphoto-_xth5605.jpg?auto=compress,format"></p>
<p>Another great way we’re recognising potential Hack Week successes is our Golden Ticket Award to the Slingshot Program:
Our internal entrepreneurial support program that helps individuals and teams with innovative business propositions
realise their idea and build an MVP for pitching. 20% of their work time can be dedicated to projects successfully part
of Slingshot, giving budding entrepreneurs the time and space (in our <a href="https://tech.zalando.com/blog/zalando-opens-new-playground-for-tech-innovation/">Innovation
Lab</a>) to get their concept off the
ground. By awarding the Golden Ticket, Zalando makes sure that one of the most promising ideas from Hack Week can be
developed further in the future.</p>
<p>Got all that? So where are we? Idea pitches and initial brainstorms for Hack Week have been happening throughout Monday
at all of our tech locations, with interested technologists getting together to participate, experiment, and innovate.
Hack Week is the time where we can really stretch the limits of what our department can achieve for ourselves, other
Zalando departments and our customers, all while supporting one another in a fun, entertaining way.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/32a7f1ba40b8b05dc147d7f5588081ff247d7bd6_helsinkihackweek.jpg?auto=compress,format"></p>
<p>The spirit of Hack Week centres around unlocking our creativity and pushing the boundaries of tech – we want to try out
as many new, exciting, and unexplored ideas as possible for the benefit of our business, our culture, and ultimately,
you! Throughout the week, we’ll be reporting on the progress of some interesting ideas, while we post all the fun stuff
over on <a href="https://twitter.com/ZalandoTech">Twitter</a>.</p>
<p>Stay tuned for project updates throughout the week! Happy hacking.</p>Zalando meets Technology Foundation Berlin at Techsperts2016-12-16T00:00:00+01:002016-12-16T00:00:00+01:00Zalando Technologytag:engineering.zalando.com,2016-12-16:/posts/2016/12/zalando-meets-technology-foundation-berlin-at-techsperts.html<p>Some healthy discussion about expectations of the future workplace and agile.</p><p>We’re at the end of the <a href="https://tech.zalando.com/blog/zalando-techspert-series-launch/">Zalando Techspert Series</a> for
2016, where we’ve decided to have an up close and personal event with Nicolas Zimmer, the CEO of the Technology
Foundation of Berlin, and Marc Lamik, Zalando’s Head of Innovation and Partnerships. <a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/235804022/">“Culture: Beyond
Agile”</a> was the hot topic of the evening, with
healthy discussion taking place about expectations of the future workplace and the evolution of agile methodologies.</p>
<p>While it was an intimate affair, we know that not everyone interested was able to make it. We caught Nicolas and Marc
after the panel to ask them some key questions related to the night’s energetic debate.</p>
<p><em>Zalando: How relevant will agile be in future tech companies? Is agile already needing a revamp?</em></p>
<p><em>Nicolas Zimmer:</em> Agile is generally mainstream at the moment. So the question really is, what comes after agile? People
have been discussing DevOps as the next step in achieving better collaboration, meaning agile is most definitely
considered to be the baseline in terms of tech culture.</p>
<p>One of the major challenges we’ll have is with companies that are so distributed around the globe that they’ll need to
find mechanisms to keep everything together. How much face time will they get? How will they communicate and really work
together as a team on one product?</p>
<p><em>Marc Lamik:</em> I think agile is a difficult word in general. Some teams who use Scrum, for example, consider themselves
to be working under an agile mindset, however the rest of the company is still structured in an old-school, top-down
hierarchical model. Of course, agile is somewhat of a baseline, but in most companies it’s only a very small portion of
people actually operating in an agile manner. If you want to evolve your work culture, you need to think much deeper
than just agile – considering not only how your engineers work, but also how the entire company will work together.</p>
<p><em>Zalando: What trends have you noticed in culture that have worked, and which ones haven't?</em></p>
<p><em>Marc Lamik:</em> Flat hierarchies have shifted from being a trend to a generally accepted framework by a lot of companies,
which works quite well in organizations of varying sizes. I think there are also some questionable trends to evaluate,
such as shared desks. This was pretty hyped some years ago and a lot of the research about shared desks concluded that
there was very little improvement. Employees missed their personal space, regardless of the size.</p>
<p><em>Nicolas Zimmer:</em> I can chime in on the shared desks example here, as it’s something I’ve noticed a lot of German
companies adopting who are not in the technology sector. I suspect it doesn’t make that much sense.</p>
<p>Another trend to consider is the growing popularity of working from home. We can see a big number of people returning to
home offices which works to some degree, but what you still miss is the cultural and communicative flow that being
present in the office can give. I think we’ll definitely see less hierarchies and more self-organizing teams in the
future, but I don’t believe there will be anything coming along that totally dissolves organizations as we know it –
some feel that the organizational model is something that needs to be overcome, however it’s a cost efficient strategy
that steers employees towards a common goal, which is still needed regardless of what you’re trying to achieve.</p>
<p><em>Zalando: How much do you think technology, like AR and VR, will dominate the future workplace?</em></p>
<p><em>Nicolas Zimmer:</em> There are a lot of promises and expectations with AR and VR. When it comes to hardware engineering,
the adoption of these makes a lot of sense, as well as when we talk about employees on the shop floor, so to speak.
Training via AR or VR to cope with certain challenges or tasks is a great use case and will certainly be a hot topic for
industries producing goods.</p>
<p><em>Marc Lamik:</em> I think there are definitely some areas where VR can make a difference if you think about cooperating
between different offices, for example. We at Zalando do a lot with Google Hangouts which works well, but it’s clearly
different from sitting in the same room and scratching on the same board during meetings and workshops.</p>
<p>If you could replicate this experience via VR, I believe meetings could become even more efficient. However, I’m also
rather confident that VR experiences won’t completely replace face-to-face meetings.</p>Talking to Elasticsearch2016-12-14T00:00:00+01:002016-12-14T00:00:00+01:00Alaa Elhadbatag:engineering.zalando.com,2016-12-14:/posts/2016/12/get-your-application-or-cluster-talking-to-elasticsearch.html<p>Good communication between applications and clusters is an essential requirement for scaling.</p><p>You may have <a href="https://tech.zalando.com/blog/a-closer-look-at-elasticsearch-express/">previously read about</a> our use of
Elasticsearch at Zalando Tech, and especially our utilization of <a href="https://tech.zalando.com/blog/a-closer-look-at-elasticsearch-express/">Elasticsearch
Express</a>: An appliance with a toolkit enabling
quick deployment and management of Elasticsearch clusters.
For us, clusters are not exposed to consumers directly, but rather behind applications. Good communication between
applications and clusters is an essential requirement for scaling either without causing problems.</p>
<p>Your application has to be aware of the cluster nodes. If new ones have joined, the application needs to be
redistributing requests on all nodes equally to allow Elasticsearch to handle more traffic. If your application is not
informed about new nodes, there is no point in adding them, since you are still hitting the older ones. Also, the
application must not send requests to nodes that have already left the cluster; it is disappointing to send a request to
someone who is not around to receive it.</p>
<p>That's why communication between applications and clusters is a key for the dynamic scaling of serving data.</p>
<p>The most recommended way for applications to talk to Elasticsearch is using the <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-transport.html">TCP transport
client</a>. If configured properly,
it will allow the application to constantly fetch the current member nodes of the cluster and distribute requests in a
round-robin fashion equally. It opens a TCP connection pool to all available nodes directly without crossing load
balancers.</p>
<p>Let’s dive into the process of getting the application acquainted with the cluster.</p>
<p>First, a TCP ELB is available only for the first acquaintance with the cluster. It's basically a DNS that forwards the
first connection request to a random node. In this instance your application only knows this node, and can only send
requests to it.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a1c77f2fb37caf671756389757e36787a4eb63c2_talkingtoelasticsearch1.png?auto=compress,format"></p>
<p>This is very limiting if you have ambitious scaling plans. To allow your application to sniff the rest of the cluster
nodes, you need to configure the TCP client in the application to sniff the list of cluster nodes through the first node
the application established a connection with, by adding <em>“client.transport.sniff:true”</em>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/dd69e841214e7bfbd6ad146bdd6334c2476cfe45_talkingtoelasticsearch2.png?auto=compress,format"></p>
<p>The application will receive a list of IPs with all the member nodes of the cluster, and can distribute calls equally on
each. The TCP ELB was only used for a random acquaintance with the cluster at the start of a new application instance.
There is no need to worry about warming up ELBs, as the application is hitting cluster nodes by their direct IPs without
extra network hops.</p>
<p>Your application or cluster can now scale individually without causing issues for your consumers.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ab3c588b64ae9222080baf07f9ca8fc35076cb75_talkingtoelasticsearch3.png?auto=compress,format"></p>
<p>We are still learning and experimenting, with more improvements and guarantees to come for the stability and
availability of our infrastructure. We’ll keep you updated with changes, but in the meantime, you can send through any
questions via Twitter at <a href="https://twitter.com/alaa_elhadba">@alaa_elhadba</a>.</p>Zalando lands at EuroClojure 20162016-12-13T00:00:00+01:002016-12-13T00:00:00+01:00Jori Ahvonentag:engineering.zalando.com,2016-12-13:/posts/2016/12/zalando-lands-at-euroclojure-2016.html<p>Clojure's growing popularity at Zalando saw us attend the largest Clojure conference in Europe.</p><p><a href="http://euroclojure.org/">EuroClojure</a> is the largest Clojure conference in Europe. It is organised by
<a href="http://cognitect.com/">Cognitect</a> and is a single track, two day event organised this year in Bratislava, Slovakia.
Zalando was excited to sponsor the event and send engineers to meet hundreds of Clojurians, attend great presentations,
and enjoy the beautiful city.</p>
<h3>The Talks</h3>
<p>One of the most anticipated topics at EuroClojure were the presentations about clojure.spec, a new feature in Clojure
1.9 which is in the alpha stage. The talk that excited us most was <a href="https://twitter.com/sbelak">Simon Belak’s</a>
presentation “Living with Spec”. Simon showed how the promises of spec translate well into large scale projects from
tutorial examples. Simon elaborated how they originally viewed it as providing schemas, but they found out quickly that
there were many other features like transformations, destructuring, and generative testing they got out of the spec for
free. Clojure.spec seems very versatile and the performance improvements in the later alphas have addressed the biggest
concerns that the Clojure community had.</p>
<p>We had great expectations for <a href="https://twitter.com/swannodette">David Nolen’s</a> keynote, having graced the Finnish
Clojure community with a visit to the <a href="http://clojutre.org/2016/">ClojuTRE conference</a> in September where he spoke on
the history and evolution of ClojureScript. His sequel talk at EuroClojure highlighted the next step: How, from a
technical point of view, ClojureScript is ready for prime time and it’s now time to get more engineers onboard.</p>
<p>Some talks that were of special interest for Zalando were about crossing the chasm between Scala and Clojure. Zalando
has a strong Scala community in addition to our Clojure developers, so combining and comparing the two is of great
interest to us.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/de4027db640ee249deb0564f9eb9989864434754_ec2.jpg?auto=compress,format"></p>
<p>The first Clojure/Scala talk was “Clojure is a Foreign Country: Combining Datomic with Scala” from <a href="https://twitter.com/p_brc">Peter
Brachwitz</a>. Peter presented an interesting case where his company wanted to reap the benefits
of Datomic while developing with Scala. The topics that stood out were how to create typed queries and how to map the
data back to typed objects on reads. Peter made interesting comments on the costs and benefits of using types. This
brought about some questions: How could the same guarantees be provided with spec on the Clojure side, and would the
costs and benefits be the same?</p>
<p>The other Clojure/Scala talk of interest was “Machine Learning with Clojure and Apache Spark” by <a href="https://twitter.com/ericqweinstein">Eric
Weinstein</a>, which presented a good primer on machine learning with decision trees
using Spark and deep learning with DL4J. Eric explained the basics of machine learning and gave some good tips on what
tools to pick up when working with them in Clojure. We’re eager to try out Flambo and Sparkling for Spark with Clojure.</p>
<p><a href="https://twitter.com/danlebrero">Daniel Lebrero’s</a> “Automating resilience testing with Clojure and Docker” was a
favourite talk for Zalando’s engineers. It was premised on the ideas of antifragility in Michael Nygard's "Release it!".
Daniel showed how they had been able to automate the installation of an isolated version of what they have in
production, and then using generative testing to kill random nodes and verify that the invariants and guarantees they
promise still hold. The whole topic would have been of great interest to those who don’t or can’t have Chaos Monkey
running wild in their infrastructure.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/eb38032db18592c6179f6a9685d401d1d0a59e9f_ec1.jpg?auto=compress,format"></p>
<p>On the entertaining side, <a href="https://twitter.com/sriharisriraman">Srihari Sriraman</a> spoke about making computer-generated
music. Not only was it greatly directed, but his presentation was also very insightful into Indian classical music.
Srihari explained the basic concepts of the Carnatic system of music and then explored various possibilities to
recognize and generate it in a programmatic way. The samples of his own singing were amazing; it’s always great to
witness someone excel not only in one skill area but in several, and watch how different domains of knowledge become
merged together.</p>
<p>The final and the most inspiring keynote was given by <a href="https://twitter.com/gigasquid">Carin Meier</a>. Her appearance was
announced by David Nolen, who referred to her as a speaker he personally admires, which turned out to be true for us in
attendance as well.</p>
<p>Carin introduced a classification of programmers into four categories: Explorers, Alchemists, Wrestlers, and Detectives.
She took the Alchemist path and showed two examples of how biology can be combined with IT. In the Genetic Programming
case, given a vector of random values, a spec was bred that was able to conform the provided data without errors. In the
Self-Healing Code case, a way to automatically replace erroneous parts of code in run-time was presented. This and the
other talks we attended about clojure.spec unveiled the wonderful world of possibilities that the spec gives, which is
inspiring and exciting.</p>
<h3>Conclusion</h3>
<p>EuroClojure 2016 turned out to be an awesome conference. The city, the venue, the people, and the talks were all better
than we expected. Additionally, it was great to talk to engineers and share what we are currently building at Zalando.
We love spreading the word about Zalando and how we’re building microservices with the hottest technologies that scale
to web scale.</p>
<p>We’d love to share more information with the broader community about our Clojure initiatives at Zalando – reach out to
us via Twitter at <a href="https://twitter.com/veikea">@veikea</a>.</p>Zalando Continues Being Part of the React Ecosystem at ReactNL 20162016-12-09T00:00:00+01:002016-12-09T00:00:00+01:00Tony Saadtag:engineering.zalando.com,2016-12-09:/posts/2016/12/zalando-continues-being-part-of-the-react-ecosystem-at-reactnl-2016.html<p>Developer Tony Saad shares his insights and learnings from the inaugural ReactNL Conference.</p><p>After attending <a href="https://tech.zalando.com/blog/our-reacteurope-recap/">React Europe 2016</a> in June, we had the pleasure
to be a sponsoring partner for the first <a href="http://reactnl.org/">ReactNL Conference</a> in Amsterdam this year. It was a
great experience to meet developers in the React community from all around the globe, as well as showing them a taste of
our Berlin offices by having “Club Mate” available for one and all!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b38cdcaf1541353b01720ef32432a08d79b8932b_1-_2unf9rqvrydxpthrztycg.jpeg?auto=compress,format"></p>
<p>Flying the Zalando Tech flag, Kolja Wilcke and Andrey Kuzmin had an interesting talk about one of their Hack Week
projects called <a href="http://zalando.github.io/elm-street-404/">“Elm Street 404”</a>, an interactive game coded in Elm. The
audience was incredibly passionate, coming from different continents to meet up and talk about React. It was such an
inspiration and an experience that broadened everybody’s knowledge.</p>
<p>I’d like to walk you through some insights about the three keynotes of the conference.</p>
<h3>Styling React.js Applications by Max Stoiber</h3>
<p>As the title suggests, Max Stoiber began his talk by discussing the different methods of styling a React Application,
whether it’s “CSS-in-JS” or “Inline Styles”, which is mainly about whether to use the <em>style={property: value}</em>
attribute vs. <em>class={styles.fancyClassName}</em>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8d3152db01efc11f77ad1fcd10f283e2db59d28d_1-3mlq2yyi0ttkpxykukfzfq.jpeg?auto=compress,format"></p>
<p>Stoiber continued comparing different “CSS-in-JS” tools like <a href="https://github.com/Khan/aphrodite">Aphrodite</a>,
<a href="https://github.com/cssinjs/jss">JSS</a>, and <a href="https://github.com/FormidableLabs/radium">Radium</a>, and it was quite
interesting how his comparison showed that there’s always a compromise to be made when using these tools. For example:
Theming, Global Scope, or Psuedo Classes may be lacking in the above mentioned tools, so in that vein, he officially
announced the launch of his new tool <strong>“styled-components”</strong> which I highly recommend to check out.</p>
<p>Dan Abramov <a href="https://twitter.com/dan_abramov/status/786586964531097604">tweeted</a> that it could be recommended by React
once it’s more mature. One great aspect of the tool is that it makes you write pure “vanilla” CSS, which is the way to
go in my opinion, especially if you enjoy writing in scss/css files. This will be the case for most people unless they
really need dynamic styling or theming.</p>
<h3>React Fiber by Andrew Clark</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/bf7fca69c2babe8188ffcbc200ea8e8e35675193_1-dkszxlw4jzrj9b3uxl2ila.jpeg?auto=compress,format"></p>
<p>Andrew Clark had an inspirational talk about the future of React, and can be quoted as saying “React Fiber is the new
React, it’s like rewriting React”. As per the <a href="https://github.com/acdlite/react-fiber-architecture">README file</a>, it’s
still an ongoing re-implementation of React’s core algorithm.</p>
<p>Clark explained all the problems they are trying to fix in the process, and noted that their work is totally
experimental. The main purpose of such a project is being able to produce <strong>60 fps web apps</strong>, which is something we’re
still lacking on the web but has already been tackled in other environments.</p>
<p>The goal of React Fiber is to increase its suitability for areas like animation, layout, and gestures. Its headline
feature is incremental rendering: The ability to split rendering work into chunks and spread it out over multiple
frames.</p>
<p>Clark mentioned how embarrassing it is that a smartphone can handle such graphic-heavy games with ease, but lags while
rendering a web page with 1,000 items in a list.</p>
<p>There are two main concepts that React Fiber is focusing on: <strong>Scheduling and Concurrency</strong>. Given that React is
following the <strong>pull-based</strong> approach it can’t support any scheduling, as the framework (React) controls both how and
when to update your UI, unlike other <strong>push-based</strong> approaches (eg. Elm). That’s where React Fiber comes in handy.</p>
<p>One example could be if you have a lot of updates in an operation that requires computation, where there’s some
animations at the same time for the sake of providing a seamless user experience. Animations are more important than
other typical updates.</p>
<p>React Fiber works in the following steps:</p>
<ol>
<li>Pause work and come back to it later</li>
<li>Assign priority to different types of work</li>
<li>Reuse previously completed work</li>
<li>Abort work if it’s no longer needed</li>
</ol>
<p>On top of all the advantages mentioned, there are also some new features that Clark promised, such as integrated layout
and returning multiple elements from the render function, which is one of the most in-demand features from React
Community. Due to the current architecture of the core algorithm, it’s incredibly painful to provide such a feature.</p>
<p>Hopefully next year we’ll hear some more great news about React Fiber.</p>
<h3>How to Build a Compiler by James Kyle and announcing Yarn</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/375cf8320bf2f9eaef757978a12c38911e63fad8_1-kojevfvg3zkphujmc0mmka.jpeg?auto=compress,format"></p>
<p>James Kyle is one of my favorite speakers, especially as he’s one of the authors of <a href="https://yarnpkg.com/">Yarn</a> and
calls himself “The Real Beyonce Of JavaScript”. He gave an oversimplified yet interesting, reasonably accurate overview
of what most compilers look like, and showed us a simple compiler that he built himself.</p>
<p>Kyle used a Lisp-inspired syntax for his complier and went through explaining the difference between <strong>assemblers vs.
compilers</strong>. He gave the same talk at <a href="https://www.youtube.com/watch?v=Tar4WgAfMr4">EmberConf 2016</a> as well.</p>
<p>The exciting part of his talk came later: He re-introduced himself and announced that Yarn was finally released,
explaining how fast it is, giving a live demo of running Yarn install vs. npm install and, as you may have guessed, it
was much faster than the latter. Many people have already started using it.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/dd15de6a90373b7e7ce4d8aa5d94c2984e294d21_1-gom3pd-fnx40pnjmu7ybqq.jpeg?auto=compress,format"></p>
<p>A big thanks from our side goes to the great team organizing ReactNL. The venue was well prepared and very impressive,
with a great crowd in attendance whom we met and socialized with.</p>
<p>We are looking forward to subsequent conferences and are very excited about what’s to come for next year as our use of
React blossoms at Zalando Tech. Feel free to reach out via Twitter at <a href="https://twitter.com/tonysa3d">@tonysa3d</a> if you
have any questions.</p>Hack Like a Girl with Zalando Tech2016-12-07T00:00:00+01:002016-12-07T00:00:00+01:00Princiya Marina Sequeiratag:engineering.zalando.com,2016-12-07:/posts/2016/12/hack-like-a-girl-with-zalando-tech.html<p>Get involved in hacking with Geek Girls Carrots Berlin and the Zalando Tech Shop API.</p><p>I recently had the opportunity to be a mentor at the <a href="http://hacklikeagirl.co/">Hack Like A Girl Hackathon</a>. It was
organized by <a href="http://www.hacklikeagirl.co/#team">Geek Girls Carrots Berlin</a>, with Zalando on board as a sponsor. The
hackathon had a health and fitness theme, and I jumped at the chance to volunteer for the event.</p>
<p>Before the hacking was to begin, participants and mentors met up at the Native Intruments offices on Friday evening to
brainstorm their health and fitness related hacks for the weekend. We were also tasked with building teams. As a fashion
platform, the link to health and fitness might not always be obvious, but with my colleagues Andra and Iuliia, gave some
of the participants insights into using the <a href="https://github.com/zalando/shop-api-documentation">Zalando Shop API</a>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e65530897ad83e6c8de7dbde97886adad0a07c78_hacklikeagirl1.jpg?auto=compress,format"></p>
<p>I was excited to get participants interested in using our API. Zalando’s fashion reputation was less enticing amongst
the participants, who were leaning towards more traditional health and fitness uses. Our pitch involved asking the
audience to come up with ideas that involved a little bit of shopping. After all, you need new clothes and certain
accessories for a healthy start at the gym!</p>
<p>The API briefing session was followed by a VR workshop using the Unity Framework, with a hands-on-hacking session
available before teams became engrossed in their hacks. After our pitch, one of the teams reached out for some help to
get started with React. The evening was rounded up with a Zumba workout session, staying true to the hackathon’s theme.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/44f3c1eac46f1a4710bdf4884438e5da0624d089_hacklikeagirl2.jpg?auto=compress,format"></p>
<p>Jury discussions and voting took place on the Sunday, in the latter part of the afternoon. Being a jury member was an
incredible experience, with seven teams presenting their hacks. The source code for all participating teams can be found
<a href="https://github.com/GGCarrotsBerlin">here</a>.</p>
<p>I was amazed at the energy level of all the participants. There was no drop in their energy, despite it being a Sunday.
Our voting was based on code, the idea, and the potential behind the hack. Each jury member voted based on different
evaluation criteria, which was interesting to observe:</p>
<ul>
<li>First criteria – potential and uniqueness of the hack</li>
<li>Second criteria – presentations and team spirit</li>
<li>Third criteria – <a href="https://github.com/GGCarrotsBerlin">implementation</a> (working prototype)</li>
</ul>
<p>It was surprising to me that looking into a team’s <a href="https://github.com/GGCarrotsBerlin">source code</a> was the last
criteria option, and that team ratings varied when this occurred.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f0d0ed1e3424055718a8c6afeb413587fa944633_hacklikeagirl3.jpg?auto=compress,format"></p>
<p>Overall, Hack Like A Girl was a great event and it was tremendously valuable being a mentor for participating teams. I
was excited by the turnout and elated by the whole experience.</p>
<p>The hackathon organizers also produced a short video, <a href="https://vimeo.com/189506698">available here</a>. I would be happy to
answer questions about this fantastic experience – please reach out via Twitter at
<a href="https://twitter.com/princi_ya">@princi_ya</a>.</p>Recommendations Galore: How Zalando Tech Makes It Happen2016-12-02T00:00:00+01:002016-12-02T00:00:00+01:00Dr. Mikio Brauntag:engineering.zalando.com,2016-12-02:/posts/2016/12/recommendations-galore-how-zalando-tech-makes-it-happen.html<p>Diving deeper into the tools that allow us to power our recommendation engines.</p><p>If you’re a frequent shopper on Zalando, you would have noticed our recommendations for similar items and brands when
you’re browsing. It’s a feature we’re constantly iterating on to make the experience as personalised as possible.</p>
<p>Having recently been featured in the Financial Times about <a href="http://www.ft.com/cms/s/2/c7764ff0-00b2-11e6-99cb-83242733f755.html#axzz4JZ5zgUA5">algorithms and data to suggest your next
purchase</a>, we’ve decided to dive a
little deeper into what these powerful tools can do for customers and for our business.</p>
<p><strong>What metrics can Zalando offer that prove the efficacy of using recommendations in terms of items sold, basket values,
etc?</strong></p>
<p>Measuring click-through rates is a core metric for recommendations, but you cannot forget about tracking gross revenue.
This gives us information about whether customers are making purchases on the same day as the click-through, displaying
how much revenue is influenced by recommendations in general. We’re also using visitor conversion rates, revenue per
visitor, average order value, average number of recommendation impressions, and share of sold items attributed to
recommendations to measure their impact. Alongside this, recommendations are powered by different engines to ensure
we’re getting the whole picture behind user interaction and article preference. Though this makes it difficult to
attribute an increase in KPIs to one engine, it is crucial as it tracks the overall effect on the customer experience.</p>
<p><strong>Do you use a content-based or collaborative filtering approach for recommendations - or both? Where do the strengths
and weaknesses of each approach lie?</strong></p>
<p>It’s important to establish what we mean when we talk about data derived from recommendation engines. User data is what
we get when we’re looking at clicks and views, whereas product data is concerned with the items themselves. In most
cases, for item-based recommendations, we use a combination of collaborative filtering and content-based filtering, as
that works best for collecting and assessing the data we need. In cases when we have to tackle a cold start problem, we
use plain content-based recommendations. Alongside this, we have implemented several content-based algorithms and we are
also working on some interesting new ones.</p>
<p>In our view, one of the benefits of collaborative filtering is that it is completely user-behaviour based, meaning we
can derive similarities between items that might not be observed solely through product data. However, to ensure that
collaborative filtering is effective, you need a lot of traffic to ensure that the data communicated is an accurate
representation of user behaviour.</p>
<p>For content-based filtering, we can use product data that we already have, meaning it is a great base to start from if
we don’t have a gamut of user-behavior figures. The effectiveness though can vary depending on your data model, or how
richly defined your data is.</p>
<p><strong>What are the biggest challenges involved in implementing a recommendation engine? What advice would you pass on to
other organisations thinking of going down this road?</strong></p>
<p>One of the biggest challenges involved in implementing an effective recommendation engine comes from developing a deep
understanding of customer preferences. Fashion purchasing is an emotional interaction, based on evolving personal
tastes, which makes it almost impossible to create a “master formula”. Still, this is the joy of recommendation engines:
creating something that helps people connect with things they love. It is key that companies understand that customers
don’t expect them to be mind readers, rather the expectation is to be helpful and improve the customer experience.</p>
<p>An additional challenge is that for many ecommerce sites, consumer preferences must be inferred from implicit feedback
like online behavior, rather than explicit feedback such as a simple “like-button” or similar. Actions such as time
spent browsing a particular product or placing a product in a shopping basket gives us an insight into what a customer
is interested in, but can also be misleading. What if they left the computer to get a drink? Or purchased a product for
a present? For an engine to be successful and provide a positive service, it does not need to be flawless, as long as
the customer is receiving relevant recommendations.</p>
<p>There is also the gap between offline prototyping and online implementation that can prove to be challenging, as it is
very difficult to evaluate our models offline. The key challenge here is that in an offline setting, you are working
with past data, and you don’t really know how a user would have reacted. Instead, one is fixing a point in time and then
try to see how well one can forecast the user’s behavior after that time point based on the behavior before. While
offline tests are an important step, the real evaluation has to happen online in live A/B-tests. Here, users are
randomly assigned to two or more different algorithmic variants and the metrics are tracked separately. Measurements are
sometimes noisy and fluctuate over time, so A/B-tests have to be run for a long time to get significant results for
metrics like Click Through Rate. This timeframe can vary between one week to several weeks in order to see a variance
that is statistically accurate and pertinent.</p>
<p>This is the biggest piece of advice we would offer to other companies adopting recommendation engines: never lose sight
of the end-user. While issues such as tracking online behaviour, data quality and effectively implementing prototypes
can cause problems, they can also distract you from focusing on how it operates in the real world.</p>
<p>In other words, it is about improving. A recommendation engine should never be complete. Instead, it is in a constant
state of improvement with the newest techniques being adopted in order to create the best results possible, something we
follow at Zalando.</p>Crafting Effective Microservices in Python2016-12-01T00:00:00+01:002016-12-01T00:00:00+01:00Rafael Cariciotag:engineering.zalando.com,2016-12-01:/posts/2016/12/crafting-effective-microservices-in-python.html<p>The API-first approach combined with Connexion are powerful tools to create effective microservices.</p><p><em>TL;DR:</em> The <a href="https://www.oreilly.com/ideas/an-api-first-approach-for-cloud-native-app-development">API-first approach</a>
combined with <a href="https://github.com/zalando/connexion/">Connexion</a> are powerful tools to create effective
<a href="http://martinfowler.com/articles/microservices.html">microservices</a>. The use of API-first brings the benefit of
creating APIs that fulfil the expectations of your clients. Besides that, using Connexion will help you develop APIs in
Python in a smooth manner.</p>
<p><a href="https://cloudplatform.googleblog.com/2016/09/Google-to-acquire-apigee.html">Google’s acquisition of Apigee</a> emphasises
how important APIs are in today’s architecture of applications. Using
<a href="http://martinfowler.com/articles/microservices.html">microservices</a>, with well crafted APIs, is crucial for maintaining
a successful business, as it simplifies the development of complex software solutions.</p>
<p>Growing companies experience a natural rise of their businesses’ complexity. This complicatedness comes from market
changes and catering to customer demands. Companies such as <a href="https://tech.zalando.de/">Zalando</a>, which are faced with
such challenges, have chosen to <a href="https://github.com/zalando/engineering-principles">adopt
microservices</a>. This aims to make it
easier to build and maintain their applications.</p>
<p>Microservices are a style of breaking up and organizing a complex software solution into smaller composable services.
Those smaller services can be independently maintained and deployed. Each service is developed around a <a href="http://searchsoa.techtarget.com/definition/business-capability">business
capability</a> that usually provides a REST API. Clearly
stating what capability each service provides can have a huge impact on the productivity of the teams working together
to build and maintain their microservices.</p>
<p>Microservices developers are expected to provide a description to client-side developers, who build and maintain the
web, mobile or other services, of how to use their REST APIs. A failure to describe an API in a clear manner or
maintaining updated documentation leads to broken system designs and the frustration of team members. The <a href="http://www.programmableweb.com/news/emergence-api-first-development/2014/01/09">API-first
approach</a> is being established as a
solution for those problems.</p>
<h3>API-first approach</h3>
<p>The API-first approach has been <a href="https://www.oreilly.com/ideas/an-api-first-approach-for-cloud-native-app-development">recently
added</a> to the original <a href="https://12factor.net/">12 factors
of app development</a> methodology—a set of rules and guidelines for developing robust applications
that a group of experienced developers came up with while building the Heroku platform. It is an approach that focuses
on having API specifications as a first-class artifact of the development process and using them as a contract to be
shared among software developers and teams. Building the specification of a service upfront facilitates discussions
among possible consumers of the API, creation of mocks of the API, and generates documentation, even before you write
the first line of code.</p>
<p>Google, IBM, Microsoft, and other companies have come together to create the <a href="https://openapis.org/">Open API initiative
(OAI)</a>, to support the definition and establishment of a vendor independent format for describing
REST APIs known as <a href="https://openapis.org/specification">Open API Specification</a>, formerly named <a href="http://swagger.io/">Swagger
2.0</a>. There are <a href="http://nordicapis.com/top-specification-formats-for-rest-apis/">other formats</a>
available for describing APIs that can be used with an API-first approach, the most well-known being <a href="https://apiblueprint.org/">API
Blueprint</a> and <a href="http://raml.org/">RAML</a>. However, the Open API format currently holds a
larger community of users and supporters. From a specification written using the Open API format, the initial code for
the implementation can be <a href="https://github.com/swagger-api/swagger-codegen#server-stubs">easily generated</a>. There is
support for many languages and frameworks: Ruby, Java, Node, C#, etc. For Python,
<a href="https://github.com/zalando/connexion/#connexion">Connexion</a> is the best choice in API First development, since it
relies neither on code generation nor needs boilerplate code.</p>
<p>Connexion is an open source framework built on top of <a href="http://flask.pocoo.org/">Flask</a> that facilitates the development
of microservices in Python following the API-first approach. It was created by <a href="https://tech.zalando.de/">Zalando</a> to
meet the <a href="https://tech.zalando.de/blog/meet-connexion-our-rest-framework-for-python/">in-house demand</a> for such
solutions. Connexion is in active development; I am one of the primary maintainers of the project, along with <a href="https://twitter.com/joaomcsantos">Joao
Santos</a> and <a href="https://twitter.com/try_except_">Henning Jacobs</a>.</p>
<h3>Building a simple service in Python</h3>
<p>The first step to building an effective microservice in Python is to describe the resources that are going to be
available in our API using the <a href="https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md">OpenAPI
Specification</a>. We will focus on describing
what routes, parameters, payloads, and which response codes our API produce. We are starting with a simple example of an
endpoint that responds with the string “Hello API!”.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># `my_api.yaml` file contents</span>
<span class="nt">swagger</span><span class="p">:</span><span class="w"> </span><span class="s">'2.0'</span>
<span class="nt">info</span><span class="p">:</span>
<span class="w"> </span><span class="nt">title</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Hello API</span>
<span class="w"> </span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">"0.1"</span>
<span class="nt">paths</span><span class="p">:</span>
<span class="w"> </span><span class="nt">/greeting</span><span class="p">:</span>
<span class="w"> </span><span class="nt">get</span><span class="p">:</span>
<span class="w"> </span><span class="nt">operationId</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">api.say_hello</span>
<span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Returns a greeting.</span>
<span class="w"> </span><span class="nt">responses</span><span class="p">:</span>
<span class="w"> </span><span class="nt">200</span><span class="p">:</span>
<span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Successful response.</span>
<span class="w"> </span><span class="nt">schema</span><span class="p">:</span>
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">object</span>
<span class="w"> </span><span class="nt">properties</span><span class="p">:</span>
<span class="w"> </span><span class="nt">message</span><span class="p">:</span>
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">string</span>
<span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Message greeting</span>
</code></pre></div>
<p>In the code snippet above, we specified that our API has an endpoint “/greeting” that accepts requests with method “GET”
and returns a status code 200, which represents success. Notice that the business logic is not defined in our
specification – this part is left to the implementation of our endpoint, which will be done in Python. The “operationId”
is what defines what Python function should be executed when the API call is made to your endpoint.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># `api.py` file contents</span>
<span class="k">def</span> <span class="nf">say_hello</span><span class="p">():</span>
<span class="k">return</span> <span class="p">{</span><span class="s2">"message"</span><span class="p">:</span> <span class="s2">"Hello API!"</span><span class="p">}</span>
</code></pre></div>
<p>As you can see, Connexion API handlers are free of boilerplate code. Normal Python functions that return a simple data
structure are used as handlers for the API calls. Data structures returned by the handlers may be validated against the
API specification by Connexion (if specified so). This validation is disabled by default to grant flexibility during
development. Now we can use Connexion to glue our code to the API specification and get a server running. The easiest
way to run our API is using the Connexion CLI tool:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># run this command in the same directory where you saved the previous files.</span>
$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>connexion<span class="w"> </span><span class="c1"># installs connexion, run only once.</span>
$<span class="w"> </span>connexion<span class="w"> </span>run<span class="w"> </span>my_api.yaml<span class="w"> </span>-v
</code></pre></div>
<p>Now we can go to the browser and access the address <a href="http://localhost:5000/greeting">http://localhost:5000/greeting</a>, where we should be able to see the
message:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="nt">"message"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hello API!"</span><span class="p">}</span>
</code></pre></div>
<p>To make it more dynamic, we can change our specification to add a HTTP parameter for the user’s name. For that our
specification should look like this:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># `my_api.yaml` file contents</span>
<span class="nt">swagger</span><span class="p">:</span><span class="w"> </span><span class="s">'2.0'</span>
<span class="nt">info</span><span class="p">:</span>
<span class="w"> </span><span class="nt">title</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Hello API</span>
<span class="w"> </span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">"0.1"</span>
<span class="nt">paths</span><span class="p">:</span>
<span class="w"> </span><span class="nt">/greeting</span><span class="p">:</span>
<span class="w"> </span><span class="nt">get</span><span class="p">:</span>
<span class="w"> </span><span class="nt">operationId</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">api.say_hello</span>
<span class="w"> </span><span class="nt">summary</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Returns a greeting.</span>
<span class="w"> </span><span class="nt">parameters</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">name</span>
<span class="w"> </span><span class="nt">in</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">query</span>
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">string</span>
<span class="w"> </span><span class="nt">responses</span><span class="p">:</span>
<span class="w"> </span><span class="nt">200</span><span class="p">:</span>
<span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Successful response.</span>
<span class="w"> </span><span class="nt">schema</span><span class="p">:</span>
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">object</span>
<span class="w"> </span><span class="nt">properties</span><span class="p">:</span>
<span class="w"> </span><span class="nt">message</span><span class="p">:</span>
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">string</span>
<span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Message greeting</span>
</code></pre></div>
<p>Now Connexion will pass an optional parameter to our function. To handle that we also have to change our Python
function.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># `api.py` file contents</span>
<span class="k">def</span> <span class="nf">say_hello</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span><span class="s2">"message"</span><span class="p">:</span> <span class="s2">"Hello </span><span class="si">{}</span><span class="s2">, from API!"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">name</span> <span class="ow">or</span> <span class="s2">""</span><span class="p">)}</span>
</code></pre></div>
<p>Our Python code now matches the API specification. Now we can restart our server, press Ctrl+C, and run the following
command again to see these changes in play:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># run this command in the same directory where you saved the previous files.</span>
$<span class="w"> </span>connexion<span class="w"> </span>run<span class="w"> </span>my_api.yaml<span class="w"> </span>-v
</code></pre></div>
<p>Once again our server will be listening on <a href="http://localhost:5000/greeting">http://localhost:5000/greeting</a>. Additionally, it is now possible to pass an
optional parameter “name”. Connexion includes an API console interface, which is available at
<a href="http://localhost:5000/ui">http://localhost:5000/ui</a>. This console UI is enabled by default and makes it easy to call the endpoints of our API. It
also serves as documentation of our microservice.</p>
<p><img alt="API Console" src="https://images.prismic.io/zalando-jobsite/7160a92c2a115cde37eddc0af6b8953663a9c8f9_api_console.png?auto=compress,format"></p>
<p>Now we know the basics to create a Connexion application. For a more detailed example, please check the <a href="https://github.com/hjacobs/connexion-example">Connexion
example project</a> available on GitHub. The <a href="http://connexion.readthedocs.io/en/latest/">official Connexion
documentation</a> is a complete resource of information regarding its
functionalities. This documentation should be used as a reference when developing RESTful microservices using Connexion.</p>
<h3>Moving forward</h3>
<p>Companies are starting to realize the benefits of the API-first approach. <a href="https://www.etsy.com/">Etsy</a> has published a
<a href="https://codeascraft.com/2016/09/06/api-first-transformation-at-etsy-concurrency/">blog post</a> describing how the
API-first approach is helping them tackle challenges with providing a consistent API. At Zalando, API-first is at <a href="https://tech.zalando.de/blog/on-apis-and-the-zalando-api-guild/">the
core of our software development lifecycle</a>, with a
process for peer review feedback and <a href="http://zalando.github.io/restful-api-guidelines/">guidelines for creating RESTful
microservices</a>, available as open source.</p>
<p>We can find <a href="http://swagger.io/tools/">many open source tools</a> that support API-first approaches available for <a href="https://github.com/zalando/friboo">diverse
languages</a>. Connexion is the perfect framework for a clean implementation of the
API-first approach in Python and is in <a href="https://github.com/zalando/connexion/pulse/monthly">active development</a>.
Connexion does not rely on code generation. This leaves you free to evolve your API specification without breaking the
code you have already implemented.</p>
<p>If you have any issues with Connexion, please feel free to post your problem to our <a href="https://github.com/zalando/connexion/issues">issue
tracker</a>. You are also very welcome to send a <a href="https://github.com/zalando/connexion/pulls">pull
request</a>. We already have <a href="https://github.com/zalando/connexion/graphs/contributors">many
contributors</a> and you can join by improving the documentation
or the code. If you’d like to get in touch, you can find me on Twitter at
<a href="https://twitter.com/rafaelcaricio">@rafaelcaricio</a>.</p>Getting Down to Business with our Techsperts2016-11-29T00:00:00+01:002016-11-29T00:00:00+01:00Zalando Technologytag:engineering.zalando.com,2016-11-29:/posts/2016/11/getting-down-to-business-with-our-techsperts.html<p>Missed our last Techspert Panel? Check out the exclusive interview with this month's guests.</p><p>We’re almost at the end of this year’s <a href="https://tech.zalando.com/blog/zalando-techspert-series-launch/">Zalando Techspert
Series</a> run, but before we have our last hurrah of the
year, we’ve had some great input from Berlin companies about <a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/235128072/">picking the right business
model</a>.</p>
<p>Our SVP Products, Philipp Erler, was joined by Daniela Cecílio, CEO of fashion search tool Asap54 and Martin Kütter, COO
of the world’s first language learning app, Babbel. Together they answered questions from our panel moderator Hannah
Löffler out of Gründerszene, which were focused on helping startups tread the right path to getting their business off
the ground and into the minds of customers.</p>
<p>Sad to have missed the event? We sat down with our Techsperts who shared their experience in growing businesses and
getting them on the track to success.</p>
<p><em>Zalando: How much research is needed when picking the right business model for your startup? What is your top tip when
exploring the options?</em></p>
<p><em>Philipp Erler:</em> It depends a lot on the business. For some it can be simple, while for others it turns out to be a lot
more complicated. It’s a lot about thinking systemically about what you want to achieve in the long term, and having a
clear strategy. Deriving your business model from that is key.</p>
<p>The difficult part here is making sure you’re not destroying any leads in the beginning. Always be open to the option of
changing your business model.</p>
<p><em>Daniela Cecílio:</em> In my case, I knew the problem that I wanted to solve. However, monetizing it was an issue I had to
face later down the track. I used Google as a basis for my research model, as well as speaking to people who had worked
at Google previously. I also looked at the competitors that were similar to Google and my own business model as part of
my research.</p>
<p><em>Martin Kütter:</em> As Philipp said, it depends on the business. In Babbel’s case, we prefer to dedicate a relatively small
amount of time to research in order to produce the most lean idea possible and test it. This is then rolled out in a
very limited scenario, with the results then closely studied. I would recommend keeping an open mind and testing all the
options available to you.</p>
<p><em>Zalando: How important is it to study your competitors?</em></p>
<p><em>Martin Kütter:</em> Babbel has a very limited group of competitors, so we have just two or three to watch with very
different approaches to the market. We tend to make it work in our own way, but are still aware of the competition
without having to experience too much friction.</p>
<p><em>Philipp Erler:</em> My advice would be to be paranoid and ignorant at the same time. For the long term, you should
understand what your competitors are doing and what effect or harm they could have on your business. It’s crucial to do
what you do best, and do it right. You can monitor your competitors in the short term as well to replicate their
success.</p>
<p><em>Daniela Cecílio:</em> While we don’t have direct competitors as such, my approach is to put myself in the consumer’s shoes.
By doing so, you can look at what’s on offer in the market and figure out what’s missing in a competitor’s offering.
Businesses can often misstep and fail to recognize what can be leveraged – studying your competitors to see what they’re
doing wrong can help to improve your own business for the better.</p>
<p><em>Zalando: What happens if you fail? How can your business recover?</em></p>
<p><em>Daniela Cecílio:</em> You’re going to be failing all the time! What you need to do is be able to bounce back quickly and
have a Plan B, C, D, E, F… Basically, picking the option that will work and keeping at it.</p>
<p>It’s also essential to make sure you’re synced with your team – if you’re feeling demotivated after failure, your team
can pick you up and get you going again. Every entrepreneur will fail, or has failed, but failure is ultimately the best
way for us all to learn.</p>
<p><em>Martin Kütter:</em> We expect to fail in many of our endeavours, it comes with the process. You should be open to the
prospect and put a name or cause to the failure, so that someone is accountable for learning from it.</p>
<p><em>Philipp Erler:</em> Failure is the ultimate driver of success. If you fail, you should understand why you’ve failed, but
don’t spend too much of your energy dissecting the issue – it’ll just drag you down.</p>
<p>Dust yourself off and learn quickly. Be productive in learning from failure – then fail again tomorrow.</p>A Closer Look at Elasticsearch Express2016-11-24T00:00:00+01:002016-11-24T00:00:00+01:00Alaa Elhadbatag:engineering.zalando.com,2016-11-24:/posts/2016/11/a-closer-look-at-elasticsearch-express.html<p>An applicance allowing teams to serve and retrieve data at a large scale, used in production.</p><p>Elasticsearch is a technology that has been gaining popularity lately at Zalando Tech. We’ve learned that it’s a state
of the art tool, and in the hands of a data artist can be used to design data models to conquer information retrieval
challenges at a very large scale in a performant, distributed manner.</p>
<p>The success of Elasticsearch as a technology is due to the various use cases it fits. Whether it is full-text search,
structured search, or data analytics, Elasticsearch has no competitor in solving so many diverse data problems.</p>
<p>For the <a href="http://www.zalando.com/">Zalando Fashion Store</a>, Elasticsearch has become the foundation of serving data to
customer facing applications or other services. Due to the importance of the role it plays in our architecture, we
needed a well-founded and scalable setup for our Elasticsearch clusters.</p>
<p>Operating a distributed data store comes with a certain set of challenges. Stability, growth, and availability of our
data must be guaranteed for our stakeholders, customers, and consumers.</p>
<p>This is why we built Elasticsearch Express.</p>
<p>Elasticsearch Express is an appliance with a toolkit enabling quick deployment and management of Elasticsearch clusters.
It enforces the best practices of operating clusters over cloud-based infrastructures. It is designed specifically to
run on AWS over <a href="https://stups.io/">STUPS</a> (Zalando PaaS).</p>
<p>Elasticsearch Express features:</p>
<ul>
<li>Easy deployment of Elasticsearch across multiple availability zones</li>
<li>Cluster deployment in less than 10 minutes</li>
<li>Full data availability guarantee on each availability zone</li>
<li>Monitoring dashboard template of Elasticsearch metrics</li>
<li>Role separation of nodes</li>
<li>Stable master election</li>
<li>No manual configuration on AWS</li>
<li>Automatic data backups in S3 bucket</li>
<li>Automatic recovery from possible infrastructure failures</li>
</ul>
<p>We built Elasticsearch Express to allow Zalando teams focus on solving data problems, without worrying about operational
configuration or infrastructure setup. The setup we offer is designed to accommodate for the possible failures that
might occur from being in a cloud environment and hosting a cluster that spans data on multiple availability zones. In
this article we will cover the deployment plan of Elasticsearch Express and scenarios of failure and automatic recovery
based on the guarantees we promised.</p>
<p>Let’s dive into our deployment plan.</p>
<h3>The Initial State</h3>
<p>We start with an empty environment. Our target is to deploy a cluster distributed across a region consisting of two
availability zones.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e13151fa20c729f56e0b30d70dada464a7790179_eexpressarticle1.png?auto=compress,format"></p>
<p><strong>1. Deploy master nodes</strong></p>
<p>The first step in forming a cluster is to deploy a master stack across both zones with an odd number of master nodes, to
guarantee a quorum on one of the zones. Master election will take place and one node will be set as a master node. The
other master-eligible nodes will be idling as long as the cluster is healthy, and no network disruption occurs.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0366fab2b568e29fa028033ce67618c75337249a_eexpressarticle2.png?auto=compress,format"></p>
<p><strong>2. Deploy data stack to the first zone</strong></p>
<p>Step two occurs after masters deployment and cluster formation. The first data nodes stack will be deployed to one
availability zone with the configuration identifying the zone ID, so that Elasticsearch is able to understand the
cluster topology and obtain shard allocation awareness.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1e75c1e78fbcb40f75f5151e40f4dc088bfd0917_eexpressarticle3.png?auto=compress,format"></p>
<p><strong>3. Deploy data stack to the second zone</strong></p>
<p>Similarly, a following stack of data nodes will be deployed to the second zone, with a configuration identifying the
second zone ID.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/797186c6207b308354bad498018ba708ef448ae2_eexpressarticle4.png?auto=compress,format"></p>
<p>At this point, the cluster is formed successfully and ready for usage. We mentioned earlier that Elasticsearch obtained
shard allocation awareness with this configuration. This means that whenever you create an index with any replication
factor bigger than zero, Elasticsearch will distribute the shards across the two availability zones, guaranteeing that
each zone contains the full data set.</p>
<p>To enable shard allocation awareness, each node within a stack is tagged with the following configuration.</p>
<p>Master nodes are configured with: “cluster.routing.allocation.awareness.attributes: zone”.
This informs Elasticsearch that shard allocation awareness for data nodes is defined by the custom attribute “zone”.
Each data node is tagged with its zone ID, such as “node.zone: z1”. This way, the master node will know the location of
each data node on the cluster across the availability zones, and distribute the data accordingly.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7f86bbc54dafdf9296a6e07e25166740d7ecd882_eexpressarticle5.png?auto=compress,format"></p>
<p>When you’ve reached this point, it is guaranteed that your data set is complete at both sides of the cluster.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/793efc4b7d5b1c76ee9907687da16d60e765bd6e_eexpressarticle6.png?auto=compress,format"></p>
<p>The whole deployment process takes less than 10 minutes. You are now ready to ingest or fetch data from your cluster.</p>
<h3>Monitoring</h3>
<p>Operating a distributed data store in a live environment requires constant awareness of the cluster performance based on
data growth, ingestion, and consumption rates. Elasticsearch exposes monitoring metrics that will represent the health
of the cluster and its nodes. These metrics are presented by Elasticsearch Express on a Grafana monitoring dashboard,
showing the performance of individual nodes. It’s very handy to have an overview of the performance of each node in
comparison to others, as it makes it easier to spot and foresee issues before they occur.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d91cd0d0482615e521b23b30a0cd94d66c4f5789_eexpressarticle7.png?auto=compress,format"></p>
<p>Some metrics require monitoring of current instant numbers, while others might be more useful to view over longer
durations to see the performance changes over a longer timeframe.</p>
<p>Data ingestion directly impacts the number of segments per node and merging rate. A healthy node will contain a number
of segments within a reasonable constant range. A growing graph is a sign of insufficient merging or sudden growth of
data magnitude.</p>
<p>Operations performed by nodes such as executing queries, merging segments, and shard allocation consume CPU and are
definitely visible on node load and CPU graphs. Letting the nodes operate on low load is recommended to account for
sudden extra load or traffic spikes.</p>
<p>Monitoring JVM/OS memory is also very important. Take into consideration that Elasticsearch uses the JVM heap to provide
fast operations, while Lucene is using the underlying OS memory for caching in-memory data structures. Memory and
garbage collection graphs will tell you about the health of the interaction between Elasticsearch and Lucene.</p>
<p>There are more metrics to account for, but these are the most important for operating a stable long lasting cluster.</p>
<p>Now let’s cover some failure scenarios that might occur to see how Elasticsearch Express will react.</p>
<h3>Data link failure</h3>
<p>Connection loss between availability zones might not happen too often, but inevitably it will. With the wrong
configuration, you can end up with a split brain situation where you have two clusters formed out of the original one.
This situation is disastrous on many levels in terms of data consistency and availability. Data replication will no
longer occur between the newly formed cluster and the data state will completely diverge. Your application view of data
will depend on which cluster the application instance is connected to.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/dc34e7142e09bd5553d57e5920d3d21e916dc974_eexpressarticle8.png?auto=compress,format"></p>
<p>Luckily, Elasticsearch Express is designed to handle such vulnerabilities. By setting a configuration parameter,
dictating master election can only take place on the quorom side of the cluster with master-eligible nodes. A split
brain between the two zones can never happen. In this case, a new master will be elected and the quorum side will form a
cluster. The other side will not form a cluster and will just be hanging until the link between the nodes is back again.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a57c18319f9fb10185c6467344170d61b43c84af_eexpressarticle9.png?auto=compress,format"></p>
<h3>Availability zone failure</h3>
<p>Complete availability zone outage is unlikely to happen. However, a long period of connection loss between different
zones can cause this.</p>
<p>Remember enabling shard allocation awareness? It’s now time to kick it into action. The data living on the cluster is
safe since it is fully contained on the surviving zone. Indeed you may have lost a lot of shards, but Elasticsearch is
aware of that and will soon start recreating them on the surviving zone.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d1ccdb7ce678930e4235aa3f3a20c4250c74c1a1_eexpressarticle10.png?auto=compress,format"></p>
<p><strong>Survive with one zone</strong></p>
<p>Survival with one zone is guaranteed if you are sure your application load on half of your cluster instances won’t
overwhelm the cluster and start killing nodes with immense traffic. Make sure your cluster always has the space to
handle more than it’s designed to be. For stateful machines, you need to over-allocate resources – during times of
failure, scaling and maintaining data state consistency is a tough challenge.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/71f63a5218776ccbe81871b682d8de64855456f0_eexpressarticle11.png?auto=compress,format"></p>
<p><strong>Restore the original state on one zone</strong></p>
<p>Elasticsearch will soon replicate the surviving shards to reach the original state before failure, but now only on one
zone. Currently, the cluster has survived with half the amount of nodes, so Elasticsearch Express sets each availability
zone on its own scaling group. Soon enough, more nodes will be added by auto scaling based on the current traffic and in
no time, the cluster will be fully fit and functional as if nothing had happened.</p>
<p>The whole cycle of failure, survival, and recovery occurs within minutes. You might be alerted somehow, but it will all
be resolved before your intervention is actually needed. You might not even notice until the next day.</p>
<p>You’ll just need to make sure that your cluster has enough resources to handle higher traffic and perform survival
tricks before you call it a night.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1feb4bd782c8189c7325bc533b757aa4de95942b_eexpressarticle12.png?auto=compress,format"></p>
<h3>Zone recovery</h3>
<p>The zone outage is not forever and the missing zone will eventually see the light again, whether it has happened
automatically or by your own manual doing. Nodes will start joining the cluster from the incarnated zone. Elasticsearch
will recognize the new nodes and their topology on the network from the configured zone ID.</p>
<p>Once again, Elasticsearch will redistribute the data across the two zones to guarantee the full data set is contained by
each.</p>
<p>Keep in mind that you might end up with a lot more nodes than before the incident due to autoscaling the surviving zone.
The Elasticsearch Express scaling strategy is automatically up, and only manually down. You don't want scaling down
nodes without your consent.</p>
<p>We are still researching the possibilities to automatically scale down to a certain threshold defined by the cluster
operator for saving costs, but until this is supported, you will have to manually decommission and shut down the
unwanted nodes. We’ve done our best so far in letting you sleep while disaster was about to strike and end the life of
your cluster.</p>
<p>Finally, Elasticsearch Express is back up and running just as smoothly as the first time you deployed it.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5184d23df1813c83dcc983f3b13ffc2f9874ee1e_eexpressarticle13.png?auto=compress,format"></p>
<p>Elasticsearch Express is currently being used in production by many teams at Zalando Tech. It is helping development
teams deliver their ambitious requirements of serving and retrieving data at a large scale for various use cases.</p>
<p>If you have any further questions about Elasticsearch Express and how we use it at Zalando, you can contact me via
Twitter at <a href="https://twitter.com/alaa_elhadba">@alaa_elhadba</a>. I’d love to hear from you.</p>Bread&Butter 2016: The Livestreaming Rollercoaster2016-11-23T00:00:00+01:002016-11-23T00:00:00+01:00Johannes Altmanntag:engineering.zalando.com,2016-11-23:/posts/2016/11/bb-the-livestreaming-rollercoaster.html<p>We took social media broadcasting to the next level at this year's B&&B showcase.</p><p>The fashion industry has been dominated by the elite for decades. Access to top products and brands were once
restricted, while trendsetters were the only people on the guest list for the hottest events. Zalando wants to open up
this world and grant greater access, reduce barriers, and empower customers. Zalando’s vision is to <strong>connect people and
fashion</strong>.</p>
<p>The <a href="https://www.breadandbutter.com/">Bread&Butter (B&&B)</a> showcase had been an established trade show for some time.
Following the acquisition of Bread&Butter by Zalando in 2015, the former trade show for industry insiders became a
“trend show” for all. For the first time ever, Bread&Butter was open to the public. The event’s motto in 2016 was <em>NOW</em>:
A reference to the digital, sharable, and instantly shoppable, collectively presented at the culmination of the
Autumn/Winter season. What better way to represent this in real-time than with the most ambitious livestreaming plan of
all? Our teams were up for the challenge.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e6d571acd0b1b4cd29b3955d669815ebce6ea530_12_bb-preview_credits-nils-krager.jpg?auto=compress,format"></p>
<p>Zalando has a reputation for being bold and forward-thinking, which is fitting when considering our livestreaming goal:
Mobilizing 18 casts for 15 Facebook pages and 1 YouTube account simultaneously. Thus, the broadcast plan wasn’t merely
restricted to internal channels: With the right amount of investment, we were able to scale a solution that matched the
flair and impact B&&B has on the industry as a whole. We wanted our technical and digital presence to live up to our
fashionable reputation and prowess.</p>
<p>Although livestreaming isn’t new, doing it with a multi-camera rig and broadcasting it live, simultaneously, to multiple
countries on Facebook and YouTube definitely is. During the build up to B&&B, we planned and executed various
livestreams to the <a href="https://www.facebook.com/breadandbutter/">B&&B Facebook page</a>, garnering up to a million views. The
stage was slowly being set for our ambitious and as-of-yet unexperienced livestream challenge.</p>
<h3>Building the potential for large-scale streaming</h3>
<p>Livestreaming an event like B&&B means that many technical implications need to be considered. On top of hiring fibre
optic cables for larger area functionality, our team also had to make sure that our technical gear was compatible with
the signal shared between brands for their booths and presentations – for instance, the large screens at the <em>Zalando
Machine of Now</em>. In between each fashion show, we also livestreamed various interviews at B&&B, requiring a separate
shooting crew and technical equipment. Once we had the signal, we were able to push content up to our Zalando-hosted
streaming server and prep, duplicate, then spread it to our targeted Facebook pages.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3aa27e8599398262c3f2ef0df765615d45be97ef_img_1151.jpg?auto=compress,format"></p>
<p>A challenge we encountered with our streaming server was its interaction with the new Facebook API, just recently
released for the WOWZA Streaming Server. It was not able to differentiate between our different Zalando pages that
represented the 15 countries we operate in. While the team organized further calls with WOWZA and Facebook, we were
getting set to prepare each of the 86 livestreams planned, step by step.</p>
<h3>As it happened – three days of streaming madness</h3>
<p>B&&B took place from Friday, September 2nd to Sunday, September 4th. This gave us plenty of opportunities to livestream
several events across the entire showcase. It also presented several circumstances where our setup was truly put to the
test.</p>
<p>No interim solution was found to differentiate between Zalando pages on the Friday, meaning we also didn’t get the
chance to test the system we’d set up. Thus, we streamed to the Zalando Germany and official B&&B pages only, kicking
off the first day of the event. When troubleshooting the initial interaction issue with the Facebook API, we came up
with the workaround that step-by-step access to each different Facebook page would create the forwardings to Facebook
needed for each country. After successfully testing this capability with a little teaser video, we were granted access
once again to all country pages.</p>
<p>On the Sunday, we encountered another hurdle when Facebook blacklisted our server and accounts due to music copyright
infringement. After again escalating the issue, we decided to iterate quickly and test the approach with a totally new
account. Success! Having prepared all translated texts for our markets and implementing the schedule over a two hour
timeframe, it was a fantastic moment as we went live – the first time ever to all of our 16 different pages at the same
time – a real rocket launch moment. We virtually clinked our glasses as Zalando’s premier fashion event beamed live on
Facebook across all countries.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/20c033b56655a252bb37cdc52c0c2071f7793dc5_18_bb-preview_credits-nils-krager.jpg?auto=compress,format"></p>
<p>We learnt throughout the process that there are some things you cannot control. Even our key person at Facebook was
unaware of the blacklisting activity that occurred when streaming music effected by copyright. Copyright infringement at
Facebook is still currently geared towards the rights of “master rights holders” rather than “streamlined copyright
resolutions” as per YouTube. We now know what the Facebook procedure is and can learn from this experience.</p>
<h3>The future of livestreaming at Zalando</h3>
<p>The results of our livestreaming extravaganza speak for themselves: Over three days, we mobilized 18 casts resulting in
86 individual stream forwardings, on 15 Facebook pages and 1 YouTube channel simultaneously. Facebook themselves have
positively confirmed that this feat has never been achieved before by anyone. Ever. In the world.</p>
<p>Having achieved a total of 5.4million impressions over the B&&B weekend, we’re inspired and ready to tackle the next
ambitious undertaking for livestreaming at Zalando in further projects. Social media broadcasting is the next step in
cementing Zalando’s place as Europe’s leading online fashion platform, and we’re gearing up to show the world what we’re
capable of.</p>
<p>Stay tuned!</p>The Art of Mob Programming2016-11-22T00:00:00+01:002016-11-22T00:00:00+01:00Massimiliano Fattorussotag:engineering.zalando.com,2016-11-22:/posts/2016/11/the-art-of-mob-programming.html<p>One of our Dortmund teams have tried out the relatively new concept of Mob Programming.</p><p>As a Producer at Zalando, our roles are instrumental in supporting
development teams achieve excellence when delivering software. We help them by creating an environment where every team
member can give their best.</p>
<p>A few months ago, I read <a href="https://www.agilealliance.org/wp-content/uploads/2015/12/ExperienceReport.2014.Zuill_.pdf">an
article</a> written by <a href="https://www.linkedin.com/in/woodyzuill">Woody
Zuill</a> about <a href="http://mobprogramming.org/">Mob Programming</a>, a fairly new concept
based on how we approach teamwork in software development.</p>
<p>The concept requires that the whole team works on the same thing, at the same time, in the same space, and, wait for it,
at the same computer! All code that enters the codebase is done so through this single computer. This process is similar
to pair programming, where two people sit at the same terminal and collaborate on the same code at the same time, but
with Mob Programming the collaboration is extended to everyone on the team. Other computers are available to individuals
for non-coding activities, such as researching.</p>
<p>We noticed that working together in general was rather effective, so we simply decided to do more of it. We aim to try
and reduce the distance between decision making and the creation of solutions.</p>
<p>My team was struggling with getting their work done, and they were running many tasks in parallel – in sub-teams or
pairs. This resulted in a diverging of ideas to the point that it was necessary to have fixed weekly technical alignment
meetings. This approach was clearly not agile enough: We were too slow when it came to decision making and reacting to
changes.</p>
<p>It is clear that Mob Programming is not a ‘one fit for all’ type of process. Small tasks and routine job can be worked
on efficiently without setting up a MOB session. I would advise that you to talk to your team and give it a try for more
creative activities, like the resolution of complex problems.</p>
<p>The results of our MOB sessions conducted up until till now are promising:</p>
<ul>
<li>Creates a positive team mood</li>
<li>Lead time is reduced and amount of WIP is naturally decreased</li>
<li>Promoted a shared understanding of problems, with everyone staying involved</li>
<li>Reduction of unnecessary task switching and interruptions.</li>
</ul>
<p>Some may argue that MOB sessions aren’t efficient, which is technically correct, if you’re measuring each team member’s
utilization or the lines of code written by your developers. I personally refer to the <a href="http://agilemanifesto.org/principles.html">Agile
Manifesto</a> principle of “working software is the primary measure of
progress”, thus MOB sessions have made a real impact on how we work together.</p>
<p>This approach relies heavily on good communication, alignment, collaboration, and continuous code review, which are all
essential ingredients for a well-functioning, self-organizing team. Our team will continue to use experiment with and
use this process for future projects to ensure we’re tackling problems creatively and achieving excellence.</p>Why Did I Decide to Become a Producer?2016-11-17T00:00:00+01:002016-11-17T00:00:00+01:00Corina Gheorghetag:engineering.zalando.com,2016-11-17:/posts/2016/11/why-did-i-decide-to-become-a-producer.html<p>As a Producer, you are instrumental in the organization of teams at Zalando Tech.</p><p>It may sound like a cliche, but I love working with people – especially engineers. The energy, creativity, and
innovation that comes out of the software development process is like a never ending tank of fuel.</p>
<p><a href="https://tech.zalando.com/jobs/75772-producer/">Producers</a> at Zalando are the organizational engines of their teams.
We’re responsible for managing product releases, external project dependencies and deliverables, as well as refining
requirements, features, and user stories. Producers play a key role in facilitating end-to-end delivery tasks and
integration plans, and work closely with every member of the team.</p>
<p>But what else do I do as a Producer? I motivate people, help them identify pain points, strengths, weaknesses, help them
figure out what they actually want and what works for them. I am there to empower my teams to build and deliver awesome
software products. I currently work with the engineering team behind <a href="http://www.collabary.com/">Collabary</a>, the new
platform connecting brands to content creators. I also work with one of the teams in our Advertising Engineering
department.</p>
<p>Before I joined Zalando, I worked as a Product Manager in gaming, which involved being responsible for the product
lifecycle, as well as bringing agile values to my teams. This experience made me realise how complex the needs of a team
are, as well as how important the work of a Producer is. I began my career in tech as a Software Engineer, which has
definitely come in handy.</p>
<p>One of the things I love about my job is Zalando’s team autonomy that genuinely makes teams more productive. Do you know what happens when
teams are happy, autonomous, work on achieving mastery, and have a clear purpose? They innovate, they push the
boundaries, and make valuable contributions to their company.</p>
<p>Each team is unique and it should be treated as such. Autonomy for every team does not mean chaos. You, as a Producer,
empower them to own what they’re working on, to be great tech citizens in a microservices-led, InnerSource ecosystem. At
the end of the day, the Agile Manifesto tells us that individuals and interactions come before processes and tools.
Whether it is Scrum, Kanban, XP, Radical Focus, or something specifically tailored without a fancy name, whatever works
for one team will not necessarily work for another – we have the power to innovate ourselves. I enjoy challenging myself
and my teams when it comes to being agile. It is a culture of continuous improvement, where my shiny agile project
management toolbox keeps getting polished.</p>
<p>But I’m not here to talk about Agile. I want to tell you why I chose to become a Producer. To quote my mentor, Katia
Vara, a Producer “gets shit done”. I love getting things done. I am helping my teams keep a healthy team spirit, be
focused, and continuously improve themselves, the product and their processes. I am there to help them truly achieve
autonomy.</p>
<p>One of my favourite tools for this, as well as one of the most powerful, are retrospectives. When done right, the team
is engaged and feels safe enough to share their thoughts and voice their concerns. It is an excellent opportunity to get
their pulse and to come up with concrete action points. My challenge here is to ask the right questions at the right
time as their guide and to emphasize their achievements. I am also the one to come up with new canvases, new ways of
conducting the sessions and adapting existing techniques to match their needs.</p>
<p>I want my teams to be happy to wake up in the morning and come to work, to understand the “why" and the “what" behind
what they’re building. I am there to shield them when needed and to help them own the “how".</p>
<p>Sounds nice, right? It actually is. But of course, the road can be bumpy, full of trials, incidents, and sometimes
emotional. Achieving results takes time, patience, ambition, and commitment, where it’s important to hold your ground
and keep your integrity. At the end of the day, I iterate. I do, measure and evaluate, learn and apply. I am not afraid
to fail and own it, as long as this will help me improve and better help my teams and stakeholders.</p>
<p>When it comes to delivery, I believe in adapting, learning, and adjusting over following a methodology strictly by the
book. From round table sessions, to interactive story mapping, to incident ninjas – when you’re a Producer, you keep
coming up with useful things to help your teams. Producers are instrumental from start to finish.</p>The State of Frontend at Zalando 20162016-11-16T00:00:00+01:002016-11-16T00:00:00+01:00Henrik Andersentag:engineering.zalando.com,2016-11-16:/posts/2016/11/the-state-of-frontend-at-zalando-2016.html<p>What is the flavour of JavaScript and frontend at Zalando Tech? Find out here!</p><p>After seeing the <a href="http://stateofjs.com">State of JavaScript</a> survey make headway online, we thought it would be
interesting to do the same for Zalando to get a better understanding about what technologies and frameworks we’re using
here.</p>
<p>For the JavaScript survey, participating developers answered questions on topics ranging from frontend frameworks and
state management, to build tools and testing libraries. Our questions were not 100% identical, but do address similar
topics.</p>
<p>The JavaScript survey also generated some great comments, such as <em>“Every time I write something in JavaScript I'm
surprised that it works.”</em> Perhaps we’d also hear some interesting commentary from our Zalando developers, too!</p>
<p>Keep reading below to see an overview of our results.</p>
<h3>Flavour of JavaScript</h3>
<p>Most of us working in frontend at Zalando (95.2%) use <a href="http://es6-features.org/#Constants">ES6</a> and would use it again,
which isn’t that much of a surprise. It follows <a href="http://www.ecma-international.org/ecma-262/6.0/">the community</a> and
shows us that ES6 is a step in the right direction for JavaScript.</p>
<p>Similar to the community, we also have a high interest in learning TypeScript and
<a href="https://tech.zalando.com/blog/using-elm-to-create-a-fun-game-in-just-five-days/">Elm</a>, which some of our developers
have dabbled with already. If you’ve encountered the Zalando 404 page, you would have noticed the <a href="https://zalando.github.io/elm-street-404/">Elm Street 404
game</a>, put together in merely five days by one of our teams.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/427d2dd4fcb049de43592514a791616bc8168817_elmsurveyresults.png?auto=compress,format"></p>
<h3>Frameworks and Libraries</h3>
<p>When it comes to using frameworks and libraries, <a href="https://facebook.github.io/react/">React</a> is definitely our library of
choice. 77.3% of survey respondents stated that they’ve used React and would use it again, with a further 18.2%
interested in learning. All of our frontend teams looking after the Fashion Store work exclusively with React for their
projects.</p>
<p>Our survey also shows that Angular 1 is officially dead at Zalando. Over 50% of respondents stated that they had used it
before and wouldn’t use it again. With new frameworks and libraries like React and Angular 2 on the rise, this reaction
was somewhat expected.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/dbbc46f633f36d404373f8169fd4f5d51820ff5e_angular1results.png?auto=compress,format"></p>
<h3>Future Gazing for Frontend</h3>
<p>The purpose of this survey was to collect insight into what kind of technologies and frameworks our teams are using,
ensuring transparency and sharing knowledge. We have the autonomy to choose, so it can be hard to track what teams are
using and experimenting with.</p>
<p>What we can now do with these results is match people who have experience with a particular technology or framework to
those who have questions. We can also start planning for activities in terms of trainings and further development for
those interested in learning.</p>
<p>It’s fascinating to see what our engineers are working with in a fast developing area like frontend. We need to stay on
top of the game and make smart decisions when it comes to choosing which frameworks and technologies we explore – we can
only do that by sharing knowledge.</p>
<p>If you’re interested in more information from these results, come and find me on Twitter at
<a href="https://twitter.com/frontendherodk">@frontendherodk</a> and let’s chat.</p>How InnerSource bolstered integration for Local Order Fulfillment2016-11-15T00:00:00+01:002016-11-15T00:00:00+01:00Dr. Jan-Hendrik Bartelstag:engineering.zalando.com,2016-11-15:/posts/2016/11/how-innersource-bolstered-integration-for-local-order-fulfillment.html<p>We're piloting InnerSource here at Zalando and have already had some early success.</p><p>Our tech organization is always looking for ways to improve its systems and processes – we’re constantly striving to be
better. To increase team autonomy and improve delivery time, we’re currently piloting InnerSource, a development
approach that applies open source principles to the way companies develop software internally.</p>
<p>Our initial pilot of the approach focused on teams working within product clusters whose work presented opportunities
for cross-collaboration. This gave the engineering teams working in Zalando’s Core Platform and logistics the chance to
step into the limelight with their innovative projects, and our implementation of the <em>gax-system</em> was the perfect place
for InnerSource to make its mark.</p>
<h3>What is InnerSource and why are we adopting it?</h3>
<p>InnerSource operates similarly to open source, in that projects are first released by an author or group, then grow and
evolve based on external contributions. The difference here is that InnerSource applies this contribution model to
delivering software within Zalando Tech directly, rather than externally like open source does.</p>
<p>InnerSource enables a clear path toward <a href="https://less.works/less/structure/feature_teams.html">feature teams</a>, while
establishing and maintaining ownership of codebases by teams. We have teams developing their software on GitHub
Enterprise, a common environment internally, allowing other engineering teams to contribute directly to their efforts.</p>
<p>Why does this process appeal to us? From a delivery effectiveness perspective, it aims to eliminate upstream
dependencies, ensuring teams aren’t waiting on each other to complete critical work. It also promotes further knowledge
sharing and better code quality, which is important for companies such as ours with a growing tech department.</p>
<h3>The gax-system and InnerSource – A successful experiment</h3>
<p>The <em>gax-system</em> is the next step in Zalando’s <a href="https://tech.zalando.com/blog/integrated-commerce-merchant-centre-rebuild/">Integrated
Commerce</a> initiative, where we aim to enable
brick and mortar stores the ability to join the Zalando platform. By using an existing, external online-based order
management tool such as the <em>gax-system</em>, relevant customer orders from the Zalando shop are passed on to local partner
stores who fulfill the order. When an order is fulfilled by the store, we receive their events updates, such as packed
or shipped, and integrate them into our status model. If a store chooses otherwise, we continue the fulfillment process
on our warehouse side. To this end the fulfillment flow is adjusted accordingly where event-based communication via
<a href="https://github.com/zalando/nakadi">Nakadi</a> provides and listens to relevant events.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b0e2d84460b103dc6e6839e1fead4b2af0c3ea12_161005_local-order-fulfillment_process_zalando-hires.jpg?auto=compress,format"></p>
<p>The pilot project has evolved over several quarters. While some progress was being made initially, it wasn’t until
InnerSource was adopted that the project could finally be realised. The majority of our resources were tied up in work
on the Core Platform, which serves as a foundation for ramping up new businesses or products, as well as providing a set
of plug-and-play services for users of the Zalando platform. For the gax-system pilot, we were faced by an array of
interdependencies from the beginning. Our Open Source Evangelist <a href="https://twitter.com/LauritaApplez">Lauri Apple</a> and
Delivery Lead <a href="https://twitter.com/kode4food">Thom Bradford</a> were the first to suggest InnerSource as a possible avenue
for delivery effectiveness.</p>
<p>The <em>gax-system</em> needed to be viewed as more than just another means of connecting suppliers to the Zalando platform. In
Q3, with little headway being made, we took on InnerSource as an experiment in delivery. We were relying a lot on other
teams to get the project off the ground – however, we were also skeptical at first about InnerSource being the solution
we needed. Would the team still feel ownership over the project? Would there be issues internally with our team making
changes to the codebases of other teams?</p>
<p>In 10 weeks, the team was able to get the <em>gax-system</em> off the ground and launched as a successful, scalable solution
for two stores – four weeks earlier than originally planned. Fourteen further stores have now been added, and the
ability to ramp it up with additional retailers is possible.</p>
<p>The biggest strength of the InnerSource concept here was that it created real ownership for an end-to-end project,
motivating our team of six to get their work over the finish line. By having all the rights and access to the code they
needed, and having little to no dependencies on other teams for the build, their minimal communication overhead allowed
a lot to be completed in a fast, lean format.</p>
<p>In terms of delivery, this also meant that the only boundaries existed within the team itself: No external dependencies
could hamper the build effort. For a team working within our Core Platform, who are connected to virtually everything
and everyone within Zalando Tech, scaling this isolated project would have hardly been possible without the
opportunities that InnerSource presents.</p>
<p>It’s also important to remember that InnerSource needs the right ingredients within a team to be of benefit – team
dynamics are incredibly important. Being organised and paying attention to code quality are essential to the process.</p>
<h3>Next steps in adoption and collaboration</h3>
<p>We’re excited to push ahead with further InnerSource experiments on upcoming projects. Expanding the circle of trust
that is essential for InnerSource is the next logical step, where teams working in the same department, such as
logistics or payments, would be able to collaborate and deliver faster. Eventually, this trust might span the whole of
our technology department, creating an environment of fluidity when we’re building and iterating.</p>
<p>Teams who work in isolation tend to satisfy their own definition of quality. When you open up your development process
to a cross-team collaboration model, you are raising the quality bar of your codebase across your organization. Other
projects that are maintained by a single team can be reinvigorated by cross-team contributions, potentially giving
Zalando products extra longevity.</p>
<p>As we continue to pilot the InnerSource approach, we hope to break down silos of codebases, encourage internal
collaboration, and identify further opportunities to contribute to our already growing <a href="https://github.com/zalando">open source
catalogue</a>. By embracing InnerSource, we hope to reap the benefits of open innovation and
increased software reuse, making our engineers more mobile throughout Zalando Tech.</p>The RecSys’16 Review2016-11-10T00:00:00+01:002016-11-10T00:00:00+01:00Dr. John Hannontag:engineering.zalando.com,2016-11-10:/posts/2016/11/the-recsys-16-review.html<p>All the learnings we brought back from the ACM Conference on Recommender Systems.</p><p>This year was <a href="https://recsys.acm.org/recsys16">RecSys’</a> 10th anniversary, and we were lucky enough to be in attendance.
The evolution of the conference is great to see – it has grown from a mid-size event beginning in 2009, to an
ever-sold-out conference in the past few years, with one of the bests mixes of industry and academia you can find. It
also has an open and welcoming community, where the standard of work being presented is incredibly high. As
representatives of Zalando, we wanted to share our experiences for those interested.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9faefcc1c51d2dc7960bcbf93888eca00a6c9cca_img_0601.jpg?auto=compress,format"></p>
<h3>Conference highlights</h3>
<p>The <a href="http://conf.turi.com/lsrs16/agenda/">large scale recommender systems workshop</a> gave us an insight into how Facebook
makes <a href="http://conf.turi.com/lsrs16/wp-content/uploads/Komal_Kapoor_Ranking-and-Recommendation-for-Billions-of-Users.pptx">recommendations for billions of
users</a>.
Komal Kapoor explained how they perform feature engineering (using random forest and neural network models) to reduce
the number of features from 100K to 20k, and how they use logistic regression to learn fresh recommendations for their
News Feed.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f49082e1093c1855cbf3eeb696bdc8ab750e1b35_img_0571.jpg?auto=compress,format"></p>
<p>It was very interesting to see how Netflix has moved away from regional models, and adopted <a href="http://www.slideshare.net/justinbasilico/recommending-for-the-world-66144921">global
models</a> (also explained in a <a href="http://techblog.netflix.com/2016/02/recommending-for-world.html">previous
blog post</a>). They managed to overcome problems such as
difference in local taste, uneven catalog availability and metadata, and their global models now perform better than
their regional ones – having only one model has a lot of other positive implications for them.</p>
<p>As part of their 10th year anniversary we had the <a href="https://recsys.acm.org/recsys16/session-7/">“Past, Present and
Future”</a> track, where Xavier Amatrian (Quora) and Justin Basilisco (Netflix)
presented some of their views about <a href="http://www.slideshare.net/xamat/past-present-and-future-of-recommender-systems-and-industry-perspective">the future of recommender
systems</a>. They
mentioned that full page personalization will become even more important, and that personalizing how we recommend will
be a big topic in the future. They also highlighted some of the problems we are currently facing, such as the lack of
high-quality negative implicit feedback, and the need for more research on long-term vs. short-term optimization in
recommender systems.</p>
<p>Also in the same track, Michael Ekstrand was able to articulate one of the main problems of recommender systems in a
really interesting way, while proposing a solution. He called for the research to <a href="https://md.ekstrandom.net/research/pubs/behaviorism/BiNE-RecSys2016.pdf">listen to the
user</a> (by doing more HCI studies, user surveys,
or focus groups) and explained the idea that if we know a user’s goals and their behaviors, a recommender system can
(and should) help them reach those goals.</p>
<p>Another big topic of the conference was humans and machines working together. In this area, Shuo Chang presented his
work on personalized <a href="http://www.slideshare.net/ShuoChang?utm_campaign=profiletracking&utm_medium=sssite&utm_source=ssslideview">natural language explanations for
recommendations</a>.
This work combines crowdsourcing and machine learning for explanations that yield more user trust and satisfaction. We
were impressed with the examples presented in this talk, and can’t wait to see more of this approach being deployed on
recommender systems.</p>
<h3>Summary</h3>
<p>To summarize, topics such as long-term vs. short-term optimization and recency were recurrent topics during the
conference. Deep learning had its own track for the first time, with YouTube presenting their <a href="http://dl.acm.org/citation.cfm?id=2959190&CFID=673800910&CFTOKEN=19663295">deep learning recommender
system</a>. But there was also room for
contrast, and simple models like logistic regression were also mentioned, highlighting their explainability. It was
fascinating to see that human-machine curation is an emerging topic in the field.</p>
<p>We cannot cover everything in a single blog post, and we know there is a lot of very interesting research in the field,
so we are really looking forward to what’s next for recommender systems and the RecSys Conference series. See you next
year in <a href="https://recsys.acm.org/recsys17/">Como</a>!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c1a8f37512cf6d4087cb2de605e856eb98d3931e_img_0600.jpg?auto=compress,format"></p>
<p>If you want to learn more about how we do recommenders, we (humans) recommend you read our blog posts on <a href="https://tech.zalando.com/blog/feature-extraction-science-or-engineering/">feature
extraction</a>, <a href="https://tech.zalando.com/blog/personalised-newsletter-emails-aws/">personalized e-mail
campaigns</a>, and a recent post on <a href="https://www.oreilly.com/ideas/what-is-hardcore-data-science-in-practice">data
science</a> with a recommender system’s focus that
<a href="https://twitter.com/mikiobraun">Dr. Mikio Braun</a> put together for O’Reilly.</p>
<p>If you’d like to contact us to hear more about <a href="https://recsys.acm.org/recsys16">RecSys16</a>, reach out via Twitter at
<a href="https://twitter.com/totopampin">@totopampin</a> and <a href="https://twitter.com/johnhenryhannon">@johnhenryhannon</a>.</p>Angular2: Final Release Unit Test Migration Guide2016-11-09T00:00:00+01:002016-11-09T00:00:00+01:00Vadym Kukhtintag:engineering.zalando.com,2016-11-09:/posts/2016/11/angular2-rc4-to-rc5-unit-test-migration-guide.html<p>About to use Angular2? Need to refactor your codebase? Take a read of this handy guide.</p><p>I am not a fan of the Angular-way in Angular1, as I think it looks rather strange. Nevertheless, Angular2 gives a much
better impression in terms of code structure, code purity, and scalability. While they didn’t reinvent the wheel, its
creators have built something interesting and spectacular that stacks up well against modern frameworks (React and
Redux, Aurelia, etc.).</p>
<p>The <a href="http://juristr.com/blog/2016/09/ng2-released/">final release</a> of Angular2 came out and “surprised” developers with
tons of changes (if you were still using RC Version < 5) that may have helped with regards to development, but forced
you to rewrite a bunch of code. I was amazed when I discovered the “Testing” section in Angular2 - Quick Start, so I
decided to further explore here.</p>
<p>Angular2 has pros and cons already described in several articles and books, so you won’t find contrasting arguments in
this post.</p>
<p>However, I hope this article will help developers who suffer when trying to refactor their codebase!</p>
<p>Let’s define some basic App structure:</p>
<p><strong>— app
— app.component.ts
— app.module.ts
— main.ts
— components
— table.component.ts
— services
— post.service.ts
— models
— post.model.ts
— test
— post.service.mock.ts
— table.component.spec.ts
— post.model.spec.ts
— post.service.spec.ts</strong></p>
<p>From here on I will use TypeScript examples, as I find TypeScript more elegant in this case.</p>
<p>This application will render a table:</p>
<p><strong>app.component</strong> – the first, initial component that will be rendered at app initialization:</p>
<div class="highlight"><pre><span></span><code><span class="o">//</span> <span class="n">Angular</span>
<span class="kn">import</span> <span class="p">{</span> <span class="n">Component</span> <span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/core'</span><span class="p">;</span>
<span class="o">//</span> <span class="n">Services</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">PostService</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'./app/services/post.service'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">Post</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'./app/models/post.model'</span><span class="p">;</span>
<span class="nd">@Component</span><span class="p">({</span>
<span class="n">selector</span><span class="p">:</span> <span class="s1">'app'</span><span class="p">,</span>
<span class="n">template</span><span class="p">:</span> <span class="err">`</span>
<span class="err">`</span>
<span class="p">})</span>
<span class="n">export</span> <span class="k">class</span> <span class="nc">AppComponent</span> <span class="p">{</span>
<span class="n">public</span> <span class="n">isDataLoaded</span><span class="p">:</span> <span class="n">boolean</span> <span class="o">=</span> <span class="n">false</span><span class="p">;</span>
<span class="n">public</span> <span class="n">post</span><span class="p">:</span> <span class="n">Post</span><span class="p">;</span>
<span class="n">constructor</span><span class="p">(</span><span class="n">public</span> <span class="n">postService</span><span class="p">:</span> <span class="n">PostService</span><span class="p">)</span> <span class="p">{}</span>
<span class="n">ngOnInit</span><span class="p">():</span> <span class="n">void</span> <span class="p">{</span>
<span class="n">this</span><span class="o">.</span><span class="n">postService</span><span class="o">.</span><span class="n">getPost</span><span class="p">()</span><span class="o">.</span><span class="n">subscribe</span><span class="p">((</span><span class="n">post</span><span class="p">:</span> <span class="nb">any</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">this</span><span class="o">.</span><span class="n">post</span> <span class="o">=</span> <span class="n">new</span> <span class="n">Post</span><span class="p">(</span><span class="n">post</span><span class="p">);</span>
<span class="n">this</span><span class="o">.</span><span class="n">isDataLoaded</span> <span class="o">=</span> <span class="n">true</span><span class="p">;</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p><strong>app.module</strong> – Will store all app dependencies. In our case, provide PostService and TableComponent:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span> <span class="n">NgModule</span> <span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/core'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span> <span class="n">BrowserModule</span> <span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/platform-browser'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span> <span class="n">HttpModule</span> <span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/http'</span><span class="p">;</span>
<span class="o">//</span> <span class="n">Components</span>
<span class="kn">import</span> <span class="p">{</span> <span class="n">AppComponent</span> <span class="p">}</span> <span class="kn">from</span> <span class="s1">'./app/app.component'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">TableComponent</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'./app/components/table/table.component'</span><span class="p">;</span>
<span class="o">//</span> <span class="n">Services</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">PostService</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'./app/services/post.service'</span><span class="p">;</span>
<span class="nd">@NgModule</span><span class="p">({</span>
<span class="n">declarations</span><span class="p">:</span> <span class="p">[</span>
<span class="n">AppComponent</span>
<span class="n">TableComponent</span>
<span class="p">],</span>
<span class="n">imports</span><span class="p">:</span> <span class="p">[</span>
<span class="n">BrowserModule</span><span class="p">,</span>
<span class="n">HttpModule</span>
<span class="p">],</span>
<span class="n">providers</span><span class="p">:</span> <span class="p">[</span>
<span class="n">PostService</span>
<span class="p">],</span>
<span class="n">bootstrap</span><span class="p">:</span> <span class="p">[</span><span class="n">AppComponent</span><span class="p">]</span>
<span class="p">})</span>
<span class="n">export</span> <span class="k">class</span> <span class="nc">AppModule</span> <span class="p">{}</span>
</code></pre></div>
<p><strong>main</strong> – The app start point where you will bootstrap the app:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span> <span class="n">platformBrowserDynamic</span> <span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/platform-browser-dynamic'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span> <span class="n">AppModule</span> <span class="p">}</span> <span class="kn">from</span> <span class="s1">'./app/app.module'</span><span class="p">;</span>
<span class="n">platformBrowserDynamic</span><span class="p">()</span><span class="o">.</span><span class="n">bootstrapModule</span><span class="p">(</span><span class="n">AppModule</span><span class="p">);</span>
</code></pre></div>
<p><strong>table.component</strong> – TableComponent, which must be rendered:</p>
<div class="highlight"><pre><span></span><code><span class="o">//</span> <span class="n">Angular</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">Component</span><span class="p">,</span> <span class="n">Input</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/core'</span><span class="p">;</span>
<span class="nd">@Component</span><span class="p">({</span>
<span class="n">selector</span><span class="p">:</span> <span class="s1">'table-component'</span><span class="p">,</span>
<span class="n">template</span><span class="p">:</span> <span class="err">`</span>
<span class="n">Post</span> <span class="n">Title</span>
</code></pre></div>
<p>Post Author</p>
<p>{{ post.title}}</p>
<p>{{ post.author}}</p>
<p>` }) export class TableComponent { @Input() public post: any; }</p>
<p><strong>post.service</strong> – An injectable service that performs API calls:</p>
<div class="highlight"><pre><span></span><code> <span class="kn">import</span> <span class="p">{</span><span class="n">Injectable</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/core'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">Observable</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'rxjs/Rx'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">Post</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'./app/models/post.model'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span> <span class="n">Http</span> <span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/http'</span><span class="p">;</span>
<span class="nd">@Injectable</span><span class="p">()</span>
<span class="n">export</span> <span class="k">class</span> <span class="nc">PostService</span> <span class="p">{</span>
<span class="n">constructor</span><span class="p">(</span><span class="n">http</span><span class="p">:</span> <span class="n">Http</span><span class="p">)</span> <span class="p">{}</span>
<span class="n">public</span> <span class="n">getPost</span><span class="p">():</span> <span class="nb">any</span> <span class="p">{</span>
<span class="o">//</span> <span class="n">Abstract</span> <span class="n">API</span><span class="p">,</span> <span class="n">Google</span><span class="p">,</span> <span class="n">Facebook</span> <span class="n">etc</span>
<span class="k">return</span> <span class="n">this</span><span class="o">.</span><span class="n">http</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">AbstractAPI</span><span class="o">.</span><span class="n">url</span><span class="p">)</span>
<span class="o">.</span><span class="n">map</span><span class="p">((</span><span class="n">res</span><span class="p">:</span> <span class="nb">any</span><span class="p">)</span> <span class="o">=></span> <span class="n">res</span><span class="o">.</span><span class="n">json</span><span class="p">())</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p><strong>post.model</strong> – Post Class with JSON structure:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="k">export</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">Post</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">public</span><span class="w"> </span><span class="n">title</span><span class="p">:</span><span class="w"> </span><span class="n">number</span><span class="p">;</span>
<span class="w"> </span><span class="n">public</span><span class="w"> </span><span class="n">author</span><span class="p">:</span><span class="w"> </span><span class="n">string</span><span class="p">;</span>
<span class="w"> </span><span class="n">constructor</span><span class="p">(</span><span class="n">post</span><span class="p">:</span><span class="w"> </span><span class="n">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">post</span><span class="o">.</span><span class="n">title</span><span class="p">;</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">author</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">post</span><span class="o">.</span><span class="n">author</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</code></pre></div>
<p>Our App is now ready and working, but how will we test it?</p>
<p>I am a fan of <a href="https://en.wikipedia.org/wiki/Test-driven_development">TDD</a>, so tests are really important for me and
should be for you too. I will use <strong>Karma and Jasmine</strong> for testing and all examples will be based on these.</p>
<p>Main is changed for @angular/testing – <strong>{it, describe}</strong> are removed from @angular/core/testing. They are now
<strong>deprecated</strong> and will be grabbed from the framework itself (Karma in my case).</p>
<p>Before:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span><span class="n">setBaseTestProviders</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/core/testing'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span>
<span class="n">TEST_BROWSER_DYNAMIC_PLATFORM_PROVIDERS</span><span class="p">,</span>
<span class="n">TEST_BROWSER_DYNAMIC_APPLICATION_PROVIDERS</span>
<span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/platform-browser-dynamic/testing'</span><span class="p">;</span>
<span class="n">setBaseTestProviders</span><span class="p">(</span>
<span class="n">TEST_BROWSER_DYNAMIC_PLATFORM_PROVIDERS</span><span class="p">,</span>
<span class="n">TEST_BROWSER_DYNAMIC_APPLICATION_PROVIDERS</span>
<span class="p">);</span>
</code></pre></div>
<p>After:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span><span class="n">TestBed</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/core/testing'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">BrowserDynamicTestingModule</span><span class="p">,</span> <span class="n">platformBrowserDynamicTesting</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/platform-browser-dynamic/testing'</span><span class="p">;</span>
<span class="n">TestBed</span><span class="o">.</span><span class="n">initTestEnvironment</span><span class="p">(</span>
<span class="n">BrowserDynamicTestingModule</span><span class="p">,</span>
<span class="n">platformBrowserDynamicTesting</span><span class="p">()</span>
<span class="p">);</span>
</code></pre></div>
<p>Now in all cases, you need to create @NgModule. See below an example with FormsModule.</p>
<p>Before:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span><span class="n">disableDeprecatedForms</span><span class="p">,</span> <span class="n">provideForms</span><span class="p">}</span> <span class="kn">from</span> <span class="nd">@angular</span><span class="o">/</span><span class="n">forms</span><span class="p">;</span>
<span class="n">bootstrap</span><span class="p">(</span><span class="n">App</span><span class="p">,</span> <span class="p">[</span>
<span class="n">disableDeprecatedForms</span><span class="p">(),</span>
<span class="n">provideForms</span><span class="p">()</span>
<span class="p">]);</span>
</code></pre></div>
<p>After:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span><span class="n">DeprecatedFormsModule</span><span class="p">,</span> <span class="n">FormsModule</span><span class="p">,</span> <span class="n">ReactiveFormsModule</span><span class="p">}</span> <span class="kn">from</span> <span class="nd">@angular</span><span class="o">/</span><span class="n">common</span><span class="p">;</span>
<span class="nd">@NgModule</span><span class="p">({</span>
<span class="n">declarations</span><span class="p">:</span> <span class="p">[</span><span class="n">MyComponent</span><span class="p">],</span>
<span class="n">imports</span><span class="p">:</span> <span class="p">[</span><span class="n">BrowserModule</span><span class="p">,</span> <span class="n">DeprecatedFormsModule</span><span class="p">],</span>
<span class="n">boostrap</span><span class="p">:</span> <span class="p">[</span><span class="n">MyComponent</span><span class="p">],</span>
<span class="p">})</span>
<span class="n">export</span> <span class="k">class</span> <span class="nc">MyAppModule</span><span class="p">{}</span>
</code></pre></div>
<p>There are a lot of other changes to take note of. You can read them in the changelog
<a href="https://github.com/angular/angular/blob/master/CHANGELOG.md">here</a>.</p>
<p>Let’s start with something easy, like <strong>post.model.spec</strong>. We can begin by testing all the properties of the model:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span><span class="n">Post</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'./../app/models/post.model'</span><span class="p">;</span>
<span class="n">let</span> <span class="n">testPost</span> <span class="o">=</span> <span class="p">{</span><span class="n">title</span><span class="p">:</span> <span class="s1">'TestPost'</span><span class="p">,</span> <span class="n">author</span><span class="p">:</span> <span class="s1">'Admin'</span><span class="p">}</span>
<span class="n">describe</span><span class="p">(</span><span class="s1">'Post'</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">it</span><span class="p">(</span><span class="s1">'checks Post properties'</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">var</span> <span class="n">post</span> <span class="o">=</span> <span class="n">new</span> <span class="n">Post</span><span class="p">(</span><span class="n">testPost</span><span class="p">);</span>
<span class="n">expect</span><span class="p">(</span><span class="n">post</span> <span class="n">instanceof</span> <span class="n">Post</span><span class="p">)</span><span class="o">.</span><span class="n">toBe</span><span class="p">(</span><span class="n">true</span><span class="p">);</span>
<span class="n">expect</span><span class="p">(</span><span class="n">post</span><span class="o">.</span><span class="n">title</span><span class="p">)</span><span class="o">.</span><span class="n">toBe</span><span class="p">(</span><span class="s2">"testPost"</span><span class="p">);</span>
<span class="n">expect</span><span class="p">(</span><span class="n">post</span><span class="o">.</span><span class="n">author</span><span class="p">)</span><span class="o">.</span><span class="n">toBe</span><span class="p">(</span><span class="s2">"Admin"</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">});</span>
</code></pre></div>
<p>If you’d like to continue with Services, then it gets a bit more complicated, but the core concept is the same.</p>
<p><strong>post.service.spec</strong> – this tests the service that makes API calls:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span>
<span class="n">inject</span><span class="p">,</span>
<span class="n">fakeAsync</span><span class="p">,</span>
<span class="n">TestBed</span><span class="p">,</span>
<span class="n">tick</span>
<span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/core/testing'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">MockBackend</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/http/testing'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span>
<span class="n">Http</span><span class="p">,</span>
<span class="n">ConnectionBackend</span><span class="p">,</span>
<span class="n">BaseRequestOptions</span><span class="p">,</span>
<span class="n">Response</span><span class="p">,</span>
<span class="n">ResponseOptions</span>
<span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/http'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">PostService</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'./../app/services/post.service'</span><span class="p">;</span>
<span class="n">describe</span><span class="p">(</span><span class="s1">'PostService'</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">beforeEach</span><span class="p">(()</span> <span class="o">=></span> <span class="p">{</span>
<span class="o">//</span> <span class="n">Inject</span> <span class="nb">all</span> <span class="n">needed</span> <span class="n">services</span>
<span class="n">TestBed</span><span class="o">.</span><span class="n">configureTestingModule</span><span class="p">({</span>
<span class="n">providers</span><span class="p">:</span> <span class="p">[</span>
<span class="n">PostService</span><span class="p">,</span>
<span class="n">BaseRequestOptions</span><span class="p">,</span>
<span class="n">MockBackend</span><span class="p">,</span>
<span class="p">{</span> <span class="n">provide</span><span class="p">:</span> <span class="n">Http</span><span class="p">,</span> <span class="n">useFactory</span><span class="p">:</span> <span class="p">(</span><span class="n">backend</span><span class="p">:</span> <span class="n">ConnectionBackend</span><span class="p">,</span>
<span class="n">defaultOptions</span><span class="p">:</span> <span class="n">BaseRequestOptions</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="n">new</span> <span class="n">Http</span><span class="p">(</span><span class="n">backend</span><span class="p">,</span> <span class="n">defaultOptions</span><span class="p">);</span>
<span class="p">},</span> <span class="n">deps</span><span class="p">:</span> <span class="p">[</span><span class="n">MockBackend</span><span class="p">,</span> <span class="n">BaseRequestOptions</span><span class="p">]}</span>
<span class="p">],</span>
<span class="n">imports</span><span class="p">:</span> <span class="p">[</span>
<span class="n">HttpModule</span>
<span class="p">]</span>
<span class="p">});</span>
<span class="p">});</span>
<span class="n">describe</span><span class="p">(</span><span class="s1">'getPost methods'</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">it</span><span class="p">(</span><span class="s1">'is existing and returning post'</span><span class="p">,</span>
<span class="o">//</span> <span class="n">Instantiate</span> <span class="nb">all</span> <span class="n">needed</span> <span class="n">services</span>
<span class="n">inject</span><span class="p">([</span><span class="n">PostService</span><span class="p">,</span> <span class="n">MockBackend</span><span class="p">],</span> <span class="n">fakeAsync</span><span class="p">((</span><span class="n">ps</span><span class="p">:</span> <span class="n">postService</span><span class="p">,</span> <span class="n">be</span><span class="p">:</span> <span class="n">MockBackend</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">var</span> <span class="n">res</span><span class="p">;</span>
<span class="o">//</span> <span class="n">Emulate</span> <span class="n">server</span> <span class="n">connection</span>
<span class="n">backend</span><span class="o">.</span><span class="n">connections</span><span class="o">.</span><span class="n">subscribe</span><span class="p">(</span><span class="n">c</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">expect</span><span class="p">(</span><span class="n">c</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">url</span><span class="p">)</span><span class="o">.</span><span class="n">toBe</span><span class="p">(</span><span class="n">AbstractAPI</span><span class="o">.</span><span class="n">url</span><span class="p">);</span>
<span class="n">let</span> <span class="n">response</span> <span class="o">=</span> <span class="n">new</span> <span class="n">ResponseOptions</span><span class="p">({</span><span class="n">body</span><span class="p">:</span> <span class="s1">'{"title": "TestPost", "author": "Admin"}'</span><span class="p">});</span>
<span class="n">c</span><span class="o">.</span><span class="n">mockRespond</span><span class="p">(</span><span class="n">new</span> <span class="n">Response</span><span class="p">(</span><span class="n">response</span><span class="p">));</span>
<span class="p">});</span>
<span class="n">ps</span><span class="o">.</span><span class="n">getPost</span><span class="p">()</span><span class="o">.</span><span class="n">subscribe</span><span class="p">((</span><span class="n">_post</span><span class="p">:</span> <span class="nb">any</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">_post</span><span class="p">;</span>
<span class="p">});</span>
<span class="o">//</span> <span class="n">tick</span><span class="p">()</span> <span class="n">function</span> <span class="ow">is</span> <span class="n">waiting</span> <span class="n">until</span> <span class="n">the</span> <span class="n">call</span> <span class="n">will</span> <span class="n">be</span> <span class="n">done</span>
<span class="n">tick</span><span class="p">();</span>
<span class="n">expect</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">title</span><span class="p">)</span><span class="o">.</span><span class="n">toBe</span><span class="p">(</span><span class="s1">'TestPost'</span><span class="p">);</span>
<span class="n">expect</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">author</span><span class="p">)</span><span class="o">.</span><span class="n">toBe</span><span class="p">(</span><span class="s1">'Admin'</span><span class="p">);</span>
<span class="p">}))</span>
<span class="p">);</span>
<span class="p">});</span>
<span class="p">});</span>
</code></pre></div>
<p>Now, onto the hardest part: Components.</p>
<p>Before giving any detailed explanations, I want to create MockPostService, which will “mock” PostService.</p>
<p><strong>post.service.mock</strong> – here we will rewrite real calls on a mocked one to return test data:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span><span class="n">PostService</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'./../app/services/post.service'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">Observable</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'rxjs'</span><span class="p">;</span>
<span class="n">export</span> <span class="k">class</span> <span class="nc">MockPostService</span> <span class="n">extends</span> <span class="n">PostService</span> <span class="p">{</span>
<span class="n">constructor</span><span class="p">()</span> <span class="p">{</span>
<span class="o">//</span> <span class="n">Inherits</span> <span class="kn">from</span> <span class="nn">real</span> <span class="n">service</span>
<span class="nb">super</span><span class="p">();</span>
<span class="p">}</span>
<span class="o">//</span> <span class="n">Rewrite</span> <span class="n">real</span> <span class="n">method</span> <span class="n">on</span> <span class="n">mocked</span> <span class="n">one</span> <span class="n">to</span> <span class="k">return</span> <span class="n">test</span> <span class="n">data</span>
<span class="n">getPost</span><span class="p">()</span> <span class="p">{</span>
<span class="o">//</span> <span class="n">Http</span> <span class="n">иis</span> <span class="n">using</span> <span class="n">Observable</span><span class="p">,</span> <span class="n">so</span> <span class="n">we</span> <span class="n">need</span> <span class="n">to</span> <span class="n">define</span> <span class="n">mocked</span> <span class="n">Observable</span>
<span class="k">return</span> <span class="n">Observable</span><span class="o">.</span><span class="n">of</span><span class="p">({</span><span class="n">title</span><span class="p">:</span> <span class="s1">'TestPost'</span><span class="p">,</span> <span class="n">author</span><span class="p">:</span> <span class="s1">'Admin'</span><span class="p">});</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Then we test for the component.</p>
<p>Before:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span>
<span class="n">inject</span><span class="p">,</span>
<span class="n">addProviders</span>
<span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/core/testing'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">TableComponent</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'./../app/components/table/table.component'</span><span class="p">;</span>
<span class="o">//</span> <span class="n">Standard</span> <span class="n">Builder</span> <span class="k">for</span> <span class="n">components</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">TestComponentBuilder</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/core/testing'</span><span class="p">;</span>
<span class="nd">@Component</span><span class="p">({</span>
<span class="n">selector</span> <span class="p">:</span> <span class="s1">'test-cmp'</span><span class="p">,</span>
<span class="n">template</span> <span class="p">:</span> <span class="s1">''</span>
<span class="p">})</span>
<span class="k">class</span> <span class="nc">TestCmpWrapper</span> <span class="p">{</span>
<span class="n">public</span> <span class="n">postMock</span> <span class="o">=</span> <span class="n">new</span> <span class="n">Post</span><span class="p">({</span><span class="s1">'title'</span><span class="p">:</span> <span class="s1">'TestPost'</span><span class="p">,</span> <span class="s1">'author'</span><span class="p">:</span> <span class="s1">'Admin'</span><span class="p">});</span>
<span class="p">}</span>
<span class="n">describe</span><span class="p">(</span><span class="s2">"TableComponent"</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">it</span><span class="p">(</span><span class="s1">'render table'</span><span class="p">,</span> <span class="n">inject</span><span class="p">([</span><span class="n">TestComponentBuilder</span><span class="p">],</span> <span class="p">(</span><span class="n">tcb</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="n">tcb</span><span class="o">.</span><span class="n">overrideProviders</span><span class="p">(</span><span class="n">TableComponent</span><span class="p">)</span>
<span class="o">.</span><span class="n">createAsync</span><span class="p">(</span><span class="n">TableComponent</span><span class="p">)</span>
<span class="o">//</span> <span class="n">In</span> <span class="n">fixture</span> <span class="n">we</span> <span class="n">store</span> <span class="nb">all</span> <span class="n">component</span> <span class="n">metadata</span><span class="p">,</span> <span class="n">like</span> <span class="n">componentInstance</span> <span class="ow">and</span> <span class="n">nativeElemnet</span> <span class="n">to</span> <span class="n">access</span> <span class="n">Component</span> <span class="n">template</span>
<span class="n">fixture</span><span class="o">.</span><span class="n">debugElement</span><span class="o">.</span><span class="n">children</span><span class="o">.</span>
<span class="o">.</span><span class="n">then</span><span class="p">((</span><span class="n">fixture</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">let</span> <span class="n">componentInstance</span> <span class="o">=</span> <span class="n">fixture</span><span class="o">.</span><span class="n">componentInstance</span><span class="p">;</span>
<span class="n">let</span> <span class="n">nativeElement</span> <span class="o">=</span> <span class="n">jQuery</span><span class="p">(</span><span class="n">fixture</span><span class="o">.</span><span class="n">nativeElement</span><span class="p">);</span>
<span class="n">componentInstance</span><span class="o">.</span><span class="n">post</span> <span class="o">=</span> <span class="n">new</span> <span class="n">Post</span><span class="p">({</span><span class="n">title</span><span class="p">:</span> <span class="s1">'TestPost'</span><span class="p">,</span> <span class="n">author</span><span class="p">:</span> <span class="s1">'Admin'</span><span class="p">});</span>
<span class="n">fixture</span><span class="o">.</span><span class="n">detectChanges</span><span class="p">();</span>
<span class="n">let</span> <span class="n">firstTable</span> <span class="o">=</span> <span class="n">nativeElement</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">'table'</span><span class="p">);</span>
<span class="n">expect</span><span class="p">(</span><span class="n">firstTable</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">'tr td:nth-child(1)'</span><span class="p">)</span><span class="o">.</span><span class="n">text</span><span class="p">())</span><span class="o">.</span><span class="n">toBe</span><span class="p">(</span><span class="s1">'TestPost'</span><span class="p">);</span>
<span class="n">expect</span><span class="p">(</span><span class="n">firstTable</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">'tr td:nth-child(2)'</span><span class="p">)</span><span class="o">.</span><span class="n">text</span><span class="p">())</span><span class="o">.</span><span class="n">toBe</span><span class="p">(</span><span class="s1">'Admin'</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}));</span>
<span class="p">});</span>
</code></pre></div>
<p>After:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="p">{</span><span class="n">Component</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/core'</span><span class="p">;</span>
<span class="o">//</span> <span class="n">TestComponentBuilder</span> <span class="n">was</span> <span class="n">changed</span> <span class="n">on</span> <span class="n">TestBed</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">TestBed</span><span class="p">,</span> <span class="k">async</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'@angular/core/testing'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">Post</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'./../app/models/post.model'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">TableComponent</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'./../app/components/table/table.component'</span><span class="p">;</span>
<span class="o">//</span> <span class="n">Services</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">PostService</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'./../app/services/post.service'</span><span class="p">;</span>
<span class="kn">import</span> <span class="p">{</span><span class="n">MockPostService</span><span class="p">}</span> <span class="kn">from</span> <span class="s1">'./post.service.mock'</span>
<span class="o">//</span> <span class="n">Create</span> <span class="n">TestCmpWrapper</span> <span class="ow">and</span> <span class="n">grab</span> <span class="nb">all</span> <span class="n">test</span> <span class="n">data</span>
<span class="nd">@Component</span><span class="p">({</span>
<span class="n">selector</span> <span class="p">:</span> <span class="s1">'test-cmp'</span><span class="p">,</span>
<span class="n">template</span> <span class="p">:</span> <span class="s1">''</span>
<span class="p">})</span>
<span class="k">class</span> <span class="nc">TestCmpWrapper</span> <span class="p">{</span>
<span class="n">public</span> <span class="n">postMock</span> <span class="o">=</span> <span class="n">new</span> <span class="n">Post</span><span class="p">({</span><span class="s1">'title'</span><span class="p">:</span> <span class="s1">'TestPost'</span><span class="p">,</span> <span class="s1">'author'</span><span class="p">:</span> <span class="s1">'Admin'</span><span class="p">});</span>
<span class="p">}</span>
<span class="n">describe</span><span class="p">(</span><span class="s2">"TableComponent"</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="o">//</span> <span class="n">Innovation</span> <span class="err">–</span> <span class="n">you</span> <span class="n">need</span> <span class="n">to</span> <span class="n">create</span> <span class="nb">all</span> <span class="n">dependencies</span> <span class="ow">in</span> <span class="n">NgModule</span>
<span class="n">beforeEach</span><span class="p">(()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">TestBed</span><span class="o">.</span><span class="n">configureTestingModule</span><span class="p">({</span>
<span class="n">declarations</span><span class="p">:</span> <span class="p">[</span>
<span class="n">TestCmpWrapper</span><span class="p">,</span>
<span class="n">TableComponent</span>
<span class="p">],</span>
<span class="n">providers</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span><span class="n">provide</span><span class="p">:</span> <span class="n">PostService</span><span class="p">,</span> <span class="n">useClass</span><span class="p">:</span> <span class="n">MockPostService</span>
<span class="p">]</span>
<span class="p">});</span>
<span class="p">});</span>
<span class="n">describe</span><span class="p">(</span><span class="s1">'check rendering'</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">it</span><span class="p">(</span><span class="s1">'if component is rendered'</span><span class="p">,</span> <span class="k">async</span><span class="p">(()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">TestBed</span><span class="o">.</span><span class="n">compileComponents</span><span class="p">()</span><span class="o">.</span><span class="n">then</span><span class="p">(()</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">let</span> <span class="n">fixture</span> <span class="o">=</span> <span class="n">TestBed</span><span class="o">.</span><span class="n">createComponent</span><span class="p">(</span><span class="n">TestCmpWrapper</span><span class="p">);</span>
<span class="n">let</span> <span class="n">componentInstance</span> <span class="o">=</span> <span class="n">fixture</span><span class="o">.</span><span class="n">componentInstance</span><span class="p">;</span>
<span class="n">let</span> <span class="n">nativeElement</span> <span class="o">=</span> <span class="n">jQuery</span><span class="p">(</span><span class="n">fixture</span><span class="o">.</span><span class="n">nativeElement</span><span class="p">);</span>
<span class="n">componentInstance</span><span class="o">.</span><span class="n">post</span> <span class="o">=</span> <span class="n">new</span> <span class="n">Post</span><span class="p">({</span><span class="n">title</span><span class="p">:</span> <span class="s1">'TestPost'</span><span class="p">,</span> <span class="n">author</span><span class="p">:</span> <span class="s1">'Admin'</span><span class="p">});</span>
<span class="n">fixture</span><span class="o">.</span><span class="n">detectChanges</span><span class="p">();</span>
<span class="n">let</span> <span class="n">firstTable</span> <span class="o">=</span> <span class="n">nativeElement</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">'table'</span><span class="p">);</span>
<span class="n">expect</span><span class="p">(</span><span class="n">firstTable</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">'tr td:nth-child(1)'</span><span class="p">)</span><span class="o">.</span><span class="n">text</span><span class="p">())</span><span class="o">.</span><span class="n">toBe</span><span class="p">(</span><span class="s1">'TestPost'</span><span class="p">);</span>
<span class="n">expect</span><span class="p">(</span><span class="n">firstTable</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">'tr td:nth-child(2)'</span><span class="p">)</span><span class="o">.</span><span class="n">text</span><span class="p">())</span><span class="o">.</span><span class="n">toBe</span><span class="p">(</span><span class="s1">'Admin'</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}));</span>
<span class="p">});</span>
<span class="p">});</span>
</code></pre></div>
<p>Read all the comments in the code carefully, as they are incredibly important. I’d also like to hear your own comments
on whether or not this was helpful for you – they are very appreciated.</p>Bootcamp for Fashpreneurs – Reimagining Online Fashion2016-11-08T00:00:00+01:002016-11-08T00:00:00+01:00Elzbieta Sikoratag:engineering.zalando.com,2016-11-08:/posts/2016/11/bootcamp-for-fashpreneurs-reimagining-online-fashion.html<p>Zalando will run its first bootcamp dedicated to Fashpreneurs in Berlin from 25-27 November.</p><p>As Europe's largest online fashion platform with a vision to connect all players in the fashion business, we want to
open up even more to the startup community by doing what we know – encouraging the next generation of fashpreneurs to
scale tech in fashion.</p>
<p>There is an ocean of opportunity out there, from the <a href="https://www.all-about-z.com/new-normal/">latest developments</a> in
AR/VR technology to the <a href="https://www.all-about-z.com/the-big-bang/">new wave of content creators</a> shaking up the brand
game. We see the need to inspire, guide, and support entrepreneurs in reimagining fashion with their own business idea,
in an effort to go beyond traditional online and app store models.</p>
<p>With this in mind, we’re excited to announce that Zalando will run its first bootcamp dedicated to Fashpreneurs in
Berlin from 25-27 November.</p>
<h3>Who, What, Why?</h3>
<p>If you are a student, graduate, entrepreneur, or a team of forward-thinkers considering (or in the process of) building
revolutionary mobile fashion journeys, you’ll definitely want to sign up to this bootcamp experience.</p>
<p>Delivering a product that consumers will love requires some homework. At <em>Zalando’s Bootcamp for Fashpreneurs</em> we’ll be
sharing industry insights on new trends, including technology developments in your preferred sphere of influence. We’ll
also dive into how you can develop and validate your customer value proposition that goes beyond simple catalogue
models.</p>
<p>Our Product, UX, and Business mentors will be on hand to explain why the devil lies in the details of your product
offering, as well as help to coach you when assessing your value proposition design skills – very crucial when
developing those winning user cases. This event presents a unique chance to meet Berlin’s most imaginative people,
prototype your ideas in <a href="https://tech.zalando.com/blog/zalando-opens-new-playground-for-tech-innovation/">Zalando’s Innovation
Lab</a>, and to share entrepreneurial
knowledge with those in the know.</p>
<p>How will it all happen? Over the course of a weekend, you’ll build your team, mingle with industry visionaries, and
master your product and pitch. You’ll receive direct expert feedback along the way, giving your product the vision and
scope it might need. You’ll be fed, hydrated, and engaged while you received heightened input from experts in the field
– an opportunity not to be missed.</p>
<h3>How do I sign up?</h3>
<p>We’ve got room for 60 keen bootcampers to join us. Head over to <a href="https://docs.google.com/forms/d/e/1FAIpQLScSXPyOovQQ4-tqOFOx8aHlf03xYjR5SEHgRJDLEHZjZbcXcA/viewform#responses">this
form</a> and
briefly tell us about yourself or your team – your interest is important to us, even if you have yet to fully formulate
your idea. We’re also happy to take any questions directly in the form linked above.</p>
<p>Sign-up closes on Sunday, 20th November – so make sure you don’t miss out.</p>Delivering a Cross-Site Project2016-11-07T00:00:00+01:002016-11-07T00:00:00+01:00Dr. John Hannontag:engineering.zalando.com,2016-11-07:/posts/2016/11/delivering-a-cross-site-project.html<p>Align your goals, produce software at speed, and be agile enough to incorporate change.</p><p>You may have heard recently from one of the many media outlets about the launch of
<a href="http://www.collabary.com/">Collabary</a>, the content creator platform from Zalando. This showcases the perfect merger of
enriched data with a high quality frontend application.</p>
<p>In this blog post I’d like to give my lessons learnt from being part of the team involved in this project, based here in
Dublin. It’s always good to start with a short history lesson to get everyone up to speed.</p>
<p>At the start of the quarter my team, Labrador, heard about a prospective project that needed some crawling technology to
help it acquire data. If we chose to be part of it, we would be inheriting a project SLA from another team with a launch
date already set in place. This scenario was daunting, and being a small team of two we knew our only possibility to
achieve this goal would be to heavily align our OKRs with the purpose of doing whatever was needed to get this project
ready and out the door.</p>
<p>Some of the other lessons that helped keep us on target and focused were:</p>
<h3>Get to Know the Stakeholders</h3>
<p>Why are we building the project? Who needs it? What needs to be done? All of these questions needed to be asked face to
face. In our case we hadn't heard of the project and didn't know fully what was needed.</p>
<p>We traveled to Berlin, had a whiteboarding session, put faces to names, and started building relationships with the
stakeholders who we would be working closely with over the next few months.</p>
<h3>Communication, Communication, Communication</h3>
<p>It goes without saying that distance is a huge barrier in communication. We knew we would need a mechanism to filter
down requirements, discuss bugs, ask general questions, and be open about current progress.</p>
<p>To allow communication to flow we used a few technologies: Hangouts were used for our daily stand-ups, we created a
dedicated room on HipChat to raise issues, ask questions, and chat off topic (it’s not all about work!), and
occasionally when we needed to take things offline and write down a set of Specs/decisions, Google Docs served us well.</p>
<h3>Process</h3>
<p>Process, process, process. It doesn't matter what your process is, it only matters that you have one and it's trackable
and measurable. For instance, our colleagues in Berlin did Scrum, had Sprints, used JIRA, and were aided by a Producer,
whereas on our side we used a flavor of Kanban that we evolved to suit our needs, using a lightweight Trello board and
having meetings only when needed. In practice this worked out really well.</p>
<p>At our daily cross-team standups, we could respond and act immediately on a feature request and pull it into our current
workday if we had capacity, or give an ETA when we could start. Having daily standups gave us the medium to discuss
blockers, progress, or any other business that cropped up.</p>
<h3>Pressure</h3>
<p>You don’t want too much pressure, but just enough to give you focus. Having a delivery date as a target allows for
prioritisation, strategic planning, and an immense urgency to get s*** done. Pressure, once managed, can be channeled
into a focus factor with huge possibilities.</p>
<p>In this project there were many sources from which to draw this pressure. The product was being pitched and a launch
date was already in place. One of our founders was backing the idea behind the project and getting it out to market
first before our competitors, thus gaining market share was paramount. Pressure can often be seen as a negative if too
large, but having the right amount along with a solid process means that teams can do amazing things in a short amount
of time.</p>
<h3>Celebrate</h3>
<p>Celebrate the wins big and small. Launching a product is the perfect excuse to celebrate, but closing out tickets and
clearing a backlog are also opportunities to celebrate small wins. We started to go for coffee and cakes in the
afternoons, which increased our team spirit, cleared our heads to refocus, and increased my waist size but, that’s an
aside!</p>
<p>As a team we went for drinks, praised each other's efforts, and we did our weekly wins in front of our office. All of
the small wins topped up the tank and powered us on for that bit longer. Finally, our Collabary colleagues from Berlin
arrived in October for a planning session and we definitely made the time to celebrate our wins then too!</p>
<h3>Fin</h3>
<p>Take what you want from the above lessons. These aren’t anything new or ground breaking and you may even find them in a
textbook or two. Having said that, these few simple things allowed us to align our goals, produce software at speed, and
be agile enough to incorporate change.</p>
<p>This isn’t the end of the collaboration and as we come into the next quarter we look forward to continuing the
partnership and the possibilities that lay in front of us.</p>
<p>If you’d like to get in touch about our experience with a cross-site project, you can contact me on Twitter at
<a href="https://twitter.com/johnhenryhannon">@johnhenryhannon</a>.</p>Doing Data Science the Cloud and Distributed Way2016-11-04T00:00:00+01:002016-11-04T00:00:00+01:00Humberto Coronatag:engineering.zalando.com,2016-11-04:/posts/2016/11/doing-data-science-the-cloud-and-distributed-way.html<p>See how we're iterating the way we do early stage exploratory data science with large datasets.</p><p>Our team have been building data science products together for almost one year now. We believe that data scientists and
data engineers should work closely together, yet we understand the differences in the environments and tasks that each
of us perform.</p>
<p>In this vein, we have iterated the way we do data science, specially early stage exploratory data science with large
datasets. Moreover, we have also established a flexible framework for reviewing and sharing data science knowledge
within and across the teams. This framework allows us to carry out data science tasks in a production-ready environment;
to have a better standard of work via peer reviews; and to use distributed computing frameworks such as Spark or Hadoop,
where we can build machine learning models in the cloud with large datasets.</p>
<p>We’ve explored an array of different frameworks in our team and we’d like to share the pros and cons of each of them
below.</p>
<h3>Jupyter Notebooks</h3>
<p>A few of us in the team really like Python, because it is great for early prototyping. Libraries such as scikit-learn
allow us to iterate really fast around ideas. We also like using Jupyter Notebooks where you can integrate text, graphs,
code, and data in a single human-readable file. However, we have found a few drawbacks with this method.</p>
<p>To start with, Jupyter Notebooks do not render in GitHub Enterprise, which we need for reviewing and versioning our work
(we solved this by doing rubber-duck reviews on markdown exports of the code). An even a bigger problem with this
approach is that is doesn’t help when using the tools we need to perform data science in really large datasets. Using
EMR clusters, reading S3 files, or having immutable experiments is not straightforward with this approach.</p>
<h3>EC2 / EMR steps</h3>
<p>As software engineers, we have also tried a more standard approach used in the software industry. Using Scala and Spark,
we have built GitHub repos to tackle a wide range of data science problems. Since we work in a production-ready
environment, we have used code reviews and standard testing techniques to guarantee the quality of our code.
Additionally, we have deployed Jenkins for Continuous Integration. In order to obtain results or create models, we
usually deploy our code as fat JARs that then run on Docker, on top of EC2 or EMR, depending on our needs.</p>
<p>The problem with this approach is that is not data science friendly. Data science requires a lot of experimentation and
tuning. The process of creating a JAR, deploying it, and running is incredibly time consuming and tedious. An
alternative would be using config files with parameters that we can change on each run of our programs to get different
results. Nevertheless, it is not as flexible nor dynamic as data scientists would expect.</p>
<h3>Zeppelin Notebooks</h3>
<p><a href="https://zeppelin.apache.org/">Zeppelin</a> is an interactive web-based notebook (similar to Jupyter) that is being
developed by the Apache community. Its main features are the large number of different programming languages it supports
and the flexibility it has to incorporate new languages in its interpreter. Its built-in visualization tools are also
one of its highlights.</p>
<p>Zeppelin simplifies the running of large-scale experiments across large datasets using the cloud (for example, AWS). It
automatically provides a SparkContext and a SQLContext so data scientists don’t have to initialize them manually. It is
also possible to upload JARs and libraries from your local filesystem or Maven repository.</p>
<h3>Conclusion</h3>
<p>We are still iterating with small improvements over our current setup for data science, especially making it closer to
engineering processes. As new tools come out, we will be testing them to see how they fit our workflow, and as our
products change, the type of data science work we do will shift.</p>
<p>The integration of Zeppelin Notebooks within the software development process has proven to be difficult. High quality
code is a standard in Zalando and we like to peer-review our notebooks. Unfortunately, just like Jupyter, Zeppelin
Notebooks are not rendered by GitHub. We have sometimes opted for exporting the notebooks as .html or .json, but the
review process here becomes tiresome and ticky.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ad71dc6290b586f7a67d2413a1d8b23a305ebb2e_zeppelin-datascience-blogpost-2.png?auto=compress,format"></p>
<p>Our current approach for exploratory and early stage data science can be seen in the image above. What we’ve
experimented with is only a view of our current state and the processes we have put in place to get there.</p>
<p>We know many teams have gone through the same process and we would like to hear about your experiences. You can contact
us via Twitter at <a href="https://twitter.com/totopampin">@totopampin</a> and <a href="https://twitter.com/S_Gonzalez_S">@S_Gonzalez_S</a>
to share your own processes and feedback.</p>The Sprint Exposed – How we Use it at Zalando2016-11-03T00:00:00+01:002016-11-03T00:00:00+01:00Adrian Dampctag:engineering.zalando.com,2016-11-03:/posts/2016/11/the-sprint-exposed--how-we-use-it-at-zalando.html<p>How we go about answering the most critical project questions and doubts in just one week.</p><p>The <em>Design Sprint</em> (or just The Sprint) is the process introduced by Google Ventures and Jake Knapp, author of “ <a href="http://www.thesprintbook.com/">The
Sprint</a>”.</p>
<p>The main method idea is to answer the most critical project questions and doubts in just one week.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2f6a52dff0c39f0adfc0249bc0e363d1fc04832d_ebde1b47795447d66f88d80d28dcd7b9.jpg?auto=compress,format"></p>
<p>There’s no endless brainstorming sessions, no coding – just short, highly energetic activities that push your product
forward.</p>
<h3>Why did we decide to use it?</h3>
<p>Our team creates merchandise solutions for Zalando employees and our business partners. We deliver tools that often need
to support complex business logic and related procedures.</p>
<p>Before Sprint, we had a lot of questions about our users, their jobs, and goals. Our knowledge was unstructured; it was
hard to clearly explain a big picture of our future plans.</p>
<p>We had many different ideas about how to improve the life of our users. Unfortunately, there wasn’t any strong proof of
which ideas were deemed the most valuable.</p>
<p>We were looking for a way to answer open questions, structurize findings, and establish a long-term strategy for our
products. The sprint concept sounded like something that fitted our problems perfectly.</p>
<p>We decided to find out if it really works.</p>
<h3>Results</h3>
<p><strong>Sprint is flexible</strong></p>
<p>Since we have not seen any showcases focused on enterprise, we were not sure if the sprint could be adopted to solve our
problems.</p>
<p>It was not an issue. We have used the sprint twice (to find the way to deliver the best product information, and to
improve the overall ordering process), and in both cases we felt that the sprint activities were helping us to solve our
problems in a creative and more motivated way.</p>
<p><strong>The Sprint is not a silver bullet for all your problems…</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7e803f64415983c7018b48fbe92a6929513d4692_img_1239.jpg?auto=compress,format"></p>
<p>In fact, sprint activities are not really revolutionary. It is more like a framework for existing methods that helps to
use them more efficiently in a limited time.</p>
<p>Something else to note is that the sprint is only as good as your team. If your team don’t choose to be fully involved
and disrespect the rules, the results will be miserable.</p>
<p>Sprints can illuminate the way for the progression of your product, but when you are already on this path and you need
to deliver it, the sprint’s role is limited here.</p>
<p><strong>… but it works!</strong></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/bc3211e10bea9a43a72fc0a12bab81f6fd4399c5_img_0591.jpg?auto=compress,format"></p>
<p>Although the sprint has its limitations, it was very helpful in our cases.</p>
<p>In one week, we achieved progress that often requires many weeks or even months. We answered a lot of product questions
and analyzed our user problems. We were also able to generate, discuss, and test a few possible solutions.</p>
<p>And the most important thing: the sprint was valuable in establishing long-term product strategy and it gave us
confidence knowing that we were on the right track.</p>
<h3>Tips and tricks</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/aa3f743a643fafd6fdac2b20b48eabe138b19935_img_0589.jpg?auto=compress,format"></p>
<p><strong>Build a diverse team</strong></p>
<p>A sprint team usually consists of seven people. It is natural to try to recruit the most important, clever, or talented
people in your organisation.But the most crucial thing here is diversity. I am sure that our sprint would have been less
effective if we were not including developers or customer specialists on our team.</p>
<p>Try to consider people that can understand your problem from different angles, even if they are usually not involved in
making product decisions.</p>
<p><strong>Don’t forget about the prototyper</strong></p>
<p>If you’re making the wrong decisions during the first three days, you can’t get that time back. While painful, you can
live with it. But if you fail further over the course of the following two days, you’ll receive incorrect answers that
might push your project down the wrong path for months.</p>
<p>Ensure that at least one person has experience in creating interactive prototypes. You only have one day to prepare a
realistic product simulation. It is really challenging and it will be even harder if you need to learn how to do this
first.</p>
<p>An experienced prototyper will not only be more efficient, but also help you decide which elements are crucial and what
can be skipped.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/fafcda03566364a59ed3bf2ec5f7ef6e165acc9e_2.-prototyper_in_action.jpeg?auto=compress,format"></p>
<p><strong>Remember the researcher</strong></p>
<p>Friday sessions should be lead by an experienced researcher. If you are unable to recruit anyone for the whole sprint,
try to invite them for at least the last day.</p>
<p>In my mind, people with no testing experience tend to suggest proper solutions for the user (usually unconsciously, for
example by using subtle gestures).</p>
<p>Being a test observer can be tricky as well; sometimes it is hard to distinguish test observations from a user’s
opinion, which can be misleading.</p>
<p>A good researcher will help you prevent these problems.</p>
<p><strong>Cancel other meetings</strong></p>
<p>Daily sprint work is divided in two three-hour blocks. It might look like you’ll have enough time to do some extra tasks
on the side. This is not the case.</p>
<p>A <em>Design Sprint</em> is like sprint run: Short but intensive. You can run (or work) at your maximum level for a limited
period of time. If you waste your energy on other activities, you will be less effective during sprint sessions.</p>
<p>We all know how hard is to clear your work calendar for whole week, but I implore that you try to do so – it is
absolutely worth the trouble.</p>
<p><strong>Keep to the schedule</strong></p>
<p>Every sprint activity has strict time frames. Quite often, you may have the impression that it is not enough time to
finish meaningful discussions or perform tasks properly.</p>
<p>Despite this, do not change the sprint schedule. Your time is limited on purpose: Work is short but intense, and
everything that is not considered essential must be skipped.</p>
<p>If you try to “hack” the schedule you will lose intensity, focus, and energy.</p>
<h3>Conclusion</h3>
<p>I hope the explanation I’ve given has inspired you to explore using a <em>Design Sprint</em> to solve your own user problems in
your team. With the success we’ve had, we’ll surely be using this technique more thoroughly in the future.</p>
<p>If you have any questions about our process or want to get in touch, you can reach me on Twitter at
<a href="https://twitter.com/adriandampc">@adriandampc</a>.</p>How Failing Fast Drives us Forward at Zalando Tech2016-10-28T00:00:00+02:002016-10-28T00:00:00+02:00Martin Schwitallatag:engineering.zalando.com,2016-10-28:/posts/2016/10/how-failing-fast-drives-us-forward-at-zalando-tech.html<p>Failing fast, getting the right feedback, and remembering to test everything you're working on.</p><p>At Zalando, we try to live a mentality of failing fast. The context here is not a failure of the person or the group
involved, but rather a chance to learn from the mistakes that were made. If you fail fast, you are also able to right
the wrongs much quicker. Failing fast can be a golden opportunity, and our team (Team Sokoban) tried to apply this
mentality during the beginning of our stock rebuild.</p>
<p>The stock system is the connection between the physical warehouses and the CFAs, and knows where an item is located, how
much we still have in stock, and how many units we can still sell to the customer. It’s easy to see that it is an
important component in Zalando’s <a href="https://tech.zalando.de/blog/zalandos-vp-brand-solutions-presents-at-the-july-2015-fashtech-konferenz./">Platform
Strategy</a>, and
our system has to be ready for the challenges that lie ahead.</p>
<p>To get the rebuild on track, we started with a two-week workshop where we exclusively focused on this topic. We defined
requirements for the stock, designed an architecture, and according to Zalando’s API first principles, designed APIs for
these new services. This is important, because it’s one thing to own and know the own domain, but something else to know
everything about the surrounding teams that rely on our solution. With this API, we were now able to liaise with other
teams and involved parties about the future, allowing them to give us feedback on their behaviours and requirements. It
was during this step where our failing occurred.</p>
<p>Some of the options we presented just didn’t work out, due either to our data model, or the fact that other teams had
plans which wouldn’t work with our solution. Our team felt that this wasn’t a problem, as we were able to spot bigger
issues right at the beginning of the project and adapt them without much effort. We were redrawing architecture pictures
and redefining APIs, but in the end we crafted a more stable and improved solution thanks to the fast feedback we
received in such a short timeframe.</p>
<p>But this doesn’t end with APIs. While defining and implementing APIs for example, we were already implementing the
monitoring required, or thinking about how scalable our solution would be. Load tests were completed to identify
problems in the setup while we spoke with different teams about monitoring and common problems. This coincided with us
making plans for migrating the old data.</p>
<p>In conclusion, failing fast is about getting feedback from the very start of your project, and testing your API, setup,
or implementation. Don’t try to overthink everything and come up with a perfect solution at first. The lesson here is
that changes will be made, and when you approach the work unafraid of failing, you’ll be able to recover and fix the
issues raised without sacrificing months of hard work.</p>
<p>If you have any questions or anecdotes to share about your own failing fast experience, get in touch via
<a href="mailto:martin.schwitalla@zalando.de">email</a>.</p>Data Science and AI in the Spotlight with our VP, Alex Rahin2016-10-26T00:00:00+02:002016-10-26T00:00:00+02:00Natali Vlatkotag:engineering.zalando.com,2016-10-26:/posts/2016/10/data-science-and-ai-in-the-spotlight-with-our-vp-alex-rahin.html<p>Introducing our new VP of Data and Machine Learning Platforms to the world, Alex Rahin.</p><p>As our work and investment in Data Science and AI continue to grow, we’ve added to our recent good news on the hiring
front here at Zalando Tech. Now that he’s fully onboarded, we’d like to introduce Alex Rahin, our new VP of Data and
Machine Learning Platforms.</p>
<p>Alex joins us with extensive product experience from Amazon, Microsoft, Intel, and several technology startups. He is
responsible for Zalando Tech’s Core Data Platforms and Applications, such as Data Infrastructure, Machine Learning,
Business Intelligence, and Web Analytics. We wanted to share more about his role and what his future endeavours will
entail for Zalando in the world of Data Science.</p>
<p><em>Zalando Tech: How have your first months been in Berlin? Are you looking forward to engaging with the tech scene here?</em></p>
<p><em>Alex Rahin:</em> It has been incredibly fascinating to meet and connect with my colleagues in and outside of Zalando within
the Data Science and Data Engineering fields. This also includes contacts in academia and research. I have also really
enjoyed getting up to speed with the challenges and opportunities related to Data at Zalando. It’s going to be a great
2017!</p>
<p><em>Zalando Tech: Your role oversees the data we own and collect – can you tell us a bit more about the vision you have for
our future collection and use of this data?</em></p>
<p><em>Alex Rahin:</em> There are many dimensions and elements to data, such as the type, the attributes, and the structure, or
lack thereof. For instance, you can derive data from a variety of sources, whether it be emails, databases, pictures,
audio, or mixed media. Data is exploding in volume, in complexity, and in connections with other data.</p>
<p>We aim to understand the different sources of data that are relevant to our business, and provide a common platform that
simplifies, automates, and scales our workflows and data-driven decision making. In other words the storage, access, and
applications of this data both internally and externally. To support such a big scope, <a href="https://tech.zalando.com/jobs/data/">we’re currently
hiring</a> to help us better deal with the volume and complexity of data that we’re
handling, and the use of data to solve challenging problems. It has been a pleasure to connect with the candidates who
have shared their ideas concerning how we can excel in this field.</p>
<p><em>Zalando Tech: What does Zalando do with this data? What are some of the applications of this data and how do we use
it?</em></p>
<p><em>Alex Rahin:</em> We’re using our data capabilities to transform Zalando. We want to fashion a mindset driven by data and
ultimately transform our culture. This includes developing tools and processes that support this push. Our vision is to
bring data intelligence and machine learning to every single corner of Zalando. Imagine a day in the near future where a
Zalando customer, partner, or employee cannot touch any part of Zalando without interacting with data science or machine
learning… That is our ultimate goal.</p>
<p>The harder question is how do we achieve this? There are many things that you can do with data. You can use it for
backwards-looking applications, helping you understand correlation, causation, and the impact of decisions on our
customers, our partners, and our business overall. You can also use data in predictive use cases, such as making
predictions about what the next trend is likely to be for the upcoming fashion season. We’re building and embracing
data-driven products and making data-driven decisions that will help realise all the potential uses and applications of
data for Zalando.</p>
<p><em>Zalando Tech: How are you connected to the new Zalando Research arm that the company has just established?</em></p>
<p><em>Alex Rahin:</em> I am incredibly close with our <a href="https://tech.zalando.com/blog/research-roles-at-zalando-research/">researchers here at
Zalando</a>, and an important role that my team plays is
understanding the areas of research that are promising to explore. We’re then tasked with helping to experiment,
prototype, validate, and eventually create real products.</p>
<p>Our customers can look forward to some very cool features and further developments in personalisation that will amp up
their user experience in the Fashion Store and our mobile apps in the near future. It’s a fantastic time for AI at
Zalando!</p>Deep Learning for Understanding Consumer Histories2016-10-25T00:00:00+02:002016-10-25T00:00:00+02:00Tobias Langtag:engineering.zalando.com,2016-10-25:/posts/2016/10/deep-learning-for-understanding-consumer-histories.html<p>See how we're delivering better consumer experiences with recurrent neural networks.</p><p><em>At</em> <em>Zalando AdTech Lab Hamburg, we develop techniques to predict the future behavior of consumers based on their
interaction history on Zalando with the goal of delivering better consumer experiences.</em> <strong>Recurrent neural networks
(RNNs)</strong><em>, one of the major classes of deep learning methods, offer us two benefits: (1) By processing raw consumer
histories, RNNs provide accurate prediction models without requiring tedious feature engineering efforts. (2)
Furthermore, we can interpret their reasoning when providing predictions for individual consumers.</em></p>
<p>At Zalando AdTech Lab Hamburg, we build ad-tech products to provide relevant and helpful advertisement experiences for
consumers. Our products use machine learning models at their core to predict the future interests and behaviors of
consumers to better serve their needs.</p>
<p>Model building is an ongoing process for us: We are constantly working on increasing the accuracy of our models to
improve consumer experiences, often by incorporating new data sources. Likewise, understanding the reasoning underlying
the predictions of our models is a growing concern for us, for monitoring purposes, as well as convincing our
collaborators of the accuracy of our products.</p>
<p>In this blog post, we will show that RNNs are helpful for both flexible model building and understanding model
predictions. We thank our colleagues at <a href="https://tech.zalando.com/blog/zalando-launches-research-lab/">Zalando Research</a>
for inspiring discussions and valuable feedback.</p>
<h3>Predicting order propensities</h3>
<p>There are a variety of goals that advertising campaigns are targeted at, like creating awareness or retargeting
consumers. For example, the latter could involve estimating the propensity of consumers to order within several days. In
this blog post we take this as our running example and focus on predicting order probability.</p>
<p><em>A side remark: Our models are based on histories of anonymous cookies. (That is, we</em> <strong>do not use</strong> <em>customer data.)
For ease of readability, we speak of consumers instead, but cookies are what we really refer to.</em></p>
<p>The <strong>consumer history</strong> captures important clues that can be utilized for prediction: Has the consumer ordered
recently? Did they visit Zalando just yesterday? Have they added something to their wishlist?</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2ac1f20f2c8beeb95a9d28988aa1966a077dd3ba_consumer_history.png?auto=compress,format"></p>
<p>Customer histories are <strong>sequences of events</strong>. For each event, we know its type (product view, cart addition, order,
etc.), its timestamp, and further information such as the viewed product or the fashion category the consumer is
currently engaging with.</p>
<p>The central question when building predictive models is: <strong>How do we get this sequential information into a machine
learning model?</strong></p>
<h3>Approach 1: Traditional machine learning with handcrafted features</h3>
<p>A traditional way to apply machine learning models on sequential data is <strong>feature engineering</strong>. Although feature
engineering receives <a href="https://tech.zalando.com/blog/feature-extraction-science-or-engineering/">negligible attention</a> in
research papers, it can be the most critical and laborious task in a practitioner's daily work.</p>
<p><em>"At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the
most important factor is the features used."</em> ( <a href="https://homes.cs.washington.edu/%7Epedrod/papers/cacm12.pdf">Pedro
Domingos</a>)</p>
<p>We can conceive many sensible features for our prediction task at hand. For example: How many products did the consumer
view yesterday? Did the consumer order within the last week? What's the current item count in the consumer's cart? And
many, many more.</p>
<p>For a given set of features and a given consumer history, we calculate the corresponding feature vector. This is simply
a large vector of numbers. The feature vector is provided as input to a <strong>vector-based machine learning model</strong>. There
is a <a href="http://scikit-learn.org/stable/supervised_learning.html">large zoo</a> of machine learning models -- logistic
regression, random forests, support-vector machines, you name it -- and almost all of them are vector-based. However,
often it is not model choice, but the feature engineering process which has the greatest influence on prediction
accuracy.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/472d9eafcf0bbd92ac94aee8d016ba107285b805_ml_featureengine.png?auto=compress,format"></p>
<p>Feature engineering is an art in itself and requires domain knowledge, as well as data science intuition. Additional
data preprocessing steps often have decisive effects on model performance. For instance, preprocessing may be required
if we have real-valued, ordinal, and categorical features at the same time; if features have different value ranges; or
if we want to ensure model robustness with model regularization. A <a href="http://olivier.chapelle.cc/pub/ngdstone.pdf">standard
approach</a> in real-time bidding is using logistic regression on a large set
of binary features. Binary features are created by binning and binarizing the original input features, a transformation
that can be done in many different ways. For example, a feature for order counts could be converted into buckets for
zero orders, one order, two-five orders, and six plus orders.</p>
<p>While the exact choice of feature representation has decisive effects on model performance, it can typically be
determined only in experiments on historical data and in A/B tests. Choosing appropriate feature sets is time-consuming,
tedious work.</p>
<p>Tedious work which RNNs have the potential to circumvent.</p>
<h3>Approach 2: Recurrent neural networks learn features</h3>
<p>Most machine learning models are vector-based and require feature engineering to deal with sequences. In contrast,
<strong>recurrent neural networks</strong> (RNNs) work directly on sequences as inputs.</p>
<p>RNNs make up one of the major classes of <strong>deep learning</strong> methods (the other one being convolutional neural networks
(CNNs), mostly used for vision; for example, Zalando uses CNNs on <a href="https://kddfashion2016.mybluemix.net/kddfashion_finalSubmissions/Fashion%20DNA%20Merging%20Content%20and%20Sales%20Data%20for%20Recommendation%20and%20Article%20Mapping.pdf">product images to improve product
recommendations</a>
and for <a href="https://devblogs.nvidia.com/parallelforall/optimizing-warehouse-operations-machine-learning-gpus/">logistics</a>).
It might be fair to say that research on deep learning has not been overly marginalized in recent years. Still, even
with healthy hype-averse skepticism, its potential is great also in the ad-tech and e-commerce realm as we try to show
in this blog (and as <a href="https://arxiv.org/pdf/1511.06939.pdf">other</a> recent <a href="https://arxiv.org/pdf/1511.06247.pdf">work</a>
shows).</p>
<p>We feed consumer histories directly into RNNs. RNNs are made up of a sequence of computational cells. Each cell takes
the input at a given time-step, in our case the event type and its timestamp (and potentially additional information
like the brand of a viewed product). Cells maintain latent vectors of real-valued numbers. These numbers describe the
consumer state until the respective time-step, as it is relevant for the prediction problem. The dimensionality of the
latent vector is a design decision. In our example scenario, we use low-dimensional vectors to describe the consumer's
propensity to order (typically, 10-15 dimensions). We use <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">long short-term memory
cells</a> (LSTMs) ( <a href="http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf">invented at TU Munich 20 years
ago</a>), which (together with descendants like GRU) underlie
the recent success of RNNs.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/782f883d321fc652053b2e8d2eb46005c61f68df_rnn_vectors.png?auto=compress,format"></p>
<p>The latent cell state at the last time-step (shown in orange) is used for the final prediction. This prediction can use
a simple linear logistic regression layer or a sequence of non-linear layers as in a neural net.</p>
<p>RNN cells contain a large number of parameters. These parameters are used in cascades of matrix multiplications to
calculate latent cell states from inputs. During <strong>training</strong>, the parameters are adapted to find signals for prediction
in the consumer history. This results in appropriate ways to calculate latent states: the latent states are prediction
features that are <strong>learned</strong> from raw inputs. <strong>No further feature engineering is required.</strong></p>
<h3>But do RNNs work in practice?</h3>
<p>To see whether RNNs live up to their promise, we performed experiments on historical consumer data from <em>Spring 2016</em> in
two European countries, involving millions of Zalando visits. Data was split along time into training and test sets.</p>
<p>As a baseline, we use a logistic regression model built on a large number of fine-tuned handcrafted features, denoted
<em>Handcrafted features + logreg</em>. These features were determined in ongoing labor-intensive feature engineering efforts.
The model has been used in our production system previously.</p>
<p>We use a variety of metrics for benchmarking. Here, we present the results for the area under curve (AUC) metric on the
test sets:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4cd91d11e8f1c3a4df4f500f37d774424519d150_results.png?auto=compress,format"></p>
<p>Random predictions have an AUC of 0.5. For <em>Country 2</em> (right diagram), we also present results with non-linear
predictors: Both <em>Hand-crafted features + neural net</em> and <em>Complex RNN</em> use an additional hidden layer with rectified
linear units (ReLUs) for prediction.</p>
<p>The results show that RNNs achieve about the same or better performance than the models relying on fine-tuned
handcrafted features.</p>
<h3>Comparing both approaches</h3>
<p>Model tuning, training, and prediction is easier and faster with handcrafted features and vector-based machine learning
methods like logistic regression. And it’s no wonder, as most of the job was done beforehand by the human domain expert.
This expert handcrafted informative features and thus, can take most of the credit for prediction accuracy.</p>
<p>In contrast, RNNs learn everything from scratch. Therefore, they require more training time and more involved model
tuning. At the same time, it is machine learning that drives full model accuracy for RNNs.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/61ffdae9b6cc0052bc28c3d3797f93ea0cbdef6c_overview.png?auto=compress,format"></p>
<p>Nowadays, a growing concern is <strong>model interpretability</strong>. Contrary to popular belief, it can be <a href="http://www.sciencedirect.com/science/article/pii/S1053811913010914">more
difficult</a> than often assumed to interpret logistic
regression models, due to correlated features and feature binarization.</p>
<p>Likewise, the parameters of RNN models are hard to make sense of. In contrast, we can interpret the prediction process
of an RNN for a given consumer history, as we'll see in the following.</p>
<h3>Visualizing RNN predictions to understand consumer behavior and improve service offers</h3>
<p>Understanding the consumer journey is a holy grail in e-commerce. Analyzing the reasoning of prediction models provides
insights into customer behavior and ways to improve the service experience of consumers. In addition, visualizing the
prediction process is helpful to assure oneself that a model does what it is intended to do. RNNs can be a great tool
for these purposes.</p>
<p>The following graph shows the latent cell states of a trained RNN when fed with the history of a specific consumer
(<em>PDP</em> = product detail page view, <em>SALE</em> = order, <em>CART</em> = cart view):</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3dad644f1e2897f9c6ef3b7ecac3fc407f6d21fb_vis_events_cells.png?auto=compress,format"></p>
<p>While the cell values cannot be interpreted by themselves, we can see how they change for the given inputs over time. In
particular, the model is sensitive to specific event patterns (emphasized in orange boxes): The RNN has learned to
detect consumer behavior indicative of future orders.</p>
<p>How wishlist views and cart additions increase order probability, while orders decrease it (in the context of the
consumer's previous events) can be seen in the corresponding predicted probabilities:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/33838dab3df95945f894e66924f9b81db9365b74_vis_events_prob.png?auto=compress,format"></p>
<p>In e-commerce, we often want to make sense of the <strong>session journey</strong> of a consumer, the sequence of their <strong>visits</strong>.
In our context, we would like to understand which sequences of sessions lead to orders. A session is simply a collection
of events, with no events before or after for a specified duration. Instead of individual events, we can feed sessions
into an RNN. More precisely, session inputs specify which event types happened within the session and how long the
session was in minutes and in number of events. (This is a mild form of feature engineering.) We use the same RNN
architecture, but training with session inputs results in a different RNN model, namely one that understands sessions
instead of events.</p>
<p>Concerning empirical performance in our experiments, AUC values were similar for simple RNNs trained with event and
session-journey inputs (the same until the third position after the decimal point; we show the event-journey RNN results
in the AUC graph above). For the complex RNNs, we only experimented with the session-journey representation (for which
we show the result in the AUC graph).</p>
<p>The following diagram depicts the resulting cell states for an exemplary consumer session journey:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a970f66e4ed1e622972d8611714b70e7900edc30_vis_sessions_cells.png?auto=compress,format"></p>
<p>The 10th to last session of the consumer was intense with 194 events (the kind of session we love to see at Zalando).
Intuitively, this shows strong interest in ordering. Indeed, this is reflected in the predicted order probability which
jumps from 29% to 51%:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c2e6f602b16b5b51da377bbed57664af21f6f2e0_vis_sessions_prob.png?auto=compress,format"></p>
<p>These visualizations allow us to assess in a quantitative way how consumer actions affect model predictions. They deepen
our understanding of how consumers interact with Zalando and shift vague assumptions about this process on a firm
empirical basis.</p>
<p>In principle, the progress of predicted order probabilities could also be visualized with vector-based machine learning
methods like logistic regression, but in a cumbersome and inefficient way: We would need to re-calculate all
hand-engineered features at every single time-step. This would be a highly redundant process: The features at time-step
<em>t</em> would represent the complete history until <em>t</em> and not only what happened between <em>t-1</em> and <em>t</em>. In contrast, all
calculations and probabilities come for free in the prediction process of an RNN, as RNNs model sequences in a natural,
direct way. More importantly, with vector-based machine learning methods, we would still need to interpret the
hand-engineered features at every time-step to make sense of the model. If you have hundreds or thousands of features,
and features are correlated and have been preprocessed, this is usually a complex and confusing task.</p>
<h3>Wrap up and next steps</h3>
<p>When you have a detailed understanding of your domain or when you're building your first predictive models, feature
engineering with logistic regression, random forests and the like are a great way to start.</p>
<p>The future of modeling sequential data, however, seems to belong to deep learning methods: They do most of the feature
engineering for you while achieving superior accuracy. RNNs process sequences in a natural way and are therefore
particularly promising for modeling consumer behavior in e-commerce. They are a versatile tool if prediction tasks are
diverse and when feature engineering becomes more cumbersome.</p>
<p>At the same time, RNNs are great in explaining the reasoning in the prediction for individual consumers, again due to
their natural modeling of sequences.</p>
<p>In practice, rolling out and maintaining deep learning models is more challenging than using traditional machine
learning models. Nevertheless, the gains may be well worth the effort.</p>
<p>At Zalando AdTech Lab Hamburg, we have set up a product prototype based on RNN models; we’re now working towards a
production system based on RNNs. This is appealing for us as we want to employ machine learning models for an increasing
number of use cases while incorporating more features. At the same time, we want to ensure that the predictions of our
models are transparent and comprehensible even for non-experts.</p>
<p>Would you like to find out more about what <a href="https://tech.zalando.com/blog/zalando-adtech-lab-hamburg/"><strong>Zalando AdTech Lab Hamburg</strong> is up to? Read about our advertising
engineering efforts and current open positions</a>
<a href="https://tech.zalando.com/locations/#hamburg">here</a>.</p>Zalando Tech x Strange Loop 20162016-10-19T00:00:00+02:002016-10-19T00:00:00+02:00Silvia Patricia Moura Pinatag:engineering.zalando.com,2016-10-19:/posts/2016/10/zalando-tech-x-strange-loop-2016.html<p>Take a look at our rundown of Strange Loop 2016 all the way from the USA.</p><p><a href="http://www.thestrangeloop.com/">Strange Loop</a> has taken place every year since 2009 in St. Louis, Missouri (USA) and is
highly considered among developers, covering a wide range of topics from programming languages, distributed systems, web
development, functional programming, and socio-political implications of technology.</p>
<p>I had the chance to attend this year's edition of Strange Loop and would like to share the highlights. The first day was
dedicated to <a href="http://www.bretfisher.com/strangeloop2016/">workshops</a> as well as two conferences in parallel: elm-conf
and Papers We Love.</p>
<p>A workshop highlight for me was "Deploying and scaling applications with Docker" by Bret Fisher. Starting with running a
sample app on a single node with Compose, we proceeded to scale it to a cluster of Docker nodes using Swarm.</p>
<p>The two following conference days were packed full of high quality talks, with a total of five concurrent tracks that
were a little hard to manage. Since the videos were readily available on the day after, some of the talks I just watched
on Strange Loop's YouTube channel, which you can access
<a href="https://www.youtube.com/channel/UC_QIfHvN9auy2CoOdSfMWDw">here</a>.</p>
<h3>Presentation highlights</h3>
<p><a href="https://www.youtube.com/watch?v=HfD9IMZ9rKY">"Systems programming as a swiss army knife"</a> by Julia Evans was one of my
favourite talks. It focused on strategies for debugging any kind of system using Linux tools such as strace, tcpdump +
wireshark and perf. She showed how knowledge about kernels and systems programming can be used to help you become a
better programmer, while being able to convey her enthusiasm in a very nice way, making the talk particularly appealing.</p>
<p><a href="https://www.youtube.com/watch?v=fNe1i7nVbXI">"Humanities x Technology"</a> by Ashley Nelson-Hornstein was the ending
keynote for the first day of the conference. The key takeaway from here is that technology just for technology’s sake
just does not matter – rather, technology is for people, and as such it should be at the intersection of liberal arts.
The examples used to convey this message were very evocative.</p>
<p>One of these examples was Tay, a chatbot that learned from Twitter users – within hours of coming online, it became
extremely anti-human, showing how the Internet had turned an AI chatbot into a hate machine. Tay was based off of
<a href="https://www.inverse.com/article/13387-microsoft-chinese-chatbot">Xiaolce, a Chinese chatbot</a>, which was more successful
since it focused on the empathy of the users, not precise chat communications. Their technology was informed by their
humanity, not pure science, like Tay. Another key takeaway is that we, as developers, are not the user, and that we
possess blind spots when trying to understand people that we need to be mindful of. You can read Ashley’s notes on the
presentation <a href="http://trevmex.com/post/150510229108/humanities-x-technology-strangeloop-notes">here</a>.</p>
<p><a href="https://www.youtube.com/watch?v=s3GfXTnzG_Y">"Building a Distributed Task Scheduler With Akka, Kafka, and Cassandra"</a>
by David van Geest showed how his team built a task scheduler using Akka, Kafka, and Cassandra, leveraging the strengths
of these technologies. Some of the challenges they faced include dynamically adjusting for increased task load with zero
downtime, ensuring task ordering across many servers, and making sure that the tasks still run if a datacenter goes
down.</p>
<p><a href="https://www.youtube.com/watch?v=7Q-UwjgZ0q4">"Unlimited Register Machines, Gödelization and Universality"</a> by Tom Hall,
presenting a formalization written in Clojure for Universal Register Machines, used here to perform simple arithmetics
operations (like sum and product), while encoding instructions as numbers themselves and performing a list of
instructions that comprise a program. For those unaware, Gödelization means to formalize a simple system in math.</p>
<p>The whole presentation was carried by Tom's enthusiasm over his achievement, even if the result was to simply perform a
simple addition. His passion for the topic is highly contagious and the talk is worth watching just because of this. You
can also read his notes <a href="http://trevmex.com/post/150548687638/unlimited-register-machines-g%C3%B6delization-and">here</a>.</p>
<p><a href="https://www.youtube.com/watch?v=C7LQATGvcnI">"Kittens - datatype-generic functional programming with Scala"</a> by Kailuo
Wang, where he presented Kittens, a library built on top of shapeless and cats, which is meant as a proof of concept
around combining generic and functional programming. Several examples that used this library were shown to illustrate
its features and use cases.</p>
<p><strong>"Reproducibility"</strong> by Gary Bernhardt focused on the importance of reproducibility, a feature that guarantees that
given the same inputs, a tool yields the same thing. This is important because it aids in building clear mental models
of the tool's behaviour, and his central thesis is that these mental models lead to tools we love whose designs are
highly non-obvious. One example of this is <a href="https://git-scm.com/">git</a>, which will produce the same hash for a file
committed in two different machines. While the video of his presentation isn’t available, you can still access his
<a href="https://strangeloop2016.slack.com/files/devin/F2CS0QQTB/reproducibility.txt">notes</a>.</p>
<p>The following talks received high praise and are on my "to watch" list:
- <a href="https://www.youtube.com/watch?v=qCUI5ryyMSE">"Diocletian, Constantine, Bedouin Sayings, and Network Defense"</a> by Adam
Wick
- <a href="https://www.youtube.com/watch?v=V5p3FBwGHnY">"Fold, paper, scissors - an exploration of origami's fold and cut
problem"</a> by Amy Wibowo
- <a href="https://www.youtube.com/watch?v=02h74L1PmaU">"Languages for 3D Industrial Knitting"</a> by Lea Albaugh</p>
<p>For me, the best part of Strange Loop is the inspiration that it brings me and the insightful conversations that I had
with some of the other people attending, so I will remember this edition of the conference for a long time.</p>
<p>If you’d like to get in touch about my experiences at Strange Loop, find me on Twitter at
<a href="https://twitter.com/smourapina">@smourapina</a>.</p>Research Roles at Zalando Research: What You Need To Know2016-10-18T00:00:00+02:002016-10-18T00:00:00+02:00Dr. Reiner Krafttag:engineering.zalando.com,2016-10-18:/posts/2016/10/research-roles-at-zalando-research.html<p>We have created three job profiles that outline how we organize research within Zalando.</p><p>In the past few months we’ve made a lot of progress reorganizing research and development in Zalando. We recently
launched <a href="https://tech.zalando.com/blog/zalando-launches-research-lab/">Zalando Research</a> as a place where we can focus
and conduct cutting edge research, as well as contribute actively to the research community in the areas of <strong>machine
learning (ML), AI, natural language processing, and deep learning</strong>. We’ve also started organizing our teams to make
sure we have a good ratio between research and engineering for every delivery unit.</p>
<p>We have created three different job roles and profiles that outline how we organize research within Zalando. All three
roles have one thing in common: Each person is a researcher. What makes them different is their focus and interest. We
have:</p>
<ul>
<li><a href="https://tech.zalando.com/jobs/241070-research-engineer-search-personalization-senior/">Research Engineers</a></li>
<li><a href="https://tech.zalando.com/jobs/241069-research-engineer-data-science/">Data Scientists</a></li>
<li><a href="https://tech.zalando.com/jobs/241969-research-scientist-machine-learning-and-ai/">Research Scientists</a></li>
</ul>
<p>As we have iterated and refined these roles for the purposes of our newly launched lab, I wanted to share briefly how we
distinguish them.</p>
<h3>Research Scientists</h3>
<p>Firstly, research scientists are very senior and experienced researchers, have a PhD in Computer Science or related
discipline, a minimum of 3-5 years of relevant research experience (either in academia or industry) in the area of ML,
and a strong publication track record.</p>
<p>Research scientists are not engineers, and they work at Zalando Research to produce world class research, to help shape
our products, publish and present papers at top conferences, file patents, and otherwise help to promote Zalando
Research as as one of the premier research labs in the world. A research scientist usually focuses on continuous
learning and experimentation as part of their research activity. We also offer a place for research scientist postdocs
who have recently completed an excellent PhD, in order to provide an environment that helps them strengthen and grow
their research experience.</p>
<h3>Data Scientists</h3>
<p>A data scientist is also a researcher, but during their education focused primarily on, and have a very strong
background in, data science. The area of data science typically comprises topics such as statistics, applied
mathematics, operational research, data mining or modelling, and related disciplines.</p>
<p>Data scientists work in the context of a delivery team, and can do basic engineering tasks to help them complete their
experiments, but are not aspiring to become engineering experts. Similar to research scientists, data scientists are
encouraged to publish papers, file patents, and actively contribute to the research community, as long as it is in line
with the delivery goals of their team.</p>
<h3>Research Engineers</h3>
<p>Last but not least, the research engineer combines strong research skills with comprehensive engineering experience.
Similar to full-stack engineers who combine frontend and backend expertise, a research engineer combines research and
engineering: They are able to work relatively independently on researching complex ideas and pushing them into
production, by writing production-quality code.</p>
<p>Usually, research engineers work in different “modes”. They start out with an idea, explore and evaluate it (“research
mode”) for some weeks, and then once they have proven the merit of the idea, they go into “engineering mode” to
implement their idea into a production system. This process is then repeated for new ideas and projects.</p>
<p>Often research engineers get stuck in maintaining a system that they had previously developed, which prevents them
working on more original research. Therefore we recommend that research engineers strive for an ongoing balance between
research and engineering. Just like research and data scientists, research engineers are encouraged to publish papers,
file patents, and actively contribute to the research community, as long as it is in line with the delivery goals of
their team.</p>
<h3>Career progression</h3>
<p>Why do we feel that it is important to know what research role you are currently fulfilling? We want all of our research
colleagues to be able to carve out a career within the industry.</p>
<p>As a research scientist, you may feel the urge to get closer to product and delivery, therefore migrating more into a
role of a data scientist or research engineer. In the latter, ramping up on engineering skills is important. If you wish
to remain a research scientist, continue broadening and deepening your research expertise to grow your career.</p>
<p>As a data scientist, you may want to improve your engineering skills to become a research engineer. Or perhaps you would
like to focus more on core research, not within the context of a delivery team or product, to eventually become a
research scientist. In this case, you would need to work on your publication track record, for example.</p>
<p>As a research engineer, broadening either your research or engineering skills would be appealing, or you might be aiming
to work in a concentrated research environment. To realize this, you would need to work on your publication track record
or on fulfilling all the requirements needed to become a research scientist, listed above.</p>
<p>I hope this article helps to clarify the differences between the research related job roles we offer at Zalando. If
you’re interested in these job opportunities, more information can be found <a href="https://tech.zalando.de/jobs/data/">here</a>.</p>Copywriting for Emotion2016-10-14T00:00:00+02:002016-10-14T00:00:00+02:00Adrien Renahytag:engineering.zalando.com,2016-10-14:/posts/2016/10/copywriting-for-emotion.html<p>With users in 15 countries, we cater to a variety of shopping behaviours and needs.</p><p>With users in 15 European countries, from the southwest of Spain to the northeast of Norway, we have to cater to a
variety of shopping behaviours and needs. At Onsite Management, we strive to localise our international websites and
apps to build an experience that our users can trust and relate to, culturally.</p>
<p>Localisation spans an array of projects and processes, such as understanding the local users performing qualitative and
quantitative research, or optimizing elements such as navigation or internal search, which help us achieve our business
goals.</p>
<p>We have been giving special attention to the way we speak to our users through our interfaces. The aim was to swap our
functional approach (“Your bag is empty”) for a more “human” and engaging style (“There’s nothing in your bag right now
but it doesn’t have to stay that way”).</p>
<h3>Understanding the status quo</h3>
<p>At Zalando, many people are involved in the process of writing copy. We knew we had to be inclusive from the start if we
wanted the project to succeed.</p>
<p>First, we interviewed various people involved in the process of crafting copy at Zalando, from Copywriters to Product
Managers to Onsite Managers. As a result, we were able to identify three main areas for improvement: Lack of context
when translating, more standardised communication, and clearer responsibilities.</p>
<h3>Finding solutions</h3>
<p>We invited a number of our fellow employees to collaborate with us in a room stocked with post-its, sharpies, and
coffee. To address the three improvement areas, we agreed that we had to focus on two paths: Refining internal processes
and increasing the quality of the interface copy. It was great to see people who had worked together for a long time
finally meet in person!</p>
<h3>Implementing solutions</h3>
<p>Among the many ideas which came from the workshop, we introduced new templates, regular knowledge sharing meetings, and
new guidelines that were made accessible to everyone in the company. We organised them as a pyramid:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/97353c896e68f5911149f75044cd4ba94d7b9be2_piramid_new.png?auto=compress,format"></p>
<ul>
<li>The “voice” is the base, personality of Zalando – once defined, it rarely changes.</li>
<li>The “tone” depends on the situation the users are in, for example making an apology or finding the right size.</li>
<li>"Web elements" are concerned with how we write amazing buttons or headlines, they are the same across markets and
situation.</li>
<li>“Country specifics” contain details relevant for one language, for example always use “voucher” instead of “coupon”
when writing in English.</li>
</ul>
<h3>Helping users</h3>
<p>We had laid the basis for improvements but still had to make a real impact for users. We chose to start working on the
copy of all the error messages because they appear in the most critical moments, and we wanted to maximise our impact:
We have 784 different messages in 11 languages!</p>
<p>We categorised them according to the section where they appear, such as the cart or login page, and prioritized them
according to the amount of times they were shown. We were finally ready to craft new copy.</p>
<p>The team planned what we called “weekly translation waves”. For each wave we asked ourselves:</p>
<ul>
<li>How is the user feeling in this situation?</li>
<li>What should be the purpose of this message?</li>
<li>What are the important elements the user needs to know at this moment?</li>
<li>How can we be more informative whilst keeping it short?</li>
</ul>
<p>After each wave, we came out with a few different versions of each error message. We performed a series of user tests to
make sure the new versions had the right effect on users. At the same time, we gathered learnings to improve the
versions before testing them again. Ultimately, the whole research project turned into a set of easy copywriting
guidelines for this specific situation.</p>
<p>As a result, we now sound a little nicer and friendlier to our users who run into trouble. And that was only the
beginning! Since then, we’ve followed a similar process to improve other parts of the website, such as the Help section
or the process to return articles. We’ve also worked on ways to communicate the guidelines more broadly within the
company (for example, we created posters).</p>
<p>If you’d like to get in touch about any of our localisation work, you can find us on Twitter
<a href="https://twitter.com/CarlosViciana">@CarlosViciana</a> and <a href="https://twitter.com/AdrienR">@AdrienR</a>.</p>
<p>Want to optimise the Zalando experience for a specific market? Do you love to run qualitative and quantitative tests to
find out what works best for target customers? <a href="https://jobs.zalando.de/en/?category=Technology+%26+UX%2FUI+Design&location=Berlin&search=onsite">We’re always looking for smart people to join
us!</a></p>Techsperts at Scale: Tips to Grow Your Business2016-10-12T00:00:00+02:002016-10-12T00:00:00+02:00Zalando Technologytag:engineering.zalando.com,2016-10-12:/posts/2016/10/techsperts-at-scale-tips-to-grow-your-business.html<p>We sat down with this month's Techsperts to chat about the business of businesses.</p><p>A great addition to our <a href="https://tech.zalando.com/blog/zalando-techspert-series-launch/">Zalando Techspert Series</a> has
just wrapped up, with this month’s theme focusing on <a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/234059037/">scaling your
business</a>.</p>
<p>We invited an extra panelist to mix it up, with Head of Data at GetYourGuide, Mathieu Bastian, joining the fun along
with the Managing Director of Movinga, Finn Age Hänsel. We were also joined by the Director of International Markets
from Delivery Hero, Tuomas Hurmerinta, and rounded up the panel with our VP of Brand Solutions, Christoph Lange.</p>
<p>Sad to have missed the event? We sat down with our Techsperts who shared some advice for those interested in the
business of businesses.</p>
<p><em>Zalando: What are the key attributes you look for when you’re hiring at scale?</em></p>
<p><em>Finn Age Hänsel:</em> When you’re hiring below scale, you’re looking for talent and very driven people. While this doesn’t
stop when you’re hiring beyond that point, hiring at scale means that you’re also looking to hire people with
experience. What we’ve learned with Movinga is that you need staff who’ve done what you’ve needed before, otherwise it’s
constant trial and error.</p>
<p><em>Mathieu Bastian:</em> I agree – at some point you need specific expertise. When hiring very specialised people, you still
want them to have a track record of agility and be able to adapt quickly. Even though their expertise is needed, the
project they may end up working on could be completely different to what they expected, or are used to. Thus, your hires
needs to be able learn fast.</p>
<p><em>Tuomas Hurmerinta:</em> I think at some point in a company’s development, you just need to have the strong functional
leaders in place. People who know what they are doing and can fully own their areas of responsibility. At scale, it
becomes more important that these functional leaders can also lead and develop their people. This means that good people
management becomes a quality that you’re after if you’re looking to grow.</p>
<p>Good communication skills is for sure something to single out, as well as vision. But you also still need to motivate
people, by being ambitious and setting targets for them to achieve.</p>
<p><em>Christoph Lange:</em> One of the major differences when hiring at a later stage is that you need more structure, whether
that be in people or the organisation, as opposed to the very beginning of your business when you’re looking for
employees who are really driven to achieve something, where the willingness to succeed often wins out over the need to
do things the right way.</p>
<p><em>Zalando: Give us an example of a challenge you encountered during your growth phase and what you learned from the
experience.</em></p>
<p><em>Christoph Lange:</em> During this month’s Welcome Day, we had 170 new starters joining the company. These days, our welcome
events are incredibly organised and accommodating for such a large group. However, if I look back to 2011 during our
growth phase, we were concentrating on internationalisation and the hiring a lot of people – adding 15% of total staff
to the company each month. This became complete chaos, leading to employees not receiving their hardware straight away,
for example, or even needing to build their own desk!</p>
<p>We now have a fully fledged Onboarding Program that lasts for four weeks, where Tech employees are properly managed and
prepared by the company to begin their work. This says a lot about how we’ve addressed the need to be more structured in
this approach.</p>
<p><em>Tuomas Hurmerinta:</em> Fast growth surely brings challenges. You have to invent new ways of doing things, as the old ways
don’t work anymore at a larger scale. And if you are in a great industry, where a growth phase can last for years and
years, you have to do this many times over.</p>
<p>The challenge is always to manage that growth, as it can definitely fluctuate – this often means you’ll have to trial
new things to ensure you can keep the company moving. This might come from new ideas or from better execution of
existing ideas. But it’s great to see that in markets where you have operated for 10-15 years, new ideas still work to
boost growth.</p>
<p><em>Mathieu Bastian:</em> As my focus is on data, I can provide comments here. Growth and scale doesn't happen without a clear
focus, which usually means good data. We spend a lot of time working on making our data rich, clean, and fast-to-access.
One constant challenge is to very precisely define what we're trying to achieve into measurable metrics, in every area
of the business. Sometimes we spend more time working on getting the data right rather than working on the problem
itself.</p>
<p>The reason behind this is that we fundamentally believe in experimentation, and there is no successful experiment
without a measurable outcome. In other words, it's only worth attacking a problem once you have a way to experiment at
scale. This is a very important mindset to have during the growth phase.</p>
<p><em>Finn Age Hänsel:</em> One thing I’ve learned is that there is a huge difference between growing and scaling. Growing, very
often, is not necessarily the problem – if you have a good product market fit that the customer loves, you can always
grow. However, scaling means to grow from a good platform with good processes, with a sustainable foundation that allows
growth without overheating the company. I think just growing isn’t necessarily a good thing – it’s scaling that shows
you have your processes in place.</p>
<hr>
<p>Keep an eye on our <a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/">Zalando Tech Meetup page</a> to be notified of the
next Techspert Panel, or follow our updates on Twitter <a href="https://twitter.com/ZalandoTech">@ZalandoTech</a>.</p>Key Talks and Takeaways from the AnDevCon Conference2016-10-11T00:00:00+02:002016-10-11T00:00:00+02:00Sergii Zhuktag:engineering.zalando.com,2016-10-11:/posts/2016/10/key-talks-and-takeaways-from-the-andevcon-conference.html<p>A great resource of information and insight into Android technology and development.</p><p>This August I had a great time in Boston attending the AnDevCon Conference. As an Android Developer at Zalando, I found
this conference a great resource of information, knowledge, and insight. I’d like to share my impressions and highlight
some of notable talks I attended.</p>
<h3>Conference organization</h3>
<p>The venue for the event – the majestic Sheraton Boston Hotel – is a high class hotel and conference center with both
surprisingly stable Wi-Fi and a convenient navigation plan. Visitors were able to chat and have drinks with rockstar
speakers such as <a href="https://twitter.com/commonsguy">Mark Murphy</a> and <a href="https://www.amazon.com/G.-Blake-Meike/e/B002SOFCA4/ref=dp_byline_cont_book_1">Blake
Meike</a>. A dedicated Google Learning Zone
was completely focused on Firebase and related products, where we could meet with Firebase developers and also Google
Developer Experts from all over the world.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1f413fc4f833a8a2cf6dcb041c1019784596ebe8_adc_show_floor.jpg-t1472680389567width710height362nameadc_show_floor.jpg?auto=compress,format"></p>
<h3>Services, Processes and Binder by Blake Meike</h3>
<p>This talk was presented by Blake Meike, who is the author of <a href="https://www.amazon.com/Android-Concurrency-About-Deep-Dive/dp/0134177436">Android
Concurrency</a> and other books. Most Android
Developers are definitely familiar with Services and IntentServices, but it seems that most of us require some knowledge
structuring after reading tons of forums and docs. During Blake’s presentation we had several coding classes with quite
challenging tasks – a perfect way to make your listeners stay awake during a deep dive into a complex topic.</p>
<h3>Multi-Window and Your App by Mark Murphy</h3>
<p>This talk was presented by one of the <a href="http://stackoverflow.com/users/115145/commonsware">major contributors</a> to the
<a href="http://stackoverflow.com/documentation/android/topics">StackOverflow Android community</a>, Mark Murphy. <a href="https://developer.android.com/guide/topics/ui/multi-window.html">The Android
Nougat Multi-Window mode</a> is one of the hottest topics
since it was announced at Google IO16 and it also affects app behavior on Chrome OS devices. Here are the main
takeaways:</p>
<ul>
<li>Root <em>Activity</em> of the task determines the window behavior</li>
<li><em>Activity</em> can stay in the paused state if app is in multi-window mode and user has switched focus to another
multi-window app</li>
<li>By default, <em>Activity</em> will be destroyed and recreated due to multi-window if user enters multi-window, resizes your
window and exits multi-window with your <em>Activity</em> focused</li>
<li>Manufacturers can choose to enable freeform mode, in which the user can freely resize each <em>Activity</em></li>
</ul>
<h3>ART/Dalvik Reverse Engineering by Jonathan Levin</h3>
<p>Jonathan Levin built his talk from a number of key points out of his book <a href="http://newandroidbook.com/">Android Internals: A Confectioner’s
Cookbook</a>, and practical shell hacking on Android OS. Jonathan went deep into Dalvik and ART
internals, comparing the difference between them, and showing console hacks. We were shown several tools for reverse
engineering, some of them developed by Jonathan himself. The true benefit for attendees were the insights gained about
current Android Runtime (ART) and OAT file format. For example, did you know that ART has two garbage collectors with
eight garbage collection algorithms?</p>
<h3>Java 8 for Android by Blake Meike</h3>
<p>Java 8 has already become a standard for backend developers, but it wasn’t there for Android until 2016 when Google
announced its support. Blake Meike introduced key language features of Java 8 including Lambdas, Streams, Method
references and new types. He also spoke about the current status of Retrolambda and the possibility of it being replaced
by Java 8 libraries. Something important to note is that Jack compiler must be enabled to use key Java 8 features on an
API lower than 23. This can lead to additional risks during compilation and break bytecode-manipulation tools used in
your project.</p>
<h3>Firebase and Android: A Real-Time Match Synced in Heaven by Adrián Catalan</h3>
<p>This practical presentation by GDE Adrián Catalan focused on an example of how to setup a Firebase configuration and
Android app to see each other, build data structures, and synchronize selected fields. Adrián demonstrated how easy it
is to add social buttons from Facebook, Google+, and email to your app login using Firebase. Another demo was intended
to show <a href="https://github.com/firebase/FirebaseUI-Android">FirebaseUI</a> binding with RecyclerView. I’m still surprised by
the functionality set provided by Firebase for Android devs – this toolset can significantly speed up the application
development process.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/cd6759506c0334841c4872792f64fe5c45b3842a_adc_class.jpg-t1472680389567width710height169nameadc_class.jpg?auto=compress,format"></p>
<h3>Easy Secure Internet Access in Android by Mark Murphy</h3>
<p>Mark Murphy was back for presentation focused on security issues of network access in Android. He demonstrated several
cases when SSL Certificate authority was hacked and why we shouldn’t go with the “trust all certificates” approach. Mark
presented the new Android Nougat Network Security Configuration feature using a dedicated XML file. This was followed by
a showcase of his <a href="https://github.com/commonsguy/cwac-netsecurity">backport</a> of this tool using the same XML
configuration going back to API Level 17 (Android 4.2).</p>
<h3>High-Performance Android Java Application Development by Paul Hohensee</h3>
<p>This presentation described some ART architecture solutions with details about improvements shipped with Android Nougat.
Paul also explained a key difference between AOT and JIT. Then he went through the most common Java code performance
issues in code that we as Android developers are writing every day. This ended with graphs comparing execution time of
different code snippets with the same output.</p>
<h3>Your Business Relies on IP, so Protect It by Vlad Shvartsman</h3>
<p>This talk wasn’t related to everyday developer work, but was focused on intellectual property. It was a surprisingly
interesting presentation with a full house of listeners and tons of practical takeaways. Let me mention just some of
them:</p>
<ul>
<li>Google’s Trademark complaint form can remove a competitor’s App in 4 hours</li>
<li>A Patent will expire in 14-20 years, but trademarks never expire as long as they’re used in commerce</li>
<li>An Application name and icon can be trademarked</li>
<li>Use the United States Patent and Trademark Office <a href="https://www.uspto.gov/">website</a> to search for registered patents
and TMs</li>
</ul>
<h3>Conclusion</h3>
<p>This post doesn’t present the full list of great talks, as the conference featured more than 50 presentations on four
tracks. However, I collected as much knowledge as I could and also met great people from all over the world. Thank you
AnDevCon and I'm looking forward to attending the event next year!</p>
<p>Do you have questions about AnDevCon experience or Android development at Zalando? Get in touch via Twitter at
<a href="https://twitter.com/sergiizhuk">@sergiizhuk</a>.</p>5G: The Future of Wireless Networks2016-10-06T00:00:00+02:002016-10-06T00:00:00+02:00JenTechnologytag:engineering.zalando.com,2016-10-06:/posts/2016/10/5g-the-future-of-wireless-networks.html<p>Want to know what's in store for the world with the approach of 5G connectivity?</p><p>5G technology is soon approaching. On the suppliers’ end, for Zalando and other companies, this will bring about faster
responses and speedier service delivery. On the consumer's’ end, a wider range of products and services will be made
available. Due to improved speeds and lower latency from 5G, processes can be expedited, for instance including payment
verification and order identification, which means same day delivery of in-stock items becomes the norm in the 5G age.</p>
<p>5G is another step up in mobile connectivity, and it is currently in the works globally. Many countries around the world
are preparing for this new shift in mobile data networks. Let’s take a closer look at this technology and how it may
change the future of wireless communication.</p>
<p>Although plans for official release for consumer use is <a href="http://www.pcworld.com/article/3037452/mobile/this-is-5gs-moment-four-years-early.html">still at least four years
away</a>, demonstrations of 5G
capabilities effectively started hype earlier this year at the Mobile World Congress (MWC). Exhibitions showed the
potential of the next generation of mobile.</p>
<h3>How is 5G different?</h3>
<p>5G’s mission is much more than just to improve basic communication services such as phone calls, SMS, or data connection
in general. It has numerous other potential applications and according to surveys, <a href="http://www.pcworld.com/article/3113669/mobile/as-5g-heads-for-iot-4g-is-far-from-done.html">the best use for 5G will be in the
Internet of Things</a>. This
will be important for major global processes like automation and transportation.</p>
<p>As for network speeds, the figure 5G aims for is 100 times faster than the current 4G and LTE networks. Nokia’s demo at
the MWC reached 20Gbps, the same speed being reached by South Korea’s SK Telecom. Ericsson’s set up yielded 26Gbps,
while T-Mobile got 70Gbps, the latter of which was shown on a live feed from a Huawei base station in Germany.</p>
<p>Results were varied as it was dependent on frequency spectrum, and some booths used a wider spectrum than others. In
comparison, however, today’s fastest LTE speeds just reach 1Gbps. Imagine if you <a href="http://www.o2.co.uk/iphone/register">register your smartphone for automatic
updates</a>, and a software upgrade was just released. As soon as you notice, the file
would’ve been downloaded already. That’s a very plausible scenario with 5G mobile.</p>
<p>Low latency is also another target feature of 5G. The goal is to reduce latency down to 1 millisecond. To show just how
fast it is, T-Mobile conducted a separate demonstration in their booth at the MWC.</p>
<p>Two metal balls, one 5G connected and the other 4G, were placed on top of a suspended platform. A robotic arm then
passes under the platform, and when it’s detected, the holes beneath the balls open. The 5G ball drops in time and gets
caught by the arm, while the 4G ball falls too late, missing the arm. Remember here that the balls and platform were
wirelessly connected to the robotic arm.</p>
<h3>Necessary preparations</h3>
<p>Of course, 5G cannot be deployed for end users without the proper technological infrastructure. All over the US, there
are <a href="http://www.rcrwireless.com/20160830/opinion/reality-check-action-needed-now-make-5g-reality-tag10">around 300,000 cell
towers</a>. 5G will more
likely need more infrastructure, as its frequency waves travel for shorter distances, for now at least, as opposed to
the frequency bands of current networks.</p>
<p>A higher number of receiver cells will also be required for the same reason, thus, more permits will be required as well
not only from governments, but from millions of other property owners. These are just a few of the challenges ahead of
making the 5G concept a reality.</p>
<p>The same problems are also being faced in Europe. As stated by Brendan O’Reilly, CTO of O2, their network alone has a
<a href="http://www.computerweekly.com/news/450302721/Three-years-left-to-prep-UKs-mobile-network-infrastructure-for-5G">consumer usage that doubles each
year</a>.
To effectively support potential 5G usage, the digital infrastructure must be radically improved.</p>
<h3>Indoor and outdoor large scale tests are ongoing and upcoming</h3>
<p>AT&T has <a href="http://www.androidheadlines.com/2016/08/att-applies-temporary-fcc-clearance-conduct-5g-demo.html">applied for a temporary
license</a> from the U.S.
Federal Communications Commission (FCC) to proceed with indoor testing at the Texas Wireless Summit. The carrier plans
to use the 28GHz spectrum for 5G trials at the Edgar A. Smith Bldg. in Austin, Texas.</p>
<p>On the other side of the world, Japan and South Korea have begun their tests and the latter’s KT has announced its plans
to debut 5G at the 2018 Winter Olympics in Pyeongchang, South Korea. China’s ZTE on the other hand has <a href="https://www.sdxcentral.com/articles/news/zte-conducts-high-frequency-5g-field-tests/2016/08/">finished the
first phase of its high-frequency field
tests</a>.</p>
<p>Seeing this many nations with their eyes and hands on 5G, all hopes of improving network connectivity look very
promising in the coming years.</p>Jimmy to Microservices – The Journey One Year Later2016-10-05T00:00:00+02:002016-10-05T00:00:00+02:00Dan Persatag:engineering.zalando.com,2016-10-05:/posts/2016/10/jimmy-to-microservices-the-journey-one-year-later.html<p>Read about our microservices adventure more than a year after making the jump.</p><p>We started our migration from <a href="https://tech.zalando.de/blog/from-jimmy-to-microservices-rebuilding-zalandos-fashion-store/">“Jimmy”, our monolithic shop application, to
microservices</a> more than a
year ago. We had lots of fun with this project from the beginning of it, with us, Team Spearheads, discussing potential
ideas about what to do and how to do it, and then quickly “cowboy coding” some prototypes.</p>
<p>Now that the project has become more mature, our challenges have changed from prototyping to optimizing for performance;
from “cowboy coding” to providing stable components for other teams at Zalando to rely on.</p>
<p>To show our progress, I’ll be using this opportunity to explore the current status of the project.</p>
<h3>Evolution and progression</h3>
<p>Since we started, our project has been given a brand new name, <a href="https://www.mosaic9.org/">Project Mosaic</a>, and I’m proud
to say that it’s been a successful venture. We’ve added more components to the project as well, such as Shaker, Quilt,
and <a href="https://github.com/zalando-incubator/instaskip">Instaskip</a>.</p>
<p>Our team has also evolved, from Spearheads — a task force whose job was to come up with the new architecture — to
Pathfinder, a team with a clear purpose: <em>Team Pathfinder enables team autonomy by providing a platform to deliver web
content</em>.</p>
<p>Our team trains other Zalando development teams to create their new
<a href="https://www.oreilly.com/ideas/better-streaming-layouts-for-frontend-microservices-with-tailor">fragments</a>, and teaches
them how to create routes and templates in the new Mosaic architecture. We are currently supporting these teams in their
efforts to migrate parts of Jimmy to Mosaic Fragments.</p>
<p>We’re also putting more and more languages into production: We recently added <a href="https://clojure.org/">Clojure</a> to our
stack, with Instaskip. We started to build a new UI for Mosaic and this web app has no customer impact, so it seemed
natural for us to try <a href="http://elixir-lang.org/">Elixir</a> together with Elm.</p>
<h3>Migration updates</h3>
<p>One of the goals of the project is to migrate from the shop monolith to microservices, to get rid of Jimmy. Here is the
progress:</p>
<p>The new Wish List and PDP fragments are going live as we speak. The Header and Footer fragments are also live. Other
feature teams in the Fashion Store are on their way to putting their fragments live in the next quarter.</p>
<h3>Project updates</h3>
<p>Each of the projects under the Mosaic umbrella have evolved nicely:</p>
<p><a href="https://github.com/zalando/skipper">Skipper</a> has even more filters, like http compression, and predicates like the
cookie, query, source ip, and time range. There’s also better documentation. We have support for debugging endpoints and
experimental SPDY and WebSocket upgrades. We’ve improved the static file server filter and TLS Configuration handling.
For eskip we have <a href="https://github.com/zalando/skipper/issues/97">pretty printing</a> and route patching.</p>
<p><a href="https://github.com/zalando/tailor">Tailor</a> now supports base templates. Seeing how multiple templates are sharing quite
a few commonalities, the need to be able to define a base template arose. The implemented solution introduces the
concept of slots that you define within these templates. Derived templates will use slots as placeholders for their
elements.</p>
<p>Shaker is a library and living showcase for UI components. It is used inside Zalando to provide a consistent user
experience while developing Fragments in distributed and autonomous teams. During the last few months we’ve added more
components to the library.</p>
<p>New features in <a href="https://github.com/zalando/innkeeper">Innkeeper</a> include better path management, wildcard routes, more
granular authorization, and tighter integration with Skipper by using the eskip format.</p>
<p><a href="https://github.com/zalando-incubator/instaskip">Instaskip</a> is a new project, a command line tool for Innkeeper. The
goal behind developing it is to give teams a tool to easily migrate their routes to Innkeeper, while using the familiar
eskip format <a href="https://github.com/zalando/skipper">Skipper</a> provides.</p>
<h3>Next challenges</h3>
<p>We’ve started to develop the UI for Mosaic, to make sure Zalando teams are able to manage their routes and templates on
their own.</p>
<p>We also plan to evolve Innkeeper’s API to accommodate more use cases, like REGEX routes.
During the next quarter we’ll start expanding our Mosaic architecture to other departments as well. We’ll keep you
updated on the progress!</p>
<h3>Conclusion</h3>
<p>In the end I’d like to say thanks to everybody who contributed to make Mosaic a success. My team, the Pathfinders, who
worked hard and put lots of passion into making Mosaic a reality.</p>
<p>We’d also like to extend this thank you to other teams in the Fashion Store who trusted us and invested their efforts into creating their own
fragments, who gave us valuable feedback and dealt with our bugs.</p>How a Summer University for Women Makes a Difference2016-10-04T00:00:00+02:002016-10-04T00:00:00+02:00Patricia Lipptag:engineering.zalando.com,2016-10-04:/posts/2016/10/how-a-summer-university-for-women-makes-a-difference.html<p>The German Summer University for women in Bremen tackles the issue of Women in Tech.</p><p>In Germany, like in most other Western European Countries, the proportion of women studying computer sciences is about
<a href="https://en.wikipedia.org/wiki/Women_in_computing#Gender_gap">10-20%</a> – very low.</p>
<p>Some of the main reasons are the bad reputation of programmers, still often tagged as “nerds”, the missing female role
models in the tech scene, and the lack of a space to learn without pressure in a supportive environment amongst others
in similar positions.</p>
<p>Much like <a href="https://tech.zalando.de/blog/girls-day-zalando-tech/">Girls Day</a>, the German Summer University for women in
Bremen, <em>Informatica Feminale</em>, wants to change this.</p>
<p>Informatica Feminale was launched in 1997 as the first single-sex courses of its kind to be incorporated into a German
research university. Since then, its concept has successfully been transferred to national and international sister
projects.</p>
<p>I had the great opportunity to present a Scala workshop for absolute beginners this year. I took care of seven students
in total, who were incredibly eager when working in pairs through Scala worksheets and letting their <a href="http://scalatron.github.io/">Scalatron
Bots</a> run through the arena we were based in. They also pushed me to successfully supervise
a follow up exercise in Scala for their legitimate studies.</p>
<p>I also attended a workshop about mobile development and various guided tours through the technical attractions of Bremen
and the university. On top of this, other attendees were given a wealth of added presentations and talks to check out:</p>
<ul>
<li>Introductions to Artificial Neural Networks</li>
<li>Programming of/playing with Raspberry Pis</li>
<li>Twittering Arduinos</li>
<li>Introduction to Git Workflows</li>
<li>Soft-skill related workshops about Presentation, Communication, Negotiation Techniques, and Lean Digital
Entrepreneurship</li>
</ul>
<p>Overall, I was very impressed not only with the quality of the program, but also with its open and supportive
atmosphere, which made the biggest difference for most of the attendees.</p>
<p>Mirjam, one of my fellow students in the mobile development workshop, shared with me that for a long time she was very
passive in her 'normal' courses, due to the men in her course seemingly knowing all the answers, and as woman she felt
inadequate in that space. The courses at the summer university gave her room to try out things on her own with the help
of other women in the same boat.</p>
<p>After her positive experience, she plans to apply for a trainee program in the industry to get more hands-on with
programming. Other attendees of the programming courses had similar experiences and are now looking ahead more
proactively to plan their careers in computer science.</p>
<p>The curiosity, interest, and endurance of Mirjam, my students, and my fellow attendees was the best reward for my
mission in Bremen and I would be happy to see graduates or teachers from this initiative taking up positions as
trainees, working students, or developers at Zalando Tech.</p>
<p>If you’d like to contact me about Informatica Feminale, please email me at <a href="mailto:patricia.lipp@zalando.de">patricia.lipp@zalando.de</a>.</p>Our Engineers get Hands-On at Flow Festival2016-09-30T00:00:00+02:002016-09-30T00:00:00+02:00Katariina Nybergtag:engineering.zalando.com,2016-09-30:/posts/2016/09/our-engineers-get-hands-on-at-flow-festival.html<p>Read about how our Zelsinkis created audio-reactive visuals for the club setup of Flow Festival.</p><p>Flow Festival is a boutique music festival in Helsinki held every year around the second weekend of August. The festival
is known for its innovative visual design, versatile line up, and great food. It is set in an old industry area east of
the Helsinki city center.</p>
<p>The Helsinki Technology Hub had the pleasure of welcoming Zalando’s Nordic Marketing Team to our offices to work on the
Flow Festival, raising our brand awareness and engaging our Finnish target group of young adults.</p>
<p>A small group of our Helsinki engineers, or Zelsinkis as we call ourselves, endeavoured to create audio-reactive visuals
for the club setup of the festival. The idea came from a hack I created at the Music Tech Fest held in Berlin at the end
of May 2016, and was extended with input from other engineers that had a background in the Finnish
<a href="https://en.wikipedia.org/wiki/Demoscene">demoscene</a>. The purpose of these hacks was first and foremost to have fun, to
indulge in a bit of creative coding, and to present our skills in an artistic context that shows off Zalando’s
technology hub in Helsinki.</p>
<p>Getting to this point required some extra work from marketing as well as our engineers. What helped both sides work
together was the shared commitment and ambition we all had to create a really cool “club” that fit the aesthetic of the
Flow Festival and represents Zalando’s products and tech department in the best light.</p>
<p>Once Flow started, both Marketing and the Helsinki Tech Hub saw the fruits of their labour. We were impressed by each
other’s work and skills, coming together as a group with feelings of respect and joy. This motivated us to try out
another new collaboration between our departments, perhaps next year’s edition of Flow, or even sooner!</p>
<p>The cross-departmental success we experienced was not the only outcome of this event. The programmers involved were able
to see their own achievements manifest in front of an audience of hundreds. Discussing the event afterwards their
satisfaction was clear: “There is nothing better than running your productions on a big screen for a crowd”, says
software engineer <a href="http://wakaba.c3.cx/">Dag Ågren</a>.</p>
<p>Engineer <a href="http://twitter.com/xdannys9">Dan Suman</a> was also excited about Zalando’s role in the festival: “Flow Festival
is a synergy of youth, arts, music. The creativity can be perceived everywhere, yet people still took pictures of our
visualisations. So, they must have looked cool."</p>
<p>Programming is knowledge-intensive work and requires a lot of creativity. Playing around with code and creating more
abstract, artistic outcomes fed the creativity of our engineers, making them better programmers and increasing their
motivation.</p>
<p>See the rest of the visualisations we created below. Drop me a line if you want to get in contact about our Flow
Festival contributions via Twitter at <a href="https://twitter.com/katsi111">@katsi111</a>.</p>
<h3>Visuals displayed at Flow Festival</h3>
<p><a href="https://github.com/katsi/processing-demos">SoundTree</a>
Growing fractal trees with music and letting the trees decay with autumnal colours. Realised with Processing 3.0 using
Java, the Minim library, recursion, randomisation, and FFT for sound analysis.</p>
<p><a href="https://bitbucket.org/WAHa_06x36/cubevisualiser">CubeVisualiser</a>
A randomly generated cube structure is lit up from the inside according to the sound level. Inspired by the creator’s
own earlier demoscene work, and the original designs for seating arrangements in the hall. Also invokes the image of
shipping boxes. Realised with plain C using OpenGL.</p>
<p><a href="https://bitbucket.org/WAHa_06x36/returndiagramvisualiser">Visualiser</a>
Drawing sound waves as shapes by plotting the audio waveform against the same waveform, but delayed. A generalisation of
the concept of Poincaré plots, a mathematical tool from chaos theory, which is used to reveal hidden shapes and
structures in chaotic systems. Realised with plain C using OpenGL.</p>
<p>The Visualiser is available on the App Store now, called
<a href="https://itunes.apple.com/app/archives/id1145966102">WaveFlower</a>.</p>
<p><a href="https://github.com/xdannys9/TypoGeo">TypoGeo</a>
Realised with Processing 3.0 using Java. It uses two processing libraries: Minim for SoundIn and Geomerative. Using
plain font text and the Geomerative library, the text is segmented according to the sound level. The same concept
applies to text opacity that reacts to the sound level by creating an electric neon effect.</p>
<p><a href="https://github.com/xdannys9/InsideOutside">InsideOutside</a>
Realised with Processing 3.0 using Java. It uses Minim library for SoundIn. Based on the sound level, it varies the
number of grid cells. The higher the level of sound, the more lines are drawn and the more ellipses are present covering
the text area.</p>Zalando Launches Research Lab2016-09-28T00:00:00+02:002016-09-28T00:00:00+02:00Dr. Reiner Krafttag:engineering.zalando.com,2016-09-28:/posts/2016/09/zalando-launches-research-lab.html<p>Zalando Tech is embarking on an exciting new chapter in its evolution as a platform.</p><p>Zalando Tech is embarking on an exciting new chapter in its evolution. We’ve recently launched Zalando Research – an
endeavour to place Zalando at the forefront of cutting-edge research, to complement our already strong foothold on
technology.</p>
<p>Why are we doing this? We already boast great researchers at Zalando Tech, so we want to give them an outlet where they
can be better organised and aligned. We also want to have a greater impact in the tech research community, especially in
the fields of data science, machine learning, and artificial intelligence (AI). We want to carve out an academic
standpoint for Zalando Tech.</p>
<p>The launch of Zalando Research begun with a group of internal research scientists working in the fields of statistical
modelling, data science, and machine learning. Our focus will be on quality, not quantity, thus our initial team is
small. Other areas we will focus on include augmented reality and how it applies within fashion, which particularly
aligns with our overall purpose of wanting to deliver fashion for the good of all.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/164ce4c12f5b29bc926b7cb2aec1ef41d71e1e88_2016-07-25-research-team-8056.jpg?auto=compress,format"></p>
<p>Our goal, with Zalando Research, is to have an official mandate for our talented research scientists to work from. We’ll
be providing clear role separation between research scientist in the lab, and research engineers within our <a href="https://tech.zalando.com/blog/why-do-we-have-autonomous-teams/">Delivery
Teams</a>. Creating an environment with a common charter
will allow the right people to focus on papers, patents, and actively contribute to teams in order to have product
impact.</p>
<p>We’ve already begun spreading the word about our push to bring academia to the forefront at Zalando Tech, with the
inaugural launch of <a href="https://all-about-z.com/">All About Z – The September Issue</a>. Myself and our Research Lead, Dr.
Roland Vollgraf, had a conversation about <a href="https://all-about-z.com/new-normal/">AI and its future</a>, where we dive deeper
into the possibilities this new research arm will lead to.</p>
<p>We’re excited about what the future holds in the realm of research and how Zalando can positively contribute to the tech
research community. In the meantime if you’d like to find out more, drop me an <a href="mailto:reiner.kraft@zalando.de">email</a>.</p>User Story Mapping from a Backend Perspective2016-09-23T00:00:00+02:002016-09-23T00:00:00+02:00Bohdan Feniaktag:engineering.zalando.com,2016-09-23:/posts/2016/09/user-story-mapping-from-a-backend-perspective.html<p>We think it's important that our developers get a clear picture of our users, too.</p><p>Your typical morning might start at the office. The scent of freshly brewed coffee (or tea) starts floating around. This
is a typical morning for a lot of teams, but not for the ones slowly gathering in a creativity room.</p>
<p>User Story Mapping encompasses a full-on and complicated day, followed by weeks, or months of work by UX teams to make
users happier. At Zalando, the backend team is also involved in this process. This is their chance to realise a full,
common understanding of a product, get to know their users, and to get a feel of what’s going to come.</p>
<p>Recently, my team took on the challenge of enabling our Category Management team of more than 550 at Zalando to
restructure and simplify their processes; to clearly define their responsibilities and easily change them (if
necessary), all while providing a consistent solution and aligning dependent systems.</p>
<p>There’s a lot of details to consider – let’s take a look.</p>
<h3>From a product perspective</h3>
<p><em>Get a good impression of your user</em>
There is a problem you need to solve. And there are people that are struggling with this issue during their daily work.
Draft the personas of your possible users. It’s even better if you get a chance to talk to them. Who are they? What are
they trying to achieve? Why are they having this problem? Are their problems connected?</p>
<p><em>Question the map</em>
You will miss the details. Maybe not exactly you, but at least someone from your team. Usually some things are just
taken as assumptions, but it doesn’t mean you’ve understood the process. If there’s even the smallest detail that’s not
clear, question it. Don’t be afraid to, either: This is the best time to build a shared understanding.</p>
<p><em>Question everything</em>
Since you don’t work as your users do everyday, you need to understand them exceptionally well. Question every term that
may have more than one definition to you. Question every word they’re using that you’re not familiar with. Question the
way they work. This will not only help you understand more, but also for users to understand that something they’re
doing is obsolete and can be skipped or delegated.</p>
<h3>From a system perspective</h3>
<p><em>Think in terms of the big picture</em>
A User Story Map is sliced into different milestones (or releases). Though an MVP may be a relatively small part of the
whole product, at the time you see the map, you should be able to understand the overall picture. This brings us to our
next point.</p>
<p><em>Think in terms of a system</em>
You should also consider the ecosystem that the product will be integrated into. In a world of microservices,
integration itself has become much easier, but there’s still a lot of challenges to solve. You will need to obtain the
data the user needs from somewhere. Sometimes you will need to pass these changes on to other services. How will they
interact? You might say it’s too early to consider these points, but as soon as you do, you will notice gaps in your
understanding that have to be filled.</p>
<p>Within such a system, there are a lot of dependencies. Need convincing?</p>
<p><em>Treat dependent systems just like users</em>
There are other stakeholders to consider. They will not use your product as directly mentioned. Instead, they are going
to need the data you have. Has your user created an order? Great, they will take it and process it further.</p>
<p>As it turns out, you need to speak to them while doing User Story Mapping just like you speak with your users. What are
their preferred methods of transfer? What are their performance needs? What exactly do you need to transfer?</p>
<p>It is the User Story Mapping where you define the foundation you’re going to work from. Understanding their needs and
your possibilities is essential to having a smooth future process. Create user stories together.</p>
<h3>From a team perspective</h3>
<p><em>Think about costs</em>
There’s always more you can build than what you have resources for.</p>
<p>There’s also the case of technology you want to try out, but doesn’t really fit into the bigger story. Additionally,
you’ll need to spend a reasonable amount of time learning about it to be effective using it. These are just a few
examples that can disrupt a project if you don’t analyse the value and effort to build it up.</p>
<p>And of course, no matter how hard you try, at no stage will 100% of your possible users be satisfied. Therefore:</p>
<p><em>Focus</em>
Focus on the main user group. Find the sweet spot where minimal functionality delivers maximum value. This user group
will benefit the most from what you’ve done and change the way they work.</p>
<p><em>Share the knowledge</em>
To stay on top with the recent developments, share knowledge across your team. With all your different experiences, you
may have a brittled understanding of the business domain. To be able to work together effectively, make sure you’re all
on the same page.</p>
<p><em>Would you use it?</em>
The ultimate question. What is the way you want to present it to the world? The speed of reaction, the processing speed,
the data load. If you were presented with this tool, would you be happy? Would it be easy for you to adapt to the
changes? Are these changes natural?</p>
<h3>Conclusion</h3>
<p>The perspective of User Story Mapping from the backend is something that might not be readily addressed, but it’s
important to highlight the questions we’re asking to gain a better understanding overall. Knowing your users and
understanding their needs enables you to build better software, in all measures.</p>
<p>Some teams might not be considering all of the above points, or might want to share their own ideas, so we’re happy to
hear feedback. You can contact me on Twitter <a href="https://twitter.com/sovereign_36">@sovereign_36</a> or via
<a href="mailto:bohdan.feniak@zalando.de">email</a> to share your thoughts.</p>Our ReactEurope Recap2016-09-21T00:00:00+02:002016-09-21T00:00:00+02:00Henrik Andersentag:engineering.zalando.com,2016-09-21:/posts/2016/09/our-reacteurope-recap.html<p>Zalando's attendance at this year’s ReactEurope Conference in Paris was a no-brainer.</p><p>React has become widely popular within Zalando, making our attendance at this year’s <a href="https://www.react-europe.org/">ReactEurope
Conference</a> in Paris a no-brainer. With around 800 like-minded developers and techies,
this was set to be a great place to share some hands-on experiences with React.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7bb160cdc8ea8a5ca5927cc5a4dddc7de0d35081_reacteurope2.jpg?auto=compress,format"></p>
<p>A lot of this year’s talks were centered around React Native and GraphQL. Let’s take a quick look into them below.</p>
<h3>React Native</h3>
<p>React Native has gained a lot of traction lately, which was also apparent by the number of talks about React Native at
the conference.</p>
<p>One interesting talk was given by Brent Vatne, who shared his insights into <a href="https://youtu.be/cI9bDvDEsYE">building an Android
App</a> with React Native:</p>
<ul>
<li>Available for all major mobile platforms (iOS, Android, Windows)</li>
<li>Microsoft announced support for React Native at F8 <a href="https://blogs.windows.com/buildingapps/2016/04/13/react-native-on-the-universal-windows-platform/">earlier this
year</a></li>
<li>Developer experience -> Hot module reloading</li>
<li>Push updates directly to users (without going via the App Store review process)</li>
</ul>
<h3>GraphQL</h3>
<p>GraphQL is a query language <a href="https://youtu.be/ViXL0YQnioU">created by Facebook</a>. It’s an alternative to the REST
approach to fetch data from a server. GraphQL was open sourced around a year ago and has been used internally at
Facebook for more than 4 years now.</p>
<p>With a GraphQL query, a client can define which data it needs from a server.</p>
<p>Query sent to the Server:</p>
<div class="highlight"><pre><span></span><code>{
article(id: 1234567) {
id,
brand,
price,
articleImage {
uri
}
}
}
</code></pre></div>
<p>Response from the Server:</p>
<div class="highlight"><pre><span></span><code>{
"aricle" : {
"id": 1234567,
"brand": "nike",
"price": "1,10",
"articleImage": {
"uri": "http://server/article.png"
}
}
}
</code></pre></div>
<p>The server responds purely with the fields that were defined in the query.</p>
<p>On the server side, a GraphQL server is needed which can interpret the query schema. A reference implementation from
Facebook can be found <a href="https://github.com/graphql/graphql-js">here</a>. I haven’t seen many uses of GraphQL, but it’s going
to be interesting to see what people will do with it.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/31b93d54c8e9126bf83fd746e316152e36174851_reacteurope3.jpg?auto=compress,format"></p>
<h3>Talks to highlight</h3>
<p><strong>Lin Clark – A Cartoon Guide to Performance in React</strong>
Lin gave a really solid presentation on performance in React. Understanding performance can be quite a challenge in any
framework or library, but with her visualizing, I think she took an interesting topic and made it understandable, even
for those in attendance who aren’t engineers. Watch it <a href="https://www.youtube.com/watch?v=nRF0OVQL9Nw">here</a>.</p>
<p><strong>Christopher Chedeau – Being Successful at Open Source</strong>
This talk was probably the best non-technical presentation I’ve had the pleasure of attending. Christopher gave really
useful insight into making an open source project successful. Check out his presentation
<a href="https://www.youtube.com/watch?v=-t8eOoRsJ7M">here</a>.</p>
<p>A lot of the talks at React Europe were held by people working at Facebook. While this is understandable, it would have
been good to see more input from people in the greater tech community. I look forward to future React innovation!</p>Juggling Expectations and Reality in UX Job Ads2016-09-15T00:00:00+02:002016-09-15T00:00:00+02:00Elena Pavlenkotag:engineering.zalando.com,2016-09-15:/posts/2016/09/juggling-expectations-and-reality-in-ux-job-ads.html<p>Help us reduce frustration and inspire the UX community to create well fitting job ads.</p><p><em>TL;DR:</em> <em>UX job titles and the implied skills can be confusing at times. Share your stories and opinions in our</em> <strong>4
minute survey</strong> <em>to help us reduce the frustration and inspire the community to create well fitting job ads.</em></p>
<p>Job titles are kind of silly, especially in the tech world. Is a ninja superior to a rockstar? Or are they at the same
seniority level? Playing around with the <a href="http://www.aaronweyenberg.com/uxgenerator/">ingenious UX title generator</a> by
Aaron Weyenberg is great fun. But it all relates to one actual real-life problem: Even within the specific community of
UI/UX, there’s no clear-cut understanding about which skills relate to which positions.</p>
<h3>“I work with Robots, but fight for the Humans”</h3>
<p>What exactly should be the responsibility of a UX Manager? Can we expect Interaction Designers to code? Which jobs are
available for someone with a strong background in research? With your help, we want to develop a common language and use
job titles that are actually accurate and relatable.</p>
<p>And it really is about time for a change. As nicely summed up by Kathryn Reeves, it’s hard to keep track of the
tremendous <a href="https://www.optimalworkshop.com/blog/the-ux-language-debate-why-its-a-good-thing/">variation of UX
definitions</a>. Are UX and UI the same?
What about Product Design? There seems to be a never-ending debate. No wonder that <a href="https://media.nngroup.com/media/reports/free/User_Experience_Careers.pdf">in their
survey</a>, Susan Farrell and Jakob Nielsen
collected a total of 210 professions all related to the field.</p>
<h3>“Service Designer? Is that like designing call centre?”</h3>
<p>But can these roles all be that different? Not necessarily. When Emelyn Baker <a href="https://uxdesign.cc/job-titles-in-the-design-community-50d51771617f#.yawkf634n">screened 110 job
titles</a> on Dribbble’s Job Board, she
found that 97 of them featured the term “design”, whereas only a few pointed to a clear specialization. Most employers
apparently phrase their job ads very flexibly, with a lot left up to interpretation.</p>
<p>Let’s reduce frustration and make life easier for both the hiring manager and the potential employee. We believe that
the UX hiring process--both within our own company and in the broader product and design community--deserves a little
more UX design of its own.</p>
<h3>“Getting the messy wool ball ready to start knitting”</h3>
<p><strong>Curious about the quotes?</strong> Find out for yourself by taking our survey and adding your stories to the mix. We’re
looking forward to more insightful and entertaining input.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/48eea6059dba091cfde83f815b7750c17c57c792_buttonsurv.png?auto=compress,format"></p>
<p>All the best,
The Zalando UX Team</p>
<p><em>P.S. You’re also invited to scan our</em> <strong>current job descriptions</strong> <em>and to drop in a comment about what you like or
don't like. We're always up for more feedback:</em></p>
<p><a href="https://docs.google.com/document/d/1-iDQl38fmx9ED20OOonMSMe2j8HBxcHNMBjfBfRMAHs/edit?usp=sharing">UX Lead Fashion
Store</a></p>
<p><a href="https://docs.google.com/document/d/1YGkoRNz-d7MGalfcArmOzDzYrhT5MxxqXLoe7AaicAM/edit?usp=sharing">Research Coordinator</a></p>
<p><a href="https://docs.google.com/document/d/1KgHhCcrgRXKFbMCOiAlTEmuj3fBYhPDu3E4rVMPa0G8/edit?usp=sharing">User Experience
Researcher</a></p>
<p><a href="https://docs.google.com/document/d/1RvRFPD8WkPQ1KDwT7tkGkEQSPviWeXZq4oNpzKch3tk/edit?usp=sharing">Visual UI Designer</a></p>
<p><a href="https://docs.google.com/document/d/15NE1uZwcToXCne3sE1a3ZcWDuFOlNN9R2WTsv9pfi3M/edit?usp=sharing">UX Interaction
Designer</a></p>Pass props and keeping the DOM neat in a React Isomorphic App2016-09-14T00:00:00+02:002016-09-14T00:00:00+02:00Roland Castillotag:engineering.zalando.com,2016-09-14:/posts/2016/09/keeping-the-dom-neat-in-a-react-isomorphic-app.html<p>Want a nicer DOM when debugging HTML in the inspector? Read more right here.</p><p>Since we adopted React here at Zalando, I have always been on the look out to make the most out of it. While working
with server side rendering, I noticed the way we were passing our props down the pipeline wasn't as clean as I liked it
to be. So I thought: "There must be another way".</p>
<p>The usual and most common way of passing props from the server side rendering to the client in a React Isomorphic app is
through <em>data attributes</em>, which is a very straightforward and easy strategy. This technique isn’t without its
downsides, as by doing so we end up with a bloated DOM. Something like the following:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c86f6bf97d9dde6d05ca325187e11bc536339e1c_screen-shot-2016-08-10-at-11.33.42.png?auto=compress,format"></p>
<p>Ugly, right? You can barely see the actual HTML – you only see data, and even though they’re also important to inspect,
there are better tools for that job (i.e. <a href="https://github.com/gaearon/redux-devtools">Redux DevTools</a>). When you debug
your HTML code with the inspector, you want to see HTML. By bloating your DOM, you only make it harder to browse the
generated HTML.</p>
<p>So, why not have a leaner DOM that keeps this lengthy data out of sight? React doesn't really care how you pass your
props, as long as they are accessible in the document. Let's wrap our props as CDATA instead of data attributes.</p>
<p>First of all, on your server side rendering file, instead of using data attributes:</p>
<div class="highlight"><pre><span></span><code><span class="k">export</span><span class="w"> </span><span class="n">default</span><span class="w"> </span><span class="n">function</span><span class="w"> </span><span class="n">render</span><span class="p">(</span><span class="n">state</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">props</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="n">model</span><span class="p">:</span><span class="w"> </span><span class="n">state</span><span class="p">};</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">dataProps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">escape</span><span class="p">(</span><span class="n">JSON</span><span class="o">.</span><span class="n">stringify</span><span class="p">(</span><span class="n">props</span><span class="p">));</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="err">``</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="n">ReactDOM</span><span class="o">.</span><span class="n">renderToString</span><span class="p">()</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="err">``</span>
<span class="w"> </span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>
<p>Use CDATA:</p>
<div class="highlight"><pre><span></span><code><span class="k">export</span><span class="w"> </span><span class="n">default</span><span class="w"> </span><span class="n">function</span><span class="w"> </span><span class="n">render</span><span class="p">(</span><span class="n">state</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">props</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="n">model</span><span class="p">:</span><span class="w"> </span><span class="n">state</span><span class="p">};</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">dataProps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSON</span><span class="o">.</span><span class="n">stringify</span><span class="p">(</span><span class="n">props</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="err">``</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="err">``</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="err">`</span><span class="o"><!</span><span class="p">[</span><span class="n">CDATA</span><span class="p">[</span><span class="o">$</span><span class="p">{</span><span class="n">dataProps</span><span class="p">}]]</span><span class="o">></span><span class="err">`</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="err">``</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="err">``</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ReactDOM</span><span class="o">.</span><span class="n">renderToString</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="err">``</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="err">``</span>
<span class="w"> </span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>
<p>Then process them on your client side file as follows:</p>
<div class="highlight"><pre><span></span><code><span class="n">let</span><span class="w"> </span><span class="n">props</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">document</span><span class="o">.</span><span class="n">getElementById</span><span class="p">(</span><span class="s1">'app-props'</span><span class="p">)</span><span class="o">.</span><span class="n">textContent</span><span class="p">;</span>
<span class="n">props</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">props</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">);</span>
<span class="k">const</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSON</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">props</span><span class="p">);</span>
<span class="n">ReactDOM</span><span class="o">.</span><span class="n">render</span><span class="p">(,</span><span class="w"> </span><span class="n">yourDomContainerNode</span><span class="p">);</span>
</code></pre></div>
<p>You could use either <em>innerText</em> or <em>textContent</em>. Just keep in mind the browser support. <em>innertText</em> has good Internet
Explorer support but it is only supported from <strong>Firefox 45</strong> onwards. On the other hand, <em>textContent</em> has very good
Firefox support but is only supported from <strong>IE9</strong> onwards.</p>
<p><strong>The result?</strong> A nicer DOM in the inspector:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6eaf13202459bae50f3bb370c8cae93dddab559c_screen-shot-2016-08-10-at-12.08.41.png?auto=compress,format"></p>
<p>Here are some of the advantages of using this strategy:</p>
<ul>
<li><strong>Decrease document size:</strong> It is not necessary to escape the data since you passed your props to the client as
CDATA. Depending on the size of your data, you save tremendous amounts of bytes by not escaping special characters.</li>
<li><strong>Clean DOM:</strong> The output in the browser inspector doesn't contain lengthy data.</li>
<li><strong>Easier DOM inspection:</strong> The inspection of your DOM becomes more efficient since you don't have to scroll through
all those lengthy data attributes.</li>
<li>And <strong>works with Flux!</strong></li>
</ul>
<p>Do not hesitate to drop me a line at <a href="mailto:roland.castillo@zalando.de">roland.castillo@zalando.de</a> in case you have comments, suggestions, feedback, etc.</p>All About Startups At The Latest Techspert Panel2016-09-13T00:00:00+02:002016-09-13T00:00:00+02:00Zalando Technologytag:engineering.zalando.com,2016-09-13:/posts/2016/09/all-about-startups-at-the-latest-techspert-panel.html<p>A recap with this month's Techspert Panelists on the hottest topic in Berlin: Startups.</p><p>Another edition of our ongoing <a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/232867086/">Zalando Techspert
Series</a> has come and gone, an event which collects
experts from respected companies to talk all things tech.</p>
<p>During this month’s debate, we <a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/233709951/">shone a light on
startups</a> and how they could earn their big break –
a hot topic in one of the hottest startup hubs in Europe. The panel featured Rocket Internet’s Johannes Bruder, CEO of
zLabels Jan Wilmking, and Co-founder of CoffeeCircle Robert Rudnick. A great turnout yielded some interesting and
insightful questions from the public.</p>
<p>Sad to have missed the event? We sat down with our Techsperts to dive deeper into their knowledge of the startup scene
and get their tips for budding techpreneurs.</p>
<p><em>Zalando: How important is it when establishing a startup to engage with the local tech scene, or should startups be
thinking global straight away?</em></p>
<p><em>Jan Wilmking:</em> We’re in the process of growing our tech organisation and we’ve realised that purely relying on local
talent isn’t enough, so we’re tapping into the global talent pool. We’re employing people from all over: India, the
United States, etc. It’s very critical to have these kinds of connections.</p>
<p><em>Robert Rudnick:</em> We’re a coffee e-commerce company, more #roastingtech, and we’ve grown a little slower than the likes
of Zalando and Rocket Internet. For us, it’s not so important to scale quickly, especially with tech talent. We haven’t
experienced problems hiring great talent locally over the last twelve months.</p>
<p><em>Johannes Bruder:</em> I would say it’s very important to have a mix – you shouldn’t restrict yourself to being global or
local with tech talent. The goal should always be to attract the best in tech, and there’s enormous potential both
globally and locally to achieve this.</p>
<p><em>Zalando: What kind of influence does financial backing from incubators or venture builders have on strategy and the
overall creativity of a startup?</em></p>
<p><em>Johannes Bruder:</em> Independent of whether there is easy access to capital or no access at all, in the early days
startups should try to stay lean and to focus on the one single thing they want to be really good at.</p>
<p><em>Robert Rudnick:</em> I totally agree – you should have a great core team that shares the same values and a guiding star for
the company. But scaling too quickly is then as dangerous as being under-funded, so having an experienced investor or
incubator definitely helps.</p>
<p><em>Jan Wilmking:</em> It might seem easier to start out with a big budget and build, build, build, but I also completely agree
with the other panelists – this presents difficulty in a larger context for having the discipline of iteration and the
discipline in scaling a tech team for the sake of scaling. It’s vital to keep that logic of iterating and building small
things first, before putting all of your resources into one basket.</p>
<p><em>Zalando: Share with us your one essential piece of advice for any would-be startup founders out there.</em></p>
<p><em>Jan Wilmking:</em> Be persistent!</p>
<p><em>Robert Rudnick:</em> Focus. If you’re starting a business from scratch, it’s easy to have a lot of small wins, but that may
not add up to long-term success, so focus is key, on top of listening to your customers.</p>
<p><em>Johannes Bruder:</em> For me it’s a combination of these: Focus, laser focus, on one thing and do it really well, and never
stop – be persistent.</p>
<p>--</p>
<p>Keep an eye on our <a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/">Zalando Tech Meetup page</a> to be notified of the
next Techspert Panel, or follow our updates on Twitter <a href="https://twitter.com/ZalandoTech">@ZalandoTech</a>.</p>How Can Your Company Radically Curb Insider Threat?2016-09-08T00:00:00+02:002016-09-08T00:00:00+02:00Christian Matthiestag:engineering.zalando.com,2016-09-08:/posts/2016/09/how-can-your-company-radically-curb-insider-threat.html<p>Security needs to stop being an afterthought of the production line when building software.</p><p>Companies have long neglected security in the development lifecycle, instead of accepting that it is part of everything
they do. The push to bring security to the forefront of product development, as well as to the minds of all employees,
is costing organisations more than they anticipated. Bigger companies have recently made headlines with their lucrative
bug bounty programs that only focus on external security threats.</p>
<p>Take Google for example: They <a href="http://www.theregister.co.uk/2016/05/13/google_crushes_five_vulns_with_patch_run_and_20k_in_bug_bounties/">recently
paid</a> £17,875
in bounties to bug-reporting researchers. The number of organisations launching successful external programmes is vast:
Pornhub <a href="https://hackerone.com/pornhub">launched</a> a bug bounty programme for security types not long ago, with Microsoft
<a href="https://technet.microsoft.com/en-us/library/dn425036.aspx">adding</a> to their own programme schedule via their Nano
Server technical preview.</p>
<p><a href="http://www.information-age.com/technology/security/123461176/researchers-find-undetected-insider-threats-100-companies">A recent
study</a>
has found undetected insider threats present in 100 percent of businesses. These threats are not necessarily malicious:
Developers can inadvertently create problems due to their lack of security knowledge. However, external threats are
still considered the main priority, and while they are obviously important, are companies putting too much focus on
them?</p>
<h3>Internal Bug Bounties secure the right mindset</h3>
<p>‘Hackers for hire’ can make a comfortable income off bounty programmes. They are constantly trawling open invitations
from companies such as Facebook, Dropbox, GitHub, Google, etc., to find vulnerabilities and be paid for the trouble.
There is even a <a href="https://bugcrowd.com/list-of-bug-bounty-programs">master list</a> for the aspiring bug bounty full-timer.
The question is whether or not the existence of these external bounty programmes is enough security for organisations?</p>
<p>Having an internal bug bounty programme, targeted at reducing the number of vulnerabilities that are easy to exploit, is
a great mechanism for reducing insider threat. The programme taps into the inherent nature of every engineer to want to
create and hack: Teams know their infrastructure best, its strengths, as well as its weaknesses. This also gives
everyone an opportunity to have a very direct impact on security within the organisation. At Zalando, our internal bug
bounty programme underlines the need to hack, learn from mistakes, and in the process, develop the most secure products.</p>
<h3>Becoming radical about security</h3>
<p>At Zalando, developers work in autonomous teams, bound by organisational trust, and use the technologies
they think will best fit the job, as well as the company as a whole. This all happens amongst team members without the
restraints of a traditional, hierarchical management structure.</p>
<p>To support our team's autonomy, Zalando Tech has been able to put several initiatives in place to foster
a broader security mindset throughout the company.</p>
<p>Finding and fixing bugs is not enough. You also need trained experts; great communicators and ambassadors who can convey
the right mindset. Zalando’s grassroots initiative, known as Security Champions, was developed to empower its employees
with enough security knowledge to ensure they are able to make the right decisions without needing to consult the
Security Team. They are the Security Team’s eyes and ears, well versed in the company’s security fundamentals, while the
bulk of the work can still be fueled by innovation. One voluntary nominee per team watches over day-to-day security
decisions, backed by their training on threat modeling, data privacy law, and security concepts such as Defense in Depth
and Security by Default. Not only does this increase security awareness amongst developers, by developers, it also
underlines the fact that ultimately, security is everyone’s responsibility.</p>
<p>It’s important to underline that this initiative is sustained by volunteer employees, who want to add to their expertise
and strive for excellence. These programs have several layers of benefits: Organisations become less error prone when
building and releasing new technology, on top of security and architecture principles being adopted broadly throughout
the company.</p>
<h3>Stop adding security as an afterthought</h3>
<p>It’s already been said: Security is part of everything you do. Do you want to curb your biggest security threat? Then
you need to get radical about security. Nico Sell, co-founder of end-to-end encrypted messaging app Wickr and
notoriously private person, stressed to
<a href="http://www.cnbc.com/2016/01/20/shades-reduce-my-digital-footprint-wickr-founder.html">CNBC</a> that: “The more data that
you have the more you have to protect".</p>
<p>Company-wide efforts are crucial if businesses are to combat the number of insider security threats most companies
boast. While it involves a dimension of checks and balances, implementing security during every step of the development
lifecycle must never cause friction or slow down innovation.</p>
<p>It’s in everyone’s interest to think security – it has to be visible and explicit, and there should be an investment
from all levels of management in programs that are loud and colourful. When you invest in training and foster a culture
of learning and improvement, the security mindset becomes a matter of course for a well-functioning organisation.</p>
<p>Security needs to stop being an afterthought of the production line. Take the steps your company needs to make security
second nature.</p>What knowledge should you have to be a frontend developer?2016-08-24T00:00:00+02:002016-08-24T00:00:00+02:00Dmitriy Kubyshkintag:engineering.zalando.com,2016-08-24:/posts/2016/08/what-knowledge-should-you-have-to-be-a-good-frontend-developer.html<p>Read on to find out which skills we think a well-rounded frontend developer should possess.</p><p>One of the hardest things about being a frontend developer comes from the fact that everybody has very different
expectations of what it means to be one. This, along with a technology stack, that changes at the speed of light, makes
interviewing for a frontend position tricky.</p>
<p>To help you out, and to give an idea as to Zalando’s views on the role, we have prepared some points on what we consider
essential knowledge for a frontend developer.</p>
<h3>Be able to solve problems</h3>
<p>This sounds quite abstract, so let’s talk about what it means. Unlike many other developer positions, a frontend
developer wields the power of multiple languages and technologies. Being able to look at a task, split it up into
necessary steps, and choose the right technologies for the job is an essential skill—some things need just a line of
CSS, but implementing them in JavaScript would require a 100k library, and vice versa.</p>
<p>Sometimes you won’t know the perfect tool for the situation at hand, and this is also fine. However, you are still
expected to provide a solution. This solution may not be the nicest in terms of code quality or speed, but it needs to
work, at least for the cases presented at hand.</p>
<h3>JavaScript</h3>
<p><a href="https://facebook.github.io/react/">React</a> and <a href="https://angularjs.org/">Angular</a>, along with other frontend frameworks,
are quite impressive pieces of engineering, and it is important to know them well if this is the primary framework you
are using for the job. That said, knowledge of JavaScript, it’s core libraries, and browser APIs is something that will
support you no matter which framework you use. Additionally, JavaScript is a general-purpose programming language, so
doing some simple array or tree processing shouldn’t cause you any problems and shouldn't immediately trigger the reflex
to include that nifty 100k library that you would barely use.</p>
<p>One more thing to pay attention to is the asynchronous, but single-threaded nature of JavaScript. You should be
comfortable with scheduling pieces of work into the future in various ways and know how to execute some code after those
asynchronous calls.</p>
<h3>CSS</h3>
<p>In the world of component-based frameworks and living style guides, you can get away with minimal CSS knowledge most of
the time. Yet, when something goes wrong with your layout, or when you need to implement something unique, you need to
know how browser layouting works. In short, we want you to understand CSS, but not necessarily keep all the
<a href="https://www.w3.org/TR/css-flexbox-1/">flexbox</a> properties, or browser-specific hacks in your head, although it is
certainly a plus if you do.</p>
<h3>Know thy standards and keep up to date</h3>
<p>Nothing gives you as much appreciation of the complexities involved in making the frontend work as reading the standards
does: <a href="https://www.w3.org/TR/html51/">HTML</a>, <a href="https://www.w3.org/TR/CSS/">CSS</a>, and
<a href="http://www.ecma-international.org/ecma-262/6.0/">ECMA</a>. It’s easy to get overwhelmed with more than 2,000 pages of
quite dense and very specific reading, so start small. Websites like <a href="https://www.smashingmagazine.com/">Smashing
Magazine</a>, <a href="https://ponyfoo.com/">PonyFoo</a>,
<a href="http://www.html5rocks.com/en/">HTML5Rocks</a> (and many others) will keep you up to date on the latest developments and
might trigger an urge to understand things deeper.</p>
<h3>Network</h3>
<p>Every awesome frontend application needs to get to the user’s browser somehow. It’s also likely that you will have to
make some asynchronous <a href="https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch">fetch</a> requests to
backend services. All of this makes use of network, which means you should be aware of what is involved in making an
<a href="https://www.ietf.org/rfc/rfc2616.txt">HTTP</a> request, at least on the level of
<a href="https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol">Wikipedia</a> with a sprinkle of <a href="https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching?hl=en">cache-related
topics</a>,
plus what <a href="https://developers.google.com/web/tools/chrome-devtools/profile/network-performance/resource-loading?hl=en">request
latency</a> is
composed of.</p>
<p>While it is not a requirement for a frontend developer, knowing the connection on a TCP level is a nice thing to have,
as is awareness of HTTP/2. Knowledge of
<a href="https://www.ics.uci.edu/~fielding/pubs/dissertation/fielding_dissertation.pdf">REST</a> services, at least from a consumer
perspective, is certainly helpful as well.</p>
<h3>Debugging</h3>
<p>This goes back to the initial section on problem solving, but also touches on every other frontend topic. Browsers are
extremely complicated pieces of software with years of legacy, but still evolving at great speed, and this software is
our domain as frontend developers. So whenever something goes wrong there, it’s our job to figure out why.</p>
<p>Sometimes it means <a href="https://developers.google.com/web/tools/chrome-devtools/debug/?hl=en">looking for JavaScript
errors</a>, other times you may need to <a href="https://developers.google.com/web/tools/chrome-devtools/profile/?hl=en">profile your
code</a> to check for slow code paths, and
occasionally you may need to use <a href="http://ec.haxx.se/">CURL</a> to fire off some requests from your terminal.</p>
<h3>Security</h3>
<p>Web, being an open platform, is constantly under the threat of various attacks, targeting both users and companies.
Every piece of code and every call that the browser makes is a potential source of security vulnerabilities—initial load
of the page, AJAX request, external scripts or iframe, and even a simple DOM manipulation can be vulnerable to
<a href="https://www.owasp.org/index.php/Cross-site_Scripting_(XSS)">XSS</a>,
<a href="https://www.owasp.org/index.php/Cross-Site_Request_Forgery_(CSRF)">CSRF</a> and various other vectors of attacks. Being
aware and vigilant is an essential requirement for the safety of the data of your customers and your company.</p>
<h3>Testing</h3>
<p>We believe that it’s each developer’s responsibility to ensure the quality of their produced code, and frontend has some very specific strategies
for testing and asserting non-functional properties of the code.</p>
<p>We believe that <a href="http://martinfowler.com/bliki/UnitTest.html">unit testing</a> is a must for any project, and we highly
encourage <a href="http://www.jamesshore.com/Agile-Book/test_driven_development.html">TDD</a>, especially for a dynamic language
like JavaScript. Generally, you would also want to supplement this with some end-to-end or functional tests to make sure
users can do what we expect they should be able to do.</p>
<h3>Tooling / Automation</h3>
<p>All of the topics discussed here also require very specialized tooling. You should definitely be comfortable with the
concept of resource bundling and the tools available for this purpose. As mentioned above, testing by hand is not our
thing, so a setup with automated tests is a must for every project.</p>
<p>Ideally, tooling and automation should be complemented by <a href="http://www.martinfowler.com/articles/continuousIntegration.html">CI /
CD</a> to make sure no code slips past without running
those tests. Depending on the project, you might also want to use a transpiler like <a href="http://postcss.org/">PostCSS</a> or
<a href="http://babeljs.io">Babel</a> to help you write more modern code.</p>
<h3>Conclusion</h3>
<p>What we’ve collected above is obviously an enormous amount of knowledge and will rarely exist in a single mind, however,
you should have an overview of these topics and have a keen insight with a select few to stand out as a frontend
developer.</p>
<p>If you’ve got further questions or queries, feel free to contact me on Twitter at
<a href="https://twitter.com/d_kubyshkin">@d_kubyshkin</a> or via <a href="mailto:dmitriy.kubyshkin@zalando.de">email</a>.</p>Emerging Tech Hubs Around The World2016-08-16T00:00:00+02:002016-08-16T00:00:00+02:00Deirdre O'Brientag:engineering.zalando.com,2016-08-16:/posts/2016/08/emerging-tech-hubs-around-the-world.html<p>Check out our assessment of the emerging tech hubs that are trending worldwide right now.</p><p>Where in the world should you head to if you want to soak up the energy, innovation, and entrepreneurial success of the
tech industry? If you’re looking to get amongst it, consider this your personal travel guide to the cities that are
making waves in tech right now.</p>
<h3>Moscow</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ee7f60039b2a349942d607753f24bae81db653ce_moscow-1556561_1920.jpg?auto=compress,format"></p>
<p>If low living costs are still a priority, then Moscow should definitely be at the top of your tech hub list. While rents
are cheap, Moscow’s commitment to funding it’s thriving tech scene prove that they’re investing cash where it counts:
<a href="http://sk.ru/news/">Skolkovo Technopark</a> is a 23,000 sq. meter Silicon Valley-style startup campus that offers a
special innovative ecosystem to grow and develop. Startups are privy to grants, tax breaks, and direct access to
investors.</p>
<p>The latest incubator making its mark in Moscow is <a href="http://ditelegraph.com/en">DI Telegraph</a>, described as the premier
workspace and network center for the tech and new economy community. Its sister company, <a href="http://dreamindustries.co/">Dream
Industries</a>, are responsible for the likes of music streaming service
<a href="https://zvooq.ru">Zvooq</a> and social ebook subscription service <a href="https://bookmate.com/">Bookmate</a>.</p>
<p><em>AVERAGE RENT FOR 1 BED-APARTMENT:</em> 726.69 €
<em>AVERAGE COST OF A PINT:</em> 0.92 €
<em>AVERAGE DEVELOPER SALARY:</em> Undetermined</p>
<h3>Helsinki</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/63603a97e9306e7a19facb0daf9ce70b30cb2c09_helsinki-168773_1920.jpg?auto=compress,format"></p>
<p>Home to over 500 tech startups, Helsinki has been dubbed Europe’s ‘Silicon Sauna’ thanks to its recent technological and
entrepreneurial rejuvenation following the Nokia years. Helsinki’s current tech roots are connected to its early 1990s
<a href="https://en.wikipedia.org/wiki/Demoscene">demoscene</a>: Computer art programs that showcase digital art and music.
Successful Finnish game companies like <a href="http://www.rovio.com/">Rovio</a> and <a href="http://supercell.com/en/">Supercell</a> trace
their own origins to the same era.</p>
<p>Having been such a force in the emergence of mobile, Helsinki is still the place to be when it comes to sourcing tech
talent. The <a href="https://digitalcityindex.eu/">European Digital City Index 2015</a> places Helsinki fourth in their rank of 35
European cities supporting digital entrepreneurs.</p>
<p><em>AVERAGE RENT FOR 1 BED-APARTMENT:</em> 897.74 €
<em>AVERAGE COST OF A PINT:</em> 6.00 €
<em>AVERAGE DEVELOPER SALARY:</em> 40,000 €</p>
<h3>Warsaw</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/791cedab0b83c77059294fea8f531adab11b6632_warsaw-1423864_1920.jpg?auto=compress,format"></p>
<p>Google’s <a href="https://www.campus.co/warsaw/en">Campus Warsaw</a> and London’s Polish edition to their <a href="https://warsaw.techhub.com/">TechHub
locations</a> should be enough to convince you about Warsaw’s potential in the tech space.
Joint in their mission to help technology startup founders understand the value in collaborative communities, the two
locations offer a supportive environment and links to an international network of collaborators.</p>
<p>Warsaw’s startup DNA is rooted more so in the grassroots tradition culture than these fancy campuses would lead you to
believe. Everyone involved in the Warsaw startup scene cites <a href="http://reaktorwarsaw.com/">Reaktor</a>, the house that became
the overnight epicentre of the city’s startup community, as the real establishing moment of Warsaw’s booming tech
movement.</p>
<p><em>AVERAGE RENT FOR 1 BED-APARTMENT:</em> 496.00 €
<em>AVERAGE COST OF A PINT:</em> 1.87 €
<em>AVERAGE DEVELOPER SALARY:</em> Undetermined</p>
<h3>Dublin</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/348747784925b90fc7ed42cbe4d7a2440c1dbdda_dublin-1049427_1920.jpg?auto=compress,format"></p>
<p>Forget London, Dublin is the next hotspot for all things tech when travelling out of mainland Europe. The numbers speak
for themselves: More than 1200 startups, 250 global tech companies, and <a href="http://www.ivca.ie/">€300M+ raised in 2015</a>.
Dublin even has it’s own appointed commissioner for startups in <a href="https://twitter.com/niamhbushnell">Niamh Bushnell</a>,
proving that the Liffeysiders are just as passionate as the big players when it comes to startup innovation.</p>
<p>It bodes well for the tech scene today that the epicentre of innovation is growing out of it’s traditional North
American home. With the emergence of different cities contributing to startup innovation, the future of the industry is
definitely looking bright.</p>
<p><em>AVERAGE RENT FOR 1 BED-APARTMENT:</em> 1,246.07 €
<em>AVERAGE COST OF A PINT:</em> 5.00 €
<em>AVERAGE DEVELOPER SALARY:</em> 55,654 €</p>
<h3>Berlin</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/76ddfccb25f88b343fbda600324b212055db656d_brandenburg-gate-417890_1920.jpg?auto=compress,format"></p>
<p>The rise of Berlin as a new and exciting tech hub has been on the radar for a few years now, its creative ecosystem
lending an exciting and diverse backdrop to an ever-thriving tech sector. VC funding has yet to reach its peak in
Berlin, but backing is steadily streaming in. 2.1 billion Euros flew into home grown initiatives in 2015 according to
<a href="http://www.ey.com/Publication/vwLUAssets/ey-venture-capital-and-start-ups-in-germany-2015/$FILE/ey-venture-capital-and-start-ups-in-germany-2015.pdf">Ernst &
Young</a>.
It’s also recently been ranked the best tech hub to live and work in by international consulting firm <a href="http://www.expertmarket.com/focus/research/top-tech-hubs">Expert
Market</a>.</p>
<p>Berlin relishes the chance to disrupt and shake-up traditional business models, with banking just one of its most recent
targets. <a href="https://number26.eu/">Number26</a>, the bank built for smartphones, is one of many successful startups taking a
fresh look at long-established industries. (And of course, it’s also the home of Zalando Tech!)</p>
<p><em>AVERAGE RENT FOR 1 BED-APARTMENT:</em> 664.09 €
<em>AVERAGE COST OF A PINT:</em> 3.00 €
<em>AVERAGE DEVELOPER SALARY:</em> 51,225 €</p>End-to-End Latency Challenges for Microservices2016-08-15T00:00:00+02:002016-08-15T00:00:00+02:00Dmitry Kolesnikovtag:engineering.zalando.com,2016-08-15:/posts/2016/08/end-to-end-latency-challenges-for-microservices.html<p>Read about Typhoon, our open source project to assess distributed software architecture.</p><p>There is pressure to define a global platform architecture and purify concepts of core business within it. Microservices
is an appropriate design style to achieve this goal – it lets us evolve systems in parallel, make things look uniform,
and implement stable and consistent interfaces across the system. Unfortunately, this architecture style brings
additional complexity and new problems. Network latency is crucial for online businesses with a direct impact on sales.</p>
<p>Latency is an important part of Quality of Service that determines the degree of consumer satisfaction. End-to-end
latency is one of the user-oriented characteristics used for quality assessment of distributed software architecture.
The ultimate goal is the ability to quantitatively evaluate and trade-off the architecture to ensure competitive
end-to-end latency of software solutions. Therefore, we have created <a href="https://github.com/zalando/typhoon">Typhoon</a> - an
open source project to make assessments of distributed software architecture.</p>
<h3>Why Typhoon was developed</h3>
<p>Typhoon helps us to solve a series of short-term and long-term decision problems. For example, short-term decisions
include the determination of optimal software and infrastructure configuration; long-term decisions concern the
development and extension of data and service architectures, choice of technologies, or runtime environments. Typhoon
help us control the actual end-to-end latency and specify emergency actions when systems are overloaded or technical
faults occur.</p>
<p>Typhoon is a distributed system stress and load testing tool. It simulates traffic from a test cluster towards
system-under-test. Its purpose is the validation of system performance and scalability, while spawning a huge number of
concurrent sessions. The tool provides out-of-the-box cross-platform solutions to investigate protocol and latencies of
microservices.</p>
<p>We had evaluated a few existing solutions and tried to match them with our needs (read below our major requirements),
not ever finding the right fit. Therefore, we created our own tool with Typhoon.</p>
<p><strong>Latency.</strong> We are looking at sources of latency through prisms of infrastructure, protocol, and application. A
deep-dive is required to understand and approximate latencies at each domain. We need to know network delay, round trip
time, a protocol’s handshake latency, time-to-first-byte and time-to-meaningful-response. Typhoon evaluates protocol
overhead by approximating packet metrics and estimates application performance/scalability.</p>
<p><strong>Realtime.</strong> We are looking for a cost efficient, distributed solution suitable to spawn a huge number of concurrent
sessions. It is important to mitigate any bottlenecks within the tool and ensure responsiveness of runtime environment.
Another aspect of this is real-time streaming and analysis of measurements.</p>
<p><strong>Visualization.</strong> The time-series data visualization crisis is well depicted by <a href="https://bost.ocks.org/mike/cubism/intro/#0">Mike
Bostock</a>. The usage of the proposed visualization technique cubism.js
improves readability and the analysis of latencies reported by the tool. We are looking for highly adoptable and
customizable visualization, preferably based on D3.</p>
<p><strong>Usability.</strong> Being easy to deploy and configure are mandatory requirements for us. We are looking for a zero-config
solution, scalable up to dozens of individual nodes hosted in a cloud environment. The tool should offer a sophisticated
approach to define workload scenarios. We believe a pure functional language is the best approach to express artificial
behaviour.</p>
<p>Two strong candidates were evaluated: Tsung and Locust. They are widely known by the community as load testing
frameworks. However, they are not compliant with our needs. Detailed latency analysis and customizable visualization are
key features. This was a decision point to develop Typhoon with focuses on latency, visualization, and usability.</p>
<h3>Latency challenge</h3>
<p>The latency challenge has existed since the beginning of distributed computing. Transparent end-to-end communication
involves various technologies and communication principles.</p>
<p><strong>Infrastructure.</strong> Software architectural decisions should account for the complexity of the underlying network
infrastructure. The Internet is not a single network, it is a series of heterogeneous systems composed of backbone
networks, infrastructures managed by service providers, and various edge/access networks.</p>
<p>The infrastructure appears as a system that make peers wait. This time consists of network and transmission delays: The
network delay is the time when message delivery is requested until that message begins delivery at the remote end; the
transmission delay is time from when the message begins delivery until delivery is completed.</p>
<p>Short interactive scenarios such as client-service interaction concern network delay. Typhoon uses <em>round-trip-time</em> to
estimate network delay.</p>
<p>Long interactive scenarios involve data transfer, streaming, etc. This interaction concerns network transmission. We are
using <em>packet metrics</em> to approximate latency experienced by applications due to infrastructure.</p>
<p><strong>Protocols.</strong> The rise in popularity of microservices architecture introduces new challenges to deal with, such as
network latency and an overhead of communication protocols. Protocol internals have a significant impact on end-to-end
latency in a heterogeneous network environment when communication is constrained by network delay, packet loss, and the
capacity of network equipment.</p>
<p>Connection-oriented protocols require a handshake procedure before data transmission. Typhoon measures the latency
required to establish TCP and TLS connections and approximate a <em>packet rate</em> that shows efficiency of techniques used
by the protocol to provide value added service (e.g. reliable communication, data integrity, overflow, etc).</p>
<p>The time-to-display response is the most valuable metric from a consumer perspective which is influenced by
infrastructure, protocol, and the application environment. Typhoon provides an analysis of application-level protocol
behavior. One of these metrics is <em>time-to-first-byte</em>. This is a concrete, consumer oriented easily measurable
indicator, defined independently of underlying solutions or technologies. It represents initial confirmation that the
remote host is responding and the client application can proceed to rendering. Secondly, <em>time-to-meaningful-response</em>
defines the network delay required to deliver application payload.</p>
<p><strong>Microservice.</strong> The latency analysis of applications requires techniques to investigate component behavior using
information gathered as the service sustains the load. A series of technology decision problems arises concerning both
short-term and long-term arrangements. The short-term decisions include aspects of software configuration and capacity
provisioning. The long-term focuses on technological and architecture requirements, the system’s ability to handle a
certain amount of work, and its potential to be enlarged to accommodate growing traffic requirements, etc.</p>
<h3>Erlang inside</h3>
<p>Massive scalability (the ability to spawn a huge number of concurrent user sessions) and real-time (accuracy of
measurements and real-time data processing) are two major requirements that led us to select Erlang as our runtime
environment. This language is recommended as an indispensable technology in similar applications.</p>
<p>Incremental scalability and decentralization are key principles used by us to define the architecture. Typhoon is a
peer-to-peer system, using consistent hashing to assemble and orchestrate the cluster. Erlang distribution and added-on
third party libraries provide highly available and an eventual consistent actor management layer for us. It helps the
system to deal with any possible network failures and provide high availability for synthetic load and telemetry
collections. The optimistic technique to replicate data has been employed by the design.</p>
<p>Consistent hashing forms a ring topology from cluster nodes. Each node claims ownership of virtual shards. Each shard is
responsible for coordinating workload scenarios based on its identity, spawning the load session across cluster nodes,
aggregating telemetry, etc.</p>
<h3>Workload definition language</h3>
<p>The workload definition language is one of the challenges we’re looking to solve. An expressive language is required to
cover the variety of traffic generation use-cases. We believe a pure functional language is the best approach to express
artificial behaviour. It gives us rich techniques to hide the complexity of Typhoon from developers using monads as
abstractions. <a href="https://wiki.haskell.org/Monads_as_computation">Think about monads as computation</a>. The workload scenario
is defined as a chain of network operations wrapped by IO-monad.</p>
<p>We have decided to use Erlang-flavored syntax for scenario definition in first releases. The decision was mainly driven
by time-to-market. Typhoon is built using an Erlang/OTP runtime. Parsing, compilation, and the debugging of workload
scenarios is provided by the runtime -- scenarios are valid Erlang code. However, the development of workload scenarios
does not necessarily require an Erlang development environment installed on your computer. Typhoon provides a REST API
to lint and compile the scenario code. The scenario development requires a basic understanding of functional programming
concepts and knowledge of Erlang syntax. <a href="http://learnyousomeerlang.com/starting-out-for-real">Erlang language
tutorials</a>, <a href="http://learnyousomeerlang.com/modules#what-are-modules">Erlang module
tutorials</a>, and <a href="http://erlang.org/doc/reference_manual/expressions.html">Erlang
expressions</a> can give you enhanced training on the subject.</p>
<p>A simple workload scenario in pure-functional notation can be seen below. It uses the Zalando Shop API to demonstrate
Typhoon’s capabilities.</p>
<div class="highlight"><pre><span></span><code><span class="o">%%</span>
<span class="o">-</span><span class="n">module</span><span class="p">(</span><span class="n">skeleton</span><span class="p">)</span><span class="o">.</span>
<span class="o">-</span><span class="n">compile</span><span class="p">({</span><span class="n">parse_transform</span><span class="p">,</span><span class="w"> </span><span class="n">monad</span><span class="p">})</span><span class="o">.</span>
<span class="n">title</span><span class="p">()</span><span class="w"> </span><span class="o">-></span>
<span class="w"> </span><span class="s2">"Skeleton Workload Scenario"</span><span class="o">.</span>
<span class="n">run</span><span class="p">(</span><span class="n">_Config</span><span class="p">)</span><span class="w"> </span><span class="o">-></span>
<span class="w"> </span><span class="n">do</span><span class="p">([</span><span class="s1">'Mio'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="n">sequence</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">requests</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">execute</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">IO</span><span class="o">-</span><span class="n">monadic</span><span class="w"> </span><span class="n">computation</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">request</span><span class="p">(),</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="n">execute</span><span class="w"> </span><span class="n">HTTP</span><span class="w"> </span><span class="n">request</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">discard</span><span class="w"> </span><span class="n">results</span>
<span class="w"> </span><span class="n">A</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">article</span><span class="p">(),</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="n">execute</span><span class="w"> </span><span class="n">HTTP</span><span class="w"> </span><span class="n">request</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">assign</span><span class="w"> </span><span class="n">response</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">variable</span><span class="w"> </span><span class="n">A</span>
<span class="w"> </span><span class="k">return</span><span class="p">(</span><span class="n">A</span><span class="p">)</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="n">it</span><span class="w"> </span><span class="n">just</span><span class="w"> </span><span class="n">takes</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="n">A</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">puts</span><span class="w"> </span><span class="n">it</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">an</span><span class="w"> </span><span class="n">IO</span><span class="w"> </span><span class="n">context</span><span class="o">.</span>
<span class="w"> </span><span class="p">])</span><span class="o">.</span>
<span class="n">request</span><span class="p">()</span><span class="w"> </span><span class="o">-></span>
<span class="w"> </span><span class="n">scenario</span><span class="p">:</span><span class="n">request</span><span class="p">(</span>
<span class="w"> </span><span class="n">scenario</span><span class="p">:</span><span class="n">header</span><span class="p">(</span><span class="s2">"Accept-Language"</span><span class="p">,</span><span class="w"> </span><span class="s2">"de-DE"</span><span class="p">,</span>
<span class="w"> </span><span class="n">scenario</span><span class="p">:</span><span class="n">url</span><span class="p">(</span><span class="s2">"https://api.zalando.com/"</span><span class="p">,</span>
<span class="w"> </span><span class="n">scenario</span><span class="p">:</span><span class="n">new</span><span class="p">(</span><span class="s2">"urn:http:zalando:api"</span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="p">)</span><span class="o">.</span>
</code></pre></div>
<h3>To be continued</h3>
<p>The <a href="https://github.com/zalando/typhoon/releases">latest Typhoon release</a> provides a solid background to perform latency
analysis. You can measure network delay, round trip time, protocol handshake times, time-to-first-byte and
time-to-meaningful-response. It provides scalable traffic production in a cloud environment.</p>
<p>Typhoon requires improvements in data analysis and visualization. We need to develop new metrics (e.g. capacity
estimation, active user estimation) and enhance reports to address aspects of executive level reporting and
dashboarding. The workload definition language is another painpoint, but we continue to use Erlang-flavored syntax as
our core language with support for other widely adopted functional languages (e.g. Scala, JavaScript) still required.</p>
<p>If you have any questions about Typhoon, please contact me over <a href="mailto:dmitry.kolesnikov@zalando.fi">email</a> or <a href="https://github.com/zalando/typhoon/issues">raise
an issue</a> on the project.</p>A closer look at the ClassNames npm package2016-08-11T00:00:00+02:002016-08-11T00:00:00+02:00Andra Joy Lallytag:engineering.zalando.com,2016-08-11:/posts/2016/08/a-closer-look-at-the-classnames-npm-package.html<p>Looking at a useful package that all teams using React should be familiar with.</p><p>The newest tool sweeping through the <a href="https://facebook.github.io/react/">React</a> teams here at Zalando Tech is the
<a href="https://github.com/JedWatson/classnames">classNames npm package</a>. The concept is simple and deletes countless lines of
unneeded code, on top of increasing readability.</p>
<p>The first time I heard about this useful package was at an All Hands meeting, where tech teams can share what what they
are doing with their product. It allows engineers to share new knowledge in a shorter period of time. The ideas or tools
shared do not have to be profound, but they do have to improve the development process of our teams.</p>
<p>One of React’s features is to write a conditional that will return one classname over another or no class at all. Below
is an example:</p>
<p>Before:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">render</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">isActive</span><span class="p">,</span>
<span class="w"> </span><span class="n">isHidden</span><span class="p">,</span>
<span class="w"> </span><span class="n">index</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">hiddenClass</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="err">?</span><span class="w"> </span><span class="err">‘</span><span class="n">isActive</span><span class="err">’</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="err">‘’</span><span class="p">;</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">activeClass</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">isHidden</span><span class="w"> </span><span class="err">?</span><span class="w"> </span><span class="err">‘</span><span class="n">isHidden</span><span class="err">’</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="err">‘’</span><span class="p">;</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">fadeInClass</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="err">?</span><span class="w"> </span><span class="err">`</span><span class="n">fade</span><span class="o">-</span><span class="ow">in</span><span class="o">-$</span><span class="p">{</span><span class="n">index</span><span class="p">}</span><span class="err">`</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="err">‘’</span><span class="p">;</span>
<span class="w"> </span><span class="p">(</span><span class="n">etc</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">conditionals</span><span class="w"> </span><span class="err">…</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">TEXT</span>
<span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
</code></pre></div>
<p>After:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">render</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">isActive</span><span class="p">,</span>
<span class="w"> </span><span class="n">isHidden</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">props</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">containerClasses</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">classNames</span><span class="p">(</span><span class="w"> </span><span class="err">‘</span><span class="n">other</span><span class="o">-</span><span class="k">class</span><span class="o">-</span><span class="n">name</span><span class="err">’</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="err">‘</span><span class="n">isHidden</span><span class="err">’</span><span class="p">:</span><span class="w"> </span><span class="n">isHidden</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="err">‘</span><span class="n">isActive</span><span class="err">’</span><span class="p">:</span><span class="w"> </span><span class="n">isActive</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="err">‘</span><span class="n">index</span><span class="err">’</span><span class="p">:</span><span class="w"> </span><span class="err">`</span><span class="n">fade</span><span class="o">-</span><span class="ow">in</span><span class="o">-$</span><span class="p">{</span><span class="n">index</span><span class="p">}</span><span class="err">`</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">(</span><span class="n">etc</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">conditionals</span><span class="w"> </span><span class="err">…</span><span class="p">));</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span>
<span class="n">TEXT</span>
<span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
</code></pre></div>
<p>I’m also using the new <a href="https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Template_literals">string
interpolation</a> feature and have added
JavaScript to the code above. If you are not using this new feature, you should start today!</p>
<p>Instead of having endless conditionals on the top of your React render method, you have it nicely cleaned up to one line
that is easy to read and reason through.</p>
<p>While this package is not the most profound invention in the React community, it is incredibly helpful and deletes
unneeded lines of code. The increase of readability between team members is also a big plus. If you are not using it at
the moment I encourage looking at the documentation and add it to your current project.</p>
<p>Want to know how I added it to mine? Contact me at <a href="mailto:andra.joy.lally@zalando.de">andra.joy.lally@zalando.de</a> with your questions.</p>Can interviewing people make you a better conversationalist?2016-08-10T00:00:00+02:002016-08-10T00:00:00+02:00Christian Kaspertag:engineering.zalando.com,2016-08-10:/posts/2016/08/can-interviewing-people-can-make-you-a-better-conversationalist.html<p>If your job involves interacting with people to get to know them, you need to read this.</p><p>As a User Researcher at Zalando Tech, one of my goals is to create a deeper understanding of human needs, behaviour, and
decision making within the company. To create this empathetic mindset, it's crucial to master conversations with our
respondents.</p>
<p>This post isn’t merely about interviewing users. It focuses rather on <strong>five principles</strong> that I have learned in my
years at Zalando as a User Researcher that can increase the inspiration and empathy in our daily conversations; with
friends and family, at work, or with the people you’ve just met.</p>
<p>We implement user research as a way to provide our customers with valuable content and useful features. For example,
during one of our projects we spoke to frequent sneaker shoppers to help inspire future sneaker campaigns.</p>
<p>The challenge of these conversations was to identify reasons why people buy sneakers. It wasn’t enough to just scratch
the surface and find out about the obvious pros of a pair of sneakers. We had to dig deeper to uncover and understand
the different, partly unconscious triggers that led to each sneaker purchase.</p>
<p>By using outlined principles, the right questions, and observing interactions between users and our services, we were
able to identify these triggers. I’ll explore the means we used to reach this conclusion below.</p>
<p>Let's start.</p>
<h3>Lesson 1: Use questions as tools to build a connection</h3>
<p>Asking the right questions, at the right time, can reveal inspiring and engaging stories. We also use questions as a
tool to dig into a more subtle level of the human mind. You want to ask questions that reveal emotions and get into a
person’s thought patterns.</p>
<p>Most of us have experienced conversations like the following:</p>
<p><em>Person 1:</em> What do you like to do in your free time?
<em>Person 2:</em> I like meeting with friends, going to concerts, and I’m actually into running. How about you?
<em>Person 1:</em> Almost the same for me, except I don’t really like running – I prefer to swim.
<em>Person 2:</em> So, you’re also sporty...
<em>Person 1:</em> Yes, sort of.</p>
<p>Sounds incredibly boring, right? To bring the conversation to a deeper level, try asking questions that help create a
meaningful connection. By continuing the conversation with a question like “What do you like about running?”, a better
understanding can be established.</p>
<p>This effect is the reason why user researchers are constantly asking “Why” over and over again.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8fb786b6cc647cdcba89064e958707efef227c0a_img_0045.jpg?auto=compress,format"></p>
<h3>Lesson 2: Ask thought-provoking questions</h3>
<p>It’s best to ask thought-provoking questions instead of obvious ones. These questions could ultimately lead to stories
and turn conversations into engaging discussions:</p>
<ul>
<li>Try using "How old do you feel?” as an alternative or additional question to "How old are you?"</li>
<li>Ask people "What do you want to become?” instead of "What do you do for a living?".</li>
<li>Probe deeper with "What was the highlight of your trip to Mexico?” rather than “How was your trip to Mexico?”</li>
</ul>
<p>Questions like these make conversations more appealing and worth the investment for you and your conversation partner.</p>
<h3>Lesson 3: Everyone is an expert</h3>
<p>When conducting interviews to gain insights, you should put aside your own biases and experience the stories of the
people you are speaking to.</p>
<p>It could be that your interviewee has a controversial opinion or a point of view which you don’t agree with, however
it’s still important to switch off judgemental thinking and keep your ears (and mind) open to create an understanding of
their story.</p>
<p>Don’t begin a conversation expecting to know everything; everyone has his or her own experience and should be classified
as an expert in that field. Make sure to treat them that way.</p>
<h3>Lesson 4: Be mindful</h3>
<p>You should give your conversation partner your full attention when speaking with them. Avoid multitasking, don’t be
distracted by new Snapchat filters, don’t check your emails… just be with this person. Concentrate on the conversation
without any distractions.</p>
<p>For us, this means that during user interviews the interviewer fully concentrates on the conversation, while it is their
partner’s job to observe and take notes.</p>
<p>Often, we drift away with our thoughts when we actually need to be listening to the person in front of us. This could be
for a variety of reasons: The conversation has triggered other thoughts or we are formulating how to respond to the
person we’re speaking to. Being mindful means focusing on the task at hand and not being distracted by your thoughts.</p>
<h3>Lesson 5: Be a good listener</h3>
<p>This lesson is one of the most obvious, but also the most important thing when it comes to conversations. Not only is it
important that we listen to keep the conversation going, but more so to create an understanding in order to build a
rapport with whom we’re speaking to.</p>
<p>Listen and notice what piques your curiosity, what you want to explore more and ask questions about it.</p>
<p>A good listener is not only listening, they are also connecting the dots to create a full picture and asks the right
questions (see lessons 1 and 2) at the right time, without interrupting the person.</p>
<p>Listening also allows your conversation partner to offer more insights because you’re showing interest in what they’re
saying. It’s beneficial for all involved.</p>
<h3>Conclusion</h3>
<p>The lessons above help us foster empathy for our customers and to create products that are useful and serve customer
needs. Without these insights, we run the risk that our products and services wouldn’t serve a need and wouldn’t be
useful to them.</p>
<p>If your job involves interacting with people in order to get to know them (such as interviewing customers, applicants,
or talking to clients), these principles can help to reveal meaningful information that in turn creates a whole new
picture of the person you are talking to.</p>
<p>Let your conversations reflect how you want to be perceived.</p>Welcome to the family, Zalando AdTech Lab Hamburg!2016-08-09T00:00:00+02:002016-08-09T00:00:00+02:00Tobias Schlottketag:engineering.zalando.com,2016-08-09:/posts/2016/08/zalando-adtech-lab-hamburg.html<p>We're introducing you to another member of the Zalando family: Zalando AdTech Lab Hamburg.</p><p>Zalando Tech is growing, with our tech hubs spread throughout Europe: Besides our headquarters in Berlin, we’re in
<a href="https://tech.zalando.de/locations/#dortmund">Dortmund</a>,
<a href="https://tech.zalando.de/blog/working-at-zalando-dublin/">Dublin</a>, and
<a href="https://tech.zalando.de/blog/hello-helsinki/">Helsinki</a>, and have smaller Tech teams located in our fulfillment centers
in Erfurt and Mönchengladbach. But did you know that we’re also in Hamburg? We’d like to introduce you to another member
of the Zalando family: Zalando AdTech Lab Hamburg.</p>
<p>How does our latest hub fit into Zalando’s <a href="https://tech.zalando.de/blog/zalandos-vp-brand-solutions-presents-at-the-july-2015-fashtech-konferenz./">platform
strategy</a>? Real
time advertising is a developing trend that allows brands to reach customers with a totally new level of precision. In
this field, Zalando’s interests aren’t purely advert driven: We’re also moving ahead as a service provider for partner
brands with our newly launched business unit, <a href="https://mediasolutions.zalando.com/">Zalando Media Solutions</a>. Our
Hamburg-based office comes, thanks to the longstanding experience from previously acquired AdTech specialist Metrigo,
with a big chain of products, tools, and experts in this area.</p>
<p>The Hamburg hub has 20 employees involved in product management, engineering, and data science to support the Zalando
advertising platform. This includes programmatic buying, attribution, and reporting.</p>
<p>The hub also has a low-level, high-throughput stack that is very data heavy. They’re programming in Java, Scala, Ruby,
and Python along with data analysis in Apache Spark and other technologies. A framework for ad hoc data-aggregation has
been developed based on Jupyter.</p>
<p>Apart from that, we’ve made machine learning one of our key pillars. Our team is currently evaluating deep learning as
an alternative to classical approaches. We’re investigating customer journeys and using machine learning, which allows
us to find the best matching consumer audiences for our brand partners. We’re also actively using recurrent neuronal
networks and deep learning via Google’s <a href="https://www.tensorflow.org/">TensorFlow</a> to understand user behaviour, in
connection with the resulting customer clusters on the Zalando website. Real Time Advertising plays a key role in our
operations as well.</p>
<p>Zalando’s investment in ad-technology was a strategic move that has seen the company under Zalando’s wing for almost a
year now, with the office currently growing. To support this growth, we’re <a href="https://jobs.zalando.de/en/?location=Hamburg&search=">actively
recruiting</a> frontend and backend developers, data scientists, and
managers from across the industry, as well as in academia.</p>
<p>Watch this space for some amazing developments.</p>Talking to Techsperts: The Price of Employee Freedom2016-08-05T00:00:00+02:002016-08-05T00:00:00+02:00Zalando Technologytag:engineering.zalando.com,2016-08-05:/posts/2016/08/talking-to-techsperts.html<p>We sat down with this month's Techsperts to expand on ideas about autonomy in the workplace.</p><p>You might be familiar with our ongoing <a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/232867086/">Zalando Techspert
Series</a>, which collects experts from respected
companies to talk all things tech, from culture to customers.</p>
<p>This month’s theme was about <a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/events/232867086/">giving employees the right amount of
freedom</a>. We had Gen Sadakane, Creative Director
and Co-Founder at EyeEm on board together with Mark Ralea, Managing Director at Glossybox. They were joined by Zalando
Tech’s Stacia Carr, Head of People (Engineering) to chat, debate, and answer questions from the public.</p>
<p>Did you miss the exciting discussion? We sat down with our Techsperts to expand on ideas about autonomy in the workplace
and what benefits (or hindrances) this offers to employees of companies large and small.</p>
<p><em>Zalando: In the context of giving employees freedom, how do you scale work culture to ensure it works for everybody?</em></p>
<p><em>Stacia Carr:</em> Scaling freedom in an organization actually requires more guidance than anything. Creating a really
strong feedback culture is part of that, to ensure employees understand whether their efforts are aligned with the
company’s purpose and its business needs. This helps to provide the right kinds of checks and balances along the way –
it’s not just raw feedback, but it's making sure that the right people have the opportunity to check in at the right
time.</p>
<p><em>Gen Sadakane:</em> EyeEm is obviously heavily involved in the photography community and marketplace, so for us it’s
incredibly important that people take a keen interest in our core activity. For us, this is the connection to our
company culture. Having this basic, yet intrinsic interest serves as the foundation for future growth.</p>
<p><em>Mark Ralea:</em> I think the crucial ingredient is to have a common goal, especially as each individual has a goal or
purpose for themselves. This, coupled with what both Stacia and Gen mentioned, are all relevant points.</p>
<p><em>Zalando: What link do you make, if any, to freedom and creativity in the workplace?</em></p>
<p><em>Mark Ralea:</em> At Glossybox, we have a product related to passion and creativity, meaning that an absence of the link to
freedom and creativity would result in unhappy customers. Without an explicit link between freedom and creativity, we
just wouldn’t be competitive in the market.</p>
<p><em>Gen Sadakane:</em> I believe it depends on employee levels and development. This could range from differences in age to
experience. I have an agency background, meaning we were really drilled to work within strict constraints. I don’t think
this is necessarily the answer, however I think there are people that need to be “schooled” in this way. People can get
lost within an environment that offers too much freedom – some need guidance, as not everyone considers themselves to be
creative.</p>
<p><em>Stacia Carr:</em> Freedom and creativity are intrinsically intertwined, however unlimited freedom can hamper creativity.
Freedom without constraint can result in people not producing anything, but freedom within a particular context where
there are hard limitations by virtue of an organization that has to generate revenue, and has shareholders to answer to,
is super powerful. It gives people the space to think out-of-the-box and for themselves.</p>
<p><em>Zalando: Do employees need to prove they can be given freedom? Is this kind of autonomy earned?</em></p>
<p><em>Mark Ralea:</em> I think from the absolute beginning you need to give a lot of trust to your employees so that they’re
equipped with the right amount of freedom to do their best work. Without that initial gift of trust, you’ll never be in
a position to create the environment that you want.</p>
<p><em>Stacia Carr:</em> I think I have a different point of view here. Particularly at scale, in a large organization where a
high bar has been set, people need to feel that trust is earned, which is why feedback is so important. It’s not about
having a difficult environment – it’s about giving people the opportunity to demonstrate what they’re capable of and to
see whether their output is right for that environment. This develops trust.</p>
<p>When you have projects that have a lot of risk associated with them, you have to be really honest about how much blanket
trust you can give. This is particularly the case when you’re dealing with junior employees who haven’t lived through
the failures of others. Yes, we’re open to taking risks, but there has to be an assessment of the potential for damage
and how much trust you can give based on that.</p>
<p><em>Gen Sadakane:</em> It’s important to have clear goals and a company vision – if you’re equipped with these things, some
people can figure out how to achieve what’s been indicated. Some employees will do it in a 40 hour week, some will spend
60 hours with passion. In the end, it’s all about results.</p>
<p>--</p>
<p>Keep an eye on our <a href="https://www.meetup.com/Zalando-Tech-Events-Berlin/">Zalando Tech Meetup page</a> to be notified of the
next Techspert Panel, or follow our updates on Twitter <a href="https://twitter.com/ZalandoTech">@ZalandoTech</a>.</p>Zalando Dortmund's RuhrJS Journal2016-08-03T00:00:00+02:002016-08-03T00:00:00+02:00Jan Stroppeltag:engineering.zalando.com,2016-08-03:/posts/2016/08/zalando-dortmunds-ruhrjs-journal.html<p>A summary of the most interesting talks we attended at RuhrJS, plus our learnings and takeaways.</p><p>The Ruhr Area is the biggest metropolitan area in Germany, with nearly 5.4 million inhabitants and of course, home to
Zalando Tech Dortmund. Being the former industrial heart of Germany, it has made a terrific structural change towards
services and information technologies in the last few years.</p>
<p>To reflect the growing impact of the Ruhr Area in the IT-sector and to strengthen the frontend community,
<a href="http://ruhrjs.de/">RuhrJS</a> was founded, an international JavaScript conference taking place from 2-3 July.</p>
<p>We wanted to summarize the most interesting talks we attended and give you an idea of our learnings and takeaways.</p>
<h3><a href="http://magixmobx.surge.sh/#/?_k=y3numa">Magic MobX - Become a reactive wizard in 30 minutes</a> (Michel Weststrate)</h3>
<p>When you use <a href="https://facebook.github.io/react/">ReactJS</a> as a rendering library together with
<a href="http://redux.js.org/">Redux</a> for state management, do you think you have found the perfect technology stack for the
next few years? You are a JavaScript developer, you should know better!</p>
<p>With <a href="https://mobxjs.github.io/mobx/">MobX</a>, developed by <a href="https://twitter.com/mweststrate">Michel Weststrate</a>, an
alternative application state management library has emerged that’s worth looking into.</p>
<p>MobX makes your application state fully reactive, so there is no need for a dispatcher or reducers to take care of your
state, nor some complicated change detection algorithms.</p>
<p>With MobX the application state can be made observable, using the corresponding decorator:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span><span class="w"> </span><span class="n">TodoList</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nv">@observable</span><span class="w"> </span><span class="n">todos</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">[]</span><span class="p">;</span>
<span class="w"> </span><span class="nv">@computed</span><span class="w"> </span><span class="k">get</span><span class="w"> </span><span class="n">unfinishedTodoCount</span><span class="p">()</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">todos</span><span class="p">.</span><span class="k">filter</span><span class="p">(</span><span class="n">todo</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="err">!</span><span class="n">todo</span><span class="p">.</span><span class="n">finished</span><span class="p">).</span><span class="n">length</span><span class="p">;</span>
<span class="w"> </span><span class="err">}</span>
<span class="p">...</span>
<span class="err">}</span>
</code></pre></div>
<p>The <em>@computed</em> decorator ensures that <em>unfinishedToDoCount</em> updates if <em>todo</em> is modified.</p>
<p>Everything that should be triggered by state changes, like updating the ui, is handled by so called <strong>Reactions</strong>,
annotated with the <em>@observer</em> decorator. You can turn your React components into reactive components with the
following:</p>
<div class="highlight"><pre><span></span><code><span class="err">@</span><span class="n">observer</span>
<span class="n">Class</span><span class="w"> </span><span class="n">TodoListView</span><span class="w"> </span><span class="k">extends</span><span class="w"> </span><span class="n">Component</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">render</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">{</span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">todoList</span><span class="o">.</span><span class="n">todos</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">todo</span><span class="w"> </span><span class="o">=></span>
<span class="w"> </span><span class="p">)}</span>
<span class="w"> </span><span class="n">Tasks</span><span class="w"> </span><span class="n">left</span><span class="p">:</span>
<span class="p">{</span><span class="n">this</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">todoList</span><span class="o">.</span><span class="n">unfinishedTodoCount</span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Application state will be changed using <strong>Actions</strong> (similar to Redux or Flux in general), but MobX itself isn’t picky
about how this should be handled.</p>
<p>If you’re more interested in the pros and cons of using MobX or Redux, have a look into <a href="https://discuss.reactjs.org/t/redux-or-mobservable-what-to-choose/2453">this linked
discussion</a>.</p>
<h3><a href="http://slides.com/francescostrazzullo/sacrificial-architecture-in-modern-web-development-ruhrjs-2016#/">Sacrificial Architecture in Modern Web Development</a> (Francesco Strazzullo)</h3>
<p><a href="https://twitter.com/TheStrazz86">Francesco</a> opened up our minds about what developers very rarely consider when
planning a new web application; notably that your architecture, frameworks or requirements will change during the
application's lifetime.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a55fe80e95a82e89e05b5e66097e1bb810b50499_sacrificial.jpg?auto=compress,format"></p>
<p><a href="http://martinfowler.com/bliki/SacrificialArchitecture.html">Sacrificial Architecture</a>, defined by <a href="https://twitter.com/martinfowler">Martin
Fowler</a>, means accepting now that in the near future you’ll have to throw away what
you are currently building, partly or as a whole.</p>
<p>From the beginning you should consider your architecture to be flexible enough to easily renew deprecated components.
This can be achieved with the correct modularity and a strong separation of logic, for example.</p>
<p>If you are using frameworks like Angular, design your way out of the framework by preventing framework specific code
inside your business logic (one example is using the common <a href="https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API">Fetch
API</a> instead of the $http module).</p>
<p>Additionally, pay attention when separating business logic from your UI components. Use events to dispatch the changes
and take care of them outside of your UI components instead of using business logic inside of them.</p>
<p>I think every developer knows that the code they write has a limited durability, but having never written code with this
in mind, the talk was something of an eye opener.</p>
<h3><a href="http://pascalprecht.github.io/slides/angular-2-change-detection-explained/#/">Angular 2 Change Detection Explained</a> (Pascal Precht)</h3>
<p>We were particularly curious about this talk since our Team started to develop an Angular 2 application some months ago
(which is in production now!) and we’ve already gained some experience working with the framework.</p>
<p>To understand the Angular 2 change detection it is important to understand when, how and why change detection is
triggered. <a href="https://twitter.com/pascalprecht">Pascal Precht</a> pointed out that change detection has to happen whenever
the state of your application is mutated. The cause of those mutations is always some asynchronous event, for example a
user input or HTTP requests.</p>
<p>So how does Angular 2 know about such things? The answer is <a href="https://github.com/angular/zone.js/">Zones</a>. Pascal managed
to explain the concept of Zones (a language feature in Dart which was ported to JavaScript) and the <a href="https://www.youtube.com/watch?v=8aGhZQkoFbQ">event
loop</a> on the fly during his talk which was very enlightening. Furthermore,
he described some other core concepts like the <a href="https://vsavkin.com/the-core-concepts-of-angular-2-c3d6cbe04d04#.q6z6df8sr">component
tree</a> and pointed out the usefulness of the
concept of <a href="http://victorsavkin.com/post/133936129316/angular-immutability-and-encapsulation">immutability</a> to improve
performance during change detection. It was also interesting to see how some of the shown concepts converge with topics
presented in another very interesting talk by <a href="https://twitter.com/mxstbr">Max Stoiber</a> about Scaling React
applications.</p>
<h3><a href="http://fritzvd.com/talks/nes-tools/#1">On NES Development</a> (Fritz van Deventer)</h3>
<p>Initially, a talk about NES development, only slightly related to JavaScript, did not match our criteria for “must see”
presentations. However, <a href="https://twitter.com/fritzvd">Fritz van Deventer</a> made a point of “doing new things” and
widening your horizon, and the slides alone, partially shown on the NES, were worth it.</p>
<p>Furthermore, Fritz talked about how constraints are a beautiful thing, fostering innovation and creativity (think
Twitter, think programming), why removing abstractions is a good idea (you actually understand things), and why “not
invented here” is a dumb response most of the time.</p>
<p>You can check out the mentioned <a href="https://www.npmjs.com/search?q=nesly">npm packages</a> if you are interested in more
details.</p>
<p>From our point of view, the RuhrJS was a worthwhile conference for us Dortmund folk. Every single talk had its right to
exist, and the variety of topics was huge and well composed. A big thanks from our side goes to the organizer of the
RuhrJS, <a href="https://9elements.com/">9Elements</a> and <a href="https://twitter.com/Maggysche">Madeleine Neumann</a> in particular. We are
looking forward to subsequent conferences and are very excited about what’s to come for next year.</p>An Obsession with Design Patterns: Redux2016-08-02T00:00:00+02:002016-08-02T00:00:00+02:00Andra Joy Lallytag:engineering.zalando.com,2016-08-02:/posts/2016/08/design-patterns-redux.html<p>Our deep dive into design patterns continues with Redux, the State Tree, and the Connect Method.</p><p>My most recent obsession in the coding world has been design patterns. I love patterns!</p>
<p>In my free time I’ve started studying design patterns and algorithms, since I first honed my craft attending a web
developer bootcamp as opposed to getting a proper Computer Science degree. Feeling that my knowledge was lacking, I dove
in head first. I would like to take the time to share some of that exploration with you.</p>
<p>I wanted to give a brief explanation of what <a href="https://github.com/reactjs/redux">Redux</a> is, but more importantly, discuss
the design patterns it uses to make our lives as developers easier. In its simplest sense Redux is a way to organize
one's data on the frontend. It has strict guidelines of how data can move or flow through a project, which is known as
<a href="http://redux.js.org/docs/basics/DataFlow.html">unidirectional data flow</a>. Below is an image of how the data would be
allowed to flow in a Redux application.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ff54323021b8177c0c6cf0a03bfbd1edcc08c276_redux-flow.png?auto=compress,format"></p>
<p>While I will not be giving an in depth explanation of this concept, I’ll instead be focusing on two key components in
Redux: <a href="http://redux.js.org/docs/introduction/ThreePrinciples.html">The State Tree</a> and the <a href="https://egghead.io/lessons/javascript-redux-generating-containers-with-connect-from-react-redux-visibletodolist">Connect
Method</a>.
I will also look into which design patterns they use.</p>
<h3>The State Tree</h3>
<p>The State Tree uses a pattern known as the <a href="https://en.wikipedia.org/wiki/Singleton_pattern">Singleton Pattern</a>. Its
definition is:</p>
<p>“In software engineering, the <strong>Singleton Pattern</strong> is a design pattern that restricts the instantiation of a class to
one object.”</p>
<p>While this is one of the easier patterns to understand in computer science, I believe it can still be hard to visualize
in a real life application. What this means when we compare it to Redux is that there can only be ONE state tree.</p>
<p>The reason for only using one tree is ensuring there is only one place to look for the different states or changes
within your application. This is easier for the human mind to comprehend, thus speeding up production. The Singleton
Pattern is a beautiful pattern put into place by the Redux team and one of the major differences between Redux and Flux
(another popular <a href="http://redux.js.org/docs/basics/DataFlow.html">unidirectional data flow</a> library).</p>
<h3>The Connect Method</h3>
<p>The Connect Method in Redux uses the <a href="https://en.wikipedia.org/wiki/Observer_pattern">Observer Pattern</a>. I’ve provided
the definition below:</p>
<p>“The <strong>Observer Pattern</strong> is a software design pattern in which an object, called the subject, maintains a list of its
dependents, called observers, and notifies them automatically of any state changes.”</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/894dc02537d075480c53b7946c158c3ff69a286d_observer-pattern.png?auto=compress,format"></p>
<p>Wow, that is a mouthful. Let's use Redux to make sense of it.</p>
<p>In Redux, it is possible for components to listen (or connect) to any part of the state tree. When the state changes,
the components will update. This is an excellent example of the Observer Pattern. The observers are the components, and
the subject is the state tree.</p>
<h3>Learnings</h3>
<p>So, what have we learned? We use design patterns everyday without knowing it. We also jumped into a deeper understanding
of the Singleton and Observer Patterns and how these patterns work together.</p>
<p>These are just two of an endless list of design patterns. If you are interested in learning more about patterns, please
check out my previous post on the <a href="https://tech.zalando.de/blog/the-factory-pattern-in-react/">Factory Pattern With
React</a>. You can also contact me at
<a href="mailto:andra.joy.lally@zalando.de">andra.joy.lally@zalando.de</a> with any questions or comments.</p>JAX Finance Learnings from London2016-08-01T00:00:00+02:002016-08-01T00:00:00+02:00Marc Schumachertag:engineering.zalando.com,2016-08-01:/posts/2016/08/jax-finance-learnings-from-london.html<p>Read about Zalando Dortmund's travels to London to hear about payment topics at JAX Finance.</p><p>There are only a few developer conferences focusing on payment topics, and <a href="https://finance.jaxlondon.com/">JAX Finance</a>
is one of them.</p>
<p>The conference was held from 27-29 April at the Park Plaza Victoria in the center of London. As a member of the Zaster
team (Zaster is German slang for “money”), responsible for developing payment processing components at Zalando Payments,
my mission was to get in contact with other developers, companies, and speakers dealing with payment related topics. As
finance itself is a quite rare field concerning development, the conference was a mix of both finance and DevOps topics.</p>
<h3>Presentations and learnings</h3>
<p><a href="https://twitter.com/EricHoresnyi">Eric Horesnyi</a> from <a href="https://streamdata.io">streamdata.io</a> was the first keynote
speaker of the conference, stating that what had previously happened with Netflix & Co. is now happening in Fintech.</p>
<p>According to Horesnyi, the most important aspects to consider for being successful are:</p>
<ul>
<li>Be pragmatic and fast</li>
<li>Collect user feedback and iterate</li>
<li>Have lean approaches</li>
<li>Go for cloud</li>
<li>Use DevOps</li>
<li>Be transparent and share</li>
</ul>
<p>He continued talking about why the creation of an API is so important. It is used to monetize, accelerate roadmaps,
disrupt through transparency, and interconnect in real time.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8bc39a5b06a30f8ac9deba1754f84341cf3cb6a3_img_0089.jpg?auto=compress,format"></p>
<p>Following the keynote, a panel discussion was held between Stefan Weiß from <a href="https://www.fidor.de/">Fidor</a>, Pierce
Crosby from <a href="http://stocktwits.com/">StockTwits</a>, Joost van de Wijgerd from <a href="http://getbux.com/">Bux</a>, and Simon Redfern
from <a href="http://tesobe.com/">Tesobe</a>. They touched on the importance of culture and noted that communication was a key
aspect in the sector. Getting closer to customers and connecting to them was thought of as vital.</p>
<p>The main statements shared by the panel were that startups get cherry-picked and reintegrated by bigger companies. They
focus on solving problems for the customer, mainly those that are frontend related. The technical platform for this is
simple APIs.</p>
<p>Development-wise, lean teams are important. We were urged to use these teams to create APIs, especially for startups,
making it easy to get in touch with banks. Create a transparent environment and open up the infrastructure to be
accessible for implementers and customers, as required of
<a href="http://ec.europa.eu/finance/payments/framework/index_en.htm">PSD/2</a>.</p>
<p>Following the panel, I attended an interesting talk about a German banking company called <a href="https://www.fidor.de/">Fidor</a>
held by <a href="https://twitter.com/weiss2go">Stefan Weiß</a>. He talked about what is possible when you open up your banking
interfaces using APIs which account owners and developers can use. Compared to traditional banks, which might have a web
interface for the customer, this bank allows users to do all they could do in the traditional web interface setup by
<a href="http://docs.fidor.de/">using an API</a>.</p>
<p>Seeing that it is possible to transfer money using a small curl call on the command line, you also could use the API to
create web interfaces or backend services such as automating tasks, writing GUIs, etc. What is also possible for
developers is the ability to integrate their software into the banking web interface as a third party extension. Neat!
For me, as a developer from the payment department, seeing a sophisticated API for bank transfers using SEPA was quite
interesting.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2fcc1d7d576ba0437954ef877dedbfc28b020447_img_0090.jpg?auto=compress,format"></p>
<p>Going a step further, a project called the <a href="https://openbankproject.com/">Open Bank Project</a> aims to give you a platform
to set up your own bank. As a lot of banks still use <strong>really</strong> old software in the backend, so this might be a starting
point to update their software.</p>
<p><a href="https://twitter.com/simsysims">Simon Redfern</a>, the founder of the project, talked about all the modern technologies
they use in the project: Scala, RESTful APIs, OAuth, flexible connectors, making use of freely customizable metadata,
Kafka, plus a huge amount of SDKs for Python, Scala, Java, and others. The project already has notable supporters like
RBS, ING, Rabobank, and Ulster Bank showing that traditional banks have noticed the need to modernize.</p>
<p>An important factor to consider when talking about payment processing is security. Due in part to the requirements of
the credit card industry (PCI) and legal regulations, this has been, and still is, an important topic for the payment
team I am working in. With this in mind, I attended a presentation from <a href="https://twitter.com/eoinwoodz">Eoin Woods</a>
about <strong>Secure System Fundamentals</strong>. He stressed the fact that security is hard to achieve, as its focus is not always
as big as it should be.</p>
<p>Eoin quoted Bruce Schneier: “Security is not a product, but a process”. It can be so hard to have a high level of
security in your products: You need to make security part of your daily work, not just a project that has a start and an
end. <a href="http://www.infosecurity-magazine.com/opinions/how-can-your-company-radically/">Security is an ongoing process</a>.</p>
<p>To move to a highly secured system, it is important that the whole development is risk driven. The design of software
needs to have security built in, as it is hard to attach security processes to an existing piece of software. Besides
this, people and process are just as critical: What is the point of having a highly secured system while people write
passwords on Post-It’s?</p>
<p>JAX Finance also featured presentations about DevOps, with the first talk I attended about <strong>Java SE 8 best practices</strong>.
As our team adopted Java 8 specifically a few months ago, I was able to find out more about issues, learnings, and best
practices. <a href="https://twitter.com/jodastephen">Stephen Colebourne</a> gave the talk, on top of having a
<a href="http://blog.joda.org/">blog</a> that is quite well known in the Java community.</p>
<p>The bottom line about Java 8 is that it improves readability if used properly. With Java 8 already being more than two
years old, a lot of developers do not feel comfortable in using the additional functionality you get. Perhaps Stephen’s
conclusion will encourage others to try it out.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c722d4cf320c3bb8a22703922727284ad460b686_img_0115.jpg?auto=compress,format"></p>
<h3>Continuous Delivery</h3>
<p>Continuous delivery and integration was another hot topic of the conference, as it is one of the important approaches
required to achieve a short time to market. Stefanos Zachariadis is Team Lead at LMAX Exchange in London, which
co-invented continuous delivery. In his talk about <strong>continuous delivery in a regulated environment</strong>, he stated that
many people think deploying to a regulated environment is risky due to the high cost of failure, and should therefore be
done as rarely as possible.</p>
<p>The opposite is actually the case: You will always make mistakes and have bugs in your code. Therefore, it makes more
sense to deploy frequently to be able to identify bugs fast. As Stefanos said: “Eat your own dogfood and eat it early
and often”.</p>
<p>By implementing continuous delivery, you reduce the effort of deployments, the pain and impact of failed deployments, as
well as the time it takes to deploy in the first place. In a regulated environment you obviously won’t be able to access
production directly, so it makes sense to have a staging environment which is quite close to the setup of production:
The topology should match production.</p>
<p>The idea is to set up a fully automated deployment pipeline for staging that is almost identical for production, with
some slight differences in configuration. An often forgotten point is to have (sanitized) production data in the staging
environment, in order to have a proper testing environment. Having only a very limited dataset of test entries in your
staging environment will not help when testing. This is done not only for performance, but also to have a multitude of
different records on the system which help find bugs.</p>
<p>For example, when introducing a new validation or calculation. Having this data on your staging machine enables you to
see problems early, rather than when it is too late. Move sanitized live data continuously to staging to be up to date.
Moving data rarely, or only after a long period of time, does not help in a continuously changing environment. When
doing so, the updating of sanitation scripts might have to be adjusted on release where necessary.</p>
<p>Continuous integration is always a challenge. Quite interestingly <a href="https://twitter.com/eduardsi">Eduards Sizovs</a>, in his
entertaining talk about <strong>eight things that make continuous delivery go nuts</strong>, pointed out that the real reasons might
come from a different source. Trying to implement continuous integration often reveals problems you always had in the
way you were developing and deploying, such as having a lot of manual steps to complete during deployment.</p>
<p>Purely technical problems are only one part of the story. We must also consider the human component in the process of
introducing CI: Get people and different departments on board, get them aligned, compromise, and if everything else
fails, try what Eduards called “beer driven diplomacy”. Involve people who will use CI instead of just confronting them
with a fully evolved solution that they cannot contribute to.</p>
<p>You can find all the keynotes plus some interviews from the conference <a href="https://vimeo.com/jaxtv/videos">here</a>.</p>
<h3>Conclusion</h3>
<p>There were a lot of valuable talks which helped me get an insight into the rest of the FinTech community. It broadens
your view once you know that there are others dealing with similar problems and possible solutions.</p>
<p>Besides finance topics, the DevOps part of the conference was really helpful! Getting more insights about what problems
other teams have, as well as and their learnings, guides us in not making the same mistake. The focus on continuous
delivery topics underlines its importance and shows that Zalando are on the right path with the way we work.</p>
<p>I really enjoyed JAX Finance in London and establishing better connections to the Fintech community. If you have any
questions about my learnings, you can get in touch via Twitter <a href="https://twitter.com/jackeroo_marc">@jackeroo_marc</a>.</p>Best Practices for Android Developer Productivity2016-07-28T00:00:00+02:002016-07-28T00:00:00+02:00Sergii Zhuktag:engineering.zalando.com,2016-07-28:/posts/2016/07/best-practices-for-android-developer-productivity.html<p>Level up your Android development skills with these tried and true tips from Sergii Zhuk.</p><p>The efficiency of your software engineering work depends not only on your deep knowledge and expertise, but also on the
toolset, proper environment configuration, and team collaboration activities.</p>
<p>I recently gave a talk at <a href="http://droidcon.de/en/sessions/effective-android-development">Droidcon Berlin</a> about the best
practices for Android developer productivity that we use in our Zalando Tech team. Below you can find the key points
from my talk which will make your developer life more pleasant and your app more stable.</p>
<h3>What does your AndroidManifest really look like?</h3>
<p>A lot of us already know that the <em>AndroidManifest.xml</em> you see in the text editor is not always the same as the one
that will be included in the application build. It happens mainly because the libraries you are including in your
project may contain extra <<em>uses-permission/</em>> elements in their manifests, which will be blended with permissions
requested in your manifest file (see <a href="https://commonsware.com/blog/2015/06/25/hey-where-did-these-permissions-come-from.html">The Commons
Blog</a> for more details).</p>
<p>To check your manifest before the APK build, we can use a new feature presented in Android Studio 2.2: <a href="http://android-developers.blogspot.de/2016/05/android-studio-22-preview-new-ui.html">Merged Manifest
Viewer</a>. This tool will show how
your AndroidManifest merges with your project dependencies based on build types, flavors, and variants. You can reach
this tool by navigating to your <em>AndroidManifest.xml</em> and clicking on the new Merged Manifest bottom tab.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f3283ff761186baf95d8638226d319d73128096a_pasted-image-0.png?auto=compress,format"></p>
<h3>Support annotations are your friends</h3>
<p>Another extremely useful tool is a support annotations library. You can include it in your project by adding
“<em>com.android.support:support-annotations:23.4.0</em>” to your <em>build.gradle</em> file. Use these metadata annotations to
decorate your code to help catch bugs and define code rules. The most common use-cases for them are marking nullable and
non-nullable, identifying integer as resource, and specifying from which thread they should be called.</p>
<p>Since these annotations are metadata annotations, your project will compile even if you violate rules defined by them.
However, it will be highlighted by Android Studio and Lint, and will be visible to other team members in your Continuous
Integration tool output.</p>
<h3>Fast and Painless code review</h3>
<p>Assume that you’d like to do a code review. It makes sense to check how the developed feature works, so you’ll need to
compile the project. The common workflow for this case is the following:</p>
<ul>
<li>Stash changes on your current branch</li>
<li>Checkout branch to review</li>
<li>Reload gradle config in your IDE</li>
<li>Read the code in IDE</li>
<li>Compile and launch, then test the app</li>
<li>Return to your work by repeating actions (1) - (5) for your branch</li>
</ul>
<p>“What’s the problem with it?” – you could say. Yes, everything is fine. But not for the case when you have a project
with 1000+ classes and different build configurations – you can easily spend more than three minutes on a powerful
Macbook waiting for the new code to be compiled.</p>
<p>Our solution is to use a dedicated IDE instance and repository folder for the code review. In this case, your work won’t
be stopped for a while and you can come back to your main IDE and branch at any moment. Just a small disclaimer: We
recommend you use a machine with at least 16GB of RAM, as the time you’ll save is definitely worth it.</p>
<h3>Apply changes fast</h3>
<p>Even if you have a small Android project, you always should spend some time waiting for the build and deploy of your
latest changes to the test device or emulator. If you have hundreds of classes and xml layouts, each rebuild and
re-deploy can cost you a lot of time – even on powerful machine. Moreover, you will need to navigate manually to the
application screen where you changed something, which also requires some effort.</p>
<p>At the end of 2015, the Android Community received two tools to allow code changes to be applied faster. The first of
these was <a href="https://zeroturnaround.com/software/jrebel-for-android/">JRebel</a>, which comes from the Java backend world
where it has been the industry standard for a long time. Another tool was announced by the Google team together with
Android Studio 2.0 – <a href="https://developer.android.com/studio/run/index.html#instant-run">Instant Run</a>. Both of these tools
have the same aim, but JRebel contains more features and has an annual license you have to pay for.</p>
<p>I haven’t found any independent comparison of these tools, so I’ve made the table below analyzing their documentation
and available blog posts. As for now, the most common use-cases for both of them are changing resources and method logic
without structural changes such as interfaces or manifests.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c0d0528550332bba7aa5b23f468d49b2dd0566da_table-android-jrebel.png?auto=compress,format"></p>
<h6><strong>Sources:</strong> <a href="https://developer.android.com/studio/run/index.html">https://developer.android.com/studio/run/index.html#instant-run</a> <em>Reto Meier:</em> <a href="https://goo.gl/mEP7N5">"Instant Run: How Does it Work?!"</a> <em>Oleg Selajev:</em> <a href="https://goo.gl/NvFHpN">"Looking at JRebel for Android and Instant Run ..."</a></h6>
<p>Both tools are still in the active development phase and improvements are coming almost every week. From our experience,
a lot of use-cases are still not covered but you can already benefit from using these tools if you know what to expect
from them.</p>
<h3>Measure execution time</h3>
<p>Another extremely useful feature during application debug and performance analysis is logging method input/output and
execution time. For these needs, we use a simple and elegant method annotation tool -
<a href="https://github.com/JakeWharton/hugo">Hugo</a> by Jake Wharton. It works like a charm if you just want to read the log and
don’t want to use deep and complicated tools like Systrace.</p>
<p>All you need is to annotate the target method as shown:</p>
<div class="highlight"><pre><span></span><code><span class="nv">@DebugLog</span>
<span class="k">public</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">getName</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="k">first</span><span class="p">,</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="k">last</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="cm">/* ... */</span><span class="err">}</span>
</code></pre></div>
<p>And find the respective information about the method call printed in logs:</p>
<div class="highlight"><pre><span></span><code><span class="n">V</span><span class="o">/</span><span class="n">Example</span><span class="o">:</span><span class="w"> </span><span class="o">--></span><span class="w"> </span><span class="n">getName</span><span class="p">(</span><span class="kr">first</span><span class="o">=</span><span class="s">"Jake"</span><span class="p">,</span><span class="w"> </span><span class="kr">last</span><span class="o">=</span><span class="s">"Wharton"</span><span class="p">)</span>
<span class="n">V</span><span class="o">/</span><span class="n">Example</span><span class="o">:</span><span class="w"> </span><span class="o"><--</span><span class="w"> </span><span class="n">getName</span><span class="w"> </span><span class="p">[</span><span class="mi">16</span><span class="n">ms</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Jake Wharton"</span>
</code></pre></div>
<h3>How to read logcat output from your device</h3>
<p>For everyday needs, most of us read logs using Android Monitor inside Android Studio. It works fine for the simple
cases, but we note several trade-offs with this approach:</p>
<ul>
<li>Logs are hard to read, you should use external tools or configs to do formatting</li>
<li>Android Studio logging tools are attached to the process ID of the application you’ve deployed. If you redeploy the
app or kill the process, your previous logs will be lost because Android Studio attaches log tool by process ID</li>
</ul>
<p>As a solution to this problem, we can use another tool by Jake Wharton –
<a href="https://github.com/JakeWharton/pidcat">pidcat</a>. The main benefits of it are:</p>
<ul>
<li>Good color schema and formatting</li>
<li>Connect to the debugged application by package name, not process ID. All logs will be kept after re-deployment of
the app</li>
</ul>
<h3>Network output logging and analyzing</h3>
<p>The most common and obvious way to read your app network interaction logs is to use the log output from your HTTP client
library. However, this approach has several trade-offs:</p>
<ul>
<li>If you keep full network logging enabled during the development, you will note that performance of the app has
decreased – it takes some time to print the log</li>
<li>If your app has some external libraries using networks for their needs (for example, Google Analytics) you may need
to do extra configuration changes for each of these libraries to force all data to be logged</li>
<li>During QA it can be impossible to have access to the console output and specific configs, thus you won’t be able to
monitor traffic this way on the production app build</li>
</ul>
<p>There is another approach: Use Http Monitoring and Proxy tools like <a href="https://www.charlesproxy.com/">Charles Proxy</a>.
These type of tools can provide the following functionality, wrapping your app as a black box:</p>
<ul>
<li>HTTP/HTTPS traffic monitoring and recording</li>
<li>Rewrite values and try edge cases of server response</li>
<li>Set breakpoints on the network calls</li>
<li>Install SSL certificate to the device to read the encrypted traffic</li>
</ul>
<h3>Keep testing on various OS versions</h3>
<p>One thing I’m always doing and keep pushing my colleagues to do is to test each feature during the developer test on
both Lollipop or higher (API 21+), and pre-Lollipop devices or emulators. This might sound like a Captain Obvious tip –
but I catching these bugs regularly during testing.</p>
<p>The most common bugs you can discover this way are touch feedback and system colors issues. We often saw app crashes on
older APIs due to some compatibility issues.</p>
<h3>Automate screen interaction</h3>
<p>Often we need to check some scenarios and do repetitive UI clicks/inputs on various devices. It can be quite annoying if
you have three to four test devices and you need to go through a regression plan with 30 scenarios.</p>
<p>The first step of automation we can do is typing adb commands or even whole scripts to not interact with the device by
hand every time. For example, <em>adb shell input keyevent 4</em> will submit a UP button press to the connected test device.
This way you can pass system key press, keyboard input, and screen touch.</p>
<p>But what do you do if you have three devices for the same test scenario? We use
<a href="https://github.com/romannurik/env/blob/master/bin/ninja-adb">adb-ninja</a> script by Roman Nurik to submit commands to
several devices simultaneously. It’s a tiny shell script which sends the typed <em>adb</em> command to all connected devices
and saves a lot of time.</p>
<h3>Check your build.gradle configuration</h3>
<p>Even experienced developers sometimes follow outdated configuration practices. Let’s check your <em>build.gradle</em> file and
see if you’re one of them:</p>
<ul>
<li>Get rid of mavenCentral and use jcenter as the dependencies repository. jcenter has faster response time and now
acts as a superset of mavenCental content.</li>
<li>Check <em>Android Plugin for Gradle</em> version, since having the latest version can increase build performance and
contains sweet tools such as Instant Run.</li>
<li>Don’t specify version ranges for the dependencies. Always use a constant version value like “23.4.0” to be secured
from unexpected dependency API changes, so as not to do a network call for the latest version check for each
dependency on every build.</li>
<li>Use build flavors to setup the build for <em>minSdkVersion 21</em> or higher during the development. It will help to build
faster using all improvements that the Android Tools team provides for us.</li>
</ul>
<p>And that’s it! In this post we discussed 10 tips to increase the efficiency of your everyday developer work and build
high quality apps following <a href="http://droidcon.de/en/sessions/effective-android-development">my talk at Droidcon Berlin
2016</a>. The full version of slides were published at
<a href="https://speakerdeck.com/sergiiz/effective-android-development/">Speakerdeck</a>.</p>
<p>Feel free to contact me on on Twitter <a href="https://twitter.com/sergiizhuk">@sergiizhuk</a> with your questions and comments.</p>Zalando makes a Connexion: Our interview with Tony Tam2016-07-20T00:00:00+02:002016-07-20T00:00:00+02:00Natali Vlatkotag:engineering.zalando.com,2016-07-20:/posts/2016/07/connexion-interview-with-tony-tam.html<p>We chat to the creator of Swagger, Tony Tam, to hear his thoughts on Zalando's Connexion.</p><p>Our Zalando developers are building services for a swathe of great projects lately. <a href="http://swagger.io/">Swagger</a>, the
API framework, is just one of the projects we’re using and building libraries for. As a company that champions an <a href="https://tech.zalando.de/blog/on-apis-and-the-zalando-api-guild/">“API
First”</a> approach, the Swagger Specification complements
our own push to apply a RESTful style to APIs.</p>
<p>One of our related contributions comes in the form of <a href="https://github.com/zalando/connexion">Connexion</a>, a framework for
Python on top of <a href="http://flask.pocoo.org/">Flask</a> that automagically handles HTTP requests based on the <a href="https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md">OpenAPI 2.0
Specification</a> of your API, described in <a href="https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md#format">YAML
format</a>. It allows you to write a
Swagger specification, then maps the endpoints to your Python functions – a unique process, when many tools generate the
specification based directly on your Python code.</p>
<p>Connexion has generated a great level of interest from developers at <a href="https://www.optimizely.com/">Optimizely</a>, the A/B
testing and personalisation platform, amongst other companies.</p>
<p>Our developers have also built <a href="https://github.com/zalando/play-swagger">play-swagger</a>, which provides Swagger support
for the Play Framework, and <a href="https://github.com/zalando/friboo">Friboo</a>, a utility library to write microservices in
Clojure with support for Swagger and OAuth. You can then add an <a href="https://plugins.jetbrains.com/plugin/8347">IntelliJ
plugin</a> to the list, where you can easily edit Swagger specification files
inside IntelliJ IDEA.</p>
<p>With all this Swagger action going on, we wanted to know how our input was being received and learn more about the
<a href="https://openapis.org/">OpenAPI Initiative</a> in general. We got in touch with <a href="https://twitter.com/fehguy/status/738034008332730368">one of the
fans</a> of our IntelliJ IDEA plugin and creator of Swagger <a href="https://twitter.com/fehguy">Tony
Tam</a>, head of all things Swagger at
<a href="https://smartbear.com/company/team/featured-people/tony-tam/">SmartBear</a>, to find out more.</p>
<p><em>Zalando Tech:</em> We've heard that you've been spreading the word about Connexion. Thanks! What are your thoughts on the
framework? What is the feedback like?</p>
<p><em>Tony Tam:</em> There have been a few efforts to do a true design-first implementation of REST APIs and Zalando has been
right on the leading edge of that movement. Connexion fills the need in the Python landscape, and is an excellent
sibling to <a href="https://github.com/swagger-api/swagger-inflector">swagger-inflector</a> for Java and
<a href="https://github.com/swagger-api/swagger-node">swagger-node</a> for node.js. It has been a delight for the development
community to see a large retailer putting efforts into an open source framework.</p>
<p>Letting the OpenAPI definition drive the implementation is still a fairly new idea, but it’s catching on quickly as
developers figure out that it’s truly a great way to develop.</p>
<p><em>Zalando Tech:</em> There's been mention of additional formats in the works on top of JSON and YAML for the spec. How is
this progressing?</p>
<p><em>Tony Tam:</em> It’s still in active and lively discussion – it’s still unclear if we’re going to have it in this version of
the spec or not. We are looking at how to give the specification more future-proofing by getting in front of other
formats, however, it may be in a subsequent release.</p>
<p><em>Zalando Tech:</em> There are currently more than 15 language integrations for Swagger. Have you got a wishlist for other
community-driven language integrations?</p>
<p><em>Tony Tam:</em> I’m all for a smaller number of excellent integrations. I think there is room to improve some of the
existing language support to a design-first, like Connexion. But there is always a new hipster language on the front
that needs to have support.</p>
<p><em>Zalando Tech:</em> What does the future road map look like for OAI? What non-technical activities have been identified to
facilitate the further evolution of the spec and its adoption within the industry?</p>
<p><em>Tony Tam:</em> We’ve been very transparent about the roadmap by tracking all the activity in
<a href="https://github.com/OAI/OpenAPI-Specification">GitHub</a>. Now with the OAI and the increased support, we have made changes
which will allow many more APIs to be described by the OpenAPI Specification, which will be great for everyone.</p>The Factory Pattern in React2016-07-19T00:00:00+02:002016-07-19T00:00:00+02:00Andra Joy Lallytag:engineering.zalando.com,2016-07-19:/posts/2016/07/the-factory-pattern-in-react.html<p>One of our developers shares her learnings and takeaways as she dives deeper into React.</p><p>I recently switched teams here at Zalando Tech, and I went from working by myself with no code reviews, to now working
with a senior engineer with over 10 years experience. I wanted to share with you one of the core concepts I have learned
thus far, “The Factory Pattern”.</p>
<p>One of the reasons React is so successful is due to its concept of components. It is easy to organize your project and
have someone else understand how it works. While components are great, both dumb and smart, there are more
organizational patterns that can be added.</p>
<p>A simplified version of what my colleague and I had to built was as follows:</p>
<ul>
<li>A layout of Sliders, with each Slider on the page a little different from the next.</li>
<li>In each Slider, there are Slides that differ from one to the other.</li>
</ul>
<p>While we organized the Sliders and the Slides using the Factory Pattern, for the moment I will only be describing the
Slides to keep this post as simple as possible.</p>
<p>An example of a Slider with Brand Slides:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f355e73881cbc9a16d494def7bb3a98c94523c16_screen-shot-2016-07-06-at-09.20.32.png?auto=compress,format"></p>
<p>An example of a Slider with Article Slides:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/caab764210eb66c790449626ae9a63ac15e24033_screen-shot-2016-07-06-at-09.20.21.png?auto=compress,format"></p>
<p>The most important concept to note here is that the Slider stays the same: It is the Slide type that is changing. This
is the perfect scenario to use the Factory Pattern. The Slider does not need to care about what kind of Slides it
contains – it is the job of the Factory to decide. Below is a simplified version of the Factory we built:</p>
<div class="highlight"><pre><span></span><code><span class="k">export</span><span class="w"> </span><span class="n">default</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">SlideFactory</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">static</span><span class="w"> </span><span class="n">build</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">switch</span><span class="w"> </span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">source</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">case</span><span class="w"> </span><span class="s1">'brand'</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">;</span>
<span class="w"> </span><span class="n">case</span><span class="w"> </span><span class="s1">'article'</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">;</span>
<span class="w"> </span><span class="n">default</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">undefined</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Inside the Slider, you would render the Slides with the following method:</p>
<div class="highlight"><pre><span></span><code><span class="k">const</span><span class="w"> </span><span class="n">items</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">source</span><span class="p">:</span><span class="w"> </span><span class="err">‘</span><span class="n">brand</span><span class="err">’</span><span class="p">,</span>
<span class="w"> </span><span class="n">title</span><span class="p">:</span><span class="w"> </span><span class="err">‘</span><span class="n">title</span><span class="err">’</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">]</span>
<span class="k">const</span><span class="w"> </span><span class="n">renderedItems</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">items</span><span class="o">.</span><span class="n">map</span><span class="p">((</span><span class="n">item</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">SlideFactory</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">item</span><span class="p">));</span>
</code></pre></div>
<p>The Slide “<em>item.source</em>” returns a string:</p>
<div class="highlight"><pre><span></span><code>item.source ⇒ ‘brand’
</code></pre></div>
<p>Or</p>
<div class="highlight"><pre><span></span><code>item.source ⇒ ‘article’
</code></pre></div>
<p>This string value informs the Factory of which Slide it needs to render. This particular Factory has two different types
of Slides it creates: Brand or article. But we can add as many as we want. It is that simple!</p>
<p>Now let’s take look at the <a href="https://en.wikipedia.org/wiki/Factory_method_pattern">definition</a> of the Factory Pattern:</p>
<p><em>“The Factory Method Pattern is a creational pattern that uses factory methods to deal with the problem of creating
objects without having to specify the exact class of the object that will be created.”</em></p>
<p>Wow, that is intense. Let’s break that down into something understandable, shall we?</p>
<ul>
<li>The Factory creates Slides with the build method (is a creational pattern that uses factory methods).</li>
<li>The Factory can create many different kinds of Slides that share the same interface (without having to specify the
exact class of the object that will be created).</li>
</ul>
<p>This pattern has made it possible to take more logic out of our Slider components, making them even dumber and
organizing them into a more humanly readable fashion.</p>
<p>As a junior developer I can be easily impressed, but this pattern is extremely useful together with React components,
and I recommend that everyone give it a try in their next project.</p>
<p>Let me know how you get on – drop me a line at <a href="mailto:andra.joy.lally@zalando.de">andra.joy.lally@zalando.de</a>.</p>Dynamic App Content: An Introduction to Truly Native Apps2016-07-15T00:00:00+02:002016-07-15T00:00:00+02:00Dr. Fadi Mark Chabarektag:engineering.zalando.com,2016-07-15:/posts/2016/07/an-introduction-to-truly-native-apps.html<p>Read about the challenges we experienced when designing the home screen of the Zalando App.</p><p>When we designed the home screen of the <a href="https://www.zalando.com/zalando-apps/">Zalando App</a>, we faced three challenges.
Firstly, the home screen consisted of several components developed and delivered by individual teams, without former
native development experience. Secondly, we believe that the first impression for customers - our home screen - must
have a premium look and feel, and therefore decided to build it in a truly native fashion. The last challenge was that
the release cycle of our app is measured in weeks, and we would like to deliver content on a daily basis.</p>
<p>In this article we will describe our solution for these three challenges - Truly Native Apps (TNA). We start by
introducing a UI language that is used to declare elements like scrollable lists, images and text, and the composition
of such elements. Its JSON format is input for the Flexible Layout Kit. The Kit is an SDK available for Android, iOS,
and Windows phones that renders the described elements as native views on the respective platforms. The JSON will be
served by the Kit’s backend called the App Layout Service, which we are currently about to bring to production. This
service will fetch content from several internal providers, for example, from our CMS or advertisements. The main use
case of the service will be to aggregate individual content and serve it to the Flexible Layout Kits in the apps.</p>
<p>We will start this blog with a simplified version of Zalando’s home screen as an example. Based on this example we
explain the JSON format and the views the Flexible Layout Kit will render. We proceed by explaining how the App Layout
Service fetches and aggregates content from the content providers. Finally, we compare our solution with related
technologies and conclude with a summary of the approach and a short outlook.</p>
<h3>The Flexible Layout Kit</h3>
<p>Swipeable lists, videos, carousels, product catalogs, and more - all of these elements are displayed with TNA on our
home screen. For the sake of simplicity, we use a version of the screen which consists only of a teaser and an image
(see left part of Figure 1 – an example home screen showing a teaser and an image).</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/57ba1aebaeaddf3c6baa5d6640e6b8da283a58d4_tna-elements.png?auto=compress,format"></p>
<p>Conceptually, we represent the home screen by a vertical list (see right part of Figure 1). Each item of this list is a
slot for content. The slots’ content is described by a set of predefined elements that are part of the TNA language. In
TNA, there are basic elements like text or images, and complex elements that are composed of these basic elements. For
example, the first slot of the home screen is filled by a teaser. The teaser is composed of a background image and two
areas of text, one for the title and one for the teaser’s subtitle. Thus it is a complex element. In the second slot we
use a basic element, the image element.</p>
<p>The Flexible Layout Kit’s input is a JSON that declares the slots and elements of the home screen (see Listing 1). The
JSON describes the elements by type and a set of attributes. The first element is the vertical list as the container of
the screen. The vertical list is composed of a teaser element and an image as subelements. The element’s attributes are
used to declare the details of the elements. Examples are image urls and sizes, or the font and color of a text.</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="s">"element-type"</span><span class="p">:</span><span class="w"> </span><span class="s">"vertical-list"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"subelements"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"element-type"</span><span class="p">:</span><span class="w"> </span><span class="s">"teaser"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"subelements"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"element-type"</span><span class="p">:</span><span class="w"> </span><span class="s">"image"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"attributes"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="err">…</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"element-type"</span><span class="p">:</span><span class="w"> </span><span class="s">"text"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"attributes"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="err">…</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"element-type"</span><span class="p">:</span><span class="w"> </span><span class="s">"text"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"attributes"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="err">…</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}]</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"element-type"</span><span class="p">:</span><span class="w"> </span><span class="s">"image"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"attributes"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="err">…</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div>
<p>When the Zalando app is opened, we download the home screen’s content. The Flexible Layout Kit traverses the described
vertical list and renders the individual slots and elements using native implementations from Android, iOS, or a Windows
phone. The result is our simplified home screen - truly native and updateable from the server continuously.</p>
<h3>The App Layout Service</h3>
<p>The content of the home screen’s slots is produced by content providers (see Figure 2). Examples for providers are the
CMS, Brand Shops, or Advertisements which serve editorial content, brand pages, and advertisements, respectively. The
providers describe their content using the TNA JSON notation for an element. The basic idea is that the elements are
placed into slots of a vertical list. The vertical list is served to the apps where the Flexible Layout Kit renders the
list to produce the home screen.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f98cef65ff329254f3e1db42f054f391be209904_workflow-tna.png?auto=compress,format"></p>
<p>The process of fetching and placing the content is implemented by the App Layout Service. The service is built based on
the concept of Zalando’s open source project <a href="https://tech.zalando.de/blog/frontend-microservices-tailor/">Tailor</a>, i.e.
we describe the canvas of our home screen statically and fill the screen with dynamic content which is fetched from
microservices. In our setup, we use a template that contains a vertical list with its subelements as static part (see
Figure 3 – a TNA template which references teasers and an ad image). References within this template point to content
via a URL and define the dynamic part. For every reference, the App Layout Service fetches a TNA element from the
content providers and places it into the list. This way, the slots of our vertical list are filled. The list is finally
served to the devices’ Flexible Layout Kits which render the home screen.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e5be61ac2246396eb2b13f1d9cccb04eaf7c9958_references-1.png?auto=compress,format"></p>
<h3>Related Technologies</h3>
<p>On each platform, the Flexible Layout Kit implements rendering using native platform components. Compared to <a href="https://facebook.github.io/react-native/">React
Native</a>, the Flexible Layout Kit has a lightweight rendering engine which
doesn’t require the JavaScript virtual machine. It gives app developers the freedom to choose how to implement views,
neither restricting to a specific native framework nor prohibiting the use of any of them. This allows the tuning of UIs
according to platform guidelines, performance optimizations, and easily extending a rich set of UI elements.
Additionally, content providers do not have to learn about native technologies.</p>
<p>On the downside, we have to maintain the Flexible Layout Kits on all of our three platforms, and keep TNA as a protocol
between our app team and content providers with all the alignment overhead.</p>
<p>Facebook’s <a href="http://componentkit.org/">Component Kit</a> takes a very sophisticated, declarative component approach to
building UIs. Unfortunately, the Kit is neither available for Android nor for Windows Phone. The iOS Component Kit is
written in C++ and cannot be bridged to Swift. On top of that, Component Kit does not specify a JSON notation which is a
cornerstone to the continuous update approach in our solution.</p>
<h3>Summary and Outlook</h3>
<p>The objective of TNA is to deliver real-time, personalized content to our mobile applications in a platform-agnostic and
declarative language, while being rendered natively for a rich and premium user experience. The introduction of a
language to express these user interfaces is an important step towards potentially unifying how content providers across
Zalando express their intent to deliver content through our mobile channels. Truly Native Apps is bringing these user
experiences to life across all three mobile platforms.</p>
<p>Our next steps are to take the App Layout Service live, which will help us include even more content on our home screen.
We’re also looking to evaluate React Native as part of our solution. We will be sure to keep you updated on the outcome.
Stay tuned!</p>
<p>For feedback and questions, please contact me at <a href="mailto:fadi.mark.chabarek@zalando.de">fadi.mark.chabarek@zalando.de</a>.</p>Scaling Our Tech Organization and Architecture2016-07-13T00:00:00+02:002016-07-13T00:00:00+02:00Dan Persatag:engineering.zalando.com,2016-07-13:/posts/2016/07/scaling-our-tech-organization-and-architecture.html<p>Zalando Tech likes to set itself big challenges – hear more about them with Dan Persa.</p><p>Zalando Tech likes to set itself big challenges, like building the greatest tech team on Earth. With our tech department
growing rapidly, the challenges of scaling our organization also need to be addressed, to make sure we’re building the
best products and establishing a culture of innovation.</p>
<p>The term ‘scalability’ has different meanings when we talk about an organization or software architecture, but it’s
still critical. The recent <a href="http://commerce.codetalks.de/">code.talks 2016 conference</a> had a commerce theme, and we
thought this would be a brilliant opportunity to explain how we’re scaling and organizing our growing tech company,
while at the same time, switching to a new architecture.</p>
<p>In the talk below, I shared the recipe we have applied to scale our tech team to more than 1,000 people, while
redesigning the architecture of our Online Fashion Shop -- <a href="https://www.mosaic9.org/">Project Mosaic</a> -- to make more
than 18 million customers happier. I also shared Zalando Tech’s learnings and takeaways in order to become both more
successful and more customer centric. Tune in below or access my presentation slides
<a href="http://www.slideshare.net/ZalandoTech/how-we-made-our-tech-organization-and-architecture-converge-towards-scalability">here</a>.</p>Building services with the Akamai API Open API using Go2016-07-12T00:00:00+02:002016-07-12T00:00:00+02:00Nick Jüttnertag:engineering.zalando.com,2016-07-12:/posts/2016/07/building-services-with-the-akamai-api-open-api-using-go.html<p>Read about our new Go library that Akamai GitHub uses as their default Go implementation.</p><p>Here at Zalando Tech, we’re driving the <a href="https://tech.zalando.de/blog/auto-scaling-your-api-tips-from-zalando-slides/">“API First
approach”</a> as our teams are making a lot
of use of external APIs.</p>
<p>Since we also depend on different external technologies for content delivery, DNS hosting, and asset storage, we want to
integrate our services with them. One good example is Akamai. We’re using a variety of their offerings in order to give
customers the best delivery and user experience.</p>
<p>For these services, our goal is to build them in <a href="https://golang.org">Go</a>, which we’ve recently been using. Our
experiences with Go have been positive thus far, like with
<a href="https://tech.zalando.de/blog/building-our-own-open-source-http-routing-solution/">Skipper</a>, which is our own HTTP
routing solution.</p>
<p>We’ve decided to build a Go library, as one didn’t exist to allow us to build services on top of Akamai. We also open
sourced the library so that others could benefit. We are big advocates of <a href="https://github.com/zalando">open source</a>
software and believe it's important to give back to the community as much as possible.</p>
<p>More information about the Akamai API can be found on the <a href="https://developer.akamai.com/">Akamai Developer Portal</a>.</p>
<p>Want to learn how to use it? Read more below:</p>
<ol>
<li>
<p>Clone the repo from GitHub
<em>git@github.com:akamai-open/AkamaiOPEN-edgegrid-golang.git</em></p>
</li>
<li>
<p>Copy the <em>examples/.edgerc</em> to your user home <em>(~/.edgerc)</em> and add the required information (host, client_token,
client secret and access token). You can find/create these credentials in the API Management section of the Akamai
service console.</p>
</li>
<li>
<p>Run one of the examples using:
<em>go run edgegrid_with_configfile.go</em></p>
</li>
</ol>
<p>This example shows us all Akamai edge location endpoints:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5871c24952c99f5c0e6fd1f0ea720d9cf06ba109_akamaishot.png?auto=compress,format"></p>
<p>Feel free to use and fork the project on <a href="https://github.com/akamai-open/AkamaiOPEN-edgegrid-golang">GitHub</a>. We’re keen
to hear feedback via Twitter <a href="https://twitter.com/ZalandoTech">@ZalandoTech</a> or
<a href="https://twitter.com/njuettner">@njuettner</a>.</p>Top 5 Techpreneurs Revolutionising Tech Culture2016-07-08T00:00:00+02:002016-07-08T00:00:00+02:00Zalando Technologytag:engineering.zalando.com,2016-07-08:/posts/2016/07/top-5-techpreneurs-revolutionising-tech-culture.html<p>We take a look at who's responsible for reshaping the definition of modern tech culture.</p><p>There has been a steady wave of Techpreneurs making headlines in the media, but let's consider more than just their
entrepreneurial prowess. They’re tech culture revolutionaries, transforming the way developers, designers, and tech
companies work, create, and communicate.</p>
<p>How exactly have they made an impact on the tech world? This following list of influential figures has revamped the
traditional work environment and, in our opinion, are reshaping the very definition of modern tech culture.</p>
<h3>Tony Hsieh</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c2647db444018266d5f565b2a57125205bc41dca_tony-hsieh-net-worth.jpg?auto=compress,format"></p>
<p><em>Company:</em> <a href="http://www.zappos.com/">Zappos</a>
<em>Culture style:</em> Self-management under Holacracy
<em>Famous for:</em> Developing advertising cooperative <a href="https://en.wikipedia.org/wiki/LinkExchange">LinkExchange</a>
<em>What can we learn from Tony Hsieh?:</em> Tony Hsieh’s radical management experiment at Zappos resulted in a loss of 18% of
his employees, yet turned into the tech industry’s most famous reinvention and comeback. By adopting self-organisation,
self-management, and self-direction, Zappos heralded a reimagining of tech culture as we know it today. But is complete
freedom the answer? We’ve learned that autonomy unleashes creativity, but total freedom can result in misdirection.</p>
<h3>Stewart Butterfield</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ee126a42c96d669bb6308f8e6ecd358c2f6f15ea_stewart-butterfield.jpg?auto=compress,format"></p>
<p><em>Company:</em> <a href="https://slack.com/">Slack</a>
<em>Culture style:</em> Efficiency in communication
<em>Famous for:</em> Co-founding photo sharing website <a href="https://www.flickr.com/">Flickr</a>
<em>What can we learn from Stewart Butterfield?:</em> Slack has completely rejuvenated the way company employees work together
and talk to each other with Slack’s organisation of channels, files, and notifications. The main takeaway from Slack’s
success for the rest of us is clearly that communication is key, and more specifically, the coordination of all
communication channels in one place can make us more efficient overall.</p>
<h3>Joel Spolsky</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9fcd43abbe89edc45802f17e04522af8bdab0279_spolsky.jpg?auto=compress,format"></p>
<p><em>Company:</em> <a href="https://trello.com/">Trello</a>
<em>Culture style:</em> Office spaces for all
<em>Famous for:</em> Co-founding <a href="http://stackoverflow.com/">Stack Overflow</a>
<em>What can we learn from Joel Spolsky?:</em> Having <a href="https://tech.zalando.de/blog/joel-spolsky-at-zalando-tech/">recently
hosted</a> Joel at our Zalando Tech offices, he had a lot to
say about the importance of having your own space to get into that all-important state of flow. Should we heed his
‘offices for all’ call? Trello HQ features sound-proof spaces so that developers can work undisturbed, allowing them to
find their own rhythm. According to Joel, private offices put the people who do the actual work in control, and we’ve
seen exactly what control, combined with autonomy, can provide for our teams.</p>
<h3>Henrik Kniberg</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1b8bf2164744bba3a608f77bfbda999b9f27d035_henrik_kniberg.jpg?auto=compress,format"></p>
<p><em>Company:</em> <a href="https://www.spotify.com">Spotify</a>
<em>Culture style:</em> An entirely new understanding of Agile
<em>Famous for:</em> The influential <a href="https://dl.dropboxusercontent.com/u/1018963/Articles/SpotifyScaling.pdf">white paper</a>
about Tribes, Squads, Chapters, and Guilds
<em>What can we learn from Henrik Kniberg?:</em> Henrik almost single handedly made agile hip again with his reformulated agile
implementation featuring Tribes, Squads, Chapters, and Guilds. This refreshing perspective showed the world how truly
adaptive agile could be for any organisation, propelling Spotify, and in turn Henrik, into modern agile folklore.
Henrik’s white paper has been studied by tech departments across all industries and organisations.</p>
<h3>Joel Gascoigne and Leo Widrich</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6c72bf47306e3351bd1e590368ca6d9f8fb2b0ce_joel-and-leo.jpg?auto=compress,format"></p>
<p><em>Company:</em> <a href="https://buffer.com/">Buffer</a>
<em>Culture style:</em> Being remote isn’t a dealbreaker
<em>Famous for:</em> Making social media management easy
<em>What can we learn from Joel Gascoigne and Leo Widrich?:</em> By purposely rethinking a lot of the traditional constraints
that come with a 9-5 job, Joel Gascoigne and Leo Widrich have pioneered the remote working ethos at
<a href="https://buffer.com/">Buffer</a>, their popular social media tool. By having their entire team located across the globe,
their approach to customer care has made them superstars of customer centricity, on top of their incredibly transparent
company operations. They’ve been able to show the world the benefits of remote teams through their successful product,
making the leap for other organisations a lot easier to consider.</p>
<p>Have we missed someone? Have your own opinions on this list? Let us know via Twitter
<a href="https://twitter.com/ZalandoTech">@ZalandoTech</a>.</p>Proper Use of CellForRowAtIndexPath and WillDisplayCell2016-07-07T00:00:00+02:002016-07-07T00:00:00+02:00Yunus Güzeltag:engineering.zalando.com,2016-07-07:/posts/2016/07/proper-use-of-cellforrowatindexpath-and-willdisplaycell.html<p>Yunus Güzel gives us his take on Alexander Orlov's iOS scrolling performance arguments.</p><p>In iOS development, <em>UITableView</em> works with two methods related to lifecycle of a <em>UITableViewCell</em>. The first is
“<em>willDisplayCell:forRowAtIndexPath:</em>” and the other is “<em>cellForRowAtIndexPath:</em>”. I have seen these methods often
misused or misunderstood for one another.</p>
<p>There is an article by <a href="https://medium.com/ios-os-x-development/perfect-smooth-scrolling-in-uitableviews-fd609d5275a5#.34957bl5r">Alexander
Orlov</a>, which
includes a lot of advanced programming practices for improving scrolling performance of <em>UITableView</em>. He also discusses
an optional method of <em>UITableViewDelegate</em> called <em>willDisplayCell:forRowAtIndexPath:</em> which supplies the cell that
will be displayed at the given index path. In the article he says:</p>
<p><em>“</em><strong>Don’t</strong> <em>perform data binding at this point, because there’s no cell on screen yet. For this you can use</em>
<a href="https://developer.apple.com/library/ios/documentation/UIKit/Reference/UITableViewDelegate_Protocol/index.html#//apple_ref/occ/intfm/UITableViewDelegate/tableView:willDisplayCell:forRowAtIndexPath:"><em>tableView:willDisplayCell:forRowAtIndexPath:</em> <em>method which can be implemented in
the</em></a>
<a href="https://developer.apple.com/library/ios/documentation/UIKit/Reference/UITableView_Class/#//apple_ref/occ/instp/UITableView/delegate"><em>delegate</em>
<em>of</em></a>
<a href="https://developer.apple.com/library/ios/documentation/UIKit/Reference/UITableView_Class/"><em>UITableView. The method called exactly before showing cell
in</em></a> <a href="https://developer.apple.com/library/ios/documentation/UIKit/Reference/UITableView_Class/"><em>UITableView’s</em>
</a> <em>bounds.”</em></p>
<p>However, there is no underlying reason why he makes this statement in the article. His perception of <em>willDisplayCell</em>
is that the method should <em>actually</em> be used for data binding, because it will be called just before the cell will be
shown on the screen, thus increasing scrolling performance. This assumption may be true, however, without having any
concrete facts, it makes it difficult to agree with him.</p>
<h3>Proper use and proof points</h3>
<p>iOS works with layout cycles. A layout cycle collects information of the views that have been changed compared to the
previous layout cycle. Views are layouted at the end of the cycle, not during. As <em>cellForRowAtIndexPath</em> and
<em>willDisplayCell</em> methods are called within the same layout cycle, it doesn’t make sense to expect different performance
results. Whether heavy data binding is performed in <em>cellForRowAtIndexPath</em> or in <em>willDisplayCell</em>, they will all be
executed serially in the main queue and in the same layout cycle. The cells will always be layouted after these two
methods.</p>
<p>To prove this, I have put logs for <em>cellForRowAtIndexPath:</em>, <em>willDisplayCell:</em> of <em>tableView</em>, and <em>layoutSubviews</em> of
cells. The result is:</p>
<p><strong>cellForRowAtIndexPath: 0
willDisplayCell: 0
cellForRowAtIndexPath: 1
willDisplayCell: 1
cellForRowAtIndexPath: 2
willDisplayCell: 2
cellForRowAtIndexPath: 3
willDisplayCell: 3
cellForRowAtIndexPath: 4
willDisplayCell: 4
layoutSubviews: 0
layoutSubviews: 1
layoutSubviews: 2
layoutSubviews: 3
layoutSubviews: 4</strong></p>
<p>I will also point out that without layouting subviews there is no rendering, and without rendering there is no cell on
the screen yet. This means that <em>willDisplayCell</em> is also called when the cell is not on the screen. I couldn’t find
proof for Orlov’s argument.</p>
<p>Orlov also states that <em>tableView:cellForRowAtIndexPath</em> should be quick and return the dequeued cell as fast as
possible. In general, you should always be quick to prevent poor scrolling, and not only in <em>cellForRowAtIndexPath</em>. You
should also return the height of the cell in <em>heightForRowAtIndexPath:</em> as quickly as possible, which is called after
<em>cellForRowAtIndexPath</em>. If you have heavy cell height calculation, you could also have poor scrolling performance
because of it. And of course, if you have heavy work in <em>willDisplayCell:forRowAtIndexPath</em> you will have poor scrolling
issues. Where he is right, however, is when he states that this applies for all the functions that have been called
before the cell is displayed on the screen.</p>
<p>In <a href="https://developer.apple.com/library/ios/documentation/UIKit/Reference/UITableViewDelegate_Protocol/#//apple_ref/occ/intfm/UITableViewDelegate/tableView:willDisplayCell:forRowAtIndexPath">Apple’s
documentation</a>
for <em>willDisplayCell</em>, there is no explanation regarding performance, and no warning about heavy data binding:</p>
<p><em>“A table view sends this message to its delegate just before it uses cell to draw a row, thereby permitting the
delegate to customize the cell object before it is displayed. This method gives the delegate a chance to override
state-based properties set earlier by the table view, such as selection and background color. After the delegate
returns, the table view sets only the alpha and frame properties, and then only when animating rows as they slide in or
out.”</em></p>
<p>It only says that this method is a place to override state-based properties, since they may have been changed after
receiving the cell from the <em>cellForRowAtIndexPath</em> method.</p>
<h3>Conclusion</h3>
<p>Orlov’s article is an important guide for advanced programming. However, he is lacking proof for the <em>tableView</em>
delegate method <em>willDisplayCell:forRowAtIndexPath:</em>. This has been something that has bothered me for a while, as so
many people quoted the paragraph about <em>willDisplayCell</em> from the article. I couldn’t stop myself writing this piece to
question why his argument was flawed. However, I am keen to hear his further thoughts on the topic.</p>
<p>If you have any questions, feel free to contact me on Twitter <a href="https://twitter.com/yunuserenguzel">@yunuserenguzel</a>.</p>
<p><strong>Note:</strong> <em>With iOS10, willDisplayCell and cellForRowAtIndexPath call orders will be changed. I will follow up with
another article about the changes in iOS10 in the near future.</em></p>Zalando’s Tech Academy gets cosy with GitHub2016-07-06T00:00:00+02:002016-07-06T00:00:00+02:00Annett Laubetag:engineering.zalando.com,2016-07-06:/posts/2016/07/zalandos-tech-academy-gets-cosy-with-github.html<p>Want to hear about how our recent workshops with GitHub went at Zalando Tech?</p><p>In September 2015, the Zalando Tech Academy was set up to help our Techies gain further insights into the programming
languages, frameworks, and other tools or methods used within Zalando. Our role as the Learning Team within Zalando
Technology is to support and promote our experts, to bring their learning ideas to life, and to help our colleagues
develop new skills and strengthen existing proficiencies.</p>
<p>The Zalando Tech Academy, in collaboration with our internal Git and Deploy Team, set up an external workshop together
with GitHub to support Zalando’s switch from Stash to GitHub Enterprise. The two-day workshop gave more than 40
developers the opportunity to ask questions, exchange knowledge and experiences, and develop their GitHub skills
further.</p>
<p>Three external trainers made the journey to Berlin where Zalando engineers embraced the opportunity to ask the
“behind-the-scenes” questions that would deepen their knowledge. The workshop was pretty hands-on, with participants
coding along and completing exercises step-by-step.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8b48097b4eb92e11aef749247e7313b568cda8d4_pngbase64679999866ae3d5da.png?auto=compress,format"></p>
<p>The first day was all about the basics – collaboration on a project, understanding the workflow, cloning a repo, and
working on it locally. During the second day of the workshop, we delved into topics such as merge conflicts, cherry
picking, and a lot more. The atmosphere was incredibly positive, with developers asking great questions and generating
discussion between the trainers and themselves.</p>
<p>After running the workshop, we were able to see how advanced our Techies were when it came to basic functions, but that
deeper insights were needed around areas affecting their daily work, for example with merge conflicts and the git rebase
command.</p>
<p>To accommodate this need, we’re looking at setting up further workshops internally, utilising our Zalando experts who
know exactly what areas our developers need help with. These workshops would also leverage the <a href="https://github.com/zalando">open
source</a> policies we’re currently implementing throughout the company.</p>Healthy habits every software engineer should adopt2016-07-01T00:00:00+02:002016-07-01T00:00:00+02:00Zalando Technologytag:engineering.zalando.com,2016-07-01:/posts/2016/07/healthy-habits-every-software-engineer-should-adopt.html<p>We’ve outlined some of the healthiest habits that every developer should start practicing.</p><p>Most software engineers are aware of the bad habits they need to steer clear from, but what about the practices that
make them good at what they do? The key to being a more experienced developer often comes from working with people
better than yourself, so how do great software engineers stand out?</p>
<p>We’ve made it our mission to outline the healthiest habits every developer should start practicing, to level up your
engineering game and benefit your teammates. Sometimes, we all need to go back to basics when it comes to programming
and code.</p>
<h3>Plan before you code</h3>
<p>When you have a problem to solve, it makes sense to plan how you’re going to address it, rather than bash away the first
compilable code that comes into your head. Figuring out the procedure that your solution needs will allow you to
troubleshoot more effectively later on, as well as making the process a lot simpler.</p>
<p>Your plan could be in the form of a set of steps that need to be followed, or a set of questions that need to be
answered. Either way, planning your approach will result in better quality code.</p>
<h3>Ask, and ye shall receive</h3>
<p>At a company as big as Zalando, you can imagine the wealth of experience we’ve collected over the years, as well as the
knowledge that our 1,000-strong tech team possesses. Developer culture can often chastise people asking what might be
deemed a ‘stupid’ question, but the embarrassment you feel posing the question will only increase more so when you
realise you’ve been doing something wrong the whole time.</p>
<p>Resources like <a href="http://stackoverflow.com/">Stack Overflow</a> and simple Googling are always good places to start, but your
own colleagues are a valuable asset in terms of troubleshooting and general knowledge. Don’t be afraid to ask – you’ll
often find you’re not the only one needing an answer.</p>
<h3>Smarter debugging</h3>
<p>Debugging isn’t everyone’s cup of tea. It’s a necessary, inevitable evil. It could be next to impossible to find
bug-free code, but you can alleviate the pain of the process by being smart about its execution. This is where the right
debugging tools come into play.</p>
<p>Whether you prefer your IDE’s in-built debugger, open source tools, or techniques like <a href="https://en.wikipedia.org/wiki/Rubber_duck_debugging">rubber duck
debugging</a>, the aim is to streamline error detection, breakpoint
setting, expression tracking, and performance checking. Best practices can also be derived from bug bounties, which
Zalando conducts via an internal program to better combat insider threats. Teams know their infrastructure best, its
strengths, as well as its weaknesses.</p>
<h3>Don’t overcook features</h3>
<p>As programmers hone their craft, it's tempting to develop solutions that are more complex and cater to a wider audience.
However, when doing so in the middle of a project, it’s easy to lose track of the main objective and whether the feature
you’re working on actually helps or hinders its purpose.</p>
<p>Sometimes, less is more when it comes to features. Why add a function that will never get used? Just because you can,
doesn’t mean you should. For mobile developers at Zalando, feature delivery is targeted for every 4-6 weeks. There are
much better uses of your time, and time is precious.</p>
<h3>Make version control your friend</h3>
<p>Even the best programmers make mistakes – it’s how they learn and excel. But when a major misstep occurs, you’ll be
grateful to version control. Using software like Git lets you keep as many revisions as you want, branch out for code
experimentation, plus track down previous changes made in the code. You can then refer back to those changes at any
time.</p>
<p>Git was created in large part to be used by development teams, thus good practices here are fundamental to modern-day
programming and essential when working on a professional team. It will save you time and often save your arse.</p>
<p>Are we missing anything important that should be added to this list? Let us know by tweeting us
<a href="https://twitter.com/ZalandoTech">@ZalandoTech</a>.</p>Highlights of the CASI conference2016-06-30T00:00:00+02:002016-06-30T00:00:00+02:00Humberto Coronatag:engineering.zalando.com,2016-06-30:/posts/2016/06/highlights-of-the-casi-conference.html<p>Our Dublin Data Scientists have been making themselves heard in the wider statistics community.</p><p>A few weeks ago, representatives from the Zalando Tech <a href="https://tech.zalando.de/locations/#dublin">Dublin office</a>
attended <a href="http://www.casi.ie/CASI_2016/index.html">CASI</a>, the 36th Conference on Applied Statistics in Ireland. The
conference is the <a href="http://www.istat.ie/">Irish Statistical Association's</a> forum for discussion of statistical and
related issues for Irish and International statisticians.</p>
<p>The conference was hosted by the University of Limerick, and gathered almost 100 researchers interested in statistics.
There were people working in the industry (like us), on top of professors and young researchers with a very wide range
of interests, backgrounds, and nationalities.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0dec234c11ea3ba534f2b4f1824f59f82ac7647c_casi-conf-1.jpeg?auto=compress,format"></p>
<p>We were really surprised by the quality and diversity of the talks and the speakers. The statistics community has a very
good relationship with the health sector, and many talks were related to health problems. The standout presentation came
from keynote speaker <a href="https://bethanycbray.wordpress.com/">Professor Bethany Bray</a>, who gave an overview of her current
research interests on latent model analysis for different types of health disorders and their relation to risk
behaviours.</p>
<p>I also enjoyed <a href="http://www.casi.ie/CASI_2016/www/documents/presentations/White_A.pdf">Arthur White's</a> presentation on
identifying patterns of student class attendance. Arthur explained how there are clear clusters of students in this
space, and the work has very important implications, especially since there seems to be a relationship between
attendance and grades.</p>
<p>Finally, <a href="http://www.casi.ie/CASI_2016/www/documents/abstracts/Sweeney_J.pdf">James Sweeney</a> showed us some promising
work on spatial modeling for house prices in Dublin (spoiler alert, they are rising!), a very popular topic in the wider
audience.</p>
<p>Zalando’s presence at the event was also supported by a talk from Sergio Gonzalez Sanz, a Data Scientist at <a href="https://tech.zalando.de/locations/#dublin">Zalando
Dublin</a>, who presented his work on <a href="http://www.casi.ie/CASI_2016/www/documents/abstracts/GonzalezSanz_S.pdf">conformal
predictors</a>. His research tries to answer the
question: What is the quality of a single prediction made by a classification model? Through conformal predictors,
Sergio explained a new way to hedge predictions with confidence and credibility values, using p-values for the
classification problem in a novel way.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f1b024d4374441f98722bf11477785c2f3e2a8b5_casi-conf-2.jpeg?auto=compress,format"></p>
<p>Overall, it was a very positive experience. Being the ‘New Kids on the Block’, and this being our first academic
conference in Ireland, it was interesting to see that we are already contributing to the Irish scientific community.
Sergio's talk was very well received, and there is a strong interest in the community on applying their research to the
kind of industry-scale problems we are solving. On a very positive note, it was great to see the gender balance in
participants at the conference, in comparison to other computer science conferences I have attended. The statistics
community in Ireland is an inclusive community with high quality researchers, and we are happy to be part of it.</p>
<p>We are already looking forward to next year's CASI conference. In the meantime, you can read the proceedings from this
year <a href="http://www.casi.ie/CASI_2016/proceedings.html">here</a>.</p>Which shoe fits you? Comparing Akka Streams, Actors, and Plain Futures2016-06-29T00:00:00+02:002016-06-29T00:00:00+02:00Joachim Hofertag:engineering.zalando.com,2016-06-29:/posts/2016/06/comparing-akka-streams-actors-and-plain-futures.html<p>We explore which architecture to implement for a component that is critical to our platform.</p><p>At Zalando, our team is currently in the process of creating an essential core component of our
<a href="https://tech.zalando.de/blog/zalandos-vp-brand-solutions-presents-at-the-july-2015-fashtech-konferenz./">platform</a>.
Luckily (and thanks to consistently following a <a href="https://tech.zalando.de/blog/from-monolith-to-microservices-video/">microservices
strategy</a>), this component does something relatively
simple: Triggering a CPU-intensive computation up front, and then publishing the results to other selected services.</p>
<p>We already have a proof-of-concept up and running, and the team decided to use <a href="http://scala-lang.org/">Scala</a> and the
<a href="https://www.playframework.com/">Play framework</a> for this. However, as this component doesn't have any kind of
web-facing frontend, nor exposes any kind of REST API, we are now re-evaluating our choice of technology.</p>
<h3>Requirements</h3>
<p>As the team is not yet super-experienced with Scala, and the component is an absolutely critical core part of the whole
platform, we are looking for a technology and architecture that fulfills the following criteria:</p>
<ul>
<li>Be as <strong>simple</strong> as possible</li>
<li>Be easily <strong>maintainable</strong></li>
<li>Be as <strong>robust</strong> as possible during runtime</li>
<li>Be <strong>fast</strong>, so that we don't cause any kind of back-pressure</li>
</ul>
<p>In the end, we as developers want to be able to sleep well, knowing that the chance of any kind of failure for this
component is minimal, and if it happens, we're able to fix it quickly.</p>
<h3>Test subjects</h3>
<p>During our discussion, various approaches for tackling these requirements quickly emerged.</p>
<p>Of course, there will always be the someone who wants to go for the simplest solution possible. And it is a valid point
of view, as we have to get up and running fast, and would still be able to adapt later. However, its validity strongly
depends on the simplest solution actually being simple, and not only the
<a href="https://www.infoq.com/presentations/Simple-Made-Easy">easiest</a> thing to do. In this case, the proposal was to keep each
run through the component in a single blocking thread, and to just route threads as needed via Java's <em>ExecutorService</em>.</p>
<p>Looking at the current Play application, the second option that naturally emerged was to use Scala's
<a href="http://docs.scala-lang.org/overviews/core/futures.html">futures</a> without much further sophistication.</p>
<p>And finally, with Scala, there's always the elephant in the room: <a href="http://akka.io/">Akka</a>.</p>
<p>Until recently, Akka inevitably meant using the actor model, known from <a href="https://www.erlang.org/">Erlang</a>. In my
experience, people are often sceptical about using actors extensively. There are several reasons for that scepticism.
It's quite a paradigm shift for many people, others don't trust all the low-level concepts being abstracted away, while
some miss type safety.</p>
<p>However, the recent hype around <a href="http://www.reactivemanifesto.org/">Reactive</a> has led to more alternatives popping up,
even within Akka: <a href="http://www.reactive-streams.org/">Reactive Streams</a>. The two major implementations of Reactive
Streams for Scala out there seem to be <a href="http://doc.akka.io/docs/akka/2.4.7/scala/stream/index.html">Akka Streams</a> on the
one hand, and <a href="http://reactivex.io/rxscala/">RxScala</a> on the other.</p>
<p>There are still a lot of other approaches available to do the same things. However, the above alternatives were the ones
brought up by our team, so these will be the focus of our examination.</p>
<h3>Case study</h3>
<p>The next step was to play around with each of the approaches above and determine which of them were a good fit for the
team and our requirements. In order to do this in a structured way, on top of exposing any kinds of obvious performance
or robustness problems, we chose to create a simple sandbox model of our problem and run benchmarks on it. This means
that the following case study shouldn't be taken as a "benchmark first" case study.</p>
<h3>Introducing the model</h3>
<p>Let's get coding, finally!</p>
<p>Here's our simple model of the jobs our component will receive (the blog will contain only the relevant code snippets,
for a more holistic view, see the <a href="https://github.com/zalando/scala-concurrency-playground">respective GitHub project</a>):</p>
<div class="highlight"><pre><span></span><code><span class="n">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">Job</span><span class="p">(</span><span class="n">id</span><span class="p">:</span><span class="w"> </span><span class="n">Int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">payload</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb nb-Type">Array</span><span class="o">.</span><span class="n">fill</span><span class="p">(</span><span class="mi">16000</span><span class="p">)(</span><span class="n">Random</span><span class="o">.</span><span class="n">nextInt</span><span class="p">())</span>
<span class="p">}</span>
<span class="n">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">JobResult</span><span class="p">(</span><span class="n">job</span><span class="p">:</span><span class="w"> </span><span class="n">Job</span><span class="p">,</span><span class="w"> </span><span class="n">result</span><span class="p">:</span><span class="w"> </span><span class="n">Int</span><span class="p">)</span>
<span class="n">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">PublishResult</span><span class="p">(</span><span class="n">result</span><span class="p">:</span><span class="w"> </span><span class="n">JobResult</span><span class="p">)</span>
</code></pre></div>
<p>This is how the computational part of our model looks:</p>
<div class="highlight"><pre><span></span><code><span class="nb">object</span> <span class="n">Computer</span> <span class="p">{</span>
<span class="kn">import</span> <span class="nn">ComputationFollowedByAsyncPublishing._</span>
<span class="k">def</span> <span class="nf">compute</span><span class="p">(</span><span class="n">job</span><span class="p">:</span> <span class="n">Job</span><span class="p">):</span> <span class="n">JobResult</span> <span class="o">=</span> <span class="p">{</span>
<span class="o">//</span> <span class="n">jmh</span> <span class="n">ensures</span> <span class="n">that</span> <span class="n">this</span> <span class="n">really</span> <span class="n">consumes</span> <span class="n">CPU</span>
<span class="n">Blackhole</span> <span class="n">consumeCPU</span> <span class="n">numTokensToConsume</span>
<span class="n">JobResult</span><span class="p">(</span><span class="n">job</span><span class="p">,</span> <span class="n">job</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>It uses the awesome <a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH</a> benchmarking library (nicely integrated into
<a href="http://www.scala-sbt.org/">sbt</a> via <a href="https://github.com/ktoso/sbt-jmh">sbt-jmh</a>) and its black hole to do all the work
for us.</p>
<p>And here's the part where we "publish" to other services, which is naturally asynchronous:</p>
<div class="highlight"><pre><span></span><code><span class="nb">object</span> <span class="n">Publisher</span> <span class="p">{</span>
<span class="kn">import</span> <span class="nn">ComputationFollowedByAsyncPublishing._</span>
<span class="o">//</span> <span class="n">we</span> <span class="n">use</span> <span class="n">the</span> <span class="n">scheduler</span> <span class="ow">and</span> <span class="n">the</span> <span class="n">dispatcher</span> <span class="n">of</span> <span class="n">the</span> <span class="n">actor</span> <span class="n">system</span> <span class="n">here</span> <span class="n">because</span> <span class="n">it</span><span class="s1">'s so very convenient</span>
<span class="k">def</span> <span class="nf">publish</span><span class="p">(</span><span class="n">result</span><span class="p">:</span> <span class="n">JobResult</span><span class="p">,</span> <span class="n">system</span><span class="p">:</span> <span class="n">ActorSystem</span><span class="p">):</span> <span class="n">Future</span><span class="p">[</span><span class="n">PublishResult</span><span class="p">]</span> <span class="o">=</span>
<span class="n">after</span><span class="p">(</span><span class="n">publishDuration</span><span class="p">,</span> <span class="n">system</span><span class="o">.</span><span class="n">scheduler</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Future</span><span class="p">(</span><span class="n">PublishResult</span><span class="p">(</span><span class="n">result</span><span class="p">))(</span><span class="n">system</span><span class="o">.</span><span class="n">dispatcher</span><span class="p">)</span>
<span class="p">}</span> <span class="p">(</span><span class="n">system</span><span class="o">.</span><span class="n">dispatcher</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>Notice that we're using the convenient scheduling provided by Akka actor systems here, as we'll have an actor system
running for our other experiments anyway.</p>
<h3>Old-school blocking</h3>
<p>Here's how the good old blocking approach looks like:</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span><span class="w"> </span><span class="n">benchmark</span><span class="p">(</span><span class="nl">coreFactor</span><span class="p">:</span><span class="w"> </span><span class="nc">Int</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="k">exec</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Executors</span><span class="w"> </span><span class="n">newFixedThreadPool</span><span class="w"> </span><span class="n">numWorkers</span><span class="p">(</span><span class="n">coreFactor</span><span class="p">)</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">futures</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">numTasks</span><span class="w"> </span><span class="k">map</span><span class="w"> </span><span class="n">Job</span><span class="w"> </span><span class="k">map</span><span class="w"> </span><span class="err">{</span><span class="w"> </span><span class="n">job</span><span class="w"> </span><span class="o">=></span>
<span class="w"> </span><span class="k">exec</span><span class="p">.</span><span class="n">submit</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">Callable</span><span class="o">[</span><span class="n">PublishResult</span><span class="o">]</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">explicitly</span><span class="w"> </span><span class="n">turn</span><span class="w"> </span><span class="n">async</span><span class="w"> </span><span class="n">publishing</span><span class="w"> </span><span class="k">operation</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">blocking</span><span class="w"> </span><span class="k">operation</span>
<span class="w"> </span><span class="n">override</span><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="k">call</span><span class="p">()</span><span class="err">:</span><span class="w"> </span><span class="n">PublishResult</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="n">Await</span><span class="p">.</span><span class="k">result</span><span class="p">(</span><span class="n">Publisher</span><span class="w"> </span><span class="n">publish</span><span class="w"> </span><span class="p">(</span><span class="n">Computer</span><span class="w"> </span><span class="k">compute</span><span class="w"> </span><span class="n">job</span><span class="p">,</span><span class="w"> </span><span class="k">system</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">hour</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="n">printResult</span><span class="p">(</span><span class="n">futures</span><span class="w"> </span><span class="k">map</span><span class="w"> </span><span class="p">(</span><span class="n">_</span><span class="p">.</span><span class="k">get</span><span class="p">))</span>
<span class="w"> </span><span class="err">}</span><span class="w"> </span><span class="n">finally</span><span class="w"> </span><span class="k">exec</span><span class="p">.</span><span class="k">shutdown</span><span class="p">()</span>
<span class="err">}</span>
</code></pre></div>
<h3>Plain futures</h3>
<p>Using futures instead of blocking everywhere doesn't really look that complicated to me:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">benchmark</span><span class="p">(</span><span class="n">coreFactor</span><span class="p">:</span> <span class="n">Int</span><span class="p">):</span> <span class="n">Unit</span> <span class="o">=</span> <span class="p">{</span>
<span class="kn">import</span> <span class="nn">system.dispatcher</span>
<span class="o">//</span> <span class="n">execution</span> <span class="n">context</span> <span class="n">only</span> <span class="k">for</span> <span class="n">the</span> <span class="p">(</span><span class="n">cpu</span><span class="o">-</span><span class="n">bound</span><span class="p">)</span> <span class="n">computation</span>
<span class="n">val</span> <span class="n">ec</span> <span class="o">=</span> <span class="n">ExecutionContext</span> <span class="n">fromExecutorService</span> <span class="n">Executors</span><span class="o">.</span><span class="n">newFixedThreadPool</span><span class="p">(</span><span class="n">numWorkers</span><span class="p">(</span><span class="n">coreFactor</span><span class="p">))</span>
<span class="k">try</span> <span class="p">{</span>
<span class="o">//</span> <span class="err">`</span><span class="n">traverse</span><span class="err">`</span> <span class="n">will</span> <span class="n">distribute</span> <span class="n">the</span> <span class="n">tasks</span> <span class="n">to</span> <span class="n">the</span> <span class="n">thread</span> <span class="n">pool</span><span class="p">,</span> <span class="n">the</span> <span class="n">rest</span> <span class="n">happens</span> <span class="n">fully</span> <span class="k">async</span>
<span class="n">printResult</span><span class="p">(</span><span class="n">Await</span><span class="o">.</span><span class="n">result</span><span class="p">(</span><span class="n">Future</span><span class="o">.</span><span class="n">traverse</span><span class="p">(</span><span class="mi">1</span> <span class="n">to</span> <span class="n">numTasks</span> <span class="nb">map</span> <span class="n">Job</span><span class="p">)</span> <span class="p">{</span> <span class="n">job</span> <span class="o">=></span>
<span class="n">Future</span><span class="p">(</span><span class="n">Computer</span> <span class="n">compute</span> <span class="n">job</span><span class="p">)(</span><span class="n">ec</span><span class="p">)</span> <span class="n">flatMap</span> <span class="p">(</span><span class="n">Publisher</span><span class="o">.</span><span class="n">publish</span><span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">system</span><span class="p">))</span>
<span class="p">},</span> <span class="mi">1</span> <span class="n">hour</span><span class="p">))</span>
<span class="p">}</span> <span class="k">finally</span> <span class="n">ec</span><span class="o">.</span><span class="n">shutdown</span><span class="p">()</span>
<span class="p">}</span>
</code></pre></div>
<h3>Actors</h3>
<p>Actors, however, do get a bit more involved. First of all, here's the client distributing the jobs:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">benchmark</span><span class="p">(</span><span class="n">coreFactor</span><span class="p">:</span> <span class="n">Int</span><span class="p">):</span> <span class="n">Unit</span> <span class="o">=</span> <span class="p">{</span>
<span class="kn">import</span> <span class="nn">system.dispatcher</span>
<span class="n">implicit</span> <span class="n">val</span> <span class="n">timeout</span> <span class="o">=</span> <span class="n">Timeout</span><span class="p">(</span><span class="mi">1</span> <span class="n">hour</span><span class="p">)</span>
<span class="o">//</span> <span class="n">Route</span> <span class="n">computations</span> <span class="n">through</span> <span class="n">a</span> <span class="n">balanced</span> <span class="n">pool</span> <span class="n">of</span> <span class="p">(</span><span class="n">cpu</span> <span class="n">bound</span><span class="p">)</span> <span class="n">computation</span> <span class="n">workers</span><span class="o">.</span>
<span class="n">val</span> <span class="n">router</span> <span class="o">=</span> <span class="n">system</span> <span class="n">actorOf</span> <span class="n">BalancingPool</span><span class="p">(</span><span class="n">numWorkers</span><span class="p">(</span><span class="n">coreFactor</span><span class="p">))</span><span class="o">.</span><span class="n">props</span><span class="p">(</span><span class="n">Props</span><span class="p">[</span><span class="n">ComputeActor</span><span class="p">])</span>
<span class="k">try</span> <span class="p">{</span>
<span class="o">//</span> <span class="n">Collect</span> <span class="n">the</span> <span class="n">results</span><span class="p">,</span> <span class="nb">sum</span> <span class="n">them</span> <span class="n">up</span> <span class="ow">and</span> <span class="nb">print</span> <span class="n">the</span> <span class="nb">sum</span><span class="o">.</span>
<span class="n">printResult</span><span class="p">(</span><span class="n">Await</span><span class="o">.</span><span class="n">result</span><span class="p">(</span><span class="n">Future</span><span class="o">.</span><span class="n">traverse</span><span class="p">(</span><span class="mi">1</span> <span class="n">to</span> <span class="n">numTasks</span> <span class="nb">map</span> <span class="n">Job</span><span class="p">)</span> <span class="p">{</span> <span class="n">job</span> <span class="o">=></span>
<span class="p">(</span><span class="n">router</span> <span class="err">?</span> <span class="n">job</span><span class="p">)</span><span class="o">.</span><span class="n">mapTo</span><span class="p">[</span><span class="n">PublishResult</span><span class="p">]</span>
<span class="p">},</span> <span class="mi">1</span> <span class="n">hour</span><span class="p">))</span>
<span class="p">}</span> <span class="k">finally</span> <span class="n">router</span> <span class="err">!</span> <span class="n">PoisonPill</span>
<span class="p">}</span>
</code></pre></div>
<p>The <em>ComputeActor</em> is just an actor wrapper around the computation which delegates work to the actors responsible for
publishing:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span><span class="w"> </span><span class="n">ComputeActor</span><span class="w"> </span><span class="n">extends</span><span class="w"> </span><span class="n">Actor</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">publisher</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">context</span><span class="w"> </span><span class="n">actorOf</span><span class="w"> </span><span class="n">Props</span><span class="o">[</span><span class="n">PublishActor</span><span class="o">]</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">receive</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nl">job</span><span class="p">:</span><span class="w"> </span><span class="n">Job</span><span class="w"> </span><span class="o">=></span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">tell</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">publisher</span><span class="w"> </span><span class="n">about</span><span class="w"> </span><span class="n">who</span><span class="w"> </span><span class="n">sent</span><span class="w"> </span><span class="n">us</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">job</span><span class="p">,</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">job</span><span class="w"> </span><span class="n">results</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sender</span><span class="p">()</span>
<span class="w"> </span><span class="n">publisher</span><span class="w"> </span><span class="err">!</span><span class="w"> </span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="n">Computer</span><span class="w"> </span><span class="k">compute</span><span class="w"> </span><span class="n">job</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span>
<span class="err">}</span>
</code></pre></div>
<p>Finally, the actor wrapper around publishing the results:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">PublishActor</span> <span class="n">extends</span> <span class="n">Actor</span> <span class="p">{</span>
<span class="kn">import</span> <span class="nn">context.dispatcher</span>
<span class="k">def</span> <span class="nf">receive</span> <span class="o">=</span> <span class="p">{</span>
<span class="k">case</span> <span class="p">(</span><span class="n">s</span><span class="p">:</span> <span class="n">ActorRef</span><span class="p">,</span> <span class="n">r</span><span class="p">:</span> <span class="n">JobResult</span><span class="p">)</span> <span class="o">=></span>
<span class="o">//</span> <span class="n">just</span> <span class="n">pipe</span> <span class="n">the</span> <span class="n">result</span> <span class="n">back</span> <span class="n">to</span> <span class="n">the</span> <span class="n">original</span> <span class="n">sender</span>
<span class="n">Publisher</span><span class="o">.</span><span class="n">publish</span><span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">context</span><span class="o">.</span><span class="n">system</span><span class="p">)</span> <span class="n">pipeTo</span> <span class="n">s</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h3>Streams, using RxScala</h3>
<p>Streaming our jobs through RxScala looks really beautiful and concise in my eyes. I'm not sure if it's correctly doing
what it should, as it blows up the heap when running. I'm afraid that there's a memory leak in there somewhere, and we
shouldn’t need to deal with an indistinct issue like that.</p>
<div class="highlight"><pre><span></span><code><span class="nv">def</span><span class="w"> </span><span class="nv">benchmark</span>:<span class="w"> </span><span class="nv">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>{
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="nv">looks</span><span class="w"> </span><span class="nv">nice</span>,<span class="w"> </span><span class="nv">not</span><span class="w"> </span><span class="nv">sure</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">correct</span>,<span class="w"> </span><span class="nv">blows</span><span class="w"> </span><span class="nv">up</span><span class="w"> </span><span class="nv">the</span><span class="w"> </span><span class="nv">heap</span>
<span class="w"> </span><span class="nv">Observable</span>
<span class="w"> </span>.<span class="nv">from</span><span class="ss">(</span><span class="mi">1</span><span class="w"> </span><span class="nv">to</span><span class="w"> </span><span class="nv">numTasks</span><span class="w"> </span><span class="nv">map</span><span class="w"> </span><span class="nv">Job</span><span class="ss">)</span>
<span class="w"> </span>.<span class="nv">subscribeOn</span><span class="ss">(</span><span class="nv">ComputationScheduler</span><span class="ss">())</span>
<span class="w"> </span>.<span class="nv">map</span><span class="ss">(</span><span class="nv">Computer</span><span class="w"> </span><span class="nv">compute</span><span class="ss">)</span>
<span class="w"> </span>.<span class="nv">subscribeOn</span><span class="ss">(</span><span class="nv">ExecutionContextScheduler</span><span class="ss">(</span><span class="nv">system</span><span class="w"> </span><span class="nv">dispatcher</span><span class="ss">))</span>
<span class="w"> </span>.<span class="nv">flatMap</span><span class="ss">(</span><span class="mi">1024</span>,<span class="w"> </span><span class="nv">r</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="nv">Observable</span>.<span class="nv">from</span><span class="ss">(</span><span class="nv">Publisher</span><span class="w"> </span><span class="nv">publish</span><span class="w"> </span><span class="ss">(</span><span class="nv">r</span>,<span class="w"> </span><span class="nv">system</span><span class="ss">))(</span><span class="nv">system</span><span class="w"> </span><span class="nv">dispatcher</span><span class="ss">))</span>
<span class="w"> </span>.<span class="nv">foldLeft</span><span class="ss">(</span><span class="mi">0</span><span class="ss">)</span><span class="w"> </span>{<span class="w"> </span><span class="nv">case</span><span class="w"> </span><span class="ss">(</span><span class="nv">s</span>,<span class="w"> </span><span class="nv">r</span><span class="ss">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="nv">s</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nv">computeResult</span><span class="ss">(</span><span class="nv">r</span><span class="ss">)</span><span class="w"> </span>}
<span class="w"> </span>.<span class="nv">foreach</span><span class="ss">(</span><span class="nv">println</span><span class="ss">)</span>
}
</code></pre></div>
<h3>Streams, using Akka Streams</h3>
<p>Akka Streams, in contrast, look a little more complex. This is due to the conscious decision on the side of the Akka
team to create a more abstract DSL in order to encourage the reuse of partial flow graphs. Having a second look, the
above RxScala code might be a bit deceptive in its conciseness due to the simple nature of our model. If the pipelining
becomes more complex, Akka's composable graph DSL might be a better fit for keeping things readable and under control.</p>
<p>First, we create a helper flow responsible for balanced routing of a workload. This is basically copied from the
<a href="http://doc.akka.io/docs/akka/2.4.7/scala/stream/stream-cookbook.html#Balancing_jobs_to_a_fixed_pool_of_workers">respective Akka Streams cookbook
documentation</a>:</p>
<div class="highlight"><pre><span></span><code><span class="n">private</span><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">balancer</span><span class="o">[</span><span class="n">In, Out</span><span class="o">]</span><span class="p">(</span><span class="nl">worker</span><span class="p">:</span><span class="w"> </span><span class="n">Flow</span><span class="o">[</span><span class="n">In, Out, Any</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="nl">workerCount</span><span class="p">:</span><span class="w"> </span><span class="nc">Int</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Flow</span><span class="o">[</span><span class="n">In, Out, NotUsed</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">Flow</span><span class="w"> </span><span class="n">fromGraph</span><span class="w"> </span><span class="n">GraphDSL</span><span class="p">.</span><span class="k">create</span><span class="p">()</span><span class="w"> </span><span class="err">{</span><span class="w"> </span><span class="n">implicit</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=></span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">balancer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="n">Balance</span><span class="o">[</span><span class="n">In</span><span class="o">]</span><span class="p">(</span><span class="n">workerCount</span><span class="p">,</span><span class="w"> </span><span class="n">waitForAllDownstreams</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">false</span><span class="p">)</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="k">merge</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="k">Merge</span><span class="o">[</span><span class="n">Out</span><span class="o">]</span><span class="p">(</span><span class="n">workerCount</span><span class="p">)</span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">workerCount</span><span class="w"> </span><span class="n">foreach</span><span class="w"> </span><span class="err">{</span><span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">balancer</span><span class="w"> </span><span class="o">~></span><span class="w"> </span><span class="n">worker</span><span class="p">.</span><span class="n">async</span><span class="w"> </span><span class="o">~></span><span class="w"> </span><span class="k">merge</span><span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="n">FlowShape</span><span class="p">(</span><span class="n">balancer</span><span class="p">.</span><span class="ow">in</span><span class="p">,</span><span class="w"> </span><span class="k">merge</span><span class="p">.</span><span class="k">out</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span>
<span class="err">}</span>
</code></pre></div>
<p>With this helper at hand, we can create the graph and run it. The central piece of code here is <em>source ~> balanced
~> publish ~> sink.in</em>.</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span><span class="w"> </span><span class="n">benchmark</span><span class="p">(</span><span class="nl">coreFactor</span><span class="p">:</span><span class="w"> </span><span class="nc">Int</span><span class="p">)(</span><span class="n">implicit</span><span class="w"> </span><span class="k">system</span><span class="err">:</span><span class="w"> </span><span class="n">ActorSystem</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">sink</span><span class="w"> </span><span class="n">that</span><span class="w"> </span><span class="n">computes</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="nf">sum</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">sink</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Sink</span><span class="p">.</span><span class="n">fold</span><span class="o">[</span><span class="n">Int, PublishResult</span><span class="o">]</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="p">(</span><span class="nf">sum</span><span class="p">,</span><span class="w"> </span><span class="n">job</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="nf">sum</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">computeResult</span><span class="p">(</span><span class="n">job</span><span class="p">)</span><span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">wiring</span><span class="w"> </span><span class="n">up</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">graph</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="n">streams</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">g</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RunnableGraph</span><span class="w"> </span><span class="n">fromGraph</span><span class="w"> </span><span class="n">GraphDSL</span><span class="p">.</span><span class="k">create</span><span class="p">(</span><span class="n">sink</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"> </span><span class="n">implicit</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">sink</span><span class="w"> </span><span class="o">=></span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">preparations</span><span class="p">...</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="n">Source</span><span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">numTasks</span><span class="w"> </span><span class="k">map</span><span class="w"> </span><span class="n">Job</span><span class="p">)</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="k">compute</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Flow</span><span class="p">.</span><span class="n">fromFunction</span><span class="p">(</span><span class="n">Computer</span><span class="w"> </span><span class="k">compute</span><span class="p">).</span><span class="n">withAttributes</span><span class="p">(</span><span class="n">ActorAttributes</span><span class="w"> </span><span class="n">dispatcher</span><span class="w"> </span><span class="ss">"compute-dispatcher"</span><span class="p">)</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">balanced</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="n">balancer</span><span class="p">(</span><span class="k">compute</span><span class="p">,</span><span class="w"> </span><span class="n">numWorkers</span><span class="p">(</span><span class="n">coreFactor</span><span class="p">)).</span><span class="n">async</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">publish</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="n">Flow</span><span class="o">[</span><span class="n">JobResult</span><span class="o">]</span><span class="p">.</span><span class="n">mapAsyncUnordered</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"> </span><span class="n">Publisher</span><span class="p">.</span><span class="n">publish</span><span class="p">(</span><span class="n">_</span><span class="w"> </span><span class="p">,</span><span class="w"> </span><span class="k">system</span><span class="p">)</span><span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">finally</span><span class="p">,</span><span class="w"> </span><span class="n">here</span><span class="err">'</span><span class="n">s</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">graph</span>
<span class="w"> </span><span class="n">source</span><span class="w"> </span><span class="o">~></span><span class="w"> </span><span class="n">balanced</span><span class="w"> </span><span class="o">~></span><span class="w"> </span><span class="n">publish</span><span class="w"> </span><span class="o">~></span><span class="w"> </span><span class="n">sink</span><span class="p">.</span><span class="ow">in</span>
<span class="w"> </span><span class="n">ClosedShape</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Running</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">graph</span><span class="w"> </span><span class="n">will</span><span class="w"> </span><span class="n">materialize</span><span class="w"> </span><span class="n">it</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">future</span><span class="w"> </span><span class="nc">int</span><span class="p">.</span><span class="w"> </span><span class="n">We</span><span class="w"> </span><span class="n">wait</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">it</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="k">print</span><span class="w"> </span><span class="n">it</span><span class="p">.</span>
<span class="w"> </span><span class="n">println</span><span class="p">(</span><span class="n">Await</span><span class="p">.</span><span class="k">result</span><span class="p">(</span><span class="n">g</span><span class="p">.</span><span class="n">run</span><span class="p">()(</span><span class="n">ActorMaterializer</span><span class="p">()),</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">hour</span><span class="p">))</span>
<span class="err">}</span>
</code></pre></div>
<h3>Benchmarking results</h3>
<p>When running the benchmarks, the main result for us was that with the numbers from our problem, the choice of approach
doesn't really matter when it comes to pure runtime performance. We only observe some very slight differences, if at
all.</p>
<p>Here are the results from one specific lengthy run, as an example:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/29a6a9659793da991275c77a2126bc2b0740aa36_benchmarking-results-jhofer.png?auto=compress,format"></p>
<p>The Akka Streams implementation seems to outperform all others very slightly, while the blocking implementation usually
fares slightly worse, presumably due to thread-switching overhead.</p>
<p>The main outlier is RxScala, which unfortunately throws an OutOfMemoryError very quickly, as previously mentioned. We're
happily accepting any hints about what we might be doing wrong here, however, the code looks pretty straightforward.</p>
<p>Having said that, there are some significant differences in runtime behavior.</p>
<p>One very obvious one is the number of threads getting used, as you might expect. Plain futures and Akka streams are both
very economic when it comes to threads. Actors seem to use a few more threads. This could be a question of tuning the
configuration, I suspect. By far the most number of threads are required by the blocking approach, who would have
thought?</p>
<p>Also, when varying the numbers, we notice that futures, streams, and actors are all way more consistent in their runtime
behavior, whereas you have to re-tune the blocking approach each time to fit specific circumstances. This means that the
blocking approach is not very robust towards a live environment where numbers will be in constant flux.</p>
<h3>Summary</h3>
<p>These results leave us with three good alternatives: Using Scala futures, using Akka actors, or going for Akka streams.</p>
<p>Using futures has the advantage of simplicity. However, I'm afraid that this simplicity can be a bit deceiving when the
system grows more complex. Sure, futures compose, but it might become difficult to reason about the flow of futures
quickly. Also, there's always the pitfall of not noticing failures with futures: One misplaced foreach can have you
looking for the failure you observed, but can't pinpoint, forever.</p>
<p>Using Akka streams, on the other hand, imposes some overhead in getting up and running. It requires learning the DSL,
and also understanding what's going on under the hood to some extent. We might reap some benefits from this investment
as soon as the component starts growing in complexity, as outlined above. Concerning error handling, streams should be
quite well-behaved, as you can basically define a supervision strategy for each node in your flow graph.</p>
<p>If Akka streams are abstracting away too many details for you, using actors directly might be a good option. This
approach gives you a lot of direct control over and insight into what's going on. The actor model in general is also
tried and proven. Error handling via the Akka supervisor model is straightforward. The main problem with actors is that
they are completely based on messages and effects. You have to be careful to deal with the emerging complexity by
getting your actor hierarchies right and testing them well.</p>
<p>So there you have it. Now go and <a href="https://github.com/zalando/scala-concurrency-playground">play with the code yourself!</a>
We’d be happy to hear from you with comments or improvements.</p>
<p>The full playground of code can be found on GitHub <a href="https://github.com/zalando/scala-concurrency-playground">here</a>.</p>Revolutionising fashion at our Dublin HQ2016-06-28T00:00:00+02:002016-06-28T00:00:00+02:00Zalando Technologytag:engineering.zalando.com,2016-06-28:/posts/2016/06/revolutionising-fashion-at-our-dublin-hq.html<p>Zalando Tech’s Dublin Hub is getting some attention with the help of Tech/Life Ireland.</p><p>Zalando Tech’s <a href="https://tech.zalando.de/locations/#dublin">Dublin Hub</a> is getting some attention with the help of
<a href="https://www.techlifeireland.com/">Tech/Life Ireland</a>, an initiative of the Government of Ireland under the ICT Skills
Action Plan.</p>
<p>In a bid to attract more top-tier tech talent to Dublin, we’ve collaborated on a new project that showcases the
strengths of Dublin as an emerging tech hub, on top of Zalando’s efforts in the field of Data Science. At our Fashion
Insights Centre in the heart of Dublin’s tech district, we’ve set ourselves up to understand fashion through technology
by working with one of the richest datasets in eCommerce. Our focus on data is all in support of the Zalando fashion
platform.</p>
<p>Fashion is always in flux, so it’s crucial that we continue to push the boundaries of what we can achieve in the realm
of Data Science. <a href="https://tech.zalando.de/blog/dublin-data-science-tour/">Dr Ana Peleteiro</a> is one of the many talented
Data Scientists at Zalando that keep us in the game, and made a great candidate for our collaboration with Tech/Life
Ireland.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1d78ecca3a9b522fb9057ab441645028c3ff6611_img_4638.jpg?auto=compress,format"></p>
<p>Tech/Life Ireland is funded by the Department of Jobs, Enterprise, and Innovation in partnership with Enterprise
Ireland, IDA Ireland, and the Irish Technology Industry.</p>Feature Extraction: Science or Engineering?2016-06-23T00:00:00+02:002016-06-23T00:00:00+02:00Antonino Frenotag:engineering.zalando.com,2016-06-23:/posts/2016/06/feature-extraction-science-or-engineering.html<p>It's time for feature engineering to stop being neglected in the machine learning literature.</p><p>Every time our customers visit the Zalando Fashion Store, we want to serve them personalised product recommendations,
depending on their preferences. Some people like leather jackets, others don't. Others love ankle boots, while some
prefer sneakers. Whilst some follow the latest trends and others prefer the classic style.</p>
<p>In a nutshell, the task of a personalised recommender system involves building user profiles from their behavior and
predicting which product recommendations will be most relevant to such profiles. Intuitively, the user profile specifies
properties such as how much interest the customer has in sportswear, or whether flat heels are preferred over high heel
shoes.</p>
<p>Machine learning offers plenty of different algorithms for building personalised recommendation engines, ranging from
collaborative filtering to content-based ranking. Virtually any model, especially content-based ones, relies on a
representation of the user profile (as well as the candidate recommendations) as a collection of attributes (or
“features”) to be served as input to the recommender system. An overwhelming majority of the available machine learning
literature assumes that suitable features exist somewhere in a data store and are just waiting to be fed into the
designed algorithm.</p>
<p>Yet, coming up with a suitable collection of features usually takes way more time and effort than learning a reasonably
accurate model from the data. This task is usually referred to as "feature engineering". In this post, I'll sketch out
the solution we came up with for organising the feature extraction jobs that are run behind our recommender systems.
I'll also challenge some assumptions that people often make when thinking of feature engineering as preliminary to
predictive modeling.</p>
<h3>Zalando’s approach to feature engineering</h3>
<p>The goal with our solution was to give proper emphasis to a task, which is arguably the most crucial part of the whole
machine learning pipeline. On the other hand, I want to highlight the importance of correctly understanding the impact
of what is probably the most belittled (or at least neglected) data modeling task ever.</p>
<p>Let's consider first how our feature extraction jobs are combined into a pipeline, where the final step results in a
machine-learned ranking model. Take a look at the following diagram:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/df0d47562bbfe1d272f44d6942a9602213df998b_fart2ml----blog.png?auto=compress,format"></p>
<p>Everything starts from user action logs, which record every action performed by customers in the shop. The first job for
us is to aggregate all actions on a per-user basis. The difficulty here is that the volume of server logs is massive and
distributed over several machines. Therefore, we cannot afford to crunch all of this data every time we need to retrieve
actions for a given customer. Pre-aggregation of customer actions allows subsequent jobs to selectively retrieve the
relevant information in a much more convenient way.</p>
<p>Once the user histories have been aggregated, they are used for two purposes. On the one hand, we can extract a number
of dynamic article features (i.e. quantities that change over time), such as number of visits received by the product
detail page over a given time window, number of purchases, and so on.</p>
<p>Moreover, we extract a wide range of user attributes, such as time elapsed since the user was last active on the shop,
or how the user choices are distributed over different product brands. Optionally, user feature extraction can also
exploit already computed article features. For example, if we want to measure how user purchases are affected by product
popularity, then for each purchase, we need to check how popular the corresponding product was at the time it was
purchased (e.g. in terms of click-through rate).</p>
<p>Finally, user and article features are combined together in order to estimate a scoring function that can rank
user-article pairs in terms of goodness of fit. Ideally, an accurate ranking model will give higher scores to pairs such
that the article is more relevant to the user profile, as opposed to other available candidates from the product
catalog. The learned scoring function is deployed to our live services in order to rank the recommendations we serve, in
different contexts, to Zalando customers.</p>
<h3>Challenges and opportunities</h3>
<p>As it's clear from the workflow sketched above, feature extraction poses significant challenges in terms of
architectural design and computational burden. Coming up with an optimal design is hard because of the rich
interdependencies linking the different jobs, where even a small problem in a single component can have a tremendous
downstream impact on the whole system.</p>
<p>Having said that, the computational burden also arises from having to deal with way more data than we'll actually use in
the final predictive modeling tasks. Whatever information we decide to extract for use in a machine learning model, we
have to extract it from a virtually unbounded amount of data. Here, useful signal is literally buried under massive
collections of seemingly chaotic events, which can be very hard to bring to meaningful representations. Yet, the quality
of the resulting predictions will depend much more significantly on the quality of the used features than on any other
modeling aspect. As Pedro Domingos puts it in <a href="https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf">this paper</a>,
“[a]t the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the
most important factor is the features used.”</p>
<p>This brings us to the core problem I want to emphasise. Because of its sheer technical complexity, it takes a serious
engineering effort to design a robust and efficient feature extraction pipeline. Additionally, because of its downstream
impact on the entire predictive modeling workflow, it is utterly impossible to build long-lasting infrastructure for
feature extraction without deep knowledge of the modeling requirements that will arise afterwards from the machine
learning components.</p>
<p>Therefore, outstanding engineering skills and strong machine learning experience are crucial, non-separable requirements
of the feature engineering process. Here, my claim is that any attempt to separate concerns for software architecture
design on the one hand and machine learning modeling on the other hand is bound to affect the chance of success. Based
on this consideration, it's more and more baffling to observe how little attention this task has raised so far in the
scientific community.</p>
<p>Feature engineering has been typically neglected in the machine learning literature, as it has always been regarded as
“engineering, not science”. However, feature engineering cries for scientific analysis. In my opinion, the only reason
it's yet to be regarded as a science is that its laws have never been investigated, let alone understood, by scientists.</p>Interview preparation tips for Java developers2016-06-22T00:00:00+02:002016-06-22T00:00:00+02:00Sean Patrick Floydtag:engineering.zalando.com,2016-06-22:/posts/2016/06/interview-preparation-tips-for-java-developers.html<p>Do you program in Java? Are you preparing for an interview? Then you should read this.</p><p>Technical interviews are part and parcel of the job hunting journey for developers. They’re usually the only chance
developers have to convince companies they’re the right fit for the role. Skills such as problem solving and critical
thinking are all high on the wish list for prospective employers, but what about the nitty gritty of your preferred
programming language?</p>
<p>Java programming roles need to cover a lot of ground when it comes to knowledge and processes. We’ve put together a list
of essential points that developers should be familiar with when applying for a Java development position.</p>
<ul>
<li>Know <a href="http://etutorials.org/cert/java+certification/Chapter+11.+Collections+and+Maps/11.7+Implementing+the+equals+hashCode+and+compareTo+Methods/">the contracts of the equals / hashCode /
compareTo</a>
methods, including different ways to implement them, plus the reasoning why to go with them. You should have at
least a superficial understanding of how these methods are used in common data structures. It’s also best to know
contracts and implementation details of the major collection types: Understand <a href="http://stackoverflow.com/a/21974362/342852">when a Set is more appropriate than
a List</a>, or in what situations a LinkedList is preferable to an
ArrayList. Be aware of the <a href="http://infotechgems.blogspot.de/2011/11/java-collections-performance-time.html">time and space complexities of common usage
patterns</a>.</li>
<li>Embrace <a href="http://www.yegor256.com/2014/06/09/objects-should-be-immutable.html">immutability and the reasoning behind using
it</a>. Work at understanding the difference
between <a href="http://stackoverflow.com/a/7713332/342852">unmodifiable views and immutable copies</a>. How can you make a
class immutable if it contains potentially mutable data like collections, dates, and arrays?</li>
<li>Think about <a href="https://en.wikipedia.org/wiki/Correctness_(computer_science)">correctness</a>. Embrace techniques like
Design by Contract, or Defensive Programming, on top of <a href="https://github.com/google/guava/wiki/UsingAndAvoidingNullExplained">understanding that null is
evil</a> and what the alternatives are.</li>
<li><a href="https://docs.oracle.com/javase/tutorial/essential/concurrency/">Understand concurrency</a>, both on a low level
(threads, synchronized blocks) and on a high level (<em>ExecutorServices</em>, <em>ConcurrentMaps</em>, <em>BlockingQueues</em>,
<em>ReadWriteLocks</em>). Think about when <a href="http://www.ibm.com/developerworks/library/j-ft18/">lazy evaluation is preferable to eager
copies</a>.</li>
<li>Think about testing and know how to write testable code. You’ll be expected to know how to write readable tests,
using the appropriate level of abstraction ( <a href="http://hamcrest.org/JavaHamcrest/">Hamcrest</a>,
<a href="https://github.com/jayway/JsonPath/blob/master/json-path-assert/src/main/java/com/jayway/jsonassert/JsonAssert.java">JsonAssert</a>,
etc.), and which corner cases to cover in your tests (@Parameterized runner, property testing). Understand what to
mock and how, as well as knowing about the testing pyramid and common approaches in testing (TDD, BDD etc.).</li>
<li>Find the right level of abstraction. Object-oriented design is about finding the sweet spot somewhere between
<a href="https://blog.codinghorror.com/new-programming-jargon/">“stringly typed” and “Baklava code”</a>. You’ll want to use
functional idioms, but don’t overuse them: <a href="https://www.infoq.com/articles/How-Functional-is-Java-8">Java is not a functional
language</a>. Know your design patterns, but don’t
over-engineer. Embrace the <a href="http://martinfowler.com/bliki/InversionOfControl.html">Inversion of Control</a>, whether
you use a framework like Spring or manual techniques.</li>
<li>Something very important to know about is Microservices. There are many frameworks out there that will help you
build, deploy, monitor and interact with Microservices. Master at least one REST framework (Spring Boot, Play,
DropWizard etc.). Make sure you are aware of concepts like scaling and resilience, event-driven architectures, and
asynchronous logging, as well as having a conceptual understanding of REST. Having said that, know at least the
basics about containers (e.g. Docker), clouds (e.g. AWS), build systems, and continuous delivery.</li>
<li>Let's talk about data. You’ll need to know both traditional SQL databases (we love Postgres) and different types of
NoSQL datastores (Redis, Cassandra, DynamoDB etc.). Work to understand Message Queues (Kafka, Kinesis) and Big Data
engines (Spark, Storm, Flink).</li>
</ul>
<p>We also recommend that Java developers read the following books: <a href="https://www.amazon.de/Effective-Java-2nd-Programming-Language/dp/0321356683">Effective
Java</a>, <a href="https://www.amazon.de/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882">Clean
Code</a>, and <a href="https://www.amazon.de/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601">Java Concurrency in
Practice</a>. It’s also good practice to hone
your craft on <a href="https://www.hackerrank.com/">HackerRank</a> and/or on <a href="http://stackoverflow.com/">StackOverflow</a>.</p>
<p>While you won’t reasonably be able to master all of the above, you should try to understand most of them on a
superficial level and deep-dive into several of these points. We’re just one of many companies <a href="https://tech.zalando.de/jobs/tech/">who are looking
for</a> Java developers right now. Happy job hunting!</p>Integrated Commerce and our Merchant Center rebuild2016-06-21T00:00:00+02:002016-06-21T00:00:00+02:00Philipp Metzlertag:engineering.zalando.com,2016-06-21:/posts/2016/06/integrated-commerce-merchant-centre-rebuild.html<p>Get an insight from Brand Solutions into our work behind the Zalando Merchant Center rebuild.</p><p>Zalando’s vision is to connect people and fashion. To do that, we want to get access to potentially all the fashion
stock that’s out there in the world. This means connecting to stock from every possible source.</p>
<p>Therefore, stock integration for Zalando is divided into two areas: e-commerce stock, where we connect e-commerce
warehouses from other brands and e-tailers, and offline stock, where we connect to brick and mortar stores, digitalise
their stock, and make it visible and available online.</p>
<h3>The goal of the rebuild</h3>
<p>What we’re trying to do is to make every product available for any customer. When Zalando has access to stock sitting in
local stores or decentralised warehouses, this enables a broader assortment offering and faster delivery for customers,
as well as the potential to offer pick-up of items directly in store, a feature we’re working on via the rebuild of our
Merchant Center.</p>
<p>The rebuild will offer a new way to connect our stock partners to the Zalando Platform. It will allow partners to
connect to the Merchant Center API directly, or use the Merchant Center frontend in order to upload articles and receive
orders manually. The new elements we’re implementing will save time for our partner brands, on top of ensuring better
usability of the platform. How does this happen from a technical standpoint?</p>
<h3>Under the hood</h3>
<p>The major effort of stock integration is connecting the system landscape of brands and retailers to Zalando’s system. On
the warehouse side, this is currently achieved by connecting partners via an aggregator (e.g. Anatwine, Tradebyte) that
connects the partner’s systems (ERP or Shop System) to their own. For brick-and-mortar stores, the rebuild of the
Merchant Center involves Zalando’s Fashion Connector team in Helsinki, who have created customised interfaces to easily
integrate existing solutions into our platform.</p>
<p>Our current Partner Program has its technological foundation situated firmly in Java and the SOAP protocol. From an
operational point of view, only backend interfaces exist: We have no frontend for external partners to enable them to
manage the process themselves, however internal frontends exist to manage them within Zalando.</p>
<p>What we’re planning for our Merchant Center is to replace this with general interfaces that are based on Zalando’s
current transition from a shop <a href="https://tech.zalando.de/blog/data-integration-in-a-world-of-microservices/">monolith to
microservices</a>. We’ll also be utilising AWS
and RESTful architectural styles that incorporate both backend and frontend technologies. AWS simplifies the deployment
of new versions of our services, on top of ensuring scalability and easy maintainability.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/519af30f9da9be2bca4aaa5d7759dc2a1a3546b8_integrated-commerce_en.jpg?auto=compress,format"></p>
<p>In the backend, Scala provides a type-safe and performant way of writing services, allowing us to embrace asynchronous
computing and build scalable, resilient systems. It is the place where functional programming meets the object-oriented
world, making it an excellent tool for solving business-critical challenges.</p>
<p>Together with Scala, we use Akka, the toolkit and runtime for building highly concurrent, distributed, and resilient
message-driven applications on the JVM. This gives us an implementation of the actor model – a useful abstraction for
building distributed systems. To incorporate RESTful architectural styles, we use Akka-HTTP for building REST APIs, as
well as being a performant HTTP layer.</p>
<p>RDS, backed by PostgreSQL, gives us a scalable and managed SQL solution in the cloud, which we use as durable storage
for critical data.</p>
<p>For the frontend, our integration of AngularJS gives us the ability to modularize our code, allowing autonomous teams to
maintain and deploy their own modules. We’re also able to use test-driven development practices (TDD), thanks to its
built-in dependency injection mechanism.</p>
<p>AngularJS has a big community where we can find ready-to-use plugins and feedback if needed. It’s currently used in
conjunction with Typescript and ES6 to enhance the productivity and maintainability of our code.</p>
<h3>Benefits for brands and partners</h3>
<p>The Merchant Center rebuild will allow us to fully utilise the components of the Zalando core platform, on top of an
integration capacity to sell items on different consumer-facing applications such as the Zalando Shop, Lounge, and
ZipCart. From a product management perspective, the whole process is currently working in an iterative fashion, where
constant feedback loops (based on prototypes and MVPs) and waste elimination is key.</p>
<p>Partner brands will have frontend capacities in the rebuilt Merchant Center, unlike now, meaning that they’ll be able to
operate the technological components required to onboard articles and define prices, stock, and their location. They’ll
also be able to complete general order fulfilment, customer returns, and everything related to financial processes. This
gives brands independence within the platform, rather than relying completely on Zalando to be integrated.</p>
<p>Supporting our partners on a technological level is important, with the end goal for both Zalando and brands being
better options for customers. Accessing every possible piece of stock for every potential customer is strengthened by
building partnerships, and improving the technological landscape to make it happen.</p>The Product Specialist role in a Distributed Team Setup2016-06-15T00:00:00+02:002016-06-15T00:00:00+02:00Carsten Ernsttag:engineering.zalando.com,2016-06-15:/posts/2016/06/the-product-specialist-role-in-a-distributed-team-setup.html<p>Hear all about team autonomy and Product Management at Zalando Tech.</p><p>My name is Carsten and I joined Zalando 3.5 years ago as a Product Manager for mobile apps. I’ve spent most of my time
with Zalando’s Fashion Store app for iOS, but for about a year now, I’ve been working on a new app product called
<a href="http://www.fleekapp.de/">fleek</a>. fleek is a fashion e-commerce app that connects mobile-savvy consumers with brands,
retailers, and influencers.</p>
<p>As part of recent organizational changes focused on increasing team autonomy, my job at Zalando has evolved:
I’ve now become a Product Specialist. In this post I’ll explain what that means for me as a Product Specialist in
my day-to-day work.</p>
<h3>Product Management before 2015</h3>
<p>It has been quite a journey for Zalando in product management since I joined at the end of 2012. Back then, Zalando was
in the middle of its hyper-growth phase. For example, in 2012, we rolled out the Fashion Store in Sweden, Denmark,
Finland, Norway, Belgium, Spain, and Poland – in every case, huge localization efforts were required. Traffic started
shifting significantly towards mobile and some colleagues thought we hadn’t adapted to apps quickly enough.</p>
<p>With tons of topics and engineering teams shipping under full-speed, we spent a relatively limited time thinking about
whether we were building the right things for our customers. As a Product Manager, I was very deeply into JIRA,
specifying features in a very detailed way and hence, was making sure to keep developers coding fast. I thought
everything was under control, and it was in a way. Working like that enables what some people call “Zalando-speed”, and
this was perhaps the right setup for the phase Zalando was in.</p>
<p>Looking back from where we are now, there were two major problems: Firstly, most of our developers felt unchallenged
because they didn’t really understand the purpose of their work. Secondly, Product Managers were too busy keeping the
momentum going to make sure we were building the stuff that mattered.</p>
<h3>The fleek Product Team</h3>
<p>These days Zalando, and particularly the fleek Product Management team, doesn’t spend any time on ticket specifications.
This function lies completely with our Delivery Teams; in particular, with the role of Producer, which is complementary
to the role of Product Specialist. The fleek app has four Delivery Teams situated at our <a href="https://tech.zalando.de/locations/#helsinki">Helsinki Tech
Hub</a>, while the Product side is Berlin-based.</p>
<p>Dividing work amongst different locations works smoothly if everyone understands the purpose of fleek and where the
product has to go. That’s part of our job as the Product Team. If this is clear to everyone, developers can focus on the
“how” of product implementation. That’s where autonomy comes in. Developers have the freedom to
choose whatever technology they find appropriate, following guidance from our <a href="https://engineering.zalando.com/tags/tech-radar.html">Tech Radar</a>.
Product Management, on the other hand, has the freedom to dig deeper
into the “what”. During the early phase of fleek, we were able to spend most of our time on user research topics like
ideation and prototyping to quickly find and validate ideas. This was followed by plenty of agile user testing for a
better user experience. We were doing everything we could to find the solutions that fix real customer problems. Now
that we have release the first version of fleek, we can iterate in order to get it right.</p>
<p>While autonomy allows teams to choose the process that works for them, this doesn’t mean that every Delivery Team
strictly follows Scrum. However, whatever set of principles they adopt, we are selectively part of the implementation
process. This method of working is successful because we do the one thing that matters: Ensure transparency. The best
way to achieve this is by having all Product Specialists align on a common prioritization and bringing this to regular
planning meetings with their Delivery counterparts. In this situation it’s essential to get a commitment from the team
on what to build in the upcoming development cycle. This implies, of course, effort estimation and velocity measurement.</p>
<p>Generally speaking, sharing the labour amongst teams like we do has a lot of advantages. However, maintaining several
distributed teams at the same time can be challenging as well. All in all, Zalando manages to master it pretty well. We
run video conferences on a daily basis for team working in tandem at different locations, including demos and
retrospective sessions. Everyone in our team runs Hangouts with specific Producers and developers to clarify open
questions. We also take it in turns to travel to Helsinki – I’m usually over there for two days per month.</p>
<p>We still have so much to learn when it comes to team cooperation and our products, but it’s great fun to get both of
them right. If you have any questions about the Product Specialist role, or if you’re interested in <a href="https://tech.zalando.de/jobs/">joining the
team</a>, please contact me on Twitter at <a href="https://twitter.com/ca_ernst">@ca_ernst</a>.</p>Goodbye Angular (1), hello React2016-06-14T00:00:00+02:002016-06-14T00:00:00+02:00Jan Stroppeltag:engineering.zalando.com,2016-06-14:/posts/2016/06/goodbye-angular-1-hello-react.html<p>It's out with the old, in with the new for Team Phrasemongers at Zalando Dortmund.</p><p>At Zalando, autonomy means that every team can make their own technology choices, following our <a href="https://engineering.zalando.com/tags/tech-radar.html">Tech Radar</a>.
In conjunction with the approach to use a Microservices architecture, where every service is independent from each other.</p>
<h3>Out with the old, in with the new</h3>
<p>In 2015, our team Phrasemongers, got new responsibilities. New topics were assigned to us such as search engine
advertising and later on customer incentives, which we had to roll up. Therefore, we had to create several small
administration frontends, used by only a handful people – a perfect playground to test some of these new technologies.</p>
<p>We decided to completely renew our frontend technology stack, as we were using a stack based on Angular 1, Grunt and
RequireJS, the Zalando standard stack that was by now a bit outdated. Our first decision was to use React for future
projects, and as our team was the first to switch from Angular to React in Dortmund, we had to start from scratch,
reading through documentations and blogs to learn how to code with React and to get a general overview of libraries to
use. In contrast to Angular, React is no framework, so nearly every use case in React comes without an integrated
toolset (routing, HTTP, etc).</p>
<p>For the first application created with React, we used libraries mainly provided by Facebook – Jest for testing, Flux as
application architecture, and Reactify (based on JSTransform and react-tools) to transpile ECMAScript 6 (ES6) to
ECMAScript 5 (ES5). As the task runner we chose Gulp, as it reduces boilerplate compared to Grunt and is more performant
by working on node streams. To bundle dependencies we choose Browserify, as it is very simple to handle and the boarding
corral is not as big as Webpack. Last, but not least, we decided to use plain old jQuery Ajax for handling REST calls.</p>
<p>My first impressions after finishing this application were conflicting: I really appreciated coding with React and JSX
(XML-like syntax extension to ECMAScript), but the boilerplate produced with Flux was immense. Other weak points of the
technology stack, in my opinion, were Jest and Reactify.</p>
<p>Jest was easy to handle for testing as it mocks every dependency, but the price for it was a performance beneath
contempt, even if you limit the number of mocks dramatically. Reactify was a weak point as it only transpiled a limited
amount of features from ES6 to ES5, missing for example the let-keyword and the ES6 modules. It was also marked as
deprecated by Facebook just when we delivered the application – something a frontend developer stumbles over from time
to time.</p>
<h3>Better decisions and better results</h3>
<p>The next application we started last summer received a slight improvement on the technology stack. After researching
alternative Flux implementations, we decided to choose Alt, rated as one of the best alternatives to Facebook's Flux,
with radically reduced boilerplate. We also replaced Reactify with Babel, which is somehow the standard for transpiling
ES6 to ES5. For the HTTP request library we decided to test SuperAgent along with the ES7 stage 3 feature of async
functions, which together made the disliked promise chain easy to handle.</p>
<p>But the best decision we made was to use Webpack instead of Browserify. Browserify was a cool and easy to handle module
bundler, but once get familiar with Webpack, you don’t want to go back. With Webpack you can treat nearly everything as
a module and bundle it - like CSS- or LESS-files and images - with the concept of Webpack loaders, along with many other
incredibly useful features.</p>
<p>As you might suspect, we were far more satisfied with this stack with its various improvements. But for our next
application, which we started working on at the beginning of the year, some unexpected changes occurred. In the second
half of 2015, an alternative to Flux emerged in Redux. According to the documentation it’s a predictable state container
for JavaScript apps. It allows you to track the state over time and therefore restore any state. It’s library is 2 kb in
size but also highly extensible. We chose Redux instead of Alt for the next application and also replaced our testing
strategy, now using Mocha for lightweight unit tests against a JavaScript dom (jsdom).</p>
<p>We used the Redux-API-Middleware to make HTTP requests, based on the Fetch API as a substitute to the good old
XMLHttpRequest, with the advantage being that it is inherently promise based. And, as you don’t really need a task
runner anymore by using Webpack, we renounced Gulp, starting tasks only via Webpack or Npm scripts.</p>
<p>While it was a very small application, we managed to put together a technology stack that we are 100% satisfied with it.
For our actual frontend project, the fourth programmed with React, we haven’t replaced any library and have improved its
whole composition.</p>
<h3>And the learning continues!</h3>
<p>Team autonomy gives our team the opportunity to experiment with new technologies, gather experiences from backing the
wrong horse, collect our learnings, and make a better attempt the next time. With this approach we have a deeper
understanding of the libraries we use and we have a stack that perfectly suits our needs.</p>Falling in Love with Tech in Helsinki2016-06-13T00:00:00+02:002016-06-13T00:00:00+02:00Vivi Brooketag:engineering.zalando.com,2016-06-13:/posts/2016/06/falling-in-love-with-tech-in-helsinki.html<p>How a non-dev techie took a radical approach to understand her working environment.</p><p>In 2015, I fell in love.</p>
<p>This was when my love affair with tech began. I had always been fascinated by tech and innovation, but never to the
extent that I would consider working in the field. In 2014, I surprisingly started working in a startup software
development house, and was introduced to agile methods and the software development world. Later in 2015, I stumbled
upon Zalando’s focus on autonomous teams and soon after began working in
their newly opened <a href="https://tech.zalando.de/blog/hello-helsinki/">Helsinki Tech Hub</a>. I was sold: My crush on technology
was solidified.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/703e5e415101a2ff8531ea2355fffd786aefbf09_rails-girls-2016-helsinki-175.jpg?auto=compress,format"></p>
<p>When you work in a tech company, you should make sure you take an active interest in understanding the work going on by
your peers. I asked myself: How can I understand tech better? What is it that Zalando developers are doing? How can I
immerse myself in their mindset?</p>
<p>These questions were swimming through my mind on a daily basis.</p>
<p>I’ve learned and experienced a lot about my tech surroundings while helping build and ramp up the Helsinki office:
Working daily with teams, Retrospectives, <a href="https://tech.zalando.de/blog/one-last-thing-before-we-call-it-a-year-hack-week-4/">Hack
Weeks</a>, OKRs, meetups, etc. There was
one thing, slightly out of my comfort zone, which I had yet to do: Learn how to code.</p>
<p>In April we hosted the <a href="http://railsgirls.com/helsinki">Helsinki Rails Girls</a> workshop, and I jumped on the opportunity
to organise it. The Rails Girls workshop concept is a two day free event for women to dive into the world of building
web applications. <a href="http://railsgirls.com/">Rails Girls</a> is a global, non-profit volunteer community that was founded in
Finland.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9bd8bb33c7b5345503710beed693d0ad054f6426_rails-girls-2016-helsinki-3.jpeg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/54612dfc9bad1763f4f2e02a65e3f498c4342e63_rails-girls-2016-helsinki-165.jpg?auto=compress,format"></p>
<p>The evening started with participants getting to know their peers and coaches in small teams and then continued on to
installing and testing all required software needed for the workshop: Atom/Sublime Text, Ruby, SQLite and Sinatra.</p>
<p>After all installations were finished, there was a short introduction to web design and prototyping to help us all get
into the right mindset of building a web application. We were then asked to recreate Facebook, and write all the
relevant components we could think of on separate post-it notes. We added them to a whiteboard wall to see if we could
come up with all the components needed. Needless to say, it was a challenge to complete this task in the given 5 minute
time limit. The exercise demonstrated not only how technologies and web applications are developed, but also emulated
how complex creating software and web applications can be, which was a revelation for most of the women participating in
the workshop. Most had little to no previous experience in the tech world, let alone with coding.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/bebe76e7d08150de9d4c5ea7302a6568d6884a12_rails-girls-2016-helsinki-84.jpg?auto=compress,format"></p>
<p>The next morning the coaches introduced themselves, giving the participants some guidance and encouragement to delve
into unknown territory. We were then off to create our very own websites from scratch.</p>
<p>I think the revelation of understanding what coding is and what software development requires, even if I only scratched
the very surface via the workshop, really opened up my perspective about working in the tech world.</p>
<p>The experience of learning how to code was fun, eye-opening, and surprisingly addictive – I strongly encourage all
non-dev techies to take the leap too. Personally, I am still in love with tech and there is still much to learn – I can
hardly wait to fail fast, while continuing to develop my new skills.</p>Pushing the boundaries: Human interaction with technology2016-06-10T00:00:00+02:002016-06-10T00:00:00+02:00Andra Joy Lallytag:engineering.zalando.com,2016-06-10:/posts/2016/06/human-interaction-with-technology.html<p>At JSConf 2016 in Budapest, we gave some kickass insights into tech and human interaction.</p><p>A couple of weeks ago I attended <a href="http://jsconfbp.com/">JSConf 2016</a> in Budapest. This was a two day conference about
JavaScript where speakers from all over the world came to share their passion, ideas, and recent projects with the
broader tech community.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/69996ceb6fcb8a67e41080e7b8be22459e5f7b86_img-20160523-wa0001.jpg?auto=compress,format"></p>
<p>Zalando had a booth set up at the conference, on top of being represented by <a href="https://twitter.com/princi_ya">Princiya Marina
Sequeira</a>, who spoke on <a href="http://jsconfbp.com/speakers/princiya-sequeira.html">‘Natural User Interfaces using
JavaScript’</a>. The presentation began with Princiya discussing how
humans interact with technology today. The three major actions we currently apply are type, click, and touch. This seems
pretty archaic compared to what software is able to do for us today.</p>
<p>Princiya next described what interacting with technology in the future might look like. “Engineers will be able to
create a system for human interaction where the interfaces are more natural and intuitive.” The example she emphasised
was building technology that could respond to human gestures or motion.</p>
<p>Princiya created two apps to demonstrate to the audience what is currently possible. Her first example was her computer
responding to a human wave. She stood back a couple of feet from her computer and waved her hand. The computer responded
to this wave and moved to the next slide in her presentation. One of the audience members was so impressed she drew a
picture comparing the example to magic.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/aec548f9c19fed06000b62cd43a850697170d1b1_princiya-jsconf-magic.jpeg?auto=compress,format"></p>
<p>The second project she create was called “Draw”. The app would respond to her finger as she drew lines in the air. The
computer would then draw the motions as a visual sketch on her computer. While incredibly impressive, Princiya did admit
that the technology is not completely there just yet. For one thing, it heavily depends on light, but it still presents
us with endless possibilities.</p>
<p>This was an incredible presentation: We’re challenging the boundaries of how we interact with technology. It is
inspiring to see someone truly think outside the box. Princiya is ahead of her time and an awesome engineer. I’m truly
lucky to be able to stand in the same booth as someone at a conference that is so inspiring when it comes to tech.</p>
<p>On top of that, it’s incredible that I’m able to see someone at work who constantly challenges the status quo of our
current tech reality. Princiya did Zalando Tech proud.</p>Why should your kid code?2016-06-09T00:00:00+02:002016-06-09T00:00:00+02:00Eric Bowmantag:engineering.zalando.com,2016-06-09:/posts/2016/06/why-should-your-kid-code.html<p>To take part in the societies of tomorrow, kids should learn to code today.</p><p>Learning how to use technology and learning how it works are two very different, yet critical, educational issues.</p>
<p>It’s safe to say that today’s kids are using the latest smartphones, tablets, and laptops with little to no
instructions. The need to teach our children how to use the latest tech toy is no longer necessary – the ubiquity and
effect of these devices on our daily lives means they figure it out on their own.</p>
<p>The education systems of a <a href="http://www.euractiv.com/section/social-europe-jobs/news/coding-classes-trending-across-eu-schools/">handful of
countries</a> have
included coding as core curriculum in primary schools, which underlines the fact that the how behind technology remains
a key driver of innovation and global literacy.</p>
<p>Every computer used to come with a built-in programming language, however today’s devices don’t possess the same
features: iPads don’t even have a reasonable keyboard. The trend in technology is pushing people away from the code,
when we need to be pulling them closer instead.</p>
<p>As today’s children will play a major part in how technology is built in the future, why not supply them with the skills
they’ll need to influence, innovate, and engage with tech?</p>
<h3>Fluency in technology is an invaluable skill</h3>
<p>Being able to merely use your computer isn’t going to cut it in 10 years’ time. Our kids are born in an age where the
Internet has always existed, and where software supports almost every industry you can think of. The workforce of the
future will be expected to be technologically fluent, which means more than operating devices. This has often been
referred to as learning “21st century skills”.</p>
<h3>It’s all about layers: Creativity, critical thinking, problem solving</h3>
<p>What better way is there to bring out our children’s creativity in a technology-heavy world? With coding under their
belt, they’ll be able to build tangible systems that incorporate critical thinking, creativity, and problem solving.
These aren’t just computer science or software engineering lessons, but life lessons: Looking at big-picture-problems
and breaking them down into manageable tasks, making logical connections, and analysing and interpreting data.</p>
<p>Being effective in the presence of uncertainty is also a meaningful lesson, with trends in millennialism showing us that
workers of the future demand clear purpose to properly engage deeply in the problems modern enterprises face. However,
when we talk about layers, we’re also referring to the layers of abstraction that mask the complexity beneath it all:
Solving technical problems at scale and reducing complexity might sound like grown-up issues, but delving into these
topics early on shows that it’s more than just programming, it’s developing.</p>
<h3>Acquiring coding skills early opens up options in midlife</h3>
<p>Your child’s chosen profession almost certainly won’t be the one they stick with for life, nor should it have to be.
Learning how to code young will open up more options to kids later in their careers, and eases the struggle of coding
bootcamps in midlife. While programming is a never-ending journey of acquiring knowledge, laying the groundwork will be
crucial going forward.</p>
<h3>Connected devices will shape the future</h3>
<p>From fridges that know when you need to buy milk, to apps that give you an easy-to-read dashboard of your health and
fitness data, connected devices are paving the way to a continually connected world. Kids should be outfitted with the
know-how to navigate the emerging Internet of Things landscape, even if we’re still figuring out which devices would
actually be improved by these connections.</p>
<p>Coding skills will soon be assumed knowledge throughout various industries for kids in the future, but just one part of
a larger range of skills that a well-rounded human ought to have. It’s no surprise that initiatives in the EU, US, and
Australia to make coding part of the curriculum have been mostly successful in their implementation. Diverse
early-learning is the key to unlocking future potential and creativity.</p>
<p>The world needs more well-rounded people who can code. It also needs to up the ante when it comes to educating its youth
to keep up with technological innovation and demand.</p>
<p>To take part in the societies of tomorrow, kids should learn to code today.</p>Zappr – Enhancing your GitHub workflow2016-06-08T00:00:00+02:002016-06-08T00:00:00+02:00Nikolaus Piccolottotag:engineering.zalando.com,2016-06-08:/posts/2016/06/zappr--enhancing-your-github-workflow.html<p>An open source tool to guarantee effective code reviews on GitHub, using only GitHub.</p><p>There are more than 1,000 people working at Zalando Tech and a substantial amount of them are focused on writing code
every day. Our developers work autonomously, with full end-to-end responsibility. For compliance reasons — and as good engineering
practice — we want to make sure that every change to production systems is reviewed by at least two Zalando Tech
employees.</p>
<p>In other words, you are not allowed to write specifications or code and commit to the master branch without anybody else
being involved. This way, a successful code review becomes the seal of approval for deployments into production. It
gives us the confidence that is necessary to rely on a completely automated delivery pipeline, without the need for any
further manual intervention.</p>
<p>How do you guarantee effective code reviews on GitHub, using only GitHub? We looked at many different solutions that try
to improve workflows and provide tooling for code reviews like <a href="https://pullapprove.com/">pullapprove.com</a>,
<a href="https://www.review.ninja/">review.ninja</a> or <a href="https://github.com/reenhanced/gitreflow">git-reflow</a>. However, none of
them were really satisfying:</p>
<ul>
<li>Many require us to use a separate website that fragments the regular workflow of a developer</li>
<li>Others are proprietary and closed source, which is a deal-breaker because we may want to implement custom features
and logic</li>
<li>Certain tools are locally installed, making it hard to verify if someone is actually using it</li>
<li>Some may not play well with GitHub Enterprise</li>
</ul>
<p>This is why we decided to come up with a solution ourselves: <a href="https://zappr.opensource.zalan.do">Zappr</a>. It’s <a href="https://github.com/zalando/zappr">open
source</a> and you can use it for your own projects on GitHub.com out of the box (
<a href="http://zappr.readthedocs.io/en/latest/setup">here’s how</a>). Now, what does it actually do?</p>
<h3>Pull Request Approvals</h3>
<p>If you have used GitHub before, you’re probably familiar with its many different integrations. For example, <a href="https://travis-ci.com">Travis
CI</a> will test your code and <a href="https://coveralls.io">Coveralls</a> can calculate test coverage. GitHub
also has two really great features: <a href="https://github.com/blog/2051-protected-branches-and-required-status-checks">Protected branches and required status
checks</a> for pull requests. This means that
developers cannot commit directly to a protected branch, and integrations can send status updates that may prevent a
pull request from getting merged. Zappr leverages this feature to require developers to give a comment of approval
before merging a pull request.</p>
<h3>Automatic Branch Creation</h3>
<p>If you’re collaborating with other developers on a project, chances are that you’re using <a href="https://guides.github.com/introduction/flow">feature
branches</a> or an even more complicated branching model. Especially in
professional enterprises, but often in open source projects as well, developers create separate feature branches for
each issue or ticket they are working on. Zappr can automatically create a branch for each new issue, saving you from
doing this manually.</p>
<h3>Commit Message Patterns</h3>
<p>Let’s face it: Writing code together with many different people can get messy sometimes. Successful teams typically
follow a handful of conventions, like coding styles or rules for documentation. Commit messages are really important
too, as they are the true record of a project’s history. Properly formatted, you can even use commit messages to
generate changelogs automatically.</p>
<p>At Zalando, we also need to link the code we write to the original tickets in our issue tracker by adding ticket numbers
to our messages. With Zappr, you can add a status check to your pull request that ensures every commit message matches a
given pattern. This plays really well with our automatic branch creation feature or other tools, like
<a href="http://commitizen.github.io/cz-cli/">Commitizen</a>.</p>
<h3>Easy Setup</h3>
<p>Zappr provides what we consider sane defaults, but you can configure basically everything with a YAML file in your
repository, similar to other GitHub integrations like <a href="https://travis-ci.org">Travis</a>. Zappr works with both
<a href="https://zappr.opensource.zalan.do">GitHub.com</a> and your own GitHub Enterprise installation (you’ll need to set that up
yourself, obviously). It offers a minimal UI to enable features with the flick of a button, while the rest is done via
interactions on GitHub.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5b3ed00a42ec81e5e67b6c3e358a30140520e7d0_zappr-image.png?auto=compress,format"></p>
<p>We’ve been using Zappr at Zalando for a couple of months now and like it very much so far. However there is still work
to do: Some new features that we’re thinking of including are verifying that all commits in a pull request were signed,
and automated reminders about open pull requests.</p>
<p>We hope you’ll like Zappr as much as we do! Please try it out for yourself and let us know if you have any questions.
You can reach us via <a href="https://gitter.im/zalando/zappr">Gitter</a> or <a href="https://github.com/zalando/zappr/issues">GitHub
Issues</a>.</p>Five Tech Jobs That Didn’t Exist Five Years Ago2016-06-07T00:00:00+02:002016-06-07T00:00:00+02:00Zalando Technologytag:engineering.zalando.com,2016-06-07:/posts/2016/06/five-tech-jobs-that-didnt-exist-five-years-ago.html<p>What tech jobs are seen as the norm today that barely existed five years ago?</p><p>Technology is evolving at almost terminal velocity, with an ever-changing climate of trends currently hot around the
globe. When looking at the contemporary state of <a href="https://jobs.zalando.de/en/">job postings</a> and vacancies, one can only
imagine the plethora of positions that weren’t around all that long ago.</p>
<p>What jobs are seen as the norm today that barely existed five years ago? We take a look at the top five positions that
have yet to reach their teenage years in the modern technological age.</p>
<h3>Big Data Engineer</h3>
<p>The term Big Data was first coined in the 1990s by <a href="http://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-data-an-etymological-detective-story/">John
Mashey</a>, referring to
a large set of data that is almost impossible to manage using traditional business intelligence tools. A <a href="http://www.mckinsey.com/business-functions/business-technology/our-insights/big-data-the-next-frontier-for-innovation">2011 McKinsey
Global Institute
report</a>
revealed that nearly all sectors in the US economy had at least 200 terabytes of stored data per company, thus the need
for specialised engineers to solve Big Data problems was conceded.</p>
<p>Big Data Engineers develop, maintain, test, and evaluate big data solutions, on top of building large-scale data
processing systems. They’re proficient in Hadoop-based technologies such as MongoDB, MapReduce, and Cassandra, while
frequently working with NoSQL databases. Open source technologies are also popular amongst Big Data engineers, including
<a href="https://tech.zalando.de/blog/apache-showdown-flink-vs.-spark/">Apache Flink and Spark</a>, used for distributed stream and
batch data processing.</p>
<h3>UX Designer</h3>
<p>The importance of user experience (UX) has become such a priority that we now have dedicated designers committed to
getting it right. Another term <a href="http://www.uxmatters.com/mt/archives/2005/11/welcome-to-uxmatters.php">originating in the
‘90s</a>, UX is trending in a very big way, with
designers concerned about experiences created and shaped through technology, and how to bring them from sketch to
prototype.</p>
<blockquote>
<p>It's simple really <a href="https://t.co/924kvU0ALC">pic.twitter.com/924kvU0ALC</a></p>
<p>— Scott ☠ (@scott_riley) <a href="https://twitter.com/scott_riley/status/735453534876012544">May 25, 2016</a></p>
</blockquote>
<p>The field of UX, and thus the role of UX Designers, is still incredibly new and not completely set in stone in terms of
concrete definitions and tasks, however proficiency in <a href="http://www.adobe.com/products/photoshop.html">Adobe Photoshop</a>,
along with CSS and HTML knowledge, are usually prerequisites of the industry. Specialisations can vary between
design-focused roles to more technical positions, with both aspects adhering to a multidisciplinary approach in
designing digital products that keep the user at the center of the process.</p>
<h3>DevOps Manager</h3>
<p>DevOps: What even is it? Other than calling it the collaboration between software development and operations, it’s a
movement that’s still evolving and focuses very clearly on communication, integration, and the art of iterating more
often to deliver software faster. These two business units have traditionally worked separately, but once agile became a
household methodology for businesses, DevOps came along to ensure that deployment was part of the development process.</p>
<p>DevOps Managers were initially important shoes to fill for large public cloud service providers, ensuring frequent
deployments without breaking too many things. On top of improving deploy frequency, a DevOps approach can also shorten
lead time and provide a faster mean time to recovery. To achieve this, automation tools are essential:
<a href="https://www.chef.io/chef/">Chef</a> and <a href="https://puppet.com/">Puppet</a> are great for configuration management,
<a href="https://git-scm.com/">Git</a> is a popular choice for version control, and test systems such as
<a href="https://jenkins.io/index.html">Jenkins</a>, <a href="http://gradle.org/">Gradle</a>, and <a href="http://maven.apache.org/index.html">Maven</a>
round up the automation of common developer tasks such as creating executables and establishing documentation.</p>
<h3>Data Scientist</h3>
<p>Much like Big Data Engineers, Data Scientists are a new breed of indispensable employees for companies wanting to
extract knowledge and insights from data to remain competitive in a number of industries. Labelled as <a href="https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/">The Sexiest Job
of the 21st Century</a>, Data Scientists are
constantly up to their ears in data, acting as the world’s technical fortune tellers via analytics, visualisations,
pattern recognition, and machine learning, to name just a few of their competencies.</p>
<p>Data Scientists need to know the ropes when it comes to statistical programming languages and are often R or Python
fluent. A database querying language like SQL is also part of their arsenal. For Data Scientists working at data-driven
companies, machine learning methods such as clustering and decision tree learning will be crucial. Tools such as
<a href="https://www.tensorflow.org/">TensorFlow</a> and <a href="http://scikit-learn.org/stable/">scikit-learn</a> contain a variety of
machine learning algorithms.</p>
<h3>Golang Developer</h3>
<p>Google has cemented itself as a technological giant in recent years, so it's no wonder their in-house programming
language has become a frontrunner for preferred statically typed languages. <a href="https://golang.org/">Go</a>, or Golang as it’s
often referred to, is completely open source and was only released in November 2009, after successfully being
implemented in some of Google’s production systems.</p>
<p>While Golang Developers are also expected to be proficient in other languages, programming primarily in Go was
completely nonexistent five years ago. Considered a new language across the board, it’s basic syntax puts it in the C
family, with Pascal and Modula noted as having a significant influence in terms of declarations and packages.</p>Better streaming layouts for frontend microservices with Tailor2016-06-02T00:00:00+02:002016-06-02T00:00:00+02:00Andrey Kuzmintag:engineering.zalando.com,2016-06-02:/posts/2016/06/frontend-microservices-tailor.html<p>Learn about a straightforward approach to frontend microservices with Open Source and Tailor.</p><p>Microservices get a lot of traction these days. They allow multiple teams to work independently from each other, choose
their own technology stacks and establish their own release cycles.</p>
<p>Unfortunately, frontend development hasn’t fully capitalized yet on the benefits that microservices offer. The common
practice for building websites remains “the monolith”: a single frontend codebase that consumes multiple APIs.</p>
<p>What if we could have microservices on the frontend? This would allow frontend developers to work together with their
backend counterparts on the same feature and deploy parts of the website—“fragments” such as Header, Product, and
Footer—independently. Bringing microservices to the frontend requires a layout service that composes a website out of
fragments. It should also preserve the common requirements of most websites:</p>
<ul>
<li>Compose pre-rendered markup on the backend: This is important for SEO and fastens the initial render.</li>
<li>Ensure a fast Time to First Byte: Request fragments in parallel and stream them as soon as possible, without
blocking the rest of the page.</li>
<li>Enforce <a href="https://timkadlec.com/2014/11/performance-budget-metrics/">performance budget</a>: This is quite challenging,
because there is no single point where you can control performance.</li>
<li>Fault Tolerance: Render the meaningful output, even if a page fragment has failed or timed out.</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/db20bec7a14fa70cbc8f8cb11d4dadfa1088b8ab_router-layout-image.jpg?auto=compress,format"></p>
<p>The most straightforward approach would be to call each fragment, concatenate the results, and respond to the browser.
However, this wouldn’t work in practice: The entire page would be blocked until all the fragments responded. In response
to this dilemma, our team applied streams—allowing us to deliver a part of the page to the browser without having to
wait for all fragments. You can read more about the benefits of using streams from <a href="https://jakearchibald.com/2016/streams-ftw/">Jake Archibald’s
article</a>.</p>
<p>Our prior exposure to streams, particularly in NodeJS, came from using <a href="http://gulpjs.com/">GulpJS</a> and creating custom
plugins for it. Gulp is a fast-streaming build tool, because it allows to transform the files in a stream without
writing any intermediate results to the file system.</p>
<p>We implemented two prototypes: Scala with Akka HTTP, and Node.js with streams from the core library. The latter,
initially called “Streaming Layout Service,” performed faster, so we chose to keep it—renaming it Tailor and
<a href="https://github.com/zalando/tailor">open-sourcing it on GitHub</a>.</p>
<h3>How the layout service works</h3>
<p>In order to achieve a fast Time to First Byte, the service has to asynchronously fetch multiple fragments, assemble
their response streams, and output the final output stream.</p>
<p>The request first goes to a router, matches the public URL with a template path, and calls the layout service. The
layout service then:</p>
<ul>
<li>Fetches the template, based on the path</li>
<li>Parses the template for fragment placeholders</li>
<li>Asynchronously calls all fragments from the template</li>
<li>Assembles multiple fragments streams into a single output stream</li>
<li>Sets response headers based on the primary fragment and streams the output</li>
</ul>
<p>Tailor is partially inspired by <a href="https://www.facebook.com/notes/facebook-engineering/bigpipe-pipelining-web-pages-for-high-performance/389414033919/">Facebook’s
BigPipe</a>,
but BigPipe’s SEO limitations make it an impractical choice for an e-commerce platform. Even after reading multiple
posts about the cleverness of modern search engines and their ability to execute JavaScript, we can’t be sure if
rendering on the frontend affects the pagerank. Secondly, it’s not possible to retrospectively change the response code
of the page from JavaScript, to e.g. prevent an error from being indexed, because the headers are flushed before the
content. We developed Tailor to circumvent this limitation by having it identify and mark a primary fragment that
becomes responsible for the status code of the page.</p>
<p>An example template looks like this:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ec8b74d6f17df1c6df1ce36460c1408eba279bcf_example_template-tailor.jpg?auto=compress,format"></p>
<p>Tailor takes advantage of Node.js’s event-driven, non-blocking I/O capabilities. It is built on top of streams from
Node’s core library, and also implements some custom streams:</p>
<ol>
<li>A parser stream that internally uses <a href="https://github.com/isaacs/sax-js">sax parser</a> to group together static chunks
of HTML and parse the fragment placeholders;</li>
<li>A stringifier stream that assembles static chunks of HTML and the response streams of the fragments;</li>
<li>An async stream that is used to postpone the fragments that are marked “async”.</li>
</ol>
<p>Not all the fragments are blocking in nature. The key to ensure faster Time to First Byte is to prioritize the fragments
above the fold and flush them as soon as possible. It’s a good practice to mark the fragments below the fold as async.
This means that we output only a placeholder and defer the fragment output to the end of the page, where it comes with
inline JavaScript that moves the content into the placeholder. We achieve this behavior using Async Stream.</p>
<p>Besides the streaming on the backend, Tailor is also responsible for initializing fragments on the frontend. Each
fragment exposes its static bundles (CSS and JavaScript) through an HTTP Link header. A CSS from the fragment becomes a
tag that is output before the fragment to prevent the flash of un-styled content. For an async fragment we use
<a href="https://github.com/filamentgroup/loadCSS">loadCSS</a> to not block the page rendering. All JavaScript bundles use
<a href="http://requirejs.org/docs/whyamd.html#amd">AMD</a> and expose a special “init” function that is called with the fragment’s
DOM element.</p>
<h3>How to use Tailor</h3>
<p>Tailor is a library that provides a middleware which you can integrate into any Node.js server.</p>
<p>It is available from npm under the name <a href="https://www.npmjs.com/package/node-tailor">node-tailor</a>. The smallest setup
should include the “templates” directory with HTML templates and an “index.js” file with the following content:</p>
<div class="highlight"><pre><span></span><code><span class="k">const</span><span class="w"> </span><span class="n">http</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">require</span><span class="p">(</span><span class="s1">'http'</span><span class="p">);</span>
<span class="k">const</span><span class="w"> </span><span class="n">Tailor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">require</span><span class="p">(</span><span class="s1">'node-tailor'</span><span class="p">);</span>
<span class="k">const</span><span class="w"> </span><span class="n">tailor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">Tailor</span><span class="p">({</span><span class="o">/*</span><span class="w"> </span><span class="n">Options</span><span class="w"> </span><span class="o">*/</span><span class="p">});</span>
<span class="k">const</span><span class="w"> </span><span class="n">server</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">createServer</span><span class="p">(</span><span class="n">tailor</span><span class="o">.</span><span class="n">requestHandler</span><span class="p">);</span>
<span class="n">server</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="n">process</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">PORT</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="mi">8080</span><span class="p">);</span>
</code></pre></div>
<p>After the server has started, you can open the url with the name of the template file in the browser, e.g.
<a href="http://localhost:8080/example-template">“http://localhost:8080/example-template”</a>. You can also run <a href="https://github.com/zalando/tailor/tree/master/example">the existing
example</a> that includes the minimum amount of code for the sample
fragment.</p>
<p>Tailor is very flexible because it comes with a set of options that allows you to adjust it to your custom requirements.
For example, the “fetchTemplate” option allows to integrate a custom logic to fetch the template from an S3 bucket,
instead of the file system. Another option, “handleTag”, makes it possible to implement serialization of any HTML tag
from the “handledTags” list. This may be helpful when you want to set an HTML “lang” attribute based on the
Accept-Language header. The full <a href="https://github.com/zalando/tailor#options">list of options</a> is available in the
README.</p>
<h3>Future plans</h3>
<p>Tailor is still fresh, and we are actively working on it. We are currently experimenting with the way fragments are
initialized on the frontend. Instead of enclosing fragment content in</p>
<p>, we surround it with script tags that find themselves in the DOM and identify the start and end of the content. This
saves us from id collisions and allows to place fragments anywhere on the page, even in the</p>The nuts and bolts of the Docker-Selenium project2016-06-01T00:00:00+02:002016-06-01T00:00:00+02:00Lauri Appletag:engineering.zalando.com,2016-06-01:/posts/2016/06/docker-selenium-open-source.html<p>Our Open Source Evangelist sits down to chat with the creator of Docker-Selenium.</p><p>Many teams at Zalando have been using <a href="https://github.com/elgalu/docker-selenium">Docker-Selenium</a>: A project that aims
to provide Selenium inside a Docker container with Chrome and Firefox. Leo Gallucci, Docker-Selenium’s creator and an
engineer on our Test Infrastructure team, sat down with Zalando Open Source Evangelist Lauri Apple to take a closer look
at his two-year-old project.</p>
<p><em>Lauri Apple:</em> Why did you create Docker-Selenium?</p>
<p><em>Leo Gallucci:</em> I started the project while I was working at <a href="https://en.wikipedia.org/wiki/AppNexus">AppNexus</a>, where I
was in charge of building and maintaining the test automation suite of an <a href="https://angularjs.org/">AngularJS</a> project. I
was also doing DevOps tasks related to the test infrastructure.</p>
<p>The objective was to run the tests headless. Different solutions existed for that (e.g.
<a href="https://github.com/ariya/phantomjs">PhantomJS</a>), but we needed real browsers like Chrome or Firefox to run our tests
on. One reason was to get better test confidence, while the other was that
<a href="https://github.com/angular/protractor">Protractor</a> doesn't <a href="https://angular.github.io/protractor/#/browser-support">play
nice</a> with <a href="https://github.com/ariya/phantomjs">PhantomJS</a>.</p>
<p>With <a href="https://github.com/SeleniumHQ/selenium">Selenium</a>, you can always run your tests locally—but as soon as your tests
run, the browser pops up in your main display, which can be annoying. You could configure your windows manager to move
it automatically to another workspace, and similar, other solutions exist—but why bother going to all that trouble if
you can just <em>docker run selenium</em>?</p>
<p>You can also configure a <a href="http://elementalselenium.com/tips/38-headless">headless Xvfb selenium</a>, as it is a common use
case in Jenkins CI. But again: Why bother going to the trouble, now that <a href="https://github.com/docker/docker">Docker</a>
exists?</p>
<p><em>Lauri Apple:</em> How did you build Docker-Selenium, and how does it run tests?</p>
<p><em>Leo Gallucci:</em> It has tests that run seamlessly and
<a href="https://github.com/elgalu/docker-selenium/tree/master/test">locally</a> or in
<a href="https://travis-ci.org/elgalu/docker-selenium/builds/123103275">Travis</a>, plus deploy automation using <a href="https://docs.travis-ci.com/user/docker/">TravisCI Docker
infrastructure</a>.</p>
<p>The reason for pushing new releases (Docker images) from Travis instead of using Docker <a href="https://docs.docker.com/docker-hub/builds">automated
builds</a> is that a <a href="https://en.wikipedia.org/wiki/Continuous_integration">CI</a>
tool allows running arbitrary scripting like tests before pushing a broken image. See <a href="https://github.com/SeleniumHQ/docker-selenium/issues/208">issue
208</a> for an example.</p>
<p><em>Lauri Apple:</em> Have you actively promoted the project, or have people just learned of it in passing?</p>
<p><em>Leo Gallucci:</em> There was no promotion in the beginning. I suppose the success of it is due to the fact that it is an
obvious use case for any developer that needs <a href="https://github.com/SeleniumHQ/selenium">Selenium</a> and knows how handy
<a href="https://github.com/docker/docker">Docker</a> technology can be when it comes to working with disposable infrastructure.</p>
<p>Automation testers probably search for two words, <em>selenium</em> and <em>docker</em>, then the top results are the official project
followed by this one. Sometimes it’s the other way around depending on Google’s mood.</p>
<p><em>Lauri Apple:</em> How did you get the teams at Zalando using it? What was your pitch?</p>
<p><em>Leo Gallucci:</em> When I started at Zalando in March 2015, we weren’t working with <a href="https://saucelabs.com/selenium/selenium-grid">Sauce Labs’ Selenium
Grid</a> yet. We only had a centralised <a href="https://github.com/SeleniumHQ/selenium/wiki/Grid2">Selenium
Grid</a> in our data center, which acted as a kind of
<a href="https://en.wikipedia.org/wiki/Single_point_of_failure">SPOF</a> (Single Point of Failure) for our <a href="https://tech.zalando.com/blog/radical-agility-with-autonomous-teams-and-microservices-in-the-cloud/">Zalando
teams</a>. You
couldn't see the tests running or have recorded video results, which are two features of docker-selenium. Sure, Sauce
Labs offers these same features and supports hundreds of browser combinations. But while it’s often the preferred option
for frontend testing infrastructure these days, Docker-Selenium has three distinct advantages:</p>
<ul>
<li><strong>Cost:</strong> Free</li>
<li><strong>Speed:</strong> Runs around 2x faster than a paid, cloud-based Selenium solution</li>
<li><strong>Security:</strong> No need to tunnel your local app to a third party cloud solution.</li>
</ul>
<p>Docker-Selenium is used by a multitude of companies, including Nvidia and Algolia. The popularity of the project has
also grown, where we saw a spike in use after one of our contributors, <a href="https://github.com/rubytester">@rubytester</a>,
conducted a <a href="https://twitter.com/rubytester/status/644965076072574976">presentation</a> back in September 2015. Most users
come from Google search results, as shown in the <a href="https://github.com/elgalu/docker-selenium/graphs/traffic">traffic
stats</a>.</p>
<p>When <a href="https://github.com/mtscout6">Matthew Smith</a> (aka <a href="https://twitter.com/mtscout6">@mtscout6</a>) jumped into the
project four months after its creation, great things happened. He made a few interesting
<a href="https://github.com/elgalu/docker-selenium/commits?author=mtscout6">improvements</a>, but moreover, we started
conversations about moving Docker-Selenium to the official <a href="https://github.com/SeleniumHQ">SeleniumHQ</a> organisation.
Matt pushed this and contacted the Mozilla team. Later on, I decided to continue maintaining my own hosted project with
differentiated <a href="https://github.com/elgalu/docker-selenium#official">features</a>, which is why two projects for the same
purpose exist today.</p>
<p><em>Lauri Apple:</em> What is the future of this project?</p>
<p><em>Leo Gallucci:</em> People tend to build long-running Selenium grids by using the stock Docker Selenium images or my own
separately maintained version. However, when you see the Sauce Labs or
<a href="https://www.browserstack.com/automate">BrowserStack</a> approach, you realise that the way to go is isolation, i.e. one
machine, VM or Docker container for each Selenium session. This gives us an idea for the <a href="https://github.com/elgalu/docker-selenium/issues/65#issuecomment-212462604">next
steps</a> for the project. We might want to add
a <a href="https://github.com/elgalu/docker-selenium/issues/80">Jenkins plugin</a>. We also want to <a href="https://github.com/elgalu/docker-selenium/issues/81">automate new version
detection</a> — e.g., a new version of Chrome/Firefox/Selenium.</p>
<p>As already mentioned, <a href="https://github.com/elgalu/docker-selenium">Docker-Selenium</a> is an open source project, so we’re
always looking for contributors. The more users, the merrier!</p>
<p>To hear more about Docker-Selenium, Leo and <a href="https://www.linkedin.com/in/diemol">Diego Molina</a> will present the project
at the next <a href="http://www.meetup.com/Berlin-Selenium-Meetup/events/231184482/">Berlin Selenium Meetup</a>, hosted by Zalando
Tech on July 20. Register at the <a href="http://www.meetup.com/Berlin-Selenium-Meetup/events/231184482/">Selenium Meetup page</a>
today.</p>Scalable Fraud Detection for Zalando's Fashion Platform2016-05-31T00:00:00+02:002016-05-31T00:00:00+02:00Dr Patrick Baiertag:engineering.zalando.com,2016-05-31:/posts/2016/05/scalable-fraud-detection-fashion-platform.html<p>Team Payana have been busy: Read about their migration efforts for the Zalando platform.</p><h6><em>Longread: 15 minutes/3,282 words</em></h6>
<p>Zalando’s vision of growing from an online fashion retailer to a fashion platform not only opens up internal Zalando
services to external partners, but also dramatically increases the amount of data that flows through the company's
backend.</p>
<p>At Zalando, it’s up to individual teams to ensure that the services they own are ready for this challenge.
This poses great responsibility, but also allows each team to evaluate which technology stack they should use to tackle the challenge.</p>
<p>As part of the Data Engineering department, our team provides a service that estimates the fraud risk for incoming
orders. In a nutshell, we estimate the risk of a customer's order by using machine learning models that were learned
with historical order data.</p>
<p>Moving to the platform world also challenged our team in different ways. On the one hand, we need to be able to make
predictions on an unforeseen amount of orders in real-time. On the other hand, we need to be able to update existing
machine learning models in short intervals.</p>
<p>Coming from a data science background, our first solution to these challenges was implemented with a Python system that
uses scikit-learn for the machine learning part. However, we soon discovered that this solution does not scale as
required with the new platform vision. Hence, we decided to migrate the existing system to a new solution which is based
on Spark and Scala.</p>
<p>This post is about the journey of this migration. We will briefly sketch out our old solution, outline the pain points,
and show how they were relieved by Spark and Scala. We will also share the lessons we have learned on our journey and
discuss some ongoing problems.</p>
<h3>The two use cases and the old framework</h3>
<p>Our old implementation solves a twofold challenge: On the one hand there are a number of models that need to be trained
on historical data; on the other hand these models are used in production to deliver real-time predictions. These two
use cases are closely related but each comes with its specifics; see the figure below for an illustration.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1befeeb60af4c1ced452c642ea91f64504089370_dataflows.png?auto=compress,format"></p>
<p>In the live system (the “prediction data flow”), order data comes in the form of a JSON request. The request’s data
fields are read into a data access object, which is used to compute all features and yield a data point. This initial
data point may have missing values since the data may be corrupt or incomplete. Thus, an imputer is used to fill in
meaningful default values for them. The complete data point is now used as input to the final model, which delivers a
fraud probability, `P_f`. This value is then used by the consumers of the service to decide how to proceed with the
order. Along this path, pre-trained models are required for each step: Some of the features, a few pre-processors, the
imputer, the final prediction model.</p>
<p>Refer to the figure below for an example:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b7ebe7480285e784a0377fbae6e3297204e9497a_a.png?auto=compress,format"></p>
<p>The learning procedure for these models is similar (the “learning data flow”); it involves acquisition of the data,
filtering the relevant records (e.g. particular periods, countries), putting them in the shape of requests, extracting
the valid data points, and finally, the training.</p>
<p>Our previous solution addressed these challenges in terms of Python’s CherryPy for serving requests in the prediction
data flow, and with scikit-learn for the machine-learning part, which is running on the company’s in-house cluster for
crunching the numbers. We use JSON configuration files to specify the features of a model, its submodels and
preprocessors, the data filtering and the pre-processing steps. We’ve built an infrastructure to read these
specifications, fetch the relevant data from the repositories, and schedule the learning jobs on the cluster,
parallelising where possible (e.g. independent submodels). For deployment, the trained models are saved in a binary
format and copied to a write-once-read-many (WORM) drive to ensure their integrity. To do the real-time predictions, a
correspondingly configured RESTful service is deployed in a data center. Over time, this evolved to hundreds and
hundreds of model files and nearly one hundred deploy units.</p>
<h3>Pain points</h3>
<p>While our Python-based solution has been doing a decent job for the last few years, the situation was gradually changing
with the increasing amount of data and overall complexity that was growing into the system. Having the new platform
vision in the back of our minds, several pain points became obvious and cast doubt on the system’s future-proofness:</p>
<ul>
<li>While Python is certainly a very good choice to rapidly prototype products, it also has some major drawbacks.
Python’s interpreted execution and dynamic type system goes along with complicated maintenance of the production
systems; we often found ourselves facing production issues that could have been prevented in the first place.
Additionally, the Python interpreter does not support pure
<a href="https://wiki.python.org/moin/GlobalInterpreterLock">multithreading</a>. This regularly prevented our fraud prediction
system from scaling up to higher loads.</li>
<li>For the learning of our models, we pushed the size of the input data to the memory limits of the in-house cluster.
Since the platform vision requires us to scale to almost any size of data, we need a more flexible solution here.</li>
<li>The complexity of our system has been growing with every new model and feature added to the system. At some point,
the configuration of the models became unmanageable. At the same time, our codebase was already too complex to start
a reliable refactoring effort to allow us to fix the configuration hell. One reason for this is that the plain JSON
config files don’t allow you to easily reuse or nest common pieces.</li>
<li>We ran at the limits of the in-house infrastructure. Our jobs compete with other critical ones for computational
resources on the static cluster to learn models, and for access to a busy analytics database to acquire training
data. Both resources turned to bottlenecks, slowing down our availability to quickly learn new models.</li>
</ul>
<h3>Introducing the new system</h3>
<p>Faced with the aforementioned challenges, we sat down and discussed alternatives to provide a platform-ready
fraud-assessment system. This was around the same time when the word about the Big Data processing framework <a href="https://spark.apache.org/">Apache
Spark</a> reached us and the whole Data Science community. In a nutshell, Spark promises to
distribute your data processing task seamlessly to a set of worker machines, which then work on their own fraction of
data in parallel. This implies that you can seamlessly scale up your processing power and memory. Moreover, Spark comes
with APIs for the most important machine learning algorithms, while it can connect to almost any major data source. It
is also written in Scala, a JVM-based language, which provides the type-safety we were missing before. It was quite
obvious: Spark promised we could get rid of most of our pain points.</p>
<h3>The learn framework</h3>
<p>When beginning the implementation of our new system with Spark in a more scalable fashion, we not only wanted to
reproduce our old system, but also improve on some aspects. Hence, the redesign of the learn framework centers around
three basic principles: Learning and data acquisition should be fundamentally separated; a model is a tree of models;
and configuration is code.</p>
<h3>Data and learning</h3>
<p>Querying the data sources for each learning session slows down the overall process of producing the models. And the more
often we need to do this, as the fashion platform context imposes on the system, the more we get pushed back by it.
Thus, the first benefit of separating data collection from the process of learning is that it removes the bottleneck
associated with the major data source we use – a busy analytics database.</p>
<p>Technically, the data is loaded from the sources (databases, logs, etc.), put in a common format, and stored on Amazon’s
S3 where it is available for learning on AWS. The other effect of this decoupling is that including a new source is
matter of enhancing the data collection, which stays oblique to the learning part and allows us to include or substitute
arbitrary sources without touching the learning part unnecessarily.</p>
<h3>Trees of models</h3>
<p>A central design decision is how features and models relate and interact. As shown in the figure below, we view a
<em>model</em> as being a <em>common feature</em>, and at the same time, a <em>model</em> can have many <em>features</em>. A model is comprised of a
<em>predictor</em> (e.g. linear regression), a bunch of <em>preprocessors</em>, and the aforementioned collection of <em>features</em>. This
gives rise to a hierarchical structure where every model itself can be considered input of a higher order model. In this
way, along with trivial features, which are a simple function of the fields of a record, we can just as easily specify
features, which are an aggregation of many records, and thus need to be “trained”.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/11d3b42914d48c9fcae20d0c397b57e2d6be5966_tree1-1-1.png?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5877883f1e47a0de9c87e13df00c81512b854b7e_tree2-1.png?auto=compress,format"></p>
<p>This provides for a very powerful, conceptually simple tool to compose models. Both prediction and learning are aware of
this structure and can take care of doing the right thing in the right order with respect to the implicit dependency
graph. For instance, the model in the figure above would only be asked to predict the probability once its features are
complete, i.e. the submodels have delivered the prediction; for learning, the model would only be trained once the
submodels had been trained and could be used to provide the needed features.</p>
<h3>Configuration is a class constructor</h3>
<p>Configuration is as important for the result of any computer program as code. In fact, configuration can have bugs,
needs to be documented, reviewed, tested. It should be given as much care as code. What if configuration *was* code?</p>
<p>In the designing phase of the new framework, looking at the configuration files for our models we figured out that, in
terms of OOP, they resemble the parameters of nested constructors. As an obvious thing, we would parse the JSONs and
initiate the construction of the models. However, one can just as well bypass this error-prone step and specify the
parameters of the constructors in the code itself!</p>
<p>Having the configuration compiled and type-checked at compile time with Scala’s sophisticated type system brings it to
another level of protection against trivial and hard to find syntactic and semantic bugs. It allows the model design to
fail early when non-critical and forbids big abuse of mechanisms – all without writing a single test, just by means of
proper typization.</p>
<p>Furthermore, models and features defined as classes can be reused across different configuration “files” in a consistent
manner, which brings about conciseness, too.</p>
<p>In the same line of thought, having implemented a minimal domain specific language, we can specify which jobs to be
thrown at the cluster, as in <em>Learn(model, …) andThen Predict(model, …)</em>:</p>
<div class="highlight"><pre><span></span><code><span class="n">package</span><span class="w"> </span><span class="n">de</span><span class="p">.</span><span class="n">zalando</span><span class="p">.</span><span class="n">payana</span><span class="p">.</span><span class="n">lf</span><span class="p">.</span><span class="n">model</span><span class="p">.</span><span class="n">appDe</span>
<span class="k">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">DynamicFeat1</span><span class="p">()</span><span class="w"> </span><span class="n">extends</span><span class="w"> </span><span class="n">BasicScorer</span><span class="p">(</span>
<span class="w"> </span><span class="ss">"SecretScore1"</span><span class="p">,</span><span class="w"> </span><span class="n">SecretScorer1</span><span class="p">(),</span><span class="w"> </span><span class="n">Seq</span><span class="p">(</span><span class="ss">"1970-01"</span><span class="p">))</span>
<span class="k">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">DynamicFeat2</span><span class="p">()</span><span class="w"> </span><span class="n">extends</span><span class="w"> </span><span class="n">BasicScorer</span><span class="p">(</span>
<span class="w"> </span><span class="ss">"SecretScore2"</span><span class="p">,</span><span class="w"> </span><span class="n">SecretScorer2</span><span class="p">(),</span><span class="w"> </span><span class="n">Seq</span><span class="p">(</span><span class="ss">"1970-02"</span><span class="p">,</span><span class="w"> </span><span class="ss">"1970-03"</span><span class="p">))</span>
<span class="k">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">DynamicFeat3</span><span class="p">()</span><span class="w"> </span><span class="n">extends</span><span class="w"> </span><span class="n">BasicScorer</span><span class="p">(</span>
<span class="w"> </span><span class="ss">"SecretScore3"</span><span class="p">,</span><span class="w"> </span><span class="n">SecretScorer3</span><span class="p">(),</span><span class="w"> </span><span class="n">Seq</span><span class="p">(</span><span class="ss">"1970-04"</span><span class="p">))</span>
<span class="k">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">DynamicFeat4</span><span class="p">()</span><span class="w"> </span><span class="n">extends</span><span class="w"> </span><span class="n">BasicScorer</span><span class="p">(</span>
<span class="w"> </span><span class="ss">"SecretScore4"</span><span class="p">,</span><span class="w"> </span><span class="n">SecretScorer4</span><span class="p">(),</span><span class="w"> </span><span class="n">Seq</span><span class="p">(</span><span class="ss">"1970-05"</span><span class="p">,</span><span class="w"> </span><span class="ss">"1970-06"</span><span class="p">))</span>
<span class="k">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">Forest_1970_09</span><span class="p">()</span><span class="w"> </span><span class="n">extends</span><span class="w"> </span><span class="n">Model</span><span class="o">[</span><span class="n">OrderDao</span><span class="o">]</span><span class="p">(</span>
<span class="w"> </span><span class="n">randomForest</span><span class="p">(</span><span class="ss">"model_1970_09"</span><span class="p">),</span>
<span class="w"> </span><span class="n">BasicFeatures</span><span class="w"> </span><span class="o">++</span><span class="w"> </span><span class="n">CustomerHistoryFeatures</span><span class="w"> </span><span class="o">++</span>
<span class="w"> </span><span class="n">Seq</span><span class="p">(</span><span class="n">DynamicFeat1</span><span class="p">(),</span><span class="w"> </span><span class="n">DynamicFeat2</span><span class="p">(),</span><span class="w"> </span><span class="n">DynamicFeat3</span><span class="p">(),</span><span class="w"> </span><span class="n">DynamicFeat4</span><span class="p">()),</span>
<span class="w"> </span><span class="n">Seq</span><span class="p">(</span><span class="ss">"1970-09"</span><span class="p">,</span><span class="w"> </span><span class="ss">"1970-10"</span><span class="p">,</span><span class="w"> </span><span class="ss">"1970-11"</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">class</span><span class="w"> </span><span class="n">Job_2016_02_02</span><span class="w"> </span><span class="n">extends</span><span class="w"> </span><span class="n">Job</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">override</span><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="k">execute</span><span class="p">(</span><span class="n">implicit</span><span class="w"> </span><span class="nl">ctx</span><span class="p">:</span><span class="w"> </span><span class="n">Context</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">allModels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Seq</span><span class="p">(</span><span class="n">Forest_1970_09</span><span class="p">(),</span><span class="w"> </span><span class="n">Ridge_1970_09</span><span class="p">())</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">model</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">allModels</span><span class="p">)</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">Learn</span><span class="p">(</span><span class="n">model</span><span class="p">,</span><span class="w"> </span><span class="p">...)</span><span class="w"> </span><span class="n">andThen</span><span class="w"> </span><span class="n">Predict</span><span class="p">(</span><span class="n">model</span><span class="p">,</span><span class="w"> </span><span class="p">...)</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="err">}</span>
<span class="err">}</span>
</code></pre></div>
<p>In general, having job specification and configuration in the same codebase, tied to the strictly typed framework, makes
experiments better documented, reproducible, and secure.</p>
<h3>Solution for prediction</h3>
<p>One important aspect for our setup is that we do not want to have any technology break between preprocessing, learning,
and prediction in order to reuse definitions and avoid additional efforts for keeping codebases in sync. Previously,
this meant that we used a CherryPy server to provide a RESTful interface for prediction. With our models learned in
Scala and Spark, we’ve implemented a Scala-based web server solution in the new system. Although we did not have much
prior expertise in JVM-based web frameworks, we found it rather easy to setup a web server in Scala by using the <a href="https://www.playframework.com/">Play
framework</a>. Only a few lines of code were needed to set up a real multithreading
solution that outperforms the old solution significantly, as we will see in the following.</p>
<h3>Comparison</h3>
<p>To see if the effort of migration from our previous solution to Scala and Spark really pays off, we compared both
systems throughout a series of different experiments.</p>
<h3>Learning time</h3>
<p>For this comparison we learn a classification model with each system. The Python model is learned on our in-house
cluster, processing all data for each submodel on a single machine with 10 cores, scheduling in parallel (on separate
machines) the independent submodels. In contrast, the Spark model is learned in a data-distributed setting with one
driver and five worker nodes. When comparing the overall learning times, we observe a clear reduction with the Spark
setup, i.e. the overall learning time drops by a factor of two.</p>
<h3>Prediction time</h3>
<p>As already mentioned, one crucial aspect of our system is the ability to perform timely and scalable fraud predictions
on new orders. For this purpose, we compare the performance of our old CherryPy-based prediction engine with our new
Play-based prediction framework. We set up a load test that sends 5,000 prediction requests to the respective prediction
engine using different concurrency levels for the requests. The following figure shows the response time for both
prediction engines.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7ee3498b5edc182863883ef6e973072ece5831bf_benchmark.jpg?auto=compress,format"></p>
<p>We can see that the Python solution has a much higher response time when the number of concurrent requests is higher
than one, while the response time of the Scala engine stays almost constant (even when facing higher concurrency
levels). For instance, with 20 concurrent requests it needs on average only <strong>~70 ms compared to ~1000 ms</strong> for the
Python module.</p>
<h3>Accuracy</h3>
<p>Great time figures for prediction and learning wouldn’t be worth anything if our new models don’t predict, as well as
the old ones did in the first place. There are standard implementations in both Spark and scikit-learn for many linear
and nonlinear models that we consider. Feeding them with the same data results in comparable results for the linear
models. Somewhat surprisingly, some Spark models lose to scikit-learn by an amount we don’t tolerate. However, we were
able to find comparable substitutes for each model we use. An insightful benchmark for classification algorithm
implementations is provided by <a href="https://github.com/szilard/benchm-ml">this brilliant post</a>. (The interested reader
should follow the link from there to the <a href="http://datascience.la/benchmarking-random-forest-implementations/#comment-53599">discussion with one of Spark’s
designers</a>). Eventually, different
implementation paradigms unavoidably bring about differences in performance.</p>
<h3>Optimising with sparse features</h3>
<p>When modelling yields features, which can potentially take thousands of different values, sometimes only a tiny fraction
of them are present in the data. This is the ideal case to use a sparse representation: Instead of having a
50k-dimensional vector, one would rather encode the non-zero positions in the vector (or use a similar lossless
compression technique). While Spark has a sparse vector implementation, we encountered some odd behaviour when using it.
It turned out that under the hood the sparse optimizer blows sparse vectors up to dense vectors.</p>
<div class="highlight"><pre><span></span><code>val states = optimizer.iterations(
new CachedDiffFunction(costFun),
initialCoefficientsWithIntercept.toBreeze.toDenseVector
)
</code></pre></div>
<p>In the presence of high-dimensional sparse features, the resulting Hessian matrix ends up being very sparse. Since we
encountered different oddities when using sparse features, we suspect that the optimization is numerically unstable
because of that. Without further trying to fix the internals of the optimization, we were able to use our flexibly
designed framework to solve the problem: We implemented a sparse-to-dense condenser in terms of a preprocessor to our
models.</p>
<p>The condenser looks at the whole dataset and for sparse features, it keeps track of the values that are set. It can then
throw away all non-set dimensions and shrink the final vectors by a great deal. This resulted not only in better
predictions (more than <strong>25% improvement</strong>) but also more than <strong>halved the runtime</strong>.</p>
<p>One of the great aspects of Scala and Spark is that they allow you to write rather complex logic in a very concise way.
Hence, we listed the complete implementation below:</p>
<div class="highlight"><pre><span></span><code><span class="k">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">Condenser</span><span class="p">()</span><span class="w"> </span><span class="n">extends</span><span class="w"> </span><span class="n">LearnablePreprocessor</span><span class="o">[</span><span class="n">Map[Int, Int</span><span class="o">]</span><span class="err">]</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">override</span><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">modelGenerator</span><span class="p">(</span><span class="k">data</span><span class="err">:</span><span class="w"> </span><span class="n">RDD</span><span class="o">[</span><span class="n">MllibLabeledPoint</span><span class="o">]</span><span class="p">)(</span><span class="n">implicit</span><span class="w"> </span><span class="nl">ctx</span><span class="p">:</span><span class="w"> </span><span class="n">Context</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="k">Map</span><span class="o">[</span><span class="n">Int, Int</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">seqOp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="k">set</span><span class="err">:</span><span class="w"> </span><span class="k">Set</span><span class="o">[</span><span class="n">Int</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="nl">lp</span><span class="p">:</span><span class="w"> </span><span class="n">MllibLabeledPoint</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">lp</span><span class="p">.</span><span class="n">features</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nl">s</span><span class="p">:</span><span class="w"> </span><span class="n">SparseVector</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">set</span><span class="w"> </span><span class="o">++</span><span class="w"> </span><span class="n">s</span><span class="p">.</span><span class="n">indices</span><span class="p">.</span><span class="n">toSet</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nl">d</span><span class="p">:</span><span class="w"> </span><span class="n">DenseVector</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">RuntimeException</span><span class="p">(</span><span class="ss">"Condenser does not work with DenseVectors"</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">combOp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nl">set1</span><span class="p">:</span><span class="w"> </span><span class="k">Set</span><span class="o">[</span><span class="n">Int</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="nl">set2</span><span class="p">:</span><span class="w"> </span><span class="k">Set</span><span class="o">[</span><span class="n">Int</span><span class="o">]</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">set1</span><span class="w"> </span><span class="o">++</span><span class="w"> </span><span class="n">set2</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">nonZeroIndicesGlobal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">data</span><span class="p">.</span><span class="n">treeAggregate</span><span class="p">(</span><span class="k">Set</span><span class="o">[</span><span class="n">Int</span><span class="o">]</span><span class="p">())(</span><span class="n">seqOp</span><span class="p">,</span><span class="w"> </span><span class="n">combOp</span><span class="p">)</span>
<span class="w"> </span><span class="n">nonZeroIndicesGlobal</span><span class="p">.</span><span class="n">toSeq</span><span class="p">.</span><span class="n">sorted</span><span class="p">.</span><span class="n">zipWithIndex</span><span class="p">.</span><span class="n">toMap</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="n">override</span><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">preprocess</span><span class="p">(</span><span class="nl">v</span><span class="p">:</span><span class="w"> </span><span class="n">Vector</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Vector</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nl">s</span><span class="p">:</span><span class="w"> </span><span class="n">SparseVector</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">condensedIndexMap</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getModel</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">overlap</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">s</span><span class="p">.</span><span class="n">indices</span><span class="p">.</span><span class="k">filter</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">condensedIndexMap</span><span class="p">.</span><span class="k">contains</span><span class="p">(</span><span class="n">i</span><span class="p">))</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">newKeyMap</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">overlap</span><span class="p">.</span><span class="k">map</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">(</span><span class="n">condensedIndexMap</span><span class="p">(</span><span class="n">i</span><span class="p">),</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">i</span><span class="p">)))</span>
<span class="w"> </span><span class="n">Vectors</span><span class="p">.</span><span class="n">sparse</span><span class="p">(</span><span class="n">condensedIndexMap</span><span class="p">.</span><span class="k">size</span><span class="p">,</span><span class="w"> </span><span class="n">newKeyMap</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nl">d</span><span class="p">:</span><span class="w"> </span><span class="n">DenseVector</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">RuntimeException</span><span class="p">(</span><span class="ss">"Condenser does not work with DenseVectors"</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span>
<span class="err">}</span>
</code></pre></div>
<h3>Lessons learned</h3>
<p>With the migration to the new solution, we get rid of most pain points and also gain a lot in terms of performance, as
shown before. However, we want to also share some lessons that we learned during our migration to the new framework.</p>
<p>First of all, the transition was not so easy in terms of technology. Coming from a data science background, most of our
team is proficient in R or Python, but not in JVM-based languages. Hence, the learning curve for Scala was rather steep
in the beginning. The use of Spark also introduced some additional hurdles we had to tackle. Since Spark ships the user
code to a set of worker nodes, we had a hard time debugging and spent long sessions on making code serialisable to allow
Spark to ship it to workers.</p>
<p>Another part that bugged us was the maturity level of the MLlib library, which we relied on for learning our models.
Apparently, the produced models are not always as accurate as the ones you get from a more mature library such as
scikit-learn. You really have to take this into account when learning your models and, ideally, compare the ROC curves
with a more mature library. We are positive that eventually this will vanish over time in the process of Spark being
adopted more widely and more people contributing to the project.</p>
<h3>Conclusion</h3>
<p>We are designing solutions for fraud detection and prevention, and face the need to make them scale adequately with the
future Zalando fashion platform capacity – both for real-time prediction and for training models.</p>
<p>We’ve gone an interesting way of redesigning our previous solution in a more scalable and resilient format, and in this
post we’ve shared the lessons that we learned on the way. We started at why our Python-based solution, which uses
CherryPy for serving requests, scikit-learn for machine learning models, and number-crunching on an in-house static
cluster, does not scale to the new challenges. We arrived at a new design in Scala, using the Play framework to serve
requests, Spark for machine learning algorithms, and for running on AWS. This evaluation shows that it scales
considerably better in terms of time and versatility. In addition, the strict types, compilation of code, and
configuration makes it more secure and more fun to hack.</p>
<h3>Credits</h3>
<p>Team Payana are Patrick Baier, Stanimir Dragiev, Henning-Ulrich Esser, Andrei Kaigorodov, Tammo Krüger, Philipp Marx, and
Oleksandr Volynets. The transition is a team effort with contributions from everyone. The driving force for the redesign
is to be attributed in the largest extent to Tammo. You can also check out our presentaton at Spark Summit Europe about
<a href="https://www.youtube.com/watch?v=lMkII1orMsk">our journey from Scikit learn to Spark</a>.</p>Zalando Tech are the new unicorns at Microservices Day2016-05-26T00:00:00+02:002016-05-26T00:00:00+02:00Valentine Gogichashvilitag:engineering.zalando.com,2016-05-26:/posts/2016/05/zalando-tech-microservices-day-2016.html<p>Check out how Zalando Tech made their mark at Microservices Day, London.</p><p>On May 10th, Zalando was among the four technology unicorns that presented at <a href="http://microservicesday.com/">Microservices
Day</a> London, along with RedHat, Uber, and AutoScout 24. Microservices Day is a one-day,
single-track, non-profit event that focuses on the business benefits of utilising microservices.</p>
<p>Since we’ve begun moving our Fashion Store from a <a href="https://tech.zalando.de/blog/from-monolith-to-microservices-video/">monolith to
microservices</a>, we wanted to share what Zalando has
learned from the past year of digging deeper into data integration in a microservices-based environment.</p>
<p>We gave a complete overview of <a href="https://tech.zalando.de/blog/data-integration-in-a-world-of-microservices/">Saiki</a>,
Zalando’s scalable, cloud-based data integration infrastructure, as well as its stream processing possibilities.</p>
<p>Want to know more? You can access our slide deck
<a href="http://www.slideshare.net/FabianWollert/data-integration-in-a-microservices-world">here</a>, or watch the entire
presentation below:</p>Can you hack it? Yes you can!2016-05-25T00:00:00+02:002016-05-25T00:00:00+02:00Zalando Technologytag:engineering.zalando.com,2016-05-25:/posts/2016/05/zalando-tech-codesprint.html<p>Learn more about why Zalando's developers contributed to our upcoming online hackathon.</p><p>Zalando Tech is hosting a 24-hour online hackathon on June 4, known as the <a href="https://www.hackerrank.com/zalando-codesprint">Zalando
CodeSprint</a>, in partnership with
<a href="https://www.hackerrank.com/">HackerRank</a>. We’ve put together eight system-graded challenges to solve, with participants
welcome from anywhere in the world.</p>
<p>In the lead up to this exciting challenge, we wanted to catch up with a couple of the developers who helped craft each
problem statement, and find out the motivations behind the event.</p>
<p><em>Zalando Tech:</em> Why did you decide to contribute to the creation of challenges for the CodeSprint?</p>
<p><em>Darik:</em> I’ve been participating in hackathons like this since university, where the online environment allows entrants
to be from all over the world. These competitions are designed for developers of all levels, with the opportunity to
look at the code of other entrants and learn from them, on top of having fun.</p>
<p>The time-limit factors helps train you for real-time stressful situations at your job, when you have to find the
solution to problems fast. All of these aspects helped me in the past, which is why I wanted to participate.
Competitions like this are also a good way to prepare for interviews at a technical level, on top of giving you a rank
that you can measure against other developers.</p>
<p><em>Torsten:</em> Challenges like this don’t often require the writing of much code, but instead make you think about the
different kinds of test cases that could be used to solve a problem. The whole point of the competition is the
importance of problem solving and critical thinking, as opposed to only writing code. Participants come from different
backgrounds and therefore bring this diversity to the problems they’re facing, giving different answers and solutions
compared to others.</p>
<p><em>Zalando Tech:</em> What steps are involved when creating CodeSprint problems?</p>
<p><em>Darik:</em> There are several approaches you could take: Maybe you have a problem that you have already solved, so you hide
the solution behind the problem statement and make it as tricky as possible. The second is: When you’re coding, a real
problem arises, thus it gives you inspiration for creating puzzles for the competition but by making them a little more
difficult.</p>
<p><em>Torsten:</em> For me, not only are you thinking of problems you’ve solved, but perhaps you want to make sure a certain
technique is used in a solution, thus you create the task this way. It’s more complicated to come up with “easy”
problems; it would be nice to say that there’s a problem around something as simple as sorting, for example.</p>
<p>We also think about the problems our team needs to solve. Complexity-wise they are quite challenging, so I think about
getting rid of certain constraints to create a ‘toy’ problem that could still benefit the competition but also my team
in tackling its own work.</p>
<p><em>Zalando Tech:</em> Do you have any tips or recommendations for participants?</p>
<p><em>Darik:</em> Don’t start coding from the first problem! There are different types of problems with assorted difficulty
levels, and they’re not in an order from easiest to hardest – I suggest that entrants read all the problem statements
before they begin coding. Have a look at the constraints as well – these can make the problems very difficult to solve.</p>
<p>I also suggest that if participants can’t figure out the exact solution, there might be an answer that uses brute force
to pass some of tests. Some points can still be gained, so it’s better than nothing!</p>
<p><em>Torsten:</em> I actually disagree, I would never read all of the problem statements at once! I would skip the problems you
couldn’t work on and move on to the next workable issue. However, I do agree that looking at the constraints first is
important, because it means that certain solutions won’t be possible to implement. On the other hand, very restrictive
constraints might allow for simple or exhaustive solution strategies.</p>
<p>Get your teeth into some of the challenges that keeps us on our toes everyday. Tackle them by using your programming
skills, creativity, and endurance. <a href="https://www.hackerrank.com/zalando-codesprint">Sign up to the CodeSprint today!</a></p>The Scala Travel Diary2016-05-24T00:00:00+02:002016-05-24T00:00:00+02:00Janine Schneidertag:engineering.zalando.com,2016-05-24:/posts/2016/05/the-scala-travel-diary.html<p>What's in Zalando’s travel itinerary for the Scala world? Find out about our travel plans here.</p><p>It’s no secret that Zalando Tech has had its hands full lately with its participation in several Scala conferences and
meetups. As a company who practices <a href="https://tech.zalando.de/blog/so-youve-heard-about-radical-agility...-video/">Radical
Agility</a>, our use of Scala has skyrocketed
and it’s now one of our most adopted programming languages amongst developers.</p>
<p>So, where have we been in the Scala world? Where are we headed to next? Find out more below.</p>
<h3>Scalar, Poland</h3>
<p>This year’s <a href="http://scalar-conf.com/">Scalar</a> conference took place in Warsaw on April 16th with 480 participants. The
team over at <a href="https://softwaremill.com">SoftwareMill</a> organised a lot of well-known speakers from Europe’s Scala
community, featuring our very own <a href="https://tech.zalando.com/blog/ra-profile-eric-torreborre/">Eric Torreborre</a>, Slava
Schmidt, and Marco Borst.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/58b9b97898d3cb9fc312f9bd573453bad00c46be_thumb_img_0725_1024.jpg?auto=compress,format"></p>
<p>Eric presented on: <a href="http://www.slideshare.net/etorreborre/the-eff-monad-one-monad-to-rule-them-all">“The Eff Monad, One Monad to Rule Them
All”</a>, while Slava and Marco teamed up
to speak about “Contract First, Session Types Later!”. With such a specialised conference, we were sure to find a lot of
interested audience members for both Zalando talks.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/85cc91a807932c8374ad796985b5903924d87669_thumb_img_0728_1024.jpg?auto=compress,format"></p>
<p>Poland’s Scala community was out in force at Scalar, which shows that not only is there a big community closeby, but
that they’re also incredibly experienced and willing to share their learnings.</p>
<h3>Scala User Group Meetup, Berlin</h3>
<p>Our Zalando Tech Innovation Lab was the place to be on April 20th for the most recent edition of the <a href="http://www.meetup.com/Scala-Berlin-Brandenburg/events/230152730/">Scala User Group
Meetup</a>. Eric once again presented his <a href="http://www.slideshare.net/etorreborre/the-eff-monad-one-monad-to-rule-them-all">Eff Monad
talk</a>, with approximately 80 Scala
enthusiasts in attendance.</p>
<p>Being a much more intimate affair, meetups are a great place to ask questions, with many flowing after Eric’s
presentation. We’re incredibly keen to host the meetup again and will publish more information on our <a href="https://twitter.com/ZalandoTech">Twitter
account</a> and <a href="http://www.meetup.com/Zalando-Tech-Events-Berlin/">meetup page</a> when
details are finalised.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6f715595c34ea8acd280c9bf75e1bb4a8b146062_image1-2.jpg?auto=compress,format"></p>
<h3>Community Unconference @ Scala Days, Berlin</h3>
<p>In June, Zalando will be hosting the <a href="http://www.meetup.com/Scala-Community-Meetup/events/229686189/">Community
Unconference</a>, which is being coordinated by several
European Scala User Groups at Scala Days. Taking place on 15th June, all hands will be on deck at Zalando Tech HQ. The
Unconference will be based on the Open Space Technology format which features four principles:</p>
<ul>
<li>Whoever comes are the best people</li>
<li>Whatever happens is the only thing that could have</li>
<li>Whenever it starts is the right time</li>
<li>When it’s over, it’s over</li>
</ul>
<p>This kind of framework aims to give everyone attending the opportunity to present whatever they think might be
interesting to the Scala community. If they don’t have a talk prepared in advance, it doesn’t matter: The Unconference
format welcomes on-the-spot ideas.</p>
<p>There’s still time and space to register your attendance. Check our <a href="http://www.meetup.com/Zalando-Tech-Events-Berlin/">meetup
page</a> for more information.</p>
<h3>Scala Days 2016, Berlin</h3>
<p>One of the biggest Scala conferences on the calendar and our highlight of the year, the <a href="http://event.scaladays.org/scaladays-berlin-2016#00-Introduction">Scala Days
2016</a> event will run over two days from 15-17th of
June in the heart of Berlin.</p>
<p>With over 1,000 participants expected, this is THE place for Scala developers and contributors to meet, exchange ideas,
as well as share experiences and know-how. Being so close to home, Zalando Tech will have a big presence at the
conference with regards to attendees and our very own booth set up over the two-day event. We’re looking forward to
expanding our Scala horizons and learning about creating new applications with Scala and its related technologies.</p>
<p>Will we see you at some of these future Scala events? Let us know! Tweet us at
<a href="https://twitter.com/ZalandoTech">@ZalandoTech</a>.</p>Zalando Techspert Series: How to foster an innovative culture2016-05-24T00:00:00+02:002016-05-24T00:00:00+02:00Zalando Technologytag:engineering.zalando.com,2016-05-24:/posts/2016/05/zalando-techspert-series-launch.html<p>A Zalando-led initiative to encourage discussions about soft tech with leading organisations.</p><p>The Zalando Techspert Series: A Zalando-led initiative to encourage and initiate discussions about ‘softer’ tech topics
such as, but not limited to, agility, tech culture, and Berlin as Europe’s most magnetic tech hub.</p>
<p>Why have we set this up? We want to spearhead the discussions that matter to tech companies and businesses of all sizes:
How to recruit top talent, how to inspire innovation, and how to be leaders in a competitive digital market.</p>
<p>Every month, we’ll be holding a panel that features high-level executives from some of the world’s most exciting
startups and organisations. Our inaugural event will see our VP of Engineering Eric Bowman joined by Sean Treadway (
<a href="https://soundcloud.com/">Soundcloud</a>), Valentin Stalf ( <a href="https://number26.eu/">Number26</a>), and Ulrich Schmitz ( <a href="http://www.axelspringerplugandplay.com">Axel
Springer</a>).</p>
<p>Eric, Sean, Valentin, and Ulrich will be addressing the following question: How do we foster an innovative culture? At
Zalando, our introduction of <a href="https://tech.zalando.de/blog/so-youve-heard-about-radical-agility...-video/">Radical
Agility</a> has highlighted the importance of
good tech culture, so we think there’s more to it than free food and a kicker table. It’s about time we see if other
influential companies feel the same.</p>
<p>Interested? Excited? We want you to be involved. Every series will take place at Zalando’s very own Innovation Lab, with
refreshments provided for the debate. To attend, you can register at our <a href="http://www.meetup.com/Zalando-Tech-Events-Berlin/events/231363151/">Meetup
Page</a>, where you can also find more information
about upcoming topics.</p>
<p><a href="http://www.meetup.com/Zalando-Tech-Events-Berlin/events/231363151/">Join us</a> for the conversations that matter around
tech culture and Berlin’s thriving tech scene.</p>Our polyglot approach: Getting started with Rust2016-05-20T00:00:00+02:002016-05-20T00:00:00+02:00Dan Persatag:engineering.zalando.com,2016-05-20:/posts/2016/05/getting-started-with-rust.html<p>Dan Persa shares his first dive into the multi-paradigm programming language Rust.</p><p>I recently started using <a href="https://www.rust-lang.org">Rust</a> – the programming language. My team had thought about the
idea of using a polyglot approach when building services – we think that we should always use the right tool for the
job. We also believe that we should build our services so that others can use them, thus, while prototyping, we’ve built
projects in many programming languages as part of migrating our <a href="https://tech.zalando.com/blog/from-jimmy-to-microservices-rebuilding-zalandos-fashion-store/">shop monolith to
microservices</a>.</p>
<p>I already hear you asking: What projects have you worked on? We have <a href="https://github.com/zalando/skipper/">Skipper</a>,
built with Go. My colleague Arpad wrote a <a href="https://tech.zalando.com/blog/building-our-own-open-source-http-routing-solution/">blog
post</a> about it. I also did some <a href="https://tech.zalando.com/blog/video-reactive-restful-apis-with-akka-http-and-slick/">tech
talks</a> about
<a href="https://github.com/zalando/innkeeper">Innkeeper</a>, a reactive RESTful API we wrote in
<a href="http://www.scala-lang.org/">Scala</a>, using the Akka HTTP and Slick frameworks. Some of my colleagues <a href="https://tech.zalando.com/blog/using-elm-to-create-a-fun-game-in-just-five-days/">played around with
Elm</a> and built a game in just five days
during <a href="https://tech.zalando.com/blog/hack-week-4-begins/">Zalando’s Hack Week</a>. Elm is a functional language similar to
<a href="https://www.haskell.org/">Haskell</a>, built on top of JavaScript. And we have a layout service in Node,
<a href="https://github.com/zalando/tailor">Tailor</a>.</p>
<p>It was just a matter of time before I started experimenting with Rust. I started slow, building a mock service for our
OAuth and including it in our CI for Innkeeper, as a Docker image. In this post I’ll talk about my experience on getting
started with Rust, with a second post to follow that explains how to include it in a Docker image.</p>
<h3>Why Rust?</h3>
<p>After listening to <a href="https://www.youtube.com/watch?v=agzf6ftEsLU">some talks</a> on Rust, it immediately got my interest.
The things I instantly liked about it are</p>
<ul>
<li>The fast compilation (I’m working with Scala right now and the compilation can sometimes take too much time)</li>
<li>Being memory safe and race-conditions safe by default, without the need of a garbage collector – With a little extra
effort from the developer, of course</li>
<li>Pattern matching: Once you get used to it, it’s hard to go back to languages without it</li>
<li>No need for a Virtual Machine – now that we have Docker, having the same code running on different machines isn’t as
important as it was a while ago.</li>
</ul>
<h3>First Steps</h3>
<p>The first service I built with Rust was a JSON API, called <a href="https://github.com/danpersa/rusty-oauth">rusty-oauth</a>. To
start a new project in Rust, you have to install <a href="https://crates.io/install">Cargo</a>, which is a build tool for Rust.
Cargo helps you to:</p>
<ul>
<li>Initialise new projects</li>
<li>Build, release, run, and test your projects</li>
<li>Declare external dependencies (called crates) for your project (a Rust Crate is like a Java jar or Ruby Gem)</li>
</ul>
<!-- -->
<div class="highlight"><pre><span></span><code>cargo new --bin rusty-oauth
</code></pre></div>
<p>The above command will create a new ‘hello world’ app for you. Use <em>cargo run</em> inside the directory to compile and run
the app.</p>
<h3>The Rusty OAuth Service</h3>
<p>I now want to go through the code of this project and explain some of the most important concepts of Rust while doing
so. I won’t cover all of Rust’s features, but I’ll cover enough to make those of you considering Rust a little curious.</p>
<p>Let’s dive into the existing code. First of all the main file:</p>
<div class="highlight"><pre><span></span><code><span class="n">extern</span><span class="w"> </span><span class="n">crate</span><span class="w"> </span><span class="n">rustc_serialize</span><span class="p">;</span>
<span class="err">#</span><span class="o">[</span><span class="n">macro_use</span><span class="o">]</span><span class="w"> </span><span class="n">extern</span><span class="w"> </span><span class="n">crate</span><span class="w"> </span><span class="nf">log</span><span class="p">;</span>
<span class="n">extern</span><span class="w"> </span><span class="n">crate</span><span class="w"> </span><span class="n">env_logger</span><span class="p">;</span>
<span class="err">#</span><span class="o">[</span><span class="n">macro_use</span><span class="o">]</span><span class="w"> </span><span class="n">extern</span><span class="w"> </span><span class="n">crate</span><span class="w"> </span><span class="n">nickel</span><span class="p">;</span>
<span class="k">mod</span><span class="w"> </span><span class="n">token_info</span><span class="p">;</span>
<span class="k">use</span><span class="w"> </span><span class="nl">nickel</span><span class="p">:</span><span class="err">:{</span><span class="n">Nickel</span><span class="p">,</span><span class="w"> </span><span class="n">MediaType</span><span class="p">,</span><span class="w"> </span><span class="n">HttpRouter</span><span class="p">,</span><span class="w"> </span><span class="n">QueryString</span><span class="err">}</span><span class="p">;</span>
<span class="k">use</span><span class="w"> </span><span class="nl">nickel</span><span class="p">:</span><span class="err">:</span><span class="nl">status</span><span class="p">:</span><span class="err">:</span><span class="nl">StatusCode</span><span class="p">:</span><span class="err">:</span><span class="n">BadRequest</span><span class="p">;</span>
<span class="k">use</span><span class="w"> </span><span class="nl">rustc_serialize</span><span class="p">:</span><span class="err">:</span><span class="n">json</span><span class="p">;</span>
<span class="k">use</span><span class="w"> </span><span class="nl">token_info</span><span class="p">:</span><span class="err">:</span><span class="n">TokenInfo</span><span class="p">;</span>
</code></pre></div>
<p>In order to be able to use crates from outside of your project, you need to use the <em>extern crate</em> construction. In our
case, we use the <em>rustc_serialize</em> crate, the <em>log</em> crate, the <em>env_logger</em> crate, and the <em>nickel</em> crate.</p>
<p>We then use the <em>mod</em> keyword to define a new module. A module is a collection of items: Functions, structs, traits,
impl blocks, and other modules.</p>
<p>We use the <em>use</em> keyword to import functions, structs, and traits from other modules which we’d like to use in our
current file. In this case, we import from the <em>nickel</em> crate:</p>
<div class="highlight"><pre><span></span><code><span class="nt">fn</span><span class="w"> </span><span class="nt">main</span><span class="o">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">env_logger</span><span class="p">:</span><span class="o">:</span><span class="nf">init</span><span class="p">()</span><span class="o">.</span><span class="nf">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="err">let</span><span class="w"> </span><span class="err">mut</span><span class="w"> </span><span class="err">server</span><span class="w"> </span><span class="err">=</span><span class="w"> </span><span class="n">Nickel</span><span class="p">:</span><span class="o">:</span><span class="nf">new</span><span class="p">();</span>
<span class="w"> </span><span class="err">info!("Welcome</span><span class="w"> </span><span class="err">to</span><span class="w"> </span><span class="err">rusty-oauth")</span><span class="p">;</span>
<span class="w"> </span><span class="err">server.get("/oauth2/tokeninfo",</span><span class="w"> </span><span class="err">middleware!</span><span class="w"> </span><span class="err">{</span><span class="w"> </span><span class="err">|req,</span><span class="w"> </span><span class="err">mut</span><span class="w"> </span><span class="err">res|</span>
<span class="w"> </span><span class="err">res.set(</span><span class="n">MediaType</span><span class="p">:</span><span class="o">:</span><span class="n">Json</span><span class="p">);</span>
<span class="w"> </span><span class="err">let</span><span class="w"> </span><span class="err">token</span><span class="w"> </span><span class="err">=</span><span class="w"> </span><span class="err">match</span><span class="w"> </span><span class="err">req.query().get("access_token")</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="err">Some(token)</span><span class="w"> </span><span class="err">=></span><span class="w"> </span><span class="err">token.to_string(),</span>
<span class="w"> </span><span class="err">None</span><span class="w"> </span><span class="err">=></span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="err">res.set(BadRequest)</span><span class="p">;</span>
<span class="w"> </span><span class="err">return</span><span class="w"> </span><span class="err">res.send(invalid_request(ACCESS_TOKEN_INVALID))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="err">}</span><span class="o">;</span>
<span class="w"> </span><span class="nt">debug</span><span class="o">!(</span><span class="s2">"Request token: {:?}"</span><span class="o">,</span><span class="w"> </span><span class="nt">token</span><span class="o">);</span>
<span class="w"> </span><span class="nt">let</span><span class="w"> </span><span class="nt">token_info</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nt">match</span><span class="w"> </span><span class="nt">TokenInfo</span><span class="p">::</span><span class="nd">from_query_param</span><span class="o">(&</span><span class="nt">token</span><span class="o">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="err">Ok(token_info)</span><span class="w"> </span><span class="err">=></span><span class="w"> </span><span class="err">token_info,</span>
<span class="w"> </span><span class="err">Err(err)</span><span class="w"> </span><span class="err">=></span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="err">res.set(BadRequest)</span><span class="p">;</span>
<span class="w"> </span><span class="err">return</span><span class="w"> </span><span class="err">res.send(invalid_request(err))</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="err">}</span><span class="o">;</span>
<span class="w"> </span><span class="nt">debug</span><span class="o">!(</span><span class="s2">"Token info: {:?}"</span><span class="o">,</span><span class="w"> </span><span class="nt">token_info</span><span class="o">);</span>
<span class="w"> </span><span class="nt">json</span><span class="p">::</span><span class="nd">encode</span><span class="o">(&</span><span class="nt">token_info</span><span class="o">)</span><span class="p">.</span><span class="nc">unwrap</span><span class="o">()</span>
<span class="w"> </span><span class="err">}</span><span class="o">);</span>
<span class="w"> </span><span class="nt">server</span><span class="p">.</span><span class="nc">listen</span><span class="o">(</span><span class="s2">"0.0.0.0:6767"</span><span class="o">);</span>
<span class="err">}</span>
</code></pre></div>
<p>It’s time to look at the main function. We start by initialising the logger:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="nt">env_logger</span><span class="p">::</span><span class="nd">init</span><span class="o">()</span><span class="p">.</span><span class="nc">unwrap</span><span class="o">();</span><span class="w"> </span><span class="o">.</span>
</code></pre></div>
<p>In Rust, most functions return a <em>Result</em>. A <em>Result</em> is a simple enum, with two possible values: <em>Ok</em> or <em>Err</em>.</p>
<div class="highlight"><pre><span></span><code><span class="k">enum</span><span class="w"> </span><span class="n">Result</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Ok</span><span class="p">(</span><span class="n">R</span><span class="p">),</span>
<span class="w"> </span><span class="n">Err</span><span class="p">(</span><span class="n">E</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>There are two ways of extracting the value from a <em>Result</em>. The first (and unsafe) way is using the unwrap function, as
you can see above with the <em>env_logger</em>. If there’s an error, the unwrap function “panics”, unwinding the stack for the
current thread (while calling destructors for each of the resources owned by the stack). As our program has only one
thread, it will exit with an error message.</p>
<p>The safe way to extract the value is by using pattern matching (and we have an example at line 19). What we are doing
here is treating both cases. In case of a success, we return the <em>token_info</em>. In the case of failure, we return a
<em>BadRequest</em> back to the user.</p>
<p>As we can see from the definition of the <em>Result</em> enum, Rust also supports generics:</p>
<div class="highlight"><pre><span></span><code><span class="x">fn invalid_request>(err: S) -> String {</span>
<span class="x"> format!("</span><span class="cp">{{</span><span class="err">\</span><span class="s2">"error\":\"invalid_request\",\"error_description\":\"{}\"}}"</span><span class="o">,</span> <span class="nv">err.into</span><span class="o">())</span>
<span class="o">}</span>
</code></pre></div>
<p>Next we look at how to define a function in Rust. By default, functions are private to the module. By using the <em>pub</em>
keyword, we’re able to make a function public.</p>
<p>By skipping the semicolon (;) at the end of the line, you’re able to tell the compiler that you have an expression
there. In our case, as we skipped the ; and the type of expression matches the return type of the function, we’re able
to also skip the <em>return</em> keyword from our function:</p>
<div class="highlight"><pre><span></span><code><span class="nx">pub</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">String</span><span class="p">;</span>
<span class="nx">pub</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="nx">Realm</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">String</span><span class="p">;</span>
<span class="nx">pub</span><span class="w"> </span><span class="k">type</span><span class="w"> </span><span class="nx">Uid</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">String</span><span class="p">;</span>
<span class="err">#</span><span class="p">[</span><span class="nx">derive</span><span class="p">(</span><span class="nx">Debug</span><span class="p">)]</span>
<span class="nx">pub</span><span class="w"> </span><span class="nx">struct</span><span class="w"> </span><span class="nx">TokenInfo</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">scopes</span><span class="p">:</span><span class="w"> </span><span class="nx">Vec</span><span class="p">,</span>
<span class="w"> </span><span class="nx">realm</span><span class="p">:</span><span class="w"> </span><span class="nx">Realm</span><span class="p">,</span>
<span class="w"> </span><span class="nx">uid</span><span class="p">:</span><span class="w"> </span><span class="nx">Option</span>
<span class="p">}</span>
</code></pre></div>
<p>Here, we can define some public type aliases. As I mentioned earlier, everything is private as long as you don’t use the
<em>pub</em> keyword, and I find this to be quite a good language design decision.</p>
<p>We are also defining a struct, the <em>TokenInfo</em>. As the uid is optional, we use the <em>Option</em> trait to express this:</p>
<div class="highlight"><pre><span></span><code><span class="kd">impl</span><span class="w"> </span><span class="nx">TokenInfo</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">fn</span><span class="w"> </span><span class="nx">new</span><span class="p">(</span><span class="nx">scopes</span><span class="p">:</span><span class="w"> </span><span class="nx">Vec</span><span class="p"><</span><span class="o">&</span><span class="nx">str</span><span class="p">>,</span><span class="w"> </span><span class="nx">uid</span><span class="p">:</span><span class="w"> </span><span class="nx">Option</span><span class="p">,</span><span class="w"> </span><span class="nx">realm</span><span class="p">:</span><span class="w"> </span><span class="o">&</span><span class="nx">str</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="nx">TokenInfo</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">scopes</span><span class="p">.</span><span class="nx">iter</span><span class="p">().</span><span class="nx">map</span><span class="p">(</span><span class="o">|</span><span class="nx">s</span><span class="o">|</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">to_string</span><span class="p">()).</span><span class="nx">collect</span><span class="p">();</span>
<span class="w"> </span><span class="nx">TokenInfo</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">scopes</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">realm</span><span class="p">:</span><span class="w"> </span><span class="nx">realm</span><span class="p">.</span><span class="nx">to_string</span><span class="p">(),</span><span class="w"> </span><span class="nx">uid</span><span class="p">:</span><span class="w"> </span><span class="nx">uid</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">pub</span><span class="w"> </span><span class="kd">fn</span><span class="w"> </span><span class="nx">from_query_param</span><span class="p">(</span><span class="nx">param</span><span class="p">:</span><span class="w"> </span><span class="o">&</span><span class="nx">str</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="nx">Result</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">parts</span><span class="p">:</span><span class="w"> </span><span class="nx">Vec</span><span class="p"><</span><span class="o">&</span><span class="nx">str</span><span class="p">></span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">param</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s">"-"</span><span class="p">).</span><span class="nx">collect</span><span class="p">();</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s">"token"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">Err</span><span class="p">(</span><span class="nx">format</span><span class="p">!(</span><span class="s">"{} {}"</span><span class="p">,</span><span class="w"> </span><span class="nx">TOKEN_START_ERR</span><span class="p">,</span><span class="w"> </span><span class="nx">TOKEN_FORMAT</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">token_info</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">warn</span><span class="p">!(</span><span class="s">"{}"</span><span class="p">,</span><span class="w"> </span><span class="nx">TOKEN_MISSING_UID</span><span class="p">);</span>
<span class="w"> </span><span class="nx">TokenInfo</span><span class="o">::</span><span class="nx">new</span><span class="p">(</span><span class="nx">vec</span><span class="p">![],</span><span class="w"> </span><span class="nx">None</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">)</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">warn</span><span class="p">!(</span><span class="s">"{}"</span><span class="p">,</span><span class="w"> </span><span class="nx">TOKEN_MISSING_REALM</span><span class="p">);</span>
<span class="w"> </span><span class="nx">TokenInfo</span><span class="o">::</span><span class="nx">new</span><span class="p">(</span><span class="nx">vec</span><span class="p">![],</span><span class="w"> </span><span class="nx">create_uid</span><span class="p">(</span><span class="nx">parts</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span><span class="w"> </span><span class="s">""</span><span class="p">)</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">=></span><span class="p">{</span>
<span class="w"> </span><span class="nx">warn</span><span class="p">!(</span><span class="s">"{}"</span><span class="p">,</span><span class="w"> </span><span class="nx">TOKEN_MISSING_SCOPES</span><span class="p">);</span>
<span class="w"> </span><span class="nx">TokenInfo</span><span class="o">::</span><span class="nx">new</span><span class="p">(</span><span class="nx">vec</span><span class="p">![],</span><span class="w"> </span><span class="nx">create_uid</span><span class="p">(</span><span class="nx">parts</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">v</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">clone</span><span class="p">().</span><span class="nx">split_off</span><span class="p">(</span><span class="mi">3</span><span class="p">);</span>
<span class="w"> </span><span class="nx">TokenInfo</span><span class="o">::</span><span class="nx">new</span><span class="p">(</span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="nx">create_uid</span><span class="p">(</span><span class="nx">parts</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="nx">Ok</span><span class="p">(</span><span class="nx">token_info</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Using the <em>impl</em> keyword, we implement two functions for the <em>TokenInfo</em> struct. After we define these functions, we’ll
be able to call them using: <em>TokenInfo::new(...)</em> and <em>TokenInfo::from_query_param(...)</em>. These operate like static
functions in Java. In order to define the methods, we have to provide the “self” as a first parameter (see the next
snippet). We’ll then be able to call the methods using an instance instead of the name of the struct:
<em>my_token_info.encode(...).</em></p>
<p>In this new function, we’re able to see how to use the <em>map</em> method to transform one collection type into another:</p>
<div class="highlight"><pre><span></span><code><span class="kd">impl</span><span class="w"> </span><span class="nx">Encodable</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">TokenInfo</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">fn</span><span class="w"> </span><span class="nx">encode</span><span class="p">(</span><span class="o">&</span><span class="kp">self</span><span class="p">,</span><span class="w"> </span><span class="nx">encoder</span><span class="p">:</span><span class="w"> </span><span class="o">&</span><span class="nx">mut</span><span class="w"> </span><span class="nx">S</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="nx">Result</span><span class="p"><(),</span><span class="w"> </span><span class="nx">S</span><span class="o">::</span><span class="nx">Error</span><span class="p">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">encoder</span><span class="p">.</span><span class="nx">emit_struct</span><span class="p">(</span><span class="s">"TokenInfo"</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="o">|</span><span class="nx">encoder</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">try</span><span class="p">!(</span><span class="nx">encoder</span><span class="p">.</span><span class="nx">emit_struct_field</span><span class="p">(</span><span class="w"> </span><span class="s">"scope"</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="o">|</span><span class="nx">encoder</span><span class="o">|</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">scopes</span><span class="p">.</span><span class="nx">encode</span><span class="p">(</span><span class="nx">encoder</span><span class="p">)));</span>
<span class="w"> </span><span class="nx">try</span><span class="p">!(</span><span class="nx">encoder</span><span class="p">.</span><span class="nx">emit_struct_field</span><span class="p">(</span><span class="w"> </span><span class="s">"realm"</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="o">|</span><span class="nx">encoder</span><span class="o">|</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">realm</span><span class="p">.</span><span class="nx">encode</span><span class="p">(</span><span class="nx">encoder</span><span class="p">)));</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">uid</span><span class="p">.</span><span class="nx">is_some</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">try</span><span class="p">!(</span><span class="nx">encoder</span><span class="p">.</span><span class="nx">emit_struct_field</span><span class="p">(</span><span class="w"> </span><span class="s">"uid"</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="o">|</span><span class="nx">encoder</span><span class="o">|</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">uid</span><span class="p">.</span><span class="nx">encode</span><span class="p">(</span><span class="nx">encoder</span><span class="p">)));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">Ok</span><span class="p">(())</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Here we are implementing the <em>Encodable</em> trait for our structure. The goal is to be able to transform our structure into
JSON. We can also see how to define a method, using the <em>&self</em> as a first parameter:</p>
<div class="highlight"><pre><span></span><code><span class="err">#</span><span class="o">[</span><span class="n">cfg(test)</span><span class="o">]</span>
<span class="k">mod</span><span class="w"> </span><span class="n">tests</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">use</span><span class="w"> </span><span class="nl">super</span><span class="p">:</span><span class="err">:</span><span class="n">TokenInfo</span><span class="p">;</span>
<span class="w"> </span><span class="k">use</span><span class="w"> </span><span class="nl">super</span><span class="p">:</span><span class="err">:{</span><span class="n">TOKEN_START_ERR</span><span class="p">,</span><span class="w"> </span><span class="n">TOKEN_FORMAT</span><span class="err">}</span><span class="p">;</span>
<span class="w"> </span><span class="k">use</span><span class="w"> </span><span class="nl">rustc_serialize</span><span class="p">:</span><span class="err">:</span><span class="n">json</span><span class="p">;</span>
<span class="w"> </span><span class="err">#</span><span class="o">[</span><span class="n">test</span><span class="o">]</span>
<span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">token_info_new_test</span><span class="p">()</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">let</span><span class="w"> </span><span class="n">token_info</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&</span><span class="nl">TokenInfo</span><span class="p">:</span><span class="err">:</span><span class="k">new</span><span class="p">(</span><span class="n">vec</span><span class="err">!</span><span class="o">[</span><span class="n">"read", "write"</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="k">None</span><span class="p">,</span><span class="w"> </span><span class="ss">"/employees"</span><span class="p">);</span>
<span class="w"> </span><span class="n">assert_eq</span><span class="err">!</span><span class="p">(</span><span class="ss">"{\"</span><span class="k">scope</span><span class="err">\</span><span class="ss">":[\"</span><span class="k">read</span><span class="err">\</span><span class="ss">",\"</span><span class="k">write</span><span class="err">\</span><span class="ss">"],\"</span><span class="n">realm</span><span class="err">\</span><span class="ss">":\"</span><span class="o">/</span><span class="n">employees</span><span class="err">\</span><span class="ss">"}"</span><span class="p">,</span><span class="w"> </span><span class="nl">json</span><span class="p">:</span><span class="err">:</span><span class="n">encode</span><span class="p">(</span><span class="n">token_info</span><span class="p">).</span><span class="n">unwrap</span><span class="p">());</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="err">#</span><span class="o">[</span><span class="n">test</span><span class="o">]</span>
<span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">token_info_from_token_param_fail_test</span><span class="p">()</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">let</span><span class="w"> </span><span class="n">token_info_err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nl">TokenInfo</span><span class="p">:</span><span class="err">:</span><span class="n">from_query_param</span><span class="p">(</span><span class="ss">"bla-/employees-read-write"</span><span class="p">).</span><span class="n">err</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="n">assert_eq</span><span class="err">!</span><span class="p">(</span><span class="nf">format</span><span class="err">!</span><span class="p">(</span><span class="ss">"{} {}"</span><span class="p">,</span><span class="w"> </span><span class="n">TOKEN_START_ERR</span><span class="p">,</span><span class="w"> </span><span class="n">TOKEN_FORMAT</span><span class="p">),</span><span class="w"> </span><span class="n">token_info_err</span><span class="p">);</span>
<span class="w"> </span><span class="err">}</span>
<span class="err">}</span>
</code></pre></div>
<p>The idiomatic way of writing unit tests in Rust is used by defining a submodule in the same file as the production code.
The compiler will ensure that tests aren’t included in the release. As tests is a submodule, we need to import the
<em>TokenInfo</em> module using <em>use super::TokenInfo;.</em></p>
<p>Rust has a powerful macro definition engine. All of the functions which end with ! are macros. For our purposes, we use
the <em>assert_eq!</em> macro to panic in case the two arguments aren’t equal.</p>
<p>I hope you found my mini-dive into Rust fascinating enough to give it a try yourself. In the second part of this post,
I’ll go further into detail about how to put a Rust JSON API into a 5MB Docker Image.</p>
<p>You can contact me on Twitter <a href="https://twitter.com/danpersa">@danpersa</a> if you have any further questions. Thanks for
reading!</p>We couldn't get enough: Stack Overflow, Round 22016-05-19T00:00:00+02:002016-05-19T00:00:00+02:00Zalando Technologytag:engineering.zalando.com,2016-05-19:/posts/2016/05/stack-overflow-round-2.html<p>We chat once again with the Stack Overflow team about developers, recruiting, and more.</p><p>After our incredibly successful <a href="https://tech.zalando.com/blog/joel-spolsky-at-zalando-tech/">Stack Overflow event</a>, we
were impressed with how open and approachable Joel Spolsky was to our team of developers and recruiters. Not only did he
take part in a great Q&A session with our VP of Engineering Eric Bowman, but he also hung around afterwards to spend
some quality time with Zalando’s tech recruiters, sharing his knowledge as a founder and CEO.</p>
<p>We couldn’t resist one more chance to chat with the enigmatic creator of Trello and CEO of Stack Overflow, so we bashed
out some quick questions for a good ol’ fashioned interview with his crew. Both Joel and the team over at Stack Overflow
have put together the following information for us.</p>
<p><em>Zalando Tech:</em> Thanks for taking the time to speak with us. What learnings and impressions have you taken back with you
to New York after visiting Zalando and engaging with the tech scene in Berlin?</p>
<p><em>Stack Overflow:</em> German companies have the same needs for talents as companies in the Silicon Valley. However,
recruiting in Germany is still quite traditional and conservative compared to other markets, except for Berlin. Zalando
is a perfect example for that. You guys understand the competition for talent is fierce and you don’t shy from
experimenting with new ways to attract and retain talent by offering a developer-centred working environment.</p>
<p><em>Zalando Tech:</em> After our Q&A session with Zalando developers, you stayed back to chat with our tech recruiters. What
did they pick your brain about and what’s the one important takeaway you would highlight for recruiters in tech?</p>
<p><em>Stack Overflow:</em> If you ask programmers what’s important to them, money is quite low on the list. The things that are
most important to them are the ability to learn something new, the quality of the people they work with, and getting
excited about the products and projects they work on. When hiring developers, it is really important that a company
communicates these top things. That being said, the salary is something that both sides need to agree on. The earlier in
the process the better. We’ve run some extensive testing where we found that showing salary range increases the click
through rate of job listings substantially.</p>
<p><em>Zalando Tech:</em> You mentioned during your Zalando Q&A that Stack Overflow are accomplishing three things for developers:
Helping them learn, allowing them to share their knowledge, and leveling up their career. Can you expand on how SO are
achieving this?</p>
<p><em>Stack Overflow:</em> We're always in search of ways to help make developers' lives better. We're still very much focused on
fixing a broken tech recruiting industry and want to help developers get jobs they love without the spam and nonsense.
We've been doing a lot of work in this area to improve Jobs on Stack Overflow – better matching, salary data, etc.</p>
<p>We're also exploring new content types on Stack Overflow that will complement Q&A and help developers learn,
collaborate, and solve problems. You're going to see us launch some new features on Stack Overflow in the coming months.</p>
<p><em>Zalando Tech:</em> You would have heard about how Zalando has a focus on team autonomy for its tech teams. What kind
of agile principles are you using at Trello?</p>
<p><em>Stack Overflow:</em> We operate much closer to the
<a href="http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs5w6K7-W1p7Y868qm3GWW1qMCsg56dSf2f6-M_-n02?t=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FKanban_(development)&si=5993548097257472&pi=81575d8a-af73-4dcc-912a-5639faf19c11">Kanban</a>
methodology than agile. We do use tools from agile occasionally where they're useful, like stand-ups and retrospectives.
We don't organise work into sprints, epics, user stories, etc. I think that's what most clients think of "agile" as.</p>
<p>We definitely value the principles of the Agile Manifesto, and our current processes operate pretty well against each of
them.</p>A resilient, Zookeeper-less Solr architecture on AWS2016-05-18T00:00:00+02:002016-05-18T00:00:00+02:00Vjekoslav Osmanntag:engineering.zalando.com,2016-05-18:/posts/2016/05/zookeeper-less-solr-architecture-aws.html<p>The Recommendations team is back, providing a design for deploying Solr on AWS.</p><p>Since last year, Zalando has <a href="https://tech.zalando.com/blog/radical-agility-on-aws-video/">sped up</a> its adoption of
cloud-provided, on-demand infrastructure; we decided to embrace AWS. This meant teams would be given more freedom by
being allowed to administer their own AWS accounts. In addition to this, we were presented with a useful set of tools
called <a href="https://stups.io/">STUPS</a>.</p>
<p>Explaining STUPS in detail would require a whole blog post in itself! To summarise – it is a combination of client- and
server-side software that essentially allows you to provision complicated sets of hardware and deploy code to them, all
with a single shell command.</p>
<p>With the freedom of being able to provision our own hardware in a cloud environment, we knew that we needed to start
gracefully handling hardware failures. There is no safety net, no team of first responders that will get your system
back up if a drive or server fails.</p>
<p>The primary problem we face is that EC2 instances are <a href="https://www.reddit.com/r/sysadmin/comments/3q9zra/so_who_here_has_had_an_aws_instance_randomly/">not guaranteed to stay up
forever</a>. AWS will
sometimes announce instance terminations in advance, but terminations and reboots can also <a href="https://forums.aws.amazon.com/thread.jspa?threadID=81393">happen
unannounced</a>. Furthermore, disk volumes, both internal storage
and EBS, may fail suddenly and get <a href="https://forums.aws.amazon.com/thread.jspa?threadID=75764&tstart=0">remounted on your instance in read-only
mode</a>.</p>
<p>Here are some challenges we set out to solve while designing a Solr deployment for the Recommendations team.</p>
<h3>Challenges</h3>
<p>There are two main challenges we face when deploying Solr on AWS. They are:</p>
<ol>
<li>Bootstrapping a cluster and,</li>
<li>Dealing with hardware failures</li>
</ol>
<p>Let’s look at these in a bit more detail.</p>
<h3>Bootstrapping a cluster</h3>
<p>Here at the Recommendations team, we run classical master-slave Solr configurations. Furthermore, we separate the write
path entirely from the read path. In essence, we have a service dedicated solely to gathering the data, preparing it
and, finally, indexing it in Solr. Other services will in turn read and use this data while serving recommendations to
our customers.</p>
<p>To bootstrap a complete Solr deployment and make it available for reads, we need to:</p>
<ol>
<li>Create a single master instance</li>
<li>Fill it with data</li>
<li>Create a farm of slave instances and,</li>
<li>Replicate the master’s data to the slaves continuously</li>
</ol>
<p>In addition, the size of the data needs to be considered, and a decision on the necessity of sharding must be made. In
the use case this system was supporting, all of the data could fit on one machine, so no sharding was necessary. Even in
the age of Big Data, it turns out that many use cases’ data sets can be condensed into sizes that would be more
appropriately named “medium data”.</p>
<h3>Dealing with hardware failures</h3>
<p>In Zalando’s data center deployments, we were relying on hardware failures being handled by the data center’s incident
team. The failover was being handled by the Recommendations team in cooperation with a centralised system reliability
team.</p>
<p>On AWS however, we are on our own. Therefore, it was necessary to build a Solr architecture that would fail gracefully
and continue working without human intervention for as long as possible. Because our team does not have engineers on
call, we wanted to be able to react hours or even days later, without impacting customer satisfaction.</p>
<h3>Implementation: The old architecture</h3>
<p>Let’s take a look at the state of the data center. The indexing was managed by an application that first indexed the
master and then forced the slaves to replicate.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/aa60ef61da5cadde75f223ec47179491ec0687dd_solr-aws-image-1.png?auto=compress,format"></p>
<p>This way, our batch updates could be executed in a one-shot procedure. The read path was separate, as shown in the
following diagram.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/cdc2d086a17c65e1133bc200fc1f8eab1101d1e0_solr-aws-image-2.png?auto=compress,format"></p>
<p>It would be possible to mirror the same setup on AWS:</p>
<ul>
<li>Every Solr instance would be running in its own Docker container on its own EC2 instance</li>
<li>The Writer app would need to keep the IP address of the master, and the Reader apps would need to keep a set of all
the slave’s addresses</li>
</ul>
<p>Here we stumble upon the first problem: EC2 instances get a new random IP address when they are started. The addresses
kept by the apps would therefore need to be updated whenever an instance fails or another instance is added. Doing this
manually was not an option.</p>
<h3>Handling ephemeral IP addresses</h3>
<p>The problem of having basically ephemeral addresses can be solved by the
<a href="https://cwiki.apache.org/confluence/display/solr/SolrCloud">SolrCloud</a> feature, first introduced in Solr 4. However,
SolrCloud requires an external service to keep track of the currently available machines and their IPs:
<a href="https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble">Zookeeper</a>.</p>
<p>In an effort to cut down the complexity of our Solr deployment, we decided to try to implement a solution without
Zookeeper. As you will see, the proposed architecture makes heavy use of AWS’s <a href="http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-getting-started.html">Elastic Load
Balancers</a>.</p>
<h3>The new architecture</h3>
<p>Let’s take a look at the proposed new AWS architecture.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/85b29fa185d9c730ca6f0dde8100e5c710744a5d_solr-aws-image-3.png?auto=compress,format"></p>
<p>The architecture makes use of three distinct load balancers:</p>
<ol>
<li>Indexing ELB</li>
<li>Replication ELB and,</li>
<li>Query ELB</li>
</ol>
<h3>Indexing ELB</h3>
<p>The indexing ELB is the only address required by the Writer app. It always points to the single Solr master instance
running behind it. The Writer app uses this address to index new data into Solr.</p>
<p>The Solr master runs in a one-instance Auto-Scaling Group (ASG), which reacts to a simple <a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html">EC2 health
check</a>. If the Solr
master shuts down, the EC2 check will fail and the Writer app will know not to try and index it. When the master comes
back up, i.e. the EC2 health checks stop failing, it will be empty and the Writer app will know it can now index it with
fresh data.</p>
<h3>Replication ELB</h3>
<p>The replication ELB is the second ELB connected to Solr master’s one-instance ASG. It uses an <a href="http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-healthchecks.html">ELB health
check</a>. The health check
points to a custom URL endpoint served by Solr. When the check calls this endpoint, it executes Java code that examines
the contents of the Solr cores: It checks if the cores contain enough data to be considered ready for replication.</p>
<p>The slaves are configured so that their master URL points to the replication ELB. They will continuously poll the master
through this ELB to check for changes in the data and to replicate once changes are detected.</p>
<h3>Query ELB</h3>
<p>The query ELB checks the exact same condition as the replication ELB. It checks if the slaves’ cores are full, i.e.
successfully replicated from master and if so, the slave can join the query ELB’s pool.</p>
<h3>Implementing the ELB health check</h3>
<p>To implement the check used by the replication and query ELBs, we need to extend Solr with some custom code.</p>
<p>It’s necessary to implement a new controller that will expose the /replication.info endpoint that will be used by the
replication and query ELBs. Here’s a sample showing how this can be achieved:</p>
<div class="highlight"><pre><span></span><code><span class="nv">@Controller</span>
<span class="k">public</span><span class="w"> </span><span class="n">final</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">ReplicationController</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">private</span><span class="w"> </span><span class="k">static</span><span class="w"> </span><span class="n">final</span><span class="w"> </span><span class="nc">int</span><span class="w"> </span><span class="n">MIN_DOC_COUNT</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1000</span><span class="p">;</span>
<span class="w"> </span><span class="nv">@RequestMapping</span><span class="p">(</span><span class="ss">"/replication.info"</span><span class="p">)</span>
<span class="w"> </span><span class="nv">@ResponseBody</span>
<span class="w"> </span><span class="k">public</span><span class="w"> </span><span class="n">ResponseEntity</span><span class="w"> </span><span class="n">replicationInfo</span><span class="p">(</span><span class="n">final</span><span class="w"> </span><span class="n">ServletRequest</span><span class="w"> </span><span class="n">request</span><span class="p">)</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">final</span><span class="w"> </span><span class="n">CoreContainer</span><span class="w"> </span><span class="n">coreContainer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">CoreContainer</span><span class="p">)</span><span class="w"> </span><span class="n">request</span><span class="p">.</span><span class="n">getAttribute</span><span class="p">(</span><span class="ss">"org.apache.solr.CoreContainer"</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">final</span><span class="w"> </span><span class="n">SolrCore</span><span class="w"> </span><span class="n">core</span><span class="w"> </span><span class="err">:</span><span class="w"> </span><span class="n">coreContainer</span><span class="p">.</span><span class="n">getCores</span><span class="p">())</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">final</span><span class="w"> </span><span class="nc">int</span><span class="w"> </span><span class="n">docCount</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getDocCount</span><span class="p">(</span><span class="n">core</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">docCount</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">MIN_DOC_COUNT</span><span class="p">)</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ResponseEntity</span><span class="o"><></span><span class="p">(</span><span class="ss">"Not ready for replication/queries."</span><span class="p">,</span><span class="w"> </span><span class="n">HttpStatus</span><span class="p">.</span><span class="n">PRECONDITION_FAILED</span><span class="p">);</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ResponseEntity</span><span class="o"><></span><span class="p">(</span><span class="ss">"Ready for replication/queries."</span><span class="p">,</span><span class="w"> </span><span class="n">HttpStatus</span><span class="p">.</span><span class="n">OK</span><span class="p">);</span>
<span class="w"> </span><span class="err">}</span>
<span class="err">}</span>
</code></pre></div>
<p>The simplest way to implement the <em>getDocCount</em> method could be something like this:</p>
<div class="highlight"><pre><span></span><code><span class="nv">private</span><span class="w"> </span><span class="nv">int</span><span class="w"> </span><span class="nv">getDocCount</span><span class="ss">(</span><span class="nv">final</span><span class="w"> </span><span class="nv">SolrCore</span><span class="w"> </span><span class="nv">core</span><span class="ss">)</span><span class="w"> </span>{
<span class="w"> </span><span class="nv">RefCounted</span><span class="w"> </span><span class="nv">newestSearcher</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">core</span>.<span class="nv">getNewestSearcher</span><span class="ss">(</span><span class="nv">false</span><span class="ss">)</span><span class="c1">;</span>
<span class="w"> </span><span class="nv">int</span><span class="w"> </span><span class="nv">docCount</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">newestSearcher</span>.<span class="nv">get</span><span class="ss">()</span>.<span class="nv">getIndexReader</span><span class="ss">()</span>.<span class="nv">numDocs</span><span class="ss">()</span><span class="c1">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nv">docCount</span><span class="c1">;</span>
}
</code></pre></div>
<p>In this example code we see that the /replication.info endpoint will return 200 OK if all the cores in Solr have at
least a thousand documents. However, if a single core is found to not satisfy this condition, the HTTP status 412 is
returned, thus instructing the ELB that this instance is not healthy, i.e. does not contain all necessary data. Of
course, the logic can be extended to include more complex rules if the use case requires them.</p>
<h3>Scenario: Failure of the master instance</h3>
<p>Failure of Solr master results in two negative outcomes:</p>
<ul>
<li>Writer app is unable to update the master with fresh data and,</li>
<li>Slaves cannot replicate anymore because the Replication ELB health check failure leads to the master being taken out
of the Replication ELB’s pool.</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3efc97169433d6ae18e17b30b0b2405f46ac6602_solr-aws-image-4.png?auto=compress,format"></p>
<p>Both outcomes are not critical because as soon as the master is taken out of the Replication ELB, slaves get to keep
their old data and can happily continue to serve requests.</p>
<p>But a bigger risk comes when the master is automatically replaced by the ASG. The new master instance is started whilst
empty and remains empty until the Writer app indexes it again. If no precautions were taken, the slaves would replicate
the empty cores from our new master instance. This danger is avoided by keeping the master out of the replication ELB
for as long as it does not have all cores ready, as shown in the diagram below.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/bb14c5781cb708e3a1eadeb7de39b736d8629ad5_solr-aws-image-5.png?auto=compress,format"></p>
<p>Only after the master is fully indexed can the slaves continue to replicate fresh data.</p>
<h3>Scenario: Failure of slave instances</h3>
<p>A failure of a slave instance is a simpler scenario. As soon as the instance becomes unavailable, its health checks fail
and it is taken out of the query ELB’s pool. The slave instances’ ASG knows to immediately replace it and a new, empty
slave instance is started.</p>
<p>The procedure is very similar to the master instance example we explained above. The new slave instance remains outside
of the query ELB’s pool for as long as it is still replicating the data, i.e. for as long as it still returns 412 to the
ELB’s health check.</p>
<p>Once the ELB’s <em>/replication.info</em> health check determines that the new instance has replicated all necessary data, the
instance is added to the ELB pool and begins serving read queries.</p>
<p>It is also worth mentioning that it is very important to set an appropriate grace period on the slaves’ ASG. The
replication requires some time, and it is necessary to make the grace period long enough for the replication of data to
complete before the health checks are started. If a slave does not replicate within this time, it will start returning
412s and will be deemed unhealthy, terminated, and replaced with another instance, thus opening up the possibility of
never being able to fully replicate before termination.</p>
<p>Thanks for reading. I hope you found this Solr architecture interesting. I’m open to questions, suggestions and general
comments. You can find me on <a href="https://twitter.com/vosmann_">Twitter</a>.</p>Zalando's Tech Radar: All you need to know2016-05-17T00:00:00+02:002016-05-17T00:00:00+02:00Dr. Thomas Frauensteintag:engineering.zalando.com,2016-05-17:/posts/2016/05/zalando-tech-radar.html<p>All about our Tech Radar and its assessment of technologies used in software development.</p><p>Half year ago, we agreed on a set of principles and started working on our Tech Radar; now, we’re proud of having
published its second release for Zalando Technology. We’ve also open sourced the <a href="http://zalando.github.io/tech-radar/"><strong>Zalando Tech Radar visualisation</strong>
(including code) via our</a> <a href="https://github.com/zalando/tech-radar">public GitHub
repository</a>.</p>
<p>Some of the more prominent changes compared with first Radar release are:</p>
<ul>
<li>Play for Scala and Sbt moved from Trial to Adopt</li>
<li>Kafka also moved from Trial to Adopt</li>
<li>Flink changed from Assess to Trial</li>
<li>AngularJS 2.x rises to Assess, whereas AngularJS 1.x descends from Adopt to Hold</li>
</ul>
<h3>Why did we create the Tech Radar?</h3>
<p>The Zalando Tech Radar was created to inspire and support teams to pick the best technologies for new projects. It
provides a platform for Zalando to share knowledge and experience in technologies, to reflect on decisions, and
continuously evolve our landscape.</p>
<p>Based on the ideas of <a href="https://www.thoughtworks.com/radar">ThoughtWorks</a>, Zalando’s Tech Radar sets out the changes in
technologies that are interesting in software development; changes that we think our engineering teams should pay
attention to and consider using in their projects.</p>
<h3>What is the Tech Radar?</h3>
<p>The Zalando Tech Radar is a list of technologies, frameworks, tools, and methods complemented by an assessment result
called a <strong>Ring</strong> assignment. We use four rings with slightly adapted semantics to fit our purpose:</p>
<ul>
<li><strong>Adopt:</strong> Technologies we have high confidence in to serve our purpose, also in large scale. Technologies with a
usage culture in our Zalando production environment, low risk, and recommend to be widely used.</li>
<li><strong>Trial:</strong> Technologies that we have seen work with success in project work to solve a real problem; here we have
the first serious usage experience that confirm benefits and can uncover limitations. Trial technologies are
slightly more risky; some engineers in our organization have gone this path and will share knowledge and
experiences.</li>
<li><strong>Assess:</strong> Technologies that are promising and have clear potential as well as a value for us; technologies worth
investing some research in and prototyping efforts to see if it has impact. Assess technologies have higher risks;
they are often brand new and, while promising, highly unproven in our organisation. You will find some engineers
that have knowledge on the technology and promote it. You may even find teams that have already started a serious
prototyping project.</li>
<li><strong>Hold:</strong> Technologies not favoured to be used for new projects. Technologies that we think are not (yet) worth
(further) investing in. Hold technologies should not be used for new projects, but can usually be continued for
existing projects</li>
</ul>
<p>The Tech Radar is created by the Zalando Technologist Guild – an open group of approximately 25 Zalando senior
technologists. The Guild self-organises to maintain Tech Radar documentation, including the quarterly Zalando Tech Radar
Release and its <a href="http://zalando.github.io/tech-radar/">visualisation</a> published via
<a href="https://github.com/zalando/tech-radar">GitHub</a>. The ring assignments represent the average votes of the Guild’s
members; our voting is based on preliminary discussions within the Guild to achieve (more or less) high consensus.</p>
<h3>How does it work?</h3>
<p>The <strong>Zalando Technologist Guild</strong> maintains the Tech Radar and Compendium, as well as provindig a forum for
conversations, knowledge sharing, and supporting the engineering teams with peer review feedback on their technology
selections.</p>
<p>Guild members meet monthly to discuss the updates of the Tech Radar assessments, and upcoming and ongoing issues around
technologies and architectural decisions.</p>
<p>Together with the Tech Radar we also maintain the <strong>Tech Radar Compendium</strong> — a collection of summaries of listed
technologies including a short description of what they are, why they’re used, and risks of use. This is supplemented
with information about what teams internally at Zalando have used these technologies to solve a certain problem. It
continuously grows with any Tech Radar changes and will hopefully evolve into a valuable resource for us to share
information and get teams connected.</p>
<p>The <strong>Zalando Tech Radar Principles</strong> work hand in hand with Zalando engineering team contributions. Our <a href="https://github.com/zalando/zalando-rules-of-play">Rules of
Play</a> include obligations for our Engineering teams: Teams must use
the Tech Radar as one input source for their technology decisions. Teams are encouraged to challenge the Tech Radar and
provide feedback to the Technologist Guild.</p>
<p>Depending on the current assessment status of the technology candidate, teams should align decisions with their Delivery
Leads, inform or ask the Technologist Guild for peer review feedback on the purpose and risks, and share their knowledge
and experience with other teams and the Guild.</p>
<p>Teams contribute to move technologies onto radar rings by giving feedback, informing about an analysis, or prototyping
activity whilst sharing experience and results with the Guild and other tech parties.</p>
<p>If you want more information about the Technologists Guild or the Zalando Tech Radar, get in touch with me on Twitter
<a href="https://twitter.com/ThFrauenstein">@ThFrauenstein</a>.</p>When do you involve users in a user-centered design process?2016-05-13T00:00:00+02:002016-05-13T00:00:00+02:00Clementine Jinhee Declercqtag:engineering.zalando.com,2016-05-13:/posts/2016/05/involve-users-user-centered-design-process.html<p>Read about Clementine's preferred framework for considering users in digital product design.</p><p>One of the favourite parts of my job is that I get to discuss and share insights with relevant professionals of how to
best apply user-centered design practice in organisations. When we talk about user-centered design, we often say it is
about placing users at the centre of a product. What does that really mean? How do you understand it and apply in your
daily product design process?</p>
<p>The model below illustrates how I see user-centered design. Those who have been in the field of UX for a while would be
familiar with similar models.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/364f519c19eea369cd99114e602c3c558e71846b_image-1.png?auto=compress,format"></p>
<p>The highlight of this model is that it includes two parallel processes: User research and product design. Product design
is broken down into 3 major phases: <strong>Strategy, concept, and design.</strong> User research exists as a parallel process to
product design.</p>
<p>Users are considered here in 3 key phases (each can be repeated, if necessary):</p>
<ul>
<li><strong>Strategy</strong> for defining user needs</li>
<li><strong>Concept</strong> for prioritising features with users</li>
<li><strong>Design</strong> for validating user usability</li>
</ul>
<h3>Strategy</h3>
<p>The goal of user research here is to understand <strong>what</strong> problems your product is trying to solve (which user
requirements is it trying to fulfil?), <strong>for whom</strong> (who are you targeting?), <strong>when</strong> (in which user contexts is it to
be used?) and <strong>why</strong> (why does it have to be your product?). Involving users at this stage is important as you will be
able to assess if your product has an audience. If your product idea doesn’t have an audience, it is very unlikely to
succeed.</p>
<p>As an example of a product which involved users in an early strategic stage of development is
<a href="http://www.zalando.de">Zalando</a>. We tested our <a href="http://theheureka.com/the-king-of-shoes-the-story-of-zalando-founder-robert-gentz">first online shop
model</a> by only selling flip flops.
This was to assess if there was a need in the market for online shopping fashion items. They set up a small server to
test in real-time if there was incoming traffic to the online shop – and this was over 7 years ago. The result? It has
now become one of the biggest European online fashion platforms.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f51e398b8cb5c64f0028575f6af96954f6244bd1_image-2.png?auto=compress,format"></p>
<h3>Concept</h3>
<p>This phase is about translating your high level product idea into a concrete digital interface concept, which will
include a set of features. When doing so, it is important to prioritise and validate these features with users. This
way, you are able to identify which of them should be further emphasised in your design and which of them can safely be
removed. This is because you want to <strong>design an experience</strong> for your users through your digital product. Not <em>just</em> a
user interface, nor a feature.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9a5de49031939a6a15b2b058c4b2d88f5eb98357_image-4-again.png?auto=compress,format"></p>
<p>I like to take the e-reading app <a href="http://www.edenspiekermann.com/projects/blloon/">Blloon</a> as an example of a digital
product which focuses thoroughly on “reading” by prioritising a set of features contributing to this experience. When
using the application, it almost feels like the user interface is erased in favor of an experience around reading
“anytime, anywhere”, exploring readlists, discovering genres, and the latest bestsellers.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f1f5c62b190c2b2fc46e6e26c2139b86f9a808ba_image-5.jpeg?auto=compress,format"></p>
<h3>Design</h3>
<p>Once features are translated into concrete interface concepts, you will start polishing the design, including visual
elements and the consideration of certain interaction patterns. You will be able to get specific with your design. At
this stage, it is important to involve users for assessing the usability of your product, because you want to minimise
any sort of usability risks, but you also want to make sure that it <strong>positively influences</strong> your business goals before
the launch (and, naturally, afterwards).</p>
<p>As an example, the Obama Campaign Team in 2012 managed to significantly increase donation conversions through their
website with the help of <a href="http://kylerush.net/blog/optimization-at-the-obama-campaign-ab-testing/">A/B tests</a>. How did
they do it? By making the donation form simpler. More specifically, by turning the long donation form into “4 smaller
steps”, they increased the conversion rate by more than 5%.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/53630f1b996595bfd6283827137d069d1b9dda84_image-6.jpg?auto=compress,format"></p>
<p>According to the team, “Turns out you can get more users to the top of the mountain if you show them a gradual incline
instead of a steep slope.”</p>
<p>In conclusion, the above process is only one framework for ways of considering users in digital product design.
Depending on the scope of the project, the framework you would want to refer to could vary. Additionally, involving
users doesn’t necessarily imply a long, heavy process with multiple testing rounds. If the right methodologies are
applied, you will be able to gather valuable user insights for a successful digital product.</p>
<p>What’s your definition of a UCD and how do you consider users in your product development process? What have been the
pros and cons of each process? Let me know via <a href="https://twitter.com/clemhee">Twitter</a>.</p>How to deliver millions of personalised newsletter emails on AWS2016-05-11T00:00:00+02:002016-05-11T00:00:00+02:00Vjekoslav Osmanntag:engineering.zalando.com,2016-05-11:/posts/2016/05/personalised-newsletter-emails-aws.html<p>We look at the challenges around personalised recommendations in Zalando's newsletters.</p><p>At Zalando, we strive to help our customers find the most relevant fashion they can imagine. Zalando is known for its
great fashion assortment and its huge selection of products. In order to ensure that our customers are not overwhelmed
by this vast selection of products, the Recommendations Team builds systems that ensure that customers find products in
an easy and convenient way.</p>
<p>Zalando offers its customers the possibility to subscribe to email newsletters, where subscribers receive a weekly email
detailing the latest trends, Zalando news, sale announcements, and a selection of personalised product recommendations.
It is extremely important in this context to provide recommendations that are truly personalised to each customer.</p>
<p>In this post I will describe the challenges that we faced while implementing personalised recommendations in the
newsletter and elaborate on the technical decisions we made. I will also touch on some particularities we ran into while
migrating the service to Amazon Web Services.</p>
<h3>Emails are a bit particular</h3>
<p>A newsletter email typically contains four to ten recommended products. The email itself is a rich HTML document that
gets rendered at the moment the user opens the Zalando newsletter in their email client. The recommendations are
displayed as product images with brand and price captions. These all link to the respective product detail pages in
Zalando’s Fashion Store.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5e51b53d84c20de1a6a92a314e989b2df945e000_newsletter-aws-1.png?auto=compress,format"></p>
<p>The recommendations in the email require only an image URL and a destination page URL to be displayed, as shown in the
diagram below. When the email is sent out, it only contains placeholders. In other words, the recommendations’ URLs do
not link directly to a specific product. The products that will be shown are selected when the email is opened for the
first time. Once it is opened, multiple requests are effectively made simultaneously to render the recommendations’
images.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0c86ebbec439ebd4cf18845001f3d2442fde2172_newsletter-aws-2.png?auto=compress,format"></p>
<h3>Never recommend the same thing twice</h3>
<p>At this point we are faced with a challenge. The recommendations need to be a meaningful selection of products, but
requests for each one are made independently. Even with a stable set of rules for choosing the recommendations, the
selection made in two separate requests can differ due to products selling out or new products coming into stock.</p>
<p>Inconsistent recommendation selection may not be an issue in itself, but it can lead to product duplication, as shown
below. In the diagram we can see that the recommendations at two different positions could be, if coming from two
different selections, duplicates.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/eddfb5b011cb199c20b59b359a80eae304227b97_newsletter-aws-3.png?auto=compress,format"></p>
<p>Duplicating product recommendations in the same e-mail is completely unacceptable. Therefore we need to completely
eliminate this possibility.</p>
<p>The process of selecting recommendations can be computationally intensive, so in order to save on computing, and to
avoid duplicates, we opted for building a solution that would select the products only once, when the first request is
received. These would then be cached and the remaining requests would use the cached recommendations.</p>
<h3>Load balancers, load balancers</h3>
<p>Part of the challenge is the fact that our recommendations service needs to scale horizontally and therefore sits behind
a load balancer, as depicted in the image below.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4ffc4f5a6e40afbd0154e55d1d1c55be7eb31ee6_newsletter-aws-4.png?auto=compress,format"></p>
<p>If we don’t intervene and the requests are left to be distributed by the LB, we will induce many cache misses and
redundant computation of recommendations.</p>
<p>In the next sections I will outline two solutions to the challenge and we will see how load balancing comes into the
spotlight. One described solution will be based on load balancing hardware that we have control of in a data center, and
the other one based on the less configurable AWS Elastic Load Balancer.</p>
<h3>Load balancing in the data center</h3>
<p>The requests from an email come with only a few milliseconds between them and first hit a load balancer. They all
request a product at a certain position of a recommendations selection. Other than the position, all requests coming
from the same email share the same parameters.</p>
<p>Each request gets the full recommendations list, either by computing it or by retrieving it from the cache. Then it
extracts the product at the relevant position and returns it to the client. The order in which the requests are received
by the servers is unknown and some requests might even come very late.</p>
<h3>Configuring your own bare metal</h3>
<p>To take advantage of a trivial in-memory cache, some order needs be introduced into how the requests are load-balanced
across our machines. We need to force all the requests coming from the same client to the same machine.</p>
<p>We configured our hardware load balancer to load-balance on OSI layer 7, i.e. to balance based on a header value
extracted from the HTTP message.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d52210fe397b9ee9e4529c72b907edaf1d10eccc_newsletter-aws-5.png?auto=compress,format"></p>
<p>Since we were able to use headers to direct the LB to send requests to a certain machine, a simple cache was all we
needed. This solution works well, but dictates that the load balancer must be able to forward all requests onto the same
machine. As we will see, this is not always the case.</p>
<h3>Load balancing on AWS</h3>
<p>Since AWS is our cloud provider of choice, we looked into the load balancing functionalities offered there.</p>
<p>Not all load balancers provide customisable layer-7 load balancing, which is something I would discover while
researching the features offered by Amazon Web Services’ Elastic Load Balancer. The ELB does offer some
application-level load balancing features, but none that would fit our use case where requests are fired off in
simultaneous bursts.</p>
<h3>AWS ELB’s skimpy stickiness options</h3>
<p>As Amazon’s official documentation puts it, “... a load balancer routes each request independently to the registered
instance with the smallest load.” So by default, stickiness is disabled. The stickiness options that are offered are
tied to HTTP cookies and therefore biased toward browser-like clients and a sequential request pattern, just like when a
person surfs a web site.</p>
<p>Here’s an overview of ELB’s offerings:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9934803c14aef5a69925ad907bb9b92e55c1822b_newsletter-aws-table-1.png?auto=compress,format"></p>
<h3>Load Balancer Generated Cookie Stickiness</h3>
<p>To bind multiple requests to the same machine, Load Balancer Generated Cookie Stickiness can be enabled. It is also
sometimes called duration-based stickiness. This makes the ELB generate a cookie itself and send it together with the
response to the first request made by a client. On subsequent requests, that cookie is re-sent by the client and the ELB
knows this request is bound to a specific machine, according to the cookie’s content.</p>
<p>This is great for use cases such as a human user browsing the pages of a web site, but does not work when multiple
requests are made at the same time.</p>
<h3>Application Generated Cookie Stickiness</h3>
<p>Instead of using the AWSELB cookie for stickiness for as long as it is still valid, we can use the app cookie option to
handle stickiness via the cookie our application generates. If the application’s cookie is present, then an AWSELB
cookie will still be generated and added to the response by the ELB.</p>
<p>However, If the app cookie is removed or expires, the AWSELB cookie is no longer added to responses by the ELB and the
same email’s remaining requests are again spread out across all machines.</p>
<h3>Abandoning stickiness</h3>
<p>As we can see, even though the ELB offers application-layer load balancing, it does not support headers as a means of
achieving it, only cookies. Requests cannot be made sticky to a certain machine a priori, so we are forced to abandon
the in-memory cache and need to consider a separate coordination entity to:</p>
<ol>
<li>Decide which request will trigger the recommendations list computation and,</li>
<li>Cache the recommendations until all positions in the recommendations list have been read and displayed in the email.</li>
</ol>
<p>At first, implementing such a coordination service from scratch was discussed. Eventually we decided to go with a system
that already implements the features we require. This way, we ensured that we would be testing a far narrower scope;
only the specifics of our domain.</p>
<h3>Enter Redis</h3>
<p>After a colleague of mine, Hrvoje Torbašinović, pointed out Redis’ interesting HSETNX command, a command that would
become essential in the target solution, we decided to explore the functionality offered by Redis more deeply. A few
designs were made, reviewed, and iteratively improved. Ultimately, we got to an implementation that easily serves
800-1000 requests per second during our email campaigns without reaching the system’s upper limit.</p>
<h3>Design</h3>
<p>Instead of forcing every request onto the same machine, we let them go freely to any machine, at the ELB’s discretion,
as depicted in previous diagrams.</p>
<p>The different machines receiving requests for products at different positions in the email will contact Redis and
execute commands to decide which one will compute the recommendations list.</p>
<p>Three Redis operations are necessary:</p>
<ol>
<li>Set the lock to win the right to compute the recommendations,</li>
<li>Write the recommended products into the cache and,</li>
<li>Read computed recommendations from the cache.</li>
</ol>
<p>These operations can be seen in the flow diagram below.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/954371fd3e7a0c0906422285a61820f367a03821_newsletter-aws-6.png?auto=compress,format"></p>
<p>Basically, every request will attempt to get a lock in Redis, but only one should succeed. The lock winner will compute
product recommendations, while the others will proceed to do a blocking read on the Redis cache. After the lock-winning
request is done computing recos, it will write the list into the cache. Finally, the requests blocking on the read will
get the recommended products.</p>
<p>An additional complexity is that a huge number of emails will be opened concurrently. Every e-mail’s lock and cached
recommendations will be identified by a unique, personalised parameter combination. I name these Redis keys
<em>lock_key(params)</em> and <em>recos_key(params)</em>.</p>
<p>The system is described in more detail in the following diagram. Entry expirations are mentioned for the first time, as
are the return values of the lock-setting commands. These are explained in the implementation section.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4e9ee81dc693a66e5929dd73e70e057b520d7a82_newsletter-aws-7.png?auto=compress,format"></p>
<h3>Implementation</h3>
<p>The keys are generated with a rule similar to:</p>
<div class="highlight"><pre><span></span><code>lock_key(params) = "lock-" + concat(params)
recos_key(params) = "list-" + concat(params)
</code></pre></div>
<p>A request thread will attempt to get a lock by setting an arbitrary value into <em>lock_key(params)</em> by doing the
following command:</p>
<div class="highlight"><pre><span></span><code>HSETNX lock_key(params) "lock" "got it"
EXPIRE lock_key(params) lock_ttl
</code></pre></div>
<p>The <a href="http://redis.io/commands/hsetnx">HSETNX</a> commands write a field “lock” with the value “got it” and returns 1 only
if the the field does not already exist. Only the first write gets a 1 and any subsequent ones get a 0. This covers the
decision on who will calculate the list.</p>
<p>After the lock-winning thread is done selecting the recommendations, it writes them with these commands:</p>
<div class="highlight"><pre><span></span><code>DEL recos_key(param)
LPUSH recos_key(param) recos_json
EXPIRE recos_key(params) recos_ttl
</code></pre></div>
<p>The non-lock-winning threads carry on and try to execute a blocking read on the list, but it is not yet there, so they
wait. Since there are multiple requests trying to read, we cannot use the typical Redis pop commands, as that would mean
the first thread would remove the list, and the remaining threads would fail with a timeout. Instead, we keep the item
list in a one-element circular list using <a href="http://redis.io/commands/brpoplpush">BRPOPLPUSH</a>. Every thread reads the list
with the following command:</p>
<div class="highlight"><pre><span></span><code>recos_json = BRPOPLPUSH recos_key(param) recos_key(param)
EXPIRE recos_key(params) recos_ttl
</code></pre></div>
<p>Both the lock and list entries have their expiry times explicitly set, with <em>lock_ttl < recos_ttl</em>, as explained in
the following section.</p>
<h3>Java implementation</h3>
<p>You can check out a sample Java implementation <a href="http://github.com/vosmann/redis-mutex">here</a>.</p>
<p>The table below lists key features the implementation enforces and describes what happens when they are not enforced.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/93b8d5591893c727a36a5b1def5e0ceffe854d2c_newsletter-aws-table-2.png?auto=compress,format"></p>
<p>Additionally, an assumption is made that the network capacity is not fully saturated and the EC2 instances involved are
not under heavy load. If these are not satisfied, Redis operations can time out and result in one or more products not
being shown.</p>
<p>I hope this article has given you a bit of insight into how we handle the delivery of millions of personalised emails on
AWS. If you’ve got any questions about the process, feel free to contact me on <a href="https://twitter.com/vosmann_">Twitter</a>.</p>Dortmund Docker Meetup – A cooperation between Zalando and Docker2016-05-10T00:00:00+02:002016-05-10T00:00:00+02:00Jan Stroppeltag:engineering.zalando.com,2016-05-10:/posts/2016/05/docker-meetup-group-dortmund.html<p>Join us in Dortmund for our new meetup group that's all things Docker.</p><p>Zalando Tech has gone through some enormous changes in the last year, with the transition of its online Fashion Store
from a monolithic application, hosted in datacenters, to microservices, hosted on AWS. To deal with this change, the
<a href="https://stups.io/">Zalando STUPS platform</a> was created, allowing our autonomous teams to easily deploy their services
via Docker containers.</p>
<p>We now have great expertise concerning the technologies involved with this shift. Zalando has become a pioneer for the
use of these technologies, especially regarding Docker, and we see that there is a big interest in learning from our
experiences in the region, we wanted to share our knowledge by creating a specialised Docker meetup group. We contacted
Docker Inc. and they were straight-up excited about the idea: The <a href="http://www.meetup.com/de-DE/Docker-Dortmund/">Docker Meetup Group
Dortmund</a> has officially been founded.</p>
<p>We plan to have a meetup once per month in the group‘s initial stages to see if we can organise enough interesting
talks. These talks will be from internal speakers at Zalando, as well as from external participants from the Ruhr area
and beyond who want to share their experiences using Docker.</p>
<p>We’ll be broadcasting all meetup information via Zalando Tech’s official <a href="https://twitter.com/ZalandoTech">Twitter</a>
account, where you can also keep yourself updated on all other meetup details. We look forward to chatting all things
Docker!</p>How to avoid tapping the “Back” button in an interface design2016-05-09T00:00:00+02:002016-05-09T00:00:00+02:00Clementine Jinhee Declercqtag:engineering.zalando.com,2016-05-09:/posts/2016/05/avoid-back-button-interface-design.html<p>"Back" button interaction is essential for navigation, but it can also become counter-intuitive.</p><p>As part of my daily UX job, I get to research a lot of mobile apps. Lately, I have been looking into different ways to
move back one screen without having to tap on the “back” button. Simply put, see this interaction situation:</p>
<p><img alt="mobile navigation example" src="https://prismic-io.s3.amazonaws.com/zalando-jobsite%2F05c98c12-09c6-4476-8ac3-b3a3b8ab6c9f_1+fancy+back+back+240x427.gif"></p>
<p>On iOS and Android, “back” interaction is generally placed in the top left area in the title bar (except for some cases
on Android, which has a physical “back” button on the device).</p>
<p>Although this interaction is essential from a navigation perspective, it can also become counter-intuitive very quickly.
This holds true for products where the main purpose is focused on browsing and exploring a wide range of content, for
example, newspapers and e-commerce. Here are some reasons why this interaction can be counter-intuitive:</p>
<ul>
<li><strong>Inconvenience</strong> of having to spot with your finger that specific location in the interface where the button is
(this could get worse on tablets)</li>
<li><strong>Less immersive</strong> browsing experience caused by the repetitive “back” interaction</li>
<li><strong>Costly real estate</strong> in a small mobile interaction environment</li>
</ul>
<p>Although we cannot completely avoid this interaction, I wanted to highlight some design alternatives that can prevent
this from being systematically present.</p>
<h3>Provide hidden yet powerful gestures</h3>
<p>There are already smart gestures supported by a system’s OS for quicker navigation to the previous screen. For example,
on iOS, users can swipe-back and quickly jump to main page via double tap on the menu tab bar.</p>
<p><strong>Instagram</strong></p>
<p>Just like in Instagram’s example, you can take advantage of existing system interaction tricks: Swipe-back to the
previous screen. This is a simple solution to handle “back” interaction without going too crazy with the interface
architecture.</p>
<p><img alt="Instagram navigation example" src="https://prismic-io.s3.amazonaws.com/zalando-jobsite%2F6be0f258-a8ac-4890-aba5-d748e67e44fc_2_instagram+240x427.gif"></p>
<p><strong>Guardian</strong></p>
<p>Swiping down or swiping up to move back one screen is also becoming pretty common. You’re also seeing a lot of this in
apps like Pinterest, whose purpose is to explore visual content.</p>
<p><img alt="Guardian navigation example" src="https://prismic-io.s3.amazonaws.com/zalando-jobsite%2F853d1214-b90b-412a-96e8-e004650bec98_3_guardian_240x427.gif"></p>
<h3>Rethink the whole UI architecture</h3>
<p><strong>Snapchat</strong></p>
<p>Snapchat doesn’t have any navigation bar. Everything is accessible using gestures. Swiping top > down to access your
user profile. Swiping left > right to access the camera. Swiping right > left to access different media content
and so on.</p>
<p><img alt="Snapchat swipe example" src="https://prismic-io.s3.amazonaws.com/zalando-jobsite%2F655cf637-5ced-4f27-bf78-7dd3d422c295_4_snapchat+240x427.gif"></p>
<p><strong>Nike</strong></p>
<p>Just like Snapchat, Nike Tech Book rethought the whole structure of the application in such way that a permanent
navigation bar on the bottom was removed completely. Everything else is accessible via gestures from top down to left
right and vice versa.</p>
<p><img alt="Nike gestures example" src="https://prismic-io.s3.amazonaws.com/zalando-jobsite%2Fd3d85e90-2fbf-4763-8985-3e2721437463_5_nike+techbook+240x427.gif"></p>
<h3>Provide additional access</h3>
<p><strong>Zara</strong></p>
<p>Zara provides additional access for navigation between clothing categories by just dragging down the catalog. You don’t
have to go back systematically to the complete list of categories in order to change the category you’re navigating.</p>
<p><img alt="Zara example" src="https://prismic-io.s3.amazonaws.com/zalando-jobsite%2Fc3e39ff5-27f4-40f1-8979-7ec147fd06a6_6_zara+240x427.gif"></p>
<p><strong>Zalando</strong></p>
<p>In a similar way, you can consider vertical navigation from one product to another by swiping right <> left from
one product detail to the next. This way, you don’t have to go back to the catalog list which features all products.</p>
<p><img alt="Zalando example" src="https://prismic-io.s3.amazonaws.com/zalando-jobsite%2Fc23ccff1-9963-4f7e-b436-3b9356418586_7_zalando+240x427.gif"></p>
<h3>Play with transition or position</h3>
<p><strong>Flipboard</strong></p>
<p>This example provides the “back” button on the bottom where the navigation bar is. All functional buttons are also on
the bottom, allowing users to focus on consuming the content starting from the top.</p>
<p><img alt="Flipboard example" src="https://prismic-io.s3.amazonaws.com/zalando-jobsite%2Fc163d329-05f3-4d7c-9309-6bcf79ec148b_8_flipboard+240x427.gif"></p>
<p><strong>Zara</strong></p>
<p>With a nice transition effect, you can remove the feeling of going to a deeper level in an interaction with the use of a
different icon. For example, “close” instead of “back”.</p>
<p><img alt="Zara transition" src="https://prismic-io.s3.amazonaws.com/zalando-jobsite%2Fa939f792-8113-450b-8b9d-861166fdeedc_9_zara+transition+240x427.gif"></p>
<p>In conclusion, each alternative has its pros and cons. It can work individually or in combination with other methods for
a more convenient and frictionless navigation experience. It is important to choose the method that makes the most sense
in your design and in the experience you wish to deliver via your app.</p>
<p>What do you think? How do you handle “back” interaction in your UI? Let me know via Twitter
<a href="https://twitter.com/clemhee">@clemhee</a>.</p>Zalando explores the Hadoop Summit 20162016-05-06T00:00:00+02:002016-05-06T00:00:00+02:00Anthony Brewtag:engineering.zalando.com,2016-05-06:/posts/2016/05/zalando-hadoop-summit-2016.html<p>Get the lowdown from this year's Hadoop Summit from Zalando's Dublin crew.</p><p>With this year being the 10th birthday of Apache Hadoop, Dublin saw 1,400 members of the tech community gather for the
4th Hadoop Summit Europe. These days, the word Hadoop has a somewhat negative connotation in the minds of some people,
but this Summit proved that it is an all-encompassing word to describe a diverse ecosystem of technologies. With our
Fashion Insights Centre located in Dublin, where our engineers and data scientists use several Hadoop technologies, it
made perfect sense for Zalando to be in attendance at the two day event.</p>
<p>The week started with a meetup organised by the <a href="http://www.meetup.com/hadoop-user-group-ireland/">Hadoop User Group</a> in
the vibrant Silicon Docks where Zalando’s Dublin office is also located. The topic of the day was graph databases and
graph processing. A talk on OrientDB by <a href="https://twitter.com/fabriziofortino">Fabrizio Fortino</a> showed the value of the
NoSQL document database. The following night a second meetup, again organised by the Hadoop Users Group, was hosted in
the heart of Dublin's city centre. This event saw six speakers from around the world talk on a variety of topics. One
interesting use case was presented by <a href="https://twitter.com/vdestoeck">Vincent de Stoecklin</a> and showed how <a href="https://www.dataiku.com/dss/">Dataiku
DSS</a> was used to create a predictive application for one of its clients, which enables
drivers to find parking spaces faster.</p>
<p>After an entertaining display of some modern Irish dancing for the attendees, the conference itself began with keynotes
from some of the leaders in the Hadoop community. The themes set out included enterprise readiness, the value being
created, and the growing and thriving state of the Hadoop ecosystem. One of the key takeaways was how Hadoop enables you
to work with your data at rest in a variety of ways.</p>
<p>For us one of the more interesting showcases of the Summit was <a href="https://nifi.apache.org/">Apache NiFi</a>, a project that
was originally open-sourced by the US National Security Agency. The tool allows users to create bidirectional and
complex dataflows from a multitude of sources and outputs, which gives us some new ideas for our current projects.
Throughout the conference HortonWorks were on hand to demonstrate their distribution of NiFi, called DataFlow.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/799719abd08496c51c35468da79d507a244fe833_img_1375.jpg?auto=compress,format"></p>
<p>The <a href="https://flink.apache.org/">Apache Flink</a> project was also took centre stage for much of the conference. With this
platform in use by a number of teams within Zalando we were obviously very interested in the topics based around this. A
very interesting, funny, and somewhat controversial talk by <a href="https://twitter.com/SlimBaltagi">Slim Baltagi</a> praised
Flink as being the 4th Generation of data processing, with Apache Spark being left in its dust as an older and outdated
tool.</p>
<p>And of course <a href="http://spark.apache.org/">Apache Spark</a> was also a huge topic throughout the event, with several talks
focused on the subject. One of the more interesting demos was <a href="https://www.youtube.com/watch?v=vkVereFH1Rk">a video</a> of
an advanced execution visualiser for Spark jobs. This UI tool created, by the Hungarian Academy of Sciences in
collaboration with Ericsson, could prove useful for investigating bottlenecks and having a better understanding of the
physical execution of your Spark jobs.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b6431f571a2430f9355b5ed7d31e34a570528e95_img_20160413_170805.jpg?auto=compress,format"></p>
<p>For the data science enthusiasts in attendance there was plenty of action. A witty demo of
<a href="https://www.tensorflow.org/">TensorFlow</a> by Google’s <a href="https://twitter.com/ramramanathan1">Ram Ramanathan</a> called “Can I
hug that?” classified images as huggable or not. This, along with other demos, displayed the power of deep learning that
can be applied to everything from text to images. <a href="https://www.linkedin.com/in/bill-porto-4128417a">Bill Porto</a> gave an
upbeat presentation on the current shortcomings of some Machine Learning approaches and how to improve accuracy using
real world examples, giving all of us some food for thought.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2f43214121b652a4aed15a3129cb5c0dc768af55_img_20160413_162813.jpg?auto=compress,format"></p>
<p>During the final day's keynote, <a href="https://twitter.com/mccandelish">David McCandless</a> spoke of how data can be abstract,
but how visualising it aids communication and understanding. A striking example David gave was the comparison of the
billions estimated to fund the Iraq war and the <a href="http://www.informationisbeautiful.net/visualizations/the-billion-dollar-o-gram-2009/">final
cost</a>.</p>
<p>We learnt a lot from the Summit and made some great connections, too much to condense into a single blog post. Perhaps
by visualising the notes we took at a macro level you will see the Hadoop Summit is more than just Hadoop, it's an
ecosystem.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6d71eb620ba07d8433de163244581efcf67f4ba1_word-cloud-simon-mcgloin.png?auto=compress,format"></p>Migrating from Spray to Akka HTTP2016-05-03T00:00:00+02:002016-05-03T00:00:00+02:00Nikita Melkozerovtag:engineering.zalando.com,2016-05-03:/posts/2016/05/migrating-spray-akka-http.html<p>We cover the trickiest changes when it comes to migrating from Spray to Akka HTTP.</p><p>Spray is a well-known HTTP library in the Scala ecosystem. It was released in 2011, and since then it’s been widely used
by the Scala community. It was recently announced that Spray would be replaced with Akka HTTP, thus cementing Akka HTTP
as the successor of Spray. It’s maintained by Lightbend and it's been recommended that users migrate to it soon.</p>
<p>However, migration from one major version of a library to another is not an easy task. Very often it requires you to
spend some time reading the source code in order to figure out how to use certain features, as well as how to migrate
existing logic.</p>
<p>This post will demonstrate what changes should be applied in order to migrate your app from Spray to Akka HTTP. The
following steps don’t have a particular order, as it depends on which areas need to be rewritten.</p>
<h3>Packages</h3>
<p>In order to have the latest Akka HTTP packages, all previous Spray dependencies now need to be replaced by the
following:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="s">"com.typesafe.akka"</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="s">"akka-http-core"</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="err">“</span><span class="m m-Double">2.4.3</span><span class="err">”</span>
<span class="w"> </span><span class="s">"com.typesafe.akka"</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="s">"akka-http-experimental"</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="err">“</span><span class="m m-Double">2.4.3</span><span class="err">”</span>
<span class="w"> </span><span class="s">"com.typesafe.akka"</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="s">"akka-http-testkit"</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="err">“</span><span class="m m-Double">2.4.3</span><span class="err">”</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="s">"test"</span>
</code></pre></div>
<h3>HttpService</h3>
<p>Spray’s <em>HttpService</em> has been removed. Use <em>Http</em> class and pass your routes to the <em>bindAndHandle</em> method. For
example:</p>
<p>Before:</p>
<div class="highlight"><pre><span></span><code>val service = system.actorOf(Props(new HttpServiceActor(routes)))
IO(Http)(system) ! Http.Bind(service, "0.0.0.0", port = 8080)
</code></pre></div>
<p>After:</p>
<div class="highlight"><pre><span></span><code>Http().bindAndHandle(routes, "0.0.0.0", port = 8080)
</code></pre></div>
<h3>Marshalling</h3>
<p><em>Marshaller.of</em> can be replaced with <em>Marshaller.withFixedContentType</em>. See below:</p>
<p>Before:</p>
<div class="highlight"><pre><span></span><code><span class="n">Marshaller</span><span class="p">.</span><span class="k">of</span><span class="o">[</span><span class="n">JsonApiObject</span><span class="o">]</span><span class="p">(</span><span class="err">`</span><span class="n">application</span><span class="o">/</span><span class="n">vnd</span><span class="p">.</span><span class="n">api</span><span class="o">+</span><span class="n">json</span><span class="err">`</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"> </span><span class="p">(</span><span class="k">value</span><span class="p">,</span><span class="w"> </span><span class="n">contentType</span><span class="p">,</span><span class="w"> </span><span class="n">ctx</span><span class="p">)</span><span class="w"> </span><span class="o">=></span>
<span class="w"> </span><span class="n">ctx</span><span class="p">.</span><span class="n">marshalTo</span><span class="p">(</span><span class="n">HttpEntity</span><span class="p">(</span><span class="n">contentType</span><span class="p">,</span><span class="w"> </span><span class="k">value</span><span class="p">.</span><span class="n">toJson</span><span class="p">.</span><span class="n">toString</span><span class="p">))</span>
<span class="err">}</span>
</code></pre></div>
<p>After:</p>
<div class="highlight"><pre><span></span><code>Marshaller.withFixedContentType(`application/vnd.api+json`) { obj =>
HttpEntity(`application/vnd.api+json`, obj.toJson.compactPrint)
}
</code></pre></div>
<p>Akka HTTP marshallers support content negotiation, so you don't have to specify the content type when creating one
“super” marshaller from other marshallers:</p>
<p>Before:</p>
<div class="highlight"><pre><span></span><code>ToResponseMarshaller.oneOf(
`application/vnd.api+json`,
`application/json`
)(
jsonApiMarshaller,
jsonMarshaller
}
</code></pre></div>
<p>After:</p>
<div class="highlight"><pre><span></span><code>Marshaller.oneOf(
jsonApiMarshaller,
jsonMarshaller
)
</code></pre></div>
<h3>Unmarshalling</h3>
<p>The example below shows one way that an <em>Unmarshaller</em> might be rewritten:</p>
<p>Before:</p>
<div class="highlight"><pre><span></span><code><span class="n">Unmarshaller</span><span class="o">[</span><span class="n">Entity</span><span class="o">]</span><span class="p">(</span><span class="err">`</span><span class="n">application</span><span class="o">/</span><span class="n">vnd</span><span class="p">.</span><span class="n">api</span><span class="o">+</span><span class="n">json</span><span class="err">`</span><span class="p">)</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="n">HttpEntity</span><span class="p">.</span><span class="n">NonEmpty</span><span class="p">(</span><span class="n">contentType</span><span class="p">,</span><span class="w"> </span><span class="k">data</span><span class="p">)</span><span class="w"> </span><span class="o">=></span>
<span class="k">data</span><span class="p">.</span><span class="n">asString</span><span class="p">.</span><span class="n">parseJson</span><span class="p">.</span><span class="n">convertTo</span><span class="o">[</span><span class="n">Entity</span><span class="o">]</span>
<span class="w"> </span><span class="err">}</span>
</code></pre></div>
<p>After:</p>
<div class="highlight"><pre><span></span><code><span class="n">Unmarshaller</span><span class="p">.</span><span class="n">stringUnmarshaller</span><span class="p">.</span><span class="n">forContentTypes</span><span class="p">(</span><span class="err">`</span><span class="n">application</span><span class="o">/</span><span class="n">vnd</span><span class="p">.</span><span class="n">api</span><span class="o">+</span><span class="n">json</span><span class="err">`</span><span class="p">).</span><span class="k">map</span><span class="p">(</span><span class="n">_</span><span class="p">.</span><span class="n">parseJson</span><span class="p">.</span><span class="n">convertTo</span><span class="o">[</span><span class="n">Entity</span><span class="o">]</span><span class="p">)</span>
</code></pre></div>
<h3>MediaTypes</h3>
<p><em>MediaType.custom</em> can be replaced with specific methods in <em>MediaType</em> object.</p>
<p>Before:</p>
<div class="highlight"><pre><span></span><code>MediaType.custom("application/vnd.acme+json")
</code></pre></div>
<p>After:</p>
<div class="highlight"><pre><span></span><code>MediaType.applicationWithFixedCharset("application/vnd.acme+json", HttpCharsets.`UTF-8`)
</code></pre></div>
<h3>Rejection Handling</h3>
<p><em>RejectionHandler</em> now uses a builder pattern – see the example below:</p>
<p>Before:</p>
<div class="highlight"><pre><span></span><code>def rootRejectionHandler = RejectionHandler {
case Nil => {
requestUri { uri =>
logger.error("Route: {} does not exist.", uri)
complete((NotFound, mapErrorToRootObject(notFoundError)))
}
case AuthenticationFailedRejection(cause, challengeHeaders) :: _ => {
logger.error(s"Request is rejected with cause: $cause")
complete((Unauthorized, mapErrorToRootObject(unauthenticatedError)))
}
}
</code></pre></div>
<p>After:</p>
<div class="highlight"><pre><span></span><code><span class="nf">RejectionHandler</span>
<span class="w"> </span><span class="na">.newBuilder</span><span class="p">()</span>
<span class="w"> </span><span class="na">.handle</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nf">case</span><span class="w"> </span><span class="no">AuthenticationFailedRejection</span><span class="p">(</span><span class="no">cause</span><span class="p">,</span><span class="w"> </span><span class="no">challengeHeaders</span><span class="p">)</span><span class="w"> </span><span class="err">=></span>
<span class="w"> </span><span class="nf">logger.error</span><span class="p">(</span><span class="no">s</span><span class="err">"</span><span class="no">Request</span><span class="w"> </span><span class="no">is</span><span class="w"> </span><span class="no">rejected</span><span class="w"> </span><span class="no">with</span><span class="w"> </span><span class="no">cause</span><span class="p">:</span><span class="w"> </span><span class="no">$cause</span><span class="err">"</span><span class="p">)</span>
<span class="w"> </span><span class="nf">complete</span><span class="p">((</span><span class="no">Unauthorized</span><span class="p">,</span><span class="w"> </span><span class="no">mapErrorToRootObject</span><span class="p">(</span><span class="no">unauthenticatedError</span><span class="p">)))</span>
<span class="na">.handleNotFound</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="no">ctx</span><span class="w"> </span><span class="err">=></span>
<span class="w"> </span><span class="nf">logger.error</span><span class="p">(</span><span class="err">"</span><span class="no">Route</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span><span class="w"> </span><span class="no">does</span><span class="w"> </span><span class="no">not</span><span class="w"> </span><span class="no">exist.</span><span class="err">"</span><span class="p">,</span><span class="w"> </span><span class="no">ctx.request.uri.toString</span><span class="p">())</span>
<span class="w"> </span><span class="nf">ctx.complete</span><span class="p">((</span><span class="no">NotFound</span><span class="p">,</span><span class="w"> </span><span class="no">mapErrorToRootObject</span><span class="p">(</span><span class="no">notFoundError</span><span class="p">)))</span>
<span class="w"> </span><span class="err">}</span>
<span class="na">.result</span><span class="p">()</span><span class="w"> </span><span class="no">withFallback</span><span class="w"> </span><span class="no">RejectionHandler.default</span>
</code></pre></div>
<h3>Client</h3>
<p>The Spray-client pipeline was removed. Use Http’s <em>singleRequest</em> method instead:</p>
<p>Before:</p>
<div class="highlight"><pre><span></span><code>val<span class="w"> </span>pipeline:<span class="w"> </span>HttpRequest<span class="w"> </span>=><span class="w"> </span>Future[HttpResponse]<span class="w"> </span>=<span class="w"> </span>(addHeader(Authorization(OAuth2BearerToken(accessToken)))<span class="w"> </span>~><span class="w"> </span>sendReceive)
<span class="w"> </span>val<span class="w"> </span>patch:<span class="w"> </span>HttpRequest<span class="w"> </span>=<span class="w"> </span>Patch(uri,<span class="w"> </span>object))
pipeline(patch).map<span class="w"> </span>{<span class="w"> </span>response<span class="w"> </span>⇒
<span class="w"> </span>…
}
</code></pre></div>
<p>After:</p>
<div class="highlight"><pre><span></span><code>val request = HttpRequest(
method = PATCH,
uri = Uri(uri),
headers = List(Authorization(OAuth2BearerToken(accessToken))),
entity = HttpEntity(MediaTypes.`application/json`, object)
)
http.singleRequest(request).map {
case … => …
…
}
</code></pre></div>
<h3>Headers</h3>
<p>All HTTP headers have been moved to the <em>akka.http.scaladsl.model.headers._</em> package.</p>
<h3>Form fields and file upload</h3>
<p>With the streaming nature of http entity, it’s important to have a strict http entity before accessing multiple form
fields or use file upload directives. One solution might be using next directive before working with form fields:</p>
<div class="highlight"><pre><span></span><code>val<span class="w"> </span>toStrict:<span class="w"> </span>Directive0<span class="w"> </span>=<span class="w"> </span>extractRequest<span class="w"> </span>flatMap<span class="w"> </span>{<span class="w"> </span>request<span class="w"> </span>=>
<span class="w"> </span>onComplete(request.entity.toStrict(5.seconds))<span class="w"> </span>flatMap<span class="w"> </span>{
<span class="w"> </span>case<span class="w"> </span>Success(strict)<span class="w"> </span>=>
<span class="w"> </span>mapRequest(<span class="w"> </span>req<span class="w"> </span>=><span class="w"> </span>req.copy(entity<span class="w"> </span>=<span class="w"> </span>strict))
<span class="w"> </span>case<span class="w"> </span>_<span class="w"> </span>=><span class="w"> </span>reject
<span class="w"> </span>}
<span class="w"> </span>}
</code></pre></div>
<p>And one can use it like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">toStrict</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">formFields</span><span class="p">(</span><span class="ss">"name"</span><span class="p">.</span><span class="k">as</span><span class="o">[</span><span class="n">String</span><span class="o">]</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=></span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="err">}</span>
<span class="err">}</span>
</code></pre></div>
<p>While this list isn’t an exhaustive collection of all the changes you need to do, it covers the trickiest ones that
exist. One major drawback of Akka HTTP is that it’s not as mature as Spray, and it’s performance is not optimised yet.
Users may also notice a lack of documentation for some cases.</p>
<p>Having said that, it would be a good idea to keep the above issues in mind during this process. Happy migration!</p>Teaching React: A different approach2016-04-29T00:00:00+02:002016-04-29T00:00:00+02:00Andra Joy Lallytag:engineering.zalando.com,2016-04-29:/posts/2016/04/andra-teaching-react.html<p>Finding your latest beginners workshop too fast-paced? Andra did something about hers.</p><p>Before I became a developer, I graduated college as a math and science teacher. Teaching is something I fundamentally
believe in and think anyone can do, no matter their skill level.</p>
<p>When I first arrived at Zalando, I joined a team that had recently decided to use React. However, few had concrete
knowledge about the framework. My team decided to join three other interested teams and attend a React workshop every
two weeks.</p>
<p>After attending this workshop, I thought to myself: If I was a beginner-level React developer, this class might be a
little too advanced and fast-paced for me. To this end, I offered my services to teach the other four members of my team
an hour every week if they found a room and a suitable time. I was really surprised with how fast they jumped on the
opportunity and made it a regular weekly activity for our team.</p>
<p>I'm a firm believer in the fact that coding is fundamentally hard. People that have been coding for a long time can
often forget this. When teaching, they present complex syntax and hard coding concepts, and in my opinion, too many
topics for beginners to understand right from the get go.</p>
<p>I believe this is the wrong approach, thus my teaching style is a little bit different. I like to start by drawing
pictures, as I think you get a better understanding of code if you can draw what is happening before you add syntax. In
the case of React, I would draw smart and dumb components while discussing how they work together, and which of these
owned state and props.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b3fa5148fca5125dd3927b8ba24bae753860bacb_component-diagram.png?auto=compress,format"></p>
<p>Next, I would take an example of a page that was using React, such as Facebook. We would try to label the page with the
new vocabulary we had just learned. With this approach, a student I teach may not even see any syntax until the third
teaching day.</p>
<p>I hold high level understanding over syntax. As a student, syntax was the one thing I could always Google the answer
for. However, high level visualisations to learn from are often hard to find or practically nonexistent, which is why
it’s so important to incorporate them into teaching.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e94b0592e60fd3935ef5f701efbb846c523a6991_facebook-example.png?auto=compress,format"></p>
<p>I also believe in standing by your work, which means I often give homework if you’re attending my class. Students would
bring their work to the next lesson and be required to justify why their method is the best. I don’t have the answers,
which I make very clear, as I'm a junior developer myself. I want them to challenge me and teach me a new way to do
something. I want them to convince their classmates and I that their work is the best way to do something. With this
process, they’re required to articulate what they’ve learned, allowing me to catch anything they might have missed. This
also helps us as a team to create standards we can abide by.</p>
<p>I encourage everyone to teach one another. At a company as big as ours, we shouldn’t be taking courses online while
sitting next to each other. We can and should be teaching one another, and using the valuable resources we have: Each
other.</p>
<p>I was afraid to teach a subject I had just learned myself, but in the process I’ve become a better React developer. This
experience has also helped my team produce code faster, collaborate on a shared team standard, and has us talking about
interesting concepts such as architecture.</p>
<p>I thoroughly recommend teaching within the confines of your team. We have benefitted from the learning process and, as a
result, work better and faster together.</p>Why is Girls’ Day so important to Zalando Tech?2016-04-29T00:00:00+02:002016-04-29T00:00:00+02:00Lisa Moerschtag:engineering.zalando.com,2016-04-29:/posts/2016/04/girls-day-zalando-tech.html<p>Educating girls about Zalando and Technology is high on our company's To-Do list.</p><p>Yesterday it was <a href="http://www.girls-day.de/Ueber_den_Girls_Day/Das_ist_der_Girls_Day/Ein_Zukunftstag_fuer_Maedchen/english">Girls’
Day</a> in
Germany: A Federal Government initiative that encourages technology companies, universities, and research centres to
organise an open day for girls to positively influence their vocational choices. It’s a way to get girls in touch with
careers in IT, trades, science, and technology, as well as meet female role models in leadership positions.</p>
<p>This year, Zalando Tech took part for the first time and organised a jam-packed day of sessions. These included Q&A
rounds with several Zalando Tech employees, and participation in focus group testing for a better insight into our User
Lab. A group of eager and incredibly excited girls joined us for the day at Zalando Tech in Berlin.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e042bf29a4af1c68c4b01c2010facc6df29a777f_girls-day-bmo.jpg?auto=compress,format"></p>
<p>Campaigns such as these are incredibly important for several reasons: Not only are they a basis for changing common
attitudes towards careers, but they serve to better educate girls about misconceptions in the technology industry. Most
of the girls attending had expected technology jobs to be very anti-social, but their experience at Zalando really
opened their eyes: Working on projects together, and the company culture as a whole, challenged this idea.</p>
<p>Similar campaigns to Girls’ Day take place across 20 different European countries, with the likes of France, Italy,
Netherlands, and Norway taking part. There are also Girls’ Day initiatives in Asia, plus the launch of the first Girls’
Day happening in Egypt this year.</p>
<p>You can find more information about Girls’ Day, in German and English, via the campaign’s <a href="http://www.girls-day.de/Ueber_den_Girls_Day/Das_ist_der_Girls_Day/Ein_Zukunftstag_fuer_Maedchen/english">official
website</a>.</p>
<p>If you’re interested in finding out more about what goes on behind the scenes at Zalando Tech, perhaps for your school
or university class, please reach out to us via <a href="https://twitter.com/ZalandoTech">Twitter</a>.</p>Four lessons learned when working with Microservices2016-04-27T00:00:00+02:002016-04-27T00:00:00+02:00Malte Pickhantag:engineering.zalando.com,2016-04-27:/posts/2016/04/four-lessons-with-microservices.html<p>Read about what our Dortmund team has learned on the road to implementing Microservices.</p><p>For a couple of weeks now, our team at Zalando has been implementing Microservices for a new feature, which we’re
looking forward to sharing with you. Our team is relatively young and has been working together for roughly six months.
This is our first big project as a team and there are a lot of learnings to share.</p>
<p>On top of successes, we’ve also encountered failures. With this blog post I’d like to address the lessons we’ve taken on
when working with Microservices.</p>
<h3>Mistakes are mandatory</h3>
<p>Looking back, I’d say some mistakes could have been avoided in our project. On the other hand, these mistakes were also
important to ensure we learned from the experience.</p>
<p>You have to pull yourself together and get on with the job.</p>
<p>Our first challenge was that Zalando “switched” from its monolith architecture to creating all applications as
Microservices. You might have thought that this would be an easy task, as we initially did: You just split your
applications into smaller ones and continue on.</p>
<p>Well, not quite.</p>
<h3>Defining Microservices</h3>
<p>As soon as we started designing our new components and digging deeper into the Microservices approach, we realised how
hard it was to define the boundaries of our services.</p>
<p>We started with having two “Microservices”: One for the core business logic and one for calling third party
applications.</p>
<p>After a few days of re-thinking and designing, we knew we had to sit down together and discuss our approach. We took a
detailed look at our ER-Diagram and started splitting up the services by the context that they’re serving, making sure
they would only serve one purpose.</p>
<p>Breaking it down this way, we ended up with eight Microservices. Each of these would have their own API, which then
needed to be designed.</p>
<p><strong>Lesson #1: Define your Microservices properly and know your boundaries.</strong></p>
<p>Not only were we a completely new team and also new to the whole “Microservices thing”, we also made a decision to
switch to new technologies. We replaced Maven with Gradle in order to create cleaner build-files and switched from
PostgresSQL to Apache Cassandra in order to avoid downtimes introduced by the database.</p>
<h3>Structural apocalypse</h3>
<p>Since our team was implementing our first instance of Microservices, we started structuring our projects as we’d been
used to. For example, we created multiple modules, one for the database layer, one for the domain objects, one for the
API, and so on.</p>
<p>After some time, we recognised that this method introduces a lot of overhead, such as maintaining dependencies for each
module and uploading jar files to Nexus in order to share libs between modules. Most of the time, each module contains
only a few classes.</p>
<p><strong>Lesson #2: Don’t create modules in your Microservices.</strong></p>
<p>We soon removed all modules and used packages to separate the classes. With this we’re truly on our way to implementing
our new feature.</p>
<h3>From Waterfall to early integration</h3>
<p>We divided our team into sub-teams consisting of pairs, with each starting to define the APIs required to implement a
Microservice. Thanks to Zalando’s internal API Guild, the API design went really well.
<a href="https://en.wikipedia.org/wiki/Guild">Guilds</a> are modern-day groups with a concentrated purpose of sharing knowledge and
spreading new practices. The API Guild is a community of API engineers at Zalando that contribute to overarching API
quality and share knowledge around API design and implementation.</p>
<p>An issue that arose from this was that we tried to implement a lot of use cases in one Microservice at the same time. As
soon as we started integrating the services, we figured out that something wasn’t working as expected. From here, we
either had to change an API, or worse still, change the complete implementation of a use case.</p>
<p>After some weeks, we changed the process so that we implemented one use case through all Microservices and integrated
them early.</p>
<p><strong>Lesson #3: Integrate early, integrate often.</strong></p>
<p>This change in the development process made it possible for us to detect design failures at an early stage, resulting in
throwing away less code.</p>
<h3>I love it when a plan comes together</h3>
<p>In order to serve a product’s need for a “go-live-date”, we created a relatively detailed plan of tasks we had to
complete in order to finish the project.</p>
<p>Unfortunately, it turned out that this plan was designed to fail. Due to the organisational changes made to the project
and the tweaks we made to find our place in the new environment, a delay had occurred in our project.</p>
<p><strong>Lesson #4: Focus on weekly goals!</strong></p>
<p>We soon discovered that including weekly goals to focus on, like you would in a Scrum situation when setting up your
Sprint targets, was a better way to prioritise and focus our efforts. Weekly goals lead to an improved workflow and also
tie into the lesson we learned about integrating early and often. We wanted to make sure we could stick to the
“go-live-date”, so structuring our work via weekly deliverables was a manageable way of achieving this.</p>
<p>I hope you enjoyed reading this article about our learnings and that it saves you time when taking on the task of
developing new Microservices. Avoid our mistakes! If you have any questions, you can contact me on Twitter
<a href="https://twitter.com/Coderebelll">@Coderebelll</a>.</p>Progress recap: Elm Street 4042016-04-22T00:00:00+02:002016-04-22T00:00:00+02:00Andrey Kuzmintag:engineering.zalando.com,2016-04-22:/posts/2016/04/progress-recap-elm-street-404.html<p>The guys behind Elm Street 404 give us an update on the game's progress.</p><p><a href="https://tech.zalando.com/blog/using-elm-to-create-a-fun-game-in-just-five-days/">Remember the game</a> that was created
with the Elm language during Zalando’s Hack Week #4? We’ve been working on it via open source and it’s had a lot of
improvements since then. Our goal remains the same — to put the game on the 404 page of the Zalando website, and at its
current state, we’re as close as ever to reaching it. In this post we will go through the game’s progress in detail.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/beead74c936055591e12629289f401ccedfe4ec3_01-screenshot.png?auto=compress,format"></p>
<h3>Refactoring</h3>
<p>The game was built at a really fast pace so we had to make some quick decisions, some of which ended up not working for
the best. This resulted in some dirty hacks and cutting corners here and there. That’s why we immediately focused on
refactoring in order to prepare the code for the new features.</p>
<p>The main refactoring challenge was to bring multiple lists of different map objects (warehouses, houses, trees and
fountains) together into one list in order to use them in the path finding and map generation algorithms.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/17b2e1bf01d0d8b6b04b4f9e88c41186412b29aa_02-refactoring.png?auto=compress,format"></p>
<p>Thanks to the Elm compiler that guided us through this process, we were able to refactor it quickly and with confidence.
An interesting discovery here was that the new features usually required some abstraction to bring the data pieces
together, and this resulted in more lines of code being removed than added. This means that with Elm, you’re sometimes
adding features by removing code! In fact, if we count all the new features that have been added since last year, lines
of code have only increased slightly from 2623 to 2689.</p>
<h3>Completeness</h3>
<p>The game was fully playable at the Hack Week event, however it lacked some features that would make it look polished.
The most noticeable of these features was the “Start Game” and “Loading in Progress” states. While the first feature was
trivial to implement, the latter required a deep understanding of Elm Tasks and writing a native module in JavaScript to
<a href="https://github.com/zalando/elm-street-404/pull/13">load the image</a>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5f5248dbcdbd3e1769aa725dcc32c7086e4e7d8b_03-responsive.png?auto=compress,format"></p>
<p>We also noticed that in order to put the game on the website it had to be responsive, so we calculated the size of the
map based on the Window.dimensions signal. Because the old map was hard-coded for desktop size only, we had to write a
recursive algorithm that randomly positioned objects on the map and kept the list of empty boxes.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/37ffccaf58d03efd53e7550fe29e443c83ab288d_04-algorithm.png?auto=compress,format"></p>
<h3>Performance</h3>
<p>After making the game responsive, we spent quite a lot of time playing it on mobile! Unfortunately, we noticed that it
didn’t perform well enough and the game’s animations weren’t smooth.</p>
<p>We initially used <a href="http://elm-lang.org/blog/blazing-fast-html">elm-html</a> that implements the similar to the ReactJS
concept of a Virtual DOM, and we rendered with the requestAnimationFrame by using Effects.Tick. Because the game objects
were rendered as many div tags, the diff algorithm of the Virtual DOM had slow execution times and produced many changes
to the real DOM. Our immediate idea was to use
<a href="http://package.elm-lang.org/packages/elm-lang/core/3.0.0/Graphics-Collage">Graphics.Collage</a> to render the game to a
canvas, but unfortunately it didn’t support texture offsets, meaning we couldn’t use our animation strips.</p>
<p>Initially this seemed like a crazy idea, but in the end, we rewrote the rendering in WebGL, which made the game perform
really fast at 60 frames per second. We also got an external contributor that helped us with the <a href="https://github.com/zalando/elm-street-404/pull/20">transparency
issue</a> in the shader code.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d160a8435a3830671d5434c29f66d915eed33afb_05-performance.png?auto=compress,format"></p>
<h3>Sharing the knowledge</h3>
<p>We are very excited about the game and Elm in general, and we want more people to get involved. We’ve applied to a
couple of conferences in order to share our knowledge and spread the word. In the meantime, Andrey presented the game at
the <a href="http://www.meetup.com/Elm-Berlin/events/229841384/">first Elm Berlin meetup</a> and at the internal Web Guild event at
Zalando. You can have a look at the slides
<a href="http://unsoundscapes.com/slides/2016-04-06-creating-a-fun-game-with-elm/">here</a>.</p>
<h3>Future Plans</h3>
<p>The game is in progress as an <a href="https://github.com/zalando/elm-street-404">open source project</a>, and our current focus is
on the following tasks:</p>
<ul>
<li>We’re still looking for someone who can help us with sound effects for the game</li>
<li>The images aren’t yet complete. For example, we had planned to put boxes on the bike of the moving delivery person</li>
<li>We need to improve the gameplay, as it’s currently not possible to win and the game’s difficulty doesn’t increase</li>
</ul>
<p>If you’d like to contribute to the project, please get in touch! You can reach Andrey via Twitter
<a href="https://twitter.com/unsoundscapes">@unsoundscapes</a>, and Kolja via Twitter <a href="https://twitter.com/01k">@01k</a>.</p>Introducing the Zalando Web Guild2016-04-21T00:00:00+02:002016-04-21T00:00:00+02:00Henrik Andersentag:engineering.zalando.com,2016-04-21:/posts/2016/04/zalando-web-guild.html<p>A quick dive into the Web Guild's latest activities and announcing their first external meetup.</p><p>The Zalando Web Guild is a group of like-minded frontend engineers who meet bi-weekly to discuss new technologies, share
knowledge, and collaborate on diverse projects. With so many independent teams and engineers from different backgrounds,
we felt that there was too little communication between them, so that teams often solved similar problems instead of
sharing code and knowledge.</p>
<p>This is especially important in the area of frontend development, where new technologies seem to spring up weekly.
Sadly, not all of these technologies will be relevant, which is why we discuss and evaluate new technologies within the
Guild.</p>
<p>We first launched the Web Guild in late January 2016 with two presentations on Promises in Node.js and the new Frontend
Architecture in our Fashion Store. Ever since, we’ve kept these presentations up every other week on topics such as Elm,
Redux, Linting, and Unicode.</p>
<p>We currently have 35 active members who attend meetings, communicate daily, and contribute to open source projects.
While we’re mostly focused on frontend, we’ve also touched on JavaScript in the backend when it's relevant to group
members, like the talk about Promises in Node.js shows.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7b02e937cbbc960f24b2fbd6d0374f93c33ec1f6_thumb_img_0703_1024.jpg?auto=compress,format"></p>
<p>You don’t necessarily need to present a topic to propose it. We try to collect topics that the group is interested in,
and then people can jump in if they feel they have something to contribute. We feel that this process fosters discussion
about relevant topics; we’re currently looking into Service Workers, Functional Programming, Angular 2.0, Testing and
Tooling, and a lot more.</p>
<p>In our opinion, the Web Guild has been incredibly successful at fulfilling its aim in fostering collaboration and
sharing knowledge between teams. Following this success, we’re planning our first ever external meetup of the Guild in
May. What we hope to get out of an external meetup is the opportunity to hear from speakers outside of Zalando who can
meaningfully contribute to what we do as front-end engineers. It’s also a way to communicate with others outside of the
organisation who might benefit from our knowledge.</p>
<p>We’ll be featuring presentations by Princiya Marina Sequeira, who’ll be speaking about JavaScript’s misunderstood status
and how developers can have fun with it, and Dmitriy Kubyshkin, who will focus on frontend in a microservice world,
paying particular attention to possible strategies of a distributed frontend and the pros and cons of these strategies.</p>
<p>Join us on 3rd May, 2016 at Zalando Tech HQ in Berlin. You can find all the information you need about the event
<a href="http://www.meetup.com/Zalando-Tech-Events-Berlin/events/230488839/">here</a>.</p>An interview with Dublin's Startup Commissioner Niamh Bushnell2016-04-19T00:00:00+02:002016-04-19T00:00:00+02:00Deirdre O'Brientag:engineering.zalando.com,2016-04-19:/posts/2016/04/niamh-bushnell-interview.html<p>We chat with Dublin's Startup Commissioner before our Tech Culture Panel event.</p><p>If you’re in Dublin this week, then you’re in for a treat when it comes to Zalando events. Tomorrow, we’re hosting an
evening of interesting discussion, with the chance to catch up and share ideas with some of the companies innovating in
this space.</p>
<p>We'll be posing the question: “What is Tech Culture?”. We want to know how tech organisations foster a 'culture' to
support technical innovation and team engagement. Zalando Dublin will be represented by Graham O’Sullivan, Delivery Lead
at Zalando Dublin, who’ll be joined by other panel representatives from companies making their mark in Ireland.</p>
<p>In the lead up to this exciting panel, we had the opportunity to sit down with Niamh Bushnell, Dublin’s Startup
Commissioner and moderator for the event.</p>
<hr>
<p><em>Zalando Tech:</em> What is special about the ecosystem in Dublin that attracts best-in-class tech talent?</p>
<p><em>Niamh Bushnell:</em> There are a couple of aspects to Dublin in particular that help us stand out when it comes to tech.
Firstly, Travel Tech is a huge industry in Dublin. One in three trips organised globally touches Irish-built tech, so
it's a sector that's big and very important for us.</p>
<p>Secondly, most of the leading multinational tech companies in the world have a home in Dublin, including Zalando. Apart
from Travel Tech, Fintech, and more broadly, SaaS are huge sectors in Dublin and these companies are building product
side by side with multinational companies – Google, Amazon, Etsy, Dropbox, LinkedIn and more – in the same sectors and
hugely influenced by that. The cross pollination of tech talent that results in Dublin is pretty unique.</p>
<p>Our location also makes us unique, as we’re the bridge connecting Europe to the US. We’re quite US-centric, yet still
incredibly proud to be a European city. A lot of Venture Capitalists and Angel Investors from across the pond invest in
Irish companies thanks to the strong historical and cultural connections that have been established.</p>
<p><em>Zalando Tech:</em> What can Dublin learn from its competition in the startup space?</p>
<p><em>Niamh Bushnell:</em> We can learn a lot, definitely. What we’re unable to change in Dublin is our physical scale, but the
positive side of this is that it keeps us densely populated as a tech hub on top of being better connected. That said,
we're always looking at the larger scale hubs and seeing what they're doing, and how the ecosystem is adding value to
their startups and scale-ups in tech-related ways.</p>
<p>We're trying to encourage our startups to connect at an earlier stage with Venture Capital companies and Angel Investors
here in Ireland and internationally. It's something I sense comes more naturally to companies in a larger city. Here
companies feel like they don't want to be on the radar until they're "ready". But, funding is a relationship game and we
need to play it from the start to win!</p>
<p>I also sense that in mainland Europe there's more cross border R&D collaboration between companies in the ICT sector. I
think this is very positive and the more collaboration, the better.</p>
<p>Speaking of the US, the key startup cities like San Francisco, New York, and Boston are older and more mature than
Dublin, so there's an opportunity to learn from them. In this office, we’re constantly asking ourselves, is what we’re
doing really valuable to our startups from an education, funding, or markets perspective? I’m always trying to make sure
we don’t get caught up in the ‘Startup Industrial Complex’ as I call it, where being busy doesn’t always cut it.
Starting with questions about value is incredibly important and at the same time hard to quantify.</p>
<p><em>Zalando Tech:</em> What positive actions have you observed to support greater diversity, specifically in tech?</p>
<p><em>Niamh Bushnell:</em> The activity around female founders has really played a role in recognising the value of diversity and
how it contributes to innovation. Women receiving investment, celebrating female-led companies, all of these actions
have generated a lot of press which is a great way to foster role models. Role models are what breed the next generation
of entrepreneurs, and there aren't enough of them right now, but there’s definitely more recognition and acknowledgement
than there used to be.</p>
<p>People come to professions via different paths and through varied experiences, meaning your education doesn’t always fit
the standard criteria of a four year degree. It’s exciting that the workplace is opening up to new ways of recognising
experience, as this is a vital element in how greater diversity can be achieved.</p>
<p>These days, when you see a photo from big companies showcasing a certain team, it's odd to see them all belonging to one
gender or one race. You know that something is wrong with that picture. We have more than 1,200 startups in Dublin and
boast multinational companies that have become household names – constant cross-pollination and communication between
these spheres is a positive indication of greater diversity. Dublin is a great representation of the melting pot that is
tech, and we’ve become a go-to place in Europe for diverse influences.</p>
<p><em>Zalando Tech:</em> How important is trust in a start-up environment, especially in the context of engineers and management?</p>
<p><em>Niamh Bushnell:</em> Trust is everything, and not just in the context of interpersonal relationships at work. It’s trusting
the process, trusting in failure, experimentation, and risk, plus trusting in the culture. You’re buying into a tech
process when you enter these environments, you’re taking on a team, a culture, a lifestyle.</p>
<p>I would say passion has the same level of importance as trust. If passion and trust aren’t alive and kicking in your
daily work, then your company won’t function. Trusting in teams and the company vision needs to happen, as it touches
everyone in an organisation – trust needs to be the common denominator in the workplace.</p>
<p>A big thank you to Niamh for taking the time to speak with us before the panel. To RSVP for the event, which includes
plenty of time for questions after the discussion, check out our meetup page
<a href="http://www.meetup.com/Zalando-Tech-Events-Dublin/events/230203407/">here</a>.</p>Zester – Unit Tests on Steroids2016-04-18T00:00:00+02:002016-04-18T00:00:00+02:00Sebastian Montetag:engineering.zalando.com,2016-04-18:/posts/2016/04/zester-mutation-testing.html<p>Are the tests you write good at catching bugs? Sebastian Monte introduces Zalando's Zester.</p><p>The Zalando Testing Team creates tools that improve code quality for development teams. One way to improve code quality
is to write tests – and we do write a lot of tests! But are these tests good at catching bugs? If code is refactored and
the test suite is still green, can we be confident that we didn’t break anything? This post introduces a new tool that
verifies whether or not tests are actually able to find errors in the code.</p>
<h3>Mutation Testing</h3>
<p>We typically use code coverage to gain insight into how well software is tested. This is a rather bad metric: Code can
have a high coverage percentage even without any assertions in tests!
<a href="http://www.linozemtseva.com/research/2014/icse/coverage/">Research</a> also suggests that there isn’t a strong correlation
between coverage and test suite effectiveness.</p>
<p>One promising approach to measure test quality is mutation testing. The idea is simple: Faults are injected into the
source code and tests are run against faulty versions. If the tests start to fail, the tests are good at catching
errors. If everything is still green, the tests are doing a poor job.</p>
<p>A faulty version of the software is called a <em>mutant</em>. A <em>mutant</em> can, for example, contain a negated conditional:</p>
<div class="highlight"><pre><span></span><code><span class="k">if</span><span class="w"> </span><span class="ss">(</span><span class="nv">a</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nv">b</span><span class="ss">)</span><span class="w"> </span>{
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="nv">some</span><span class="w"> </span><span class="nv">logic</span>...
}
</code></pre></div>
<p>Which will be changed to:</p>
<div class="highlight"><pre><span></span><code><span class="k">if</span><span class="w"> </span><span class="ss">(</span><span class="nv">a</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">b</span><span class="ss">)</span><span class="w"> </span>{
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="nv">some</span><span class="w"> </span><span class="nv">logic</span>...
}
</code></pre></div>
<p>If a test fails for a <em>mutant</em>, the <em>mutant</em> is said to be <em>killed</em>. After mutation tests have been run, a mutation
score is calculated:</p>
<p><em>mutation score = number of mutants killed / total number of mutants</em></p>
<p>A higher ratio indicates a more effective test suite.</p>
<p><a href="http://pitest.org/">PIT</a> is a popular Java mutation testing tool. It provides Maven integration, command line tools,
and Ant support out of the box. Running PIT unfortunately breaks a developer’s work flow and I sincerely hope to see
mutation testing as part of a developer’s work in future. An IDE seems like the natural place for a mutation test
runner! I’d like to then introduce Zester, an IntelliJ IDEA plugin that makes running mutation tests a pleasant
experience for developers.</p>
<h3>Zester</h3>
<p><a href="https://github.com/zalando/zester">Zester</a> was developed in order to improve the quality of tests inside Zalando. It is
an IntelliJ IDEA plugin that uses PIT under the hood. It focuses on ease of use: Running mutation tests should be as
easy as running JUnit or TestNG tests inside an IDE.</p>
<p>After installing Zester, you can right click a Java file or a package in the Project navigator. The context menu offers
you an option to run mutation tests. After starting the tests, a run configuration is saved in similar fashion to JUnit
or TestNG configurations. If you need to provide more detailed configuration, you can always open the complete
configuration from the “Edit configurations…” menu.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6acc33cb1e8e3f3eef5e7d963bb34b66b6d48927_zester-image-1.png?auto=compress,format"></p>
<p>When mutation tests are executing, the console is updated with current progress.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ca8a1061c15a54907fc555d1a08014aaf01b0b24_zester-image-2.png?auto=compress,format"></p>
<p>Once mutation tests are finished, a link to the mutation test report is displayed in the console.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/500259b987fb545243d8cdcd3175a5e57f09b7ba_zester-image-3.png?auto=compress,format"></p>
<p>From the report we can see that the tests are still passing even with the negated condition at line 14. There is no test
to protect you from mistakenly modifying the condition. In larger projects, mistakes like this happen and your tests
should be able to detect these errors!</p>
<h3>Conclusion</h3>
<p>Today’s developers have a wide range of tools available that help them to test code. Mutation testing guarantees that
the tests catch every possible edge case and therefore should be part of a developer's toolkit. With Zester and PIT,
developers can write tests and refactor with confidence!</p>Stack Overflow questions you should read if you program in Java2016-04-15T00:00:00+02:002016-04-15T00:00:00+02:00Peter Lawreytag:engineering.zalando.com,2016-04-15:/posts/2016/04/stack-overflow-questions-java.html<p>Peter Lawrey shares some advice as the man with the most answers for Java on Stack Overflow.</p><p>There are common questions which come up repeatedly in Java. Even if you know the answer, it's worth getting a more
thorough understanding of what is happening in these cases.</p>
<h3>How do I compare Strings?</h3>
<p>The more general question is, how do I compare the contents of an Object? What is surprising when you use Java for the
first time is that if you have a variable like <em>String</em> which is a reference to an Object, not the Object itself. This
means when you use <em>==</em> you are only comparing references. Java has no syntactic sugar to hide this fact, so <em>==</em> only
compares references, not the contents of references.</p>
<p>If you’re in any doubt, Java only has primitives and references for data types up to Java 9 (in Java 10 it might value
value types). The other type is void, which is only used as a return type.</p>
<p>Some other questions you should check out:</p>
<ul>
<li><a href="http://stackoverflow.com/questions/40480/is-java-pass-by-reference-or-pass-by-value">Is Java Pass by Reference or Pass by
Value?</a></li>
<li><a href="http://stackoverflow.com/questions/513832/how-do-i-compare-strings-in-java">How do I compare Strings?</a></li>
<li><a href="http://stackoverflow.com/questions/1700081/why-does-128-128-return-false-but-127-127-return-true-in-this-code">Why does 128 <em>==</em> 128 return false but 127 <em>==</em> 127 return
true?</a></li>
</ul>
<h3>How do I avoid lots of != null?</h3>
<p>Checking for null is tedious, however unless you know a variable can't be null, there still a chance it will be null.
There are <em>@NotNull</em> annotations available for FindBugs and IntelliJ, which can help you detect null values where they
shouldn't be without extra coding. Optional can also play a role now.</p>
<p>Say you have:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="ss">(</span><span class="nv">a</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">null</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nv">a</span>.<span class="nv">getB</span><span class="ss">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">null</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nv">a</span>.<span class="nv">getB</span><span class="ss">()</span>.<span class="nv">getC</span><span class="ss">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">null</span><span class="ss">)</span><span class="w"> </span>{
<span class="w"> </span><span class="nv">a</span>.<span class="nv">getB</span><span class="ss">()</span>.<span class="nv">getC</span><span class="ss">()</span>.<span class="nv">doSomething</span><span class="ss">()</span><span class="c1">;</span>
<span class="w"> </span>}
</code></pre></div>
<p>Instead of checking for null, you can write:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="nt">Optional</span><span class="p">.</span><span class="nc">ofNullable</span><span class="o">(</span><span class="nt">a</span><span class="o">)</span>
<span class="w"> </span><span class="p">.</span><span class="nc">map</span><span class="o">(</span><span class="nt">A</span><span class="p">::</span><span class="nd">getB</span><span class="o">)</span>
<span class="w"> </span><span class="p">.</span><span class="nc">map</span><span class="o">(</span><span class="nt">B</span><span class="p">::</span><span class="nd">getC</span><span class="o">)</span>
<span class="w"> </span><span class="p">.</span><span class="nc">ifPresent</span><span class="o">(</span><span class="nt">C</span><span class="p">::</span><span class="nd">doSomething</span><span class="o">);</span>
</code></pre></div>
<p>See the answer for “ <a href="http://stackoverflow.com/questions/271526/avoiding-null-statements">Avoid null statements</a>” for
more information.</p>
<p>Other useful hints:</p>
<ul>
<li><a href="http://stackoverflow.com/questions/309424/read-convert-an-inputstream-to-a-string">Converting an InputStream to a
String</a></li>
<li><a href="http://stackoverflow.com/questions/363681/generating-random-integers-in-a-specific-range">How to generate numbers in a specific
range</a></li>
<li><a href="http://stackoverflow.com/questions/322715/when-to-use-linkedlist-over-arraylist">When to use a LinkedList instead of an
ArrayList</a></li>
<li><a href="http://stackoverflow.com/questions/215497/difference-between-public-default-protected-and-private">Difference between public, default, protected and
private</a></li>
<li><a href="http://stackoverflow.com/questions/1128723/how-can-i-test-if-an-array-contains-a-certain-value">How to test if an Array contains a
value</a></li>
<li><a href="http://stackoverflow.com/questions/541487/implements-runnable-vs-extends-thread">Why you should implement Runnable rather than extend
Thread</a></li>
<li><a href="http://stackoverflow.com/questions/65035/does-finally-always-execute-in-java">Does finally always execute?</a></li>
<li><a href="http://stackoverflow.com/questions/604424/convert-a-string-to-an-enum-in-java">How to convert a String to an Enum</a></li>
<li><a href="http://stackoverflow.com/questions/886955/breaking-out-of-nested-loops-in-java">Breaking out of nested loops</a></li>
</ul>
<p><a href="http://stackoverflow.com/questions/46898/how-to-efficiently-iterate-over-each-entry-in-a-map">How to effectively iterator over a
map</a> – Note that in Java 8
you can use:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="nf">map</span><span class="p">.</span><span class="n">forEach</span><span class="p">((</span><span class="n">k</span><span class="p">,</span><span class="w"> </span><span class="n">v</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">});</span>
</code></pre></div>
<p>For more posts about Java, see my blog <a href="http://vanillajava.blogspot.de/">Vanilla Java</a>. You can also contact me on
Twitter <a href="https://twitter.com/PeterLawrey">@PeterLawrey</a>.</p>Ana Peleteiro takes us on a data science tour of Dublin2016-04-12T00:00:00+02:002016-04-12T00:00:00+02:00Dr. Ana Peleteiro Ramallotag:engineering.zalando.com,2016-04-12:/posts/2016/04/dublin-data-science-tour.html<p>Get the rundown from one of our data scientists about her conference journey across Dublin.</p><p>Last month I had the opportunity to present an introduction to data science at the Women Who Code meetup in Dublin and
the R/Data mini-conference, as well as an introduction to Machine Learning at the Women TechMakers event. It was great
fun and such a good opportunity that I wanted to share my learnings with you.</p>
<p>The first stop of the month was at the <a href="http://www.meetup.com/Women-Who-Code-Dublin/">Women Who Code Dublin</a> meetup,
which is a branch of <a href="https://www.womenwhocode.com/">Women Who Code</a>. This is an organisation that has chapters all over
the world and inspires women to excel in the technology sector. This month, the meetup for the Dublin group was held in
our Zalando office in the city, with a great turnout overall (approx. 50 people). I gave an introduction into what data
science is, the main problems that we tackle in our daily jobs, and the different solutions and algorithms involved.
This presentation combined theory with some code examples in order to make it more interactive. There was a lot of
engagement on the topic and a huge amount of questions at the end of the talk. Not only that, but I was delighted to see
how people came early and stayed late to talk to each other, exchange ideas, ask more questions, and learn about
Zalando… this is the real spirit of meetups such as these.</p>
<p>My second speaking engagement was at the R/Data mini-conference that was organised by <a href="https://www.codinggrace.com/">Coding
Grace</a>, who are a group of developers based in Ireland that love to code and participate
in other geeky and not-so-crafty activities. The event took place at <a href="http://dogpatchlabs.com/">DogPatch Labs</a>, which is
a coworking space for scaling technology startups, located in the heart of Dublin’s Digital Docklands inside the
historic Chq building. The conference was a full day event where several speakers presented on different topics related
to data, as well as having several workshops running in parallel. I presented my introduction to data science, and was
once again thrilled to see that people were very engaged.</p>
<p>The third and final stop of the month was at the <a href="https://www.womentechmakers.com/">Women TechMakers</a> event, which took
place at Google’s offices in Dublin. In this presentation I focused on Machine Learning, explaining what it is and where
it can be useful. This was the biggest and most diverse crowd of the three events, and there were also a lot of
questions at the end of my presentation.</p>
<p>If you have any questions about data science or machine learning, please free to contact me on Twitter via
<a href="https://twitter.com/PeleteiroAna">@PeleteiroAna</a>.</p>Continuous Delivery pipelines of ghe-backups with Lizzy2016-04-11T00:00:00+02:002016-04-11T00:00:00+02:00Lothar Schulztag:engineering.zalando.com,2016-04-11:/posts/2016/04/ci-pipelines-with-lizzy.html<p>Lothar Schulz is back to focus on the Continuous Delivery of ghe-backup instances on AWS.</p><p>In my first post about <a href="https://tech.zalando.com/blog/multi-aws-github-enterprise-backup/">GitHub Enterprise backup</a>, I
outlined why the open source <a href="https://github.com/zalando/ghe-backup">ghe-backup project</a> was started: To provide <a href="https://help.github.com/enterprise/11.10.340/admin/articles/backing-up-enterprise-data/">GitHub
Enterprise Backups</a> based on
<a href="https://aws.amazon.com">AWS</a> with the <a href="https://stups.io/">STUPS</a> platform. The following post will focus on the
Continuous Delivery of <a href="https://github.com/zalando/ghe-backup">ghe-backup</a> instances on AWS.</p>
<p>Why do we care about Continuous Delivery of ghe-backups? Well, there are regular changes in
<a href="https://github.com/zalando-stups/taupage/commits/master">Taupage</a> and ghe-backups have to run an up-to-date version of
Taupage. Every important fix or change, as well as the recurring expiration of Taupage, requires a new deployment of
ghe-backup. Hence, a Continuous Delivery pipeline makes perfect sense.</p>
<p>In my role as Delivery Lead, my aim is to use the services provided by the teams that I’m accountable for. One of these
teams provides several managed <a href="http://jenkins-ci.org/">Jenkins</a> automation servers on AWS for Zalando Tech internal
customers. Naturally, I went for one of these setups to implement a Continuous Delivery pipeline.</p>
<p>The current delivery pipeline looks like this:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c6f3350f32932f67f7adefb6ed7652b2dc877664_ci-pipeline-1.png?auto=compress,format"></p>
<p>The pipeline view is based on the <a href="https://wiki.jenkins-ci.org/display/JENKINS/Build+Pipeline+Plugin">Jenkins Build Pipeline
Plugin</a>. This will most likely be moved to <a href="https://wiki.jenkins-ci.org/display/JENKINS/Jenkins+2.0#Jenkins2.0-RoughTimeline">Jenkins
2.0</a> once production ready.</p>
<p>Not every pipeline running requires a deployment. That's why the pipeline contains four manually triggered jobs
(highlighted above in blue). These are divided into two branches, one per AWS account to deploy to.</p>
<p>Successful deployments to the two separate AWS accounts results in the view below:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5366a9fe6bc0e3a13773669567a94423111141d1_ci-pipeline-2.png?auto=compress,format"></p>
<p>Basically, there are three common jobs for every pipeline run:</p>
<ul>
<li>Testing</li>
<li>Docker image creation</li>
<li>Docker image push</li>
</ul>
<p>Branching the pipeline is necessary since deployments are AWS-account specific. Different AWS accounts also require
different senza yaml configuration files. This <a href="https://gist.github.com/lotharschulz/c10f184d210cf984583c">gist</a>
provides a default sample senza yaml for ghe-backup.</p>
<p>The actual deployment happens using <a href="https://github.com/zalando/lizzy/">Lizzy</a>. Lizzy, as well as Jenkins based on AWS,
are both developed under teams that I’m responsible for.</p>
<p>Lizzy is a HTTPS wrapper around <a href="https://stups.io/senza/">senza</a> (a command line deployment tool to create and execute
AWS CloudFormation templates) that allows across AWS deployments. Whenever the <a href="https://github.com/zalando/lizzy-client">Lizzy
agent</a> is rolled out in an AWS account, the AWS account responsible defines
which permissions the Lizzy agent gets. This way, you can control what permission deployments shouldn’t be used within
your AWS account.</p>
<p>The last deployment step of the pipeline above calls Lizzy in the AWS account to deploy to. Lizzy starts the deployment
and reports back to the deployment job about its success (hopefully) or failures.</p>
<p>With the Continuous Delivery pipeline described above, we are now able to deploy any change within minutes – be it in
ghe-backup code, in Taupage across AWS accounts, or to both AWS accounts that run GitHub Enterprise backups. If
required, we could even configure the pipeline to deploy automatically, without us intervening.</p>
<p>If you have any further questions about the pipeline, I’d be happy to answer them!</p>We’re finalists for the 2016 SAP HANA Innovation Award!2016-04-08T00:00:00+02:002016-04-08T00:00:00+02:00Zalando Technologytag:engineering.zalando.com,2016-04-08:/posts/2016/04/sap-innovation-award-finalist.html<p>Cast your vote now to ensure Zalando remains a world class innovator.</p><p>We’ve just received the great news that Zalando has been selected as a Finalist for the 2016 SAP HANA Innovation Award!
With over 100 entries this year, we have been picked among the top three in the Process Simplifier category. This is
truly an honour, as being a Finalist shows that Zalando is recognised as being a world class innovator.</p>
<p>We’re creating the foundation for digital transformation moving forward, and you can be part of the journey. While the
judges have determined 80% of the vote, the rest is up to you! Public voting has just kicked off and we need your
support to come out on top. Here’s how you can make a difference:</p>
<p>Register and vote for Zalando <a href="https://ideas.sap.com/D32298">here</a> for your chance to win a sizable donation to charity
– only votes on Finalist entries count and you can vote once per category. Public voting closing on April 21st, 2016, so
after you <a href="https://ideas.sap.com/D32298">vote</a>, why not Tweet about it?</p>
<p>Use the hashtag <a href="https://twitter.com/search?q=%23hanastory&src=typd">#HANAStory</a> to broadcast your vote for Zalando.
We’ll be doing it too! Be sure to follow us on Twitter as well: <a href="https://twitter.com/Zalando">@Zalando</a> and
<a href="https://twitter.com/ZalandoTech">@ZalandoTech</a>.</p>
<p>Need some background on how we’ve become a Finalist?</p>
<p>With Zalando transforming itself into a platform, we had to prepare our systems for even more mass data processing and
high performance. In June 2015 we started to think about SAP HANA and the required Enterprise Resource Planning (ERP)
migration to HANA to link with a long-term Smart Data Strategy. We decided to set up a proof of concept for ERP on HANA
with two objectives:</p>
<ol>
<li>To find out which FI-CA operations run faster and - more importantly - which operations might slow down</li>
<li>To obtain an understanding of the entire migration process and all additional activities required</li>
</ol>
<p>Regarding the performance of different business transactions, we found several customer-specific enhancements which
ruined the program run on HANA. Furthermore, we discovered some SAP FI-CA standard functionality which didn’t perform as
expected. We fixed our customer-specific programs and SAP provided us with eight new notes in total for FI-CA, which
then tuned the standard functions.</p>
<p>After one test migration we could migrate ERP on HANA successfully by the end of November 2015, becoming the first SAP
FI-CA customer worldwide.</p>
<p>So what are you waiting for? <a href="https://ideas.sap.com/D32298">VOTE NOW!</a></p>EasyDI – Who wants some cake?2016-04-07T00:00:00+02:002016-04-07T00:00:00+02:00Eric Torreborretag:engineering.zalando.com,2016-04-07:/posts/2016/04/easy-di-library.html<p>Eric Torreborre presents another approach for DI that is both simple and flexible.</p><p>Scala developers have lots of options when it comes to doing <a href="https://en.wikipedia.org/wiki/Dependency_injection">Dependency
Injection</a> (or DI). The usual Java libraries can be used, like
<a href="http://docs.spring.io/autorepo/docs/spring/3.2.x/spring-framework-reference/html/beans.html">Spring</a>, or
<a href="https://github.com/google/guice">Guice</a> for Play developers.</p>
<p>But Scala being Scala, there are other options. You can use libraries leveraging macros, like
<a href="https://github.com/adamw/macwire">MacWire</a>, or use the Scala type system and the infamous <a href="http://jonasboner.com/real-world-scala-dependency-injection-di">“Cake
pattern”</a> and its <a href="http://www.warski.org/blog/2010/12/di-in-scala-cake-pattern">many
variations</a> using traits, with or without self-types.</p>
<p>Unfortunately, many teams have been burnt by the Cake Pattern and are looking for other solutions, often returning to
more or less involved DI libraries.</p>
<p>I want to take a step back and propose another approach for DI that is both simple and flexible, using basic Scala
features and … a special library, which is not a DI library at all!</p>
<h3>Back to requirements</h3>
<p>What are the basic things we can expect from DI?</p>
<ul>
<li>A way to define components</li>
<li>A way to configure them from files</li>
<li>The possibility to substitute some of them for testing, regardless of their nesting</li>
<li>The possibility to declare some of them as singletons</li>
</ul>
<p>Can we do that with some simple Scala? With just case classes and traits used as interfaces? Or to put it differently:
What’s wrong with <a href="https://en.wikipedia.org/wiki/Dependency_injection#Constructor_injection">constructor injection</a>
again?</p>
<h3>Define components</h3>
<p>With case classes and constructor injection we can declare our components like this:</p>
<div class="highlight"><pre><span></span><code><span class="k">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">ZalandoNotifier</span><span class="p">(</span><span class="nl">config</span><span class="p">:</span><span class="w"> </span><span class="n">NotifierConfig</span><span class="p">,</span><span class="w"> </span><span class="nl">email</span><span class="p">:</span><span class="w"> </span><span class="n">EmailService</span><span class="p">)</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">we</span><span class="w"> </span><span class="k">only</span><span class="w"> </span><span class="n">send</span><span class="w"> </span><span class="n">an</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">having</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">high</span><span class="w"> </span><span class="k">level</span><span class="w"> </span><span class="k">of</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">priority</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">defined</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">config</span><span class="w"> </span><span class="k">file</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">orderReady</span><span class="p">(</span><span class="k">order</span><span class="err">:</span><span class="w"> </span><span class="k">Order</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Future</span><span class="o">[</span><span class="n">Status</span><span class="o">]</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">config</span><span class="p">.</span><span class="n">sendFor</span><span class="p">(</span><span class="k">order</span><span class="p">.</span><span class="n">getPriority</span><span class="p">))</span><span class="w"> </span><span class="n">email</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="k">order</span><span class="p">.</span><span class="n">emailAddress</span><span class="p">,</span><span class="w"> </span><span class="ss">"Your order is ready!"</span><span class="p">)</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="n">Future</span><span class="p">.</span><span class="n">delay</span><span class="p">(</span><span class="n">Status</span><span class="p">.</span><span class="n">ok</span><span class="p">)</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">scalaz</span><span class="w"> </span><span class="n">Future</span>
<span class="err">}</span>
<span class="o">//</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">trait</span><span class="w"> </span><span class="n">used</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">simple</span><span class="w"> </span><span class="n">interface</span>
<span class="n">trait</span><span class="w"> </span><span class="n">EmailService</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">send</span><span class="w"> </span><span class="n">an</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">given</span><span class="w"> </span><span class="n">address</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Ok</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="n">could</span><span class="w"> </span><span class="n">be</span><span class="w"> </span><span class="n">sent</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">send</span><span class="p">(</span><span class="nl">address</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="nl">body</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Future</span><span class="o">[</span><span class="n">Status</span><span class="o">]</span>
<span class="err">}</span>
<span class="o">//</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="k">specific</span><span class="w"> </span><span class="n">implementation</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">EmailService</span>
<span class="k">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">JavaMailEmailService</span><span class="p">(</span><span class="nl">smtpHost</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="nl">smtpPort</span><span class="p">:</span><span class="w"> </span><span class="nc">Int</span><span class="p">)</span><span class="w"> </span><span class="n">extends</span><span class="w"> </span><span class="n">EmailService</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">send</span><span class="p">(</span><span class="nl">address</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="nl">body</span><span class="p">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">Future</span><span class="o">[</span><span class="n">Status</span><span class="o">]</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="vm">???</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="k">use</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">JavaMail</span><span class="w"> </span><span class="n">api</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">implement</span><span class="w"> </span><span class="n">this</span>
<span class="err">}</span>
</code></pre></div>
<h3>Create components</h3>
<p>Creating the <em>ZalandoNotifier</em> service from a configuration file is not very hard. Let’s pretend we have the following
classes to read from configuration files:</p>
<div class="highlight"><pre><span></span><code><span class="o">//</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">configuration</span><span class="w"> </span><span class="k">file</span>
<span class="n">trait</span><span class="w"> </span><span class="n">Config</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="k">get</span><span class="o">[</span><span class="n">A</span><span class="o">]</span><span class="p">(</span><span class="k">key</span><span class="err">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">ConfigError</span><span class="w"> </span><span class="n">Xor</span><span class="w"> </span><span class="n">A</span>
<span class="err">}</span>
<span class="o">//</span><span class="w"> </span><span class="n">Abstract</span><span class="w"> </span><span class="k">data</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">possible</span><span class="w"> </span><span class="n">errors</span>
<span class="o">//</span><span class="w"> </span><span class="n">happening</span><span class="w"> </span><span class="k">when</span><span class="w"> </span><span class="n">reading</span><span class="w"> </span><span class="n">configuration</span><span class="w"> </span><span class="n">files</span>
<span class="n">sealed</span><span class="w"> </span><span class="n">trait</span><span class="w"> </span><span class="n">ConfigError</span>
<span class="k">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">MissingKey</span><span class="p">(</span><span class="k">key</span><span class="err">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">extends</span><span class="w"> </span><span class="n">ConfigError</span>
<span class="k">object</span><span class="w"> </span><span class="n">ConfigError</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">render</span><span class="p">(</span><span class="nl">e</span><span class="p">:</span><span class="w"> </span><span class="n">ConfigError</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="n">MissingKey</span><span class="p">(</span><span class="k">key</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">s</span><span class="ss">"Key $key not found"</span>
<span class="w"> </span><span class="err">}</span>
<span class="err">}</span>
</code></pre></div>
<p>The <em>Xor</em> type is an <em>Either</em> type from the <em>cats</em> library. See
<a href="http://underscore.io/blog/posts/2015/06/10/an-introduction-to-cats.html">here</a> for an introduction to <em>cats</em>, <em>Xor</em> and
the <em>|@|</em> notation used below.</p>
<p>Creating the <em>ZalandoNotifier</em> service from the configuration file requires reading the <em>NotifierConfig</em>, the
<em>EmailService</em>, and creating a <em>ZalandoNotifier</em> from those 2 instances:</p>
<div class="highlight"><pre><span></span><code><span class="k">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">NotifierConfig</span><span class="p">(</span><span class="nl">priority</span><span class="p">:</span><span class="w"> </span><span class="nc">Int</span><span class="p">)</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">sendFor</span><span class="p">(</span><span class="nl">p</span><span class="p">:</span><span class="w"> </span><span class="nc">Int</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="k">Boolean</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">priority</span>
<span class="err">}</span>
<span class="o">//</span><span class="w"> </span><span class="k">create</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">NotifierConfig</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="k">file</span>
<span class="k">object</span><span class="w"> </span><span class="n">NotifierConfig</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">fromConfig</span><span class="p">(</span><span class="nl">config</span><span class="p">:</span><span class="w"> </span><span class="n">Config</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">ConfigError</span><span class="w"> </span><span class="n">Xor</span><span class="w"> </span><span class="n">NotifierConfig</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="n">config</span><span class="p">.</span><span class="k">get</span><span class="o">[</span><span class="n">Int</span><span class="o">]</span><span class="p">(</span><span class="ss">"priority"</span><span class="p">).</span><span class="k">map</span><span class="p">(</span><span class="n">NotifierConfig</span><span class="p">.</span><span class="n">apply</span><span class="p">)</span>
<span class="err">}</span>
<span class="o">//</span><span class="w"> </span><span class="k">create</span><span class="w"> </span><span class="n">an</span><span class="w"> </span><span class="n">EmailService</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="k">file</span>
<span class="k">object</span><span class="w"> </span><span class="n">EmailService</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">The</span><span class="w"> </span><span class="k">default</span><span class="w"> </span><span class="n">EmailService</span><span class="w"> </span><span class="n">instance</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="k">using</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">Java</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="n">API</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">fromConfig</span><span class="p">(</span><span class="nl">config</span><span class="p">:</span><span class="w"> </span><span class="n">Config</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">ConfigError</span><span class="w"> </span><span class="n">Xor</span><span class="w"> </span><span class="n">EmailService</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="p">(</span><span class="n">config</span><span class="p">.</span><span class="k">get</span><span class="o">[</span><span class="n">String</span><span class="o">]</span><span class="p">(</span><span class="ss">"smtp-host"</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="err">@</span><span class="o">|</span>
<span class="w"> </span><span class="n">config</span><span class="p">.</span><span class="k">get</span><span class="o">[</span><span class="n">Int</span><span class="o">]</span><span class="p">(</span><span class="ss">"smtp-port"</span><span class="p">)).</span><span class="k">map</span><span class="p">(</span><span class="n">JavaEmailService</span><span class="p">.</span><span class="n">apply</span><span class="p">)</span>
<span class="err">}</span>
<span class="o">//</span><span class="w"> </span><span class="k">create</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">ZalandoNotifier</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">its</span><span class="w"> </span><span class="n">components</span>
<span class="k">object</span><span class="w"> </span><span class="n">ZalandoNotifier</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">fromConfig</span><span class="p">(</span><span class="nl">config</span><span class="p">:</span><span class="w"> </span><span class="n">Config</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">ConfigError</span><span class="w"> </span><span class="n">Xor</span><span class="w"> </span><span class="n">EmailService</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="p">(</span><span class="n">NotifierConfig</span><span class="p">.</span><span class="n">fromConfig</span><span class="p">(</span><span class="n">config</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="err">@</span><span class="o">|</span>
<span class="w"> </span><span class="n">EmailService</span><span class="p">.</span><span class="n">fromConfig</span><span class="p">(</span><span class="n">config</span><span class="p">)).</span><span class="k">map</span><span class="p">(</span><span class="n">ZalandoNotifier</span><span class="p">.</span><span class="n">apply</span><span class="p">)</span>
<span class="err">}</span>
</code></pre></div>
<p>Assembling the <em>ZalandoNotifier</em> from its components is very simple here, we just combine the two parts into one. We
could have a more complex <em>fromConfig</em> method using a <em>for</em> comprehension and injecting different <em>EmailService</em>
instances based on a configuration parameter.</p>
<p>We have now assembled components from values in a configuration file, taking care of possible configuration errors. What
is the next difficulty?</p>
<h3>Testing</h3>
<p>The next difficulty is this one. Suppose we have a bigger application using the <em>ZalandoNotifier</em> and we want to switch
out the <em>JavaEmailService</em> with a mock implementation for testing. Using a naive approach for constructor injection, we
might want to do this:</p>
<div class="highlight"><pre><span></span><code>def createApplication(
orderService: OrderService,
notifierConfig: NotifierConfig,
emailService: EmailService): Application =
Application(orderService, ZalandoNotifier(notifierConfig, emailService))
def testApplication = {
// use a mocked email service
val app = createApplication(OrderService(), NotifierConfig(10), MockedEmailService())
// test the application now
}
</code></pre></div>
<p>This is really problematic because we think we need to expose all the components and their dependencies to the top-level
of the application to be able to build the exact component graph we want and substitute other components. This is
tedious and breaks all encapsulation.</p>
<p>Fortunately, there exists a library precisely aimed at replacing objects in graph:
<a href="https://bitbucket.org/inkytonik/kiama">Kiama</a>.</p>
<h3>Kiama</h3>
<p><a href="https://bitbucket.org/inkytonik/kiama">Kiama</a> is a library for language processing, a toolbox for parsing computer
languages, analysing and interpreting them.</p>
<p>We are going to use one feature of Kiama: Tree rewriting. As you probably know, computer languages are being parsed into
<a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax Trees</a> (ASTs). Most of the time these trees are
being rewritten to simpler trees, in order to remove some syntactic sugar or to optimise some constructs. For example, a
tree representing collection operations might rewrite two consecutive <em>map</em> operations into one for efficiency (a
<em>fusion</em> operation).</p>
<p>And what is an “application”, if not a tree of services and configuration objects? So, with the help of Kiama, we can
“rewrite” the application to replace some of its parts:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// return true if A implements the list of types defined by a given class tag */</span>
<span class="n">def</span><span class="w"> </span><span class="nf">implements</span><span class="p">(</span><span class="n">a</span><span class="o">:</span><span class="w"> </span><span class="n">Any</span><span class="p">)(</span><span class="n">implicit</span><span class="w"> </span><span class="n">ct</span><span class="o">:</span><span class="w"> </span><span class="n">ClassTag</span><span class="p">[</span><span class="n">_</span><span class="p">])</span><span class="o">:</span><span class="w"> </span><span class="kt">Boolean</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">types</span><span class="o">:</span><span class="w"> </span><span class="n">List</span><span class="p">[</span><span class="kt">Class</span><span class="p">[</span><span class="n">_</span><span class="p">]]</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="n">ct</span><span class="p">.</span><span class="n">runtimeClass</span><span class="w"> </span><span class="o">+:</span><span class="w"> </span><span class="n">ct</span><span class="p">.</span><span class="n">runtimeClass</span><span class="p">.</span><span class="n">getInterfaces</span><span class="p">.</span><span class="n">toList</span>
<span class="w"> </span><span class="n">types</span><span class="p">.</span><span class="n">forall</span><span class="p">(</span><span class="n">t</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">isAssignableFrom</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">getClass</span><span class="p">))</span>
<span class="p">}</span>
<span class="c1">// a Kiama Strategy to replace any node having the same type as T</span>
<span class="c1">// with another instance</span>
<span class="n">def</span><span class="w"> </span><span class="n">replaceStrategy</span><span class="p">[</span><span class="n">T</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">ClassTag</span><span class="p">](</span><span class="n">t</span><span class="o">:</span><span class="w"> </span><span class="n">T</span><span class="p">)</span><span class="o">:</span><span class="w"> </span><span class="n">Strategy</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="n">strategy</span><span class="p">[</span><span class="n">Any</span><span class="p">]</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">v</span><span class="w"> </span><span class="no">if</span><span class="w"> </span><span class="no">implements</span><span class="p">(</span><span class="no">v</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="no">Some</span><span class="p">(</span><span class="no">t</span><span class="p">)</span>
<span class="w"> </span><span class="no">case</span><span class="w"> </span><span class="no">other</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="no">None</span>
<span class="w"> </span><span class="err">}</span>
</code></pre></div>
<p>A Strategy in Kiama is more or less a partial function taking one of the nodes in the tree and returning an
<em>Option[A]</em> if it succeeds and <em>None</em> if it doesn’t. You can then use combinators to define where and how you want to
apply this Strategy. For example:</p>
<div class="highlight"><pre><span></span><code><span class="o">//</span><span class="w"> </span><span class="k">use</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">strategy</span><span class="w"> </span><span class="n">everywhere</span><span class="w"> </span><span class="n">you</span><span class="w"> </span><span class="n">can</span><span class="p">,</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="k">top</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">down</span>
<span class="n">def</span><span class="w"> </span><span class="n">replaceWithStrategy</span><span class="o">[</span><span class="n">G</span><span class="o">]</span><span class="p">(</span><span class="nl">strategy</span><span class="p">:</span><span class="w"> </span><span class="n">Strategy</span><span class="p">,</span><span class="w"> </span><span class="nl">graph</span><span class="p">:</span><span class="w"> </span><span class="n">G</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">G</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="n">rewrite</span><span class="p">(</span><span class="n">everywheretd</span><span class="p">(</span><span class="n">strategy</span><span class="p">))(</span><span class="n">graph</span><span class="p">)</span>
<span class="n">replaceWithStrategy</span><span class="p">(</span><span class="n">replaceStrategy</span><span class="o">[</span><span class="n">EmailService</span><span class="o">]</span><span class="p">(</span><span class="n">mock</span><span class="p">),</span><span class="w"> </span><span class="n">application</span><span class="p">)</span>
</code></pre></div>
<p>Or, with a bit of syntactic sugar:</p>
<div class="highlight"><pre><span></span><code><span class="n">application</span><span class="p">.</span><span class="nf">replace</span><span class="o">[</span><span class="n">EmailService</span><span class="o">]</span><span class="p">(</span><span class="n">mock</span><span class="p">)</span>
</code></pre></div>
<p>It really isn’t that difficult to write tests!</p>
<p>You can also conceive much more targeted replacements:</p>
<div class="highlight"><pre><span></span><code>case class Leg(foot: String)
case class Robot(left: Leg, right: Leg)
case class Application(robot: Robot, house: House)
// just replace one leg of the robot!
application.replace {
case Robot(left, right) => Robot(Leg(Foot("repaired")), right)
}
</code></pre></div>
<p>Now that we know how to deal with testing, what else do we want to do? Singletons!</p>
<h3>Singletons</h3>
<p>Applications are not just simple trees, but rather directed acyclic graphs where some nodes are being shared. For
example, many Scala applications need to share an <em>ExecutionContext</em> (and / or an <em>ActionSystem</em>, a <em>Materializer</em> if
working with <a href="http://akka.io">Akka</a>). Duplicating these contexts would waste resources.</p>
<p>When we build the application from the configuration file, we use independent <em>fromConfig</em> methods where each component
knows how to read the file and create itself. What if 2 components need an <em>ExecutionContext</em>?</p>
<div class="highlight"><pre><span></span><code>//<span class="w"> </span>Delay<span class="w"> </span>the<span class="w"> </span>evaluation<span class="w"> </span>of<span class="w"> </span>the<span class="w"> </span>ExecutionContext
case<span class="w"> </span>class<span class="w"> </span>ExecutionService(ec:<span class="w"> </span>Eval[ExecutionContext])
object<span class="w"> </span>ExecutionService<span class="w"> </span>{
<span class="w"> </span>def<span class="w"> </span>create(config:<span class="w"> </span>Config):<span class="w"> </span>ExecutionService<span class="w"> </span>=<span class="w"> </span>{
<span class="w"> </span>lazy<span class="w"> </span>val<span class="w"> </span>system:<span class="w"> </span>ActorSystem<span class="w"> </span>=<span class="w"> </span>ActorSystem("xxx",<span class="w"> </span>config)
<span class="w"> </span>lazy<span class="w"> </span>val<span class="w"> </span>executionContext:<span class="w"> </span>ExecutionContext<span class="w"> </span>=<span class="w"> </span>system.dispatcher
<span class="w"> </span>ExecutionService(Eval.later(executionContext))
<span class="w"> </span>}
}
//<span class="w"> </span>First<span class="w"> </span>component
case<span class="w"> </span>class<span class="w"> </span>Service1(config:<span class="w"> </span>C1,<span class="w"> </span>es:<span class="w"> </span>ExecutionService)
object<span class="w"> </span>Service1<span class="w"> </span>{
<span class="w"> </span>def<span class="w"> </span>fromConfig(config:<span class="w"> </span>Config):<span class="w"> </span>ConfigError<span class="w"> </span>Xor<span class="w"> </span>Service1<span class="w"> </span>=
<span class="w"> </span>C1.fromConfig(config).map(c1<span class="w"> </span>=><span class="w"> </span>Service1(c1,<span class="w"> </span>ExecutionService.create)
}
//<span class="w"> </span>Second<span class="w"> </span>component
case<span class="w"> </span>class<span class="w"> </span>Service2(config:<span class="w"> </span>C2,<span class="w"> </span>es:<span class="w"> </span>ExecutionService)
object<span class="w"> </span>Service2<span class="w"> </span>{
<span class="w"> </span>def<span class="w"> </span>fromConfig(config:<span class="w"> </span>Config):<span class="w"> </span>ConfigError<span class="w"> </span>Xor<span class="w"> </span>Service2<span class="w"> </span>=
<span class="w"> </span>C2.fromConfig(config).map(c2<span class="w"> </span>=><span class="w"> </span>Service2(c2,<span class="w"> </span>ExecutionService.create)
}
//<span class="w"> </span>The<span class="w"> </span>main<span class="w"> </span>Application
case<span class="w"> </span>Application(s1:<span class="w"> </span>Service1,<span class="w"> </span>s2:<span class="w"> </span>Service2)
object<span class="w"> </span>Application<span class="w"> </span>{
<span class="w"> </span>def<span class="w"> </span>fromConfig(config:<span class="w"> </span>Config):<span class="w"> </span>ConfigError<span class="w"> </span>Xor<span class="w"> </span>Application<span class="w"> </span>=
<span class="w"> </span>(Service1.fromConfig(config)<span class="w"> </span>|@|
<span class="w"> </span>Service2.fromConfig(config)).map(Application.apply)
}
</code></pre></div>
<p>Here we are duplicating the <em>ExecutionService</em> component in the <em>Application</em> instance, this can’t be good.</p>
<p>This could also be worse because at this stage we haven’t consumed any resource. The <em>ExecutionContext</em> is encapsulated
in an <em>ExecutionService</em> using the <em>cats.Eval</em> type to delay any evaluation of the <em>ExecutionContext</em>. Is there a way to
use only one instance of the <em>ExecutionService</em>? Yes, indeed, with one more Strategy:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// store the first instance of the target type and replace all nodes of the same type with that one</span>
<span class="n">def</span><span class="w"> </span><span class="n">singletonStrategy</span><span class="p">[</span><span class="n">T</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">ClassTag</span><span class="p">]</span><span class="o">:</span><span class="w"> </span><span class="n">Strategy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="n">t</span><span class="o">:</span><span class="w"> </span><span class="n">Option</span><span class="p">[</span><span class="n">T</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">None</span>
<span class="w"> </span><span class="n">strategy</span><span class="p">[</span><span class="n">Any</span><span class="p">]</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">v</span><span class="w"> </span><span class="no">if</span><span class="w"> </span><span class="no">implements</span><span class="p">(</span><span class="no">v</span><span class="p">)(</span><span class="no">implicitly</span><span class="p">[</span><span class="no">ClassTag</span><span class="p">[</span><span class="no">T</span><span class="p">]])</span><span class="w"> </span><span class="o">=></span>
<span class="w"> </span><span class="no">t</span><span class="w"> </span><span class="no">match</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="no">case</span><span class="w"> </span><span class="no">Some</span><span class="p">(</span><span class="no">singleton</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="no">Some</span><span class="p">(</span><span class="no">singleton</span><span class="p">)</span>
<span class="w"> </span><span class="no">case</span><span class="w"> </span><span class="no">None</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="no">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="no">Some</span><span class="p">(</span><span class="no">v</span><span class="p">.</span><span class="no">asInstanceOf</span><span class="p">[</span><span class="no">T</span><span class="p">])</span>
<span class="w"> </span><span class="no">Some</span><span class="p">(</span><span class="no">v</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="no">case</span><span class="w"> </span><span class="no">other</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="no">None</span>
<span class="w"> </span><span class="err">}</span>
<span class="err">}</span>
</code></pre></div>
<p>And with a bit of syntactic sugar, creating the full application becomes:</p>
<div class="highlight"><pre><span></span><code><span class="k">object</span><span class="w"> </span><span class="n">Application</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">fromConfig</span><span class="p">(</span><span class="nl">config</span><span class="p">:</span><span class="w"> </span><span class="n">Config</span><span class="p">)</span><span class="err">:</span><span class="w"> </span><span class="n">ConfigError</span><span class="w"> </span><span class="n">Xor</span><span class="w"> </span><span class="n">Application</span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="p">(</span><span class="n">Service1</span><span class="p">.</span><span class="n">fromConfig</span><span class="p">(</span><span class="n">config</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="err">@</span><span class="o">|</span>
<span class="w"> </span><span class="n">Service2</span><span class="p">.</span><span class="n">fromConfig</span><span class="p">(</span><span class="n">config</span><span class="p">)).</span><span class="k">map</span><span class="p">(</span><span class="n">Application</span><span class="p">.</span><span class="n">apply</span><span class="p">).</span><span class="k">map</span><span class="p">(</span><span class="n">_</span><span class="p">.</span><span class="n">singleton</span><span class="o">[</span><span class="n">ExecutionService</span><span class="o">]</span><span class="p">)</span>
<span class="err">}</span>
</code></pre></div>
<h3>Conclusion</h3>
<p>The technique presented here is really minimal: No annotations, <a href="https://github.com/google/guice/wiki/Bindings">no
modules/bindings</a>, no type system trickery. In addition, we can use other
tools in the Kiama toolbox, like <a href="http://wiki.kiama.googlecode.com/hg-history/02d6b58d5156633aea5e0dac9b5dec4dd0461d4a/papers/SCP11.pdf">attribute
grammars</a>, to
implement a topological sort in a few lines and start services in order.</p>
<p>The main drawback is the necessity to instantiate a full graph of objects before being able to modify it (the
<em>fromConfig</em> methods). On the other hand, specifying how every component can be instantiated from the configuration file
(and testing that!) needs to be done anyway.</p>
<p>I wouldn’t be surprised if large applications had new, unforeseen requirements that I’m not addressing right now, but
simple applications can definitely use this technique and larger applications could explore more Kiama strategies.</p>
<p>I also hope that this post gave you an incentive to explore Kiama (in version 2.0.0 as of now), as it’s an awesome
library full of possibilities.</p>Our multi AWS account GitHub Enterprise backup2016-04-05T00:00:00+02:002016-04-05T00:00:00+02:00Lothar Schulztag:engineering.zalando.com,2016-04-05:/posts/2016/04/multi-aws-github-enterprise-backup.html<p>A look at our GitHub Enterprise backup with a High Availability configuration based on AWS.</p><p>Last year Zalando Tech offered its engineers to host their company source code on <a href="https://enterprise.github.com">GitHub
Enterprise</a>. This was based on the feedback we received from <a href="https://tech.zalando.com/blog/zalando-techs-github-enterprise-and-stash-workshop/">Zalando Tech’s GitHub
Enterprise and Stash Workshop</a>.</p>
<p>At Zalando Tech, we’ve setup GitHub Enterprise with a <a href="https://help.github.com/enterprise/2.4/admin/guides/installation/high-availability-configuration/">High Availability
configuration</a> based
on <a href="https://aws.amazon.com">AWS</a>, including
<a href="https://help.github.com/enterprise/2.4/admin/guides/installation/backups-and-disaster-recovery/">backup</a>. Whenever the
main GitHub enterprise instance is down, you have two options with which to proceed:</p>
<ol>
<li>Promote the replication instance to a new main instance.</li>
<li>Restore from the backup instance.</li>
</ol>
<p>We were faced with two challenges to include an HA setup in Zalando Tech:</p>
<ol>
<li>Usage of <a href="http://stups.readthedocs.org/en/latest/components/taupage.html">Taupage AMI</a> with a Docker container on
top is mandatory to be Zalando company compliant on AWS. It’s an Amazon machine image created by Zalando STUPS. The
first backup approach on AWS was just an Ubuntu box.</li>
<li>Source code data might be affected if the AWS account where GitHub Enterprise instances run gets compromised. Often,
source code is a company’s intellectual property and you don’t want it to be exploited. On top of that, source code
affection may result in malicious code added without anyone noticing.</li>
</ol>
<p>AWS delivers scalable cloud-based IT infrastructure services and resources, and Docker images are designed as a
composition of layers. A minimal amount of data is sent over the network – like in the ghe-backup case to be deployed to
AWS – due to the composition of layers that a Docker image consists of.</p>
<p>Hence, both technologies make perfect sense for a GitHub Enterprise backup approach.</p>
<p>Based on this idea, we created the open source <a href="https://github.com/zalando/ghe-backup">ghe-backup project</a>. This will
help people out there facing the same challenges we had. Deploying updated backup instances into different AWS accounts
might be one such use case. However, using Taupage will not be a requirement for people outside of Zalando.</p>
<p>The basic idea of ghe-backup is to have two backup instances based on Taupage running in two AWS accounts:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4bf04b8d77cb5fee76f8a06551b7a34665c72e75_slide1.png?auto=compress,format"></p>
<p>Both backup instances fetch the latest changes every hour and persist the data on EBS volumes within the AWS accounts.</p>
<p>In order to connect a GitHub Enterprise Backup host with the master, the backup host requires a private ssh key that
matches its public ssh key present on the GitHub Enterprise primary instance.</p>
<p>The private ssh key can’t be shipped for security reasons with the actual deployment, therefore
<a href="https://aws.amazon.com/de/kms/">KMS</a> is used to encrypt the private key. The KMS key is defined within ghe-backup’s
senza yaml file:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="nl">kms_private_ssh_key:</span><span class="w"> </span><span class="s">"aws:kms:myAWSregion:123456789:key/myrandomstringwithnumbers123456567890"</span>
<span class="w"> </span><span class="p">...</span>
</code></pre></div>
<p>See <a href="https://gist.github.com/lotharschulz/c10f184d210cf984583c">here</a> for a full sample senza yaml file.</p>
<p>Once a deployment is triggered, the encrypted private ssh key value is fetched from KMS,
<a href="https://github.com/zalando/ghe-backup/blob/master/python/decryptkms.py">decrypted</a>, and written to a file. Most of this
is defined in the appropriate <a href="https://github.com/zalando/ghe-backup/blob/master/Dockerfile">Dockerfile</a>. Afterwards, a
<a href="https://github.com/zalando/ghe-backup/blob/master/cron-ghe-backup">cron job</a> triggers the actual hourly backups within
<a href="https://github.com/zalando/ghe-backup/blob/master/cron-ghe-backup">extended business hours</a>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3f9ce739498ce15c47c9c34c98bf759814e5edf1_blog-post-proposal--multi-aws-account-github-enterprise-backup---encrypted-private-ssh-key-deployment-flow.png?auto=compress,format"></p>
<p>The KMS key defined in senza yaml is taken on a deployment to fetch the encrypted private ssh that gets decrypted, and
put into the Docker container running inside of Taupage.</p>
<p>With the above method, we have GitHub Enterprise backups in separate AWS accounts as well as the backups that are
fetched in a company-compliant way using Taupage.</p>Joel Spolsky holds the fort at Zalando Tech2016-04-01T00:00:00+02:002016-04-01T00:00:00+02:00Natali Vlatkotag:engineering.zalando.com,2016-04-01:/posts/2016/04/joel-spolsky-at-zalando-tech.html<p>The CEO of Stack Overflow and creator of Trello gets cosy with Zalando Tech in Berlin.</p><p>When the opportunity to sit down and pick the brain of <a href="http://stackoverflow.com/">Stack Overflow</a> co-founder Joel
Spolsky came up, we here at Zalando Tech jumped at the chance. Hosting him in our very own Innovation Lab amongst a
developer-heavy audience, we had our VP of Engineering Eric Bowman ask him all manner of questions about programming,
the beginnings of Stack Overflow, and how his web-based project management app <a href="https://trello.com/">Trello</a> came to be.</p>
<p>We live tweeted the session via the hashtag <a href="https://twitter.com/hashtag/ZalandoTechxSO?src=hash">#ZalandoTechxSO</a> and
had an absolute blast. Spolsky is a confident and entertaining speaker who gave us an insight into the early ideas of
Stack Overflow and how it developed into the legedary tome of knowledge it is today. He also touched on issues such as
documentation, communication, and the all-holy element of flow in a developer's work process.</p>
<p>See how the session played out in our recording below. A big thanks to all parties involved for making this such a
successful event!</p>Apache Showdown: Flink vs. Spark2016-03-31T00:00:00+02:002016-03-31T00:00:00+02:00Javier Lopeztag:engineering.zalando.com,2016-03-31:/posts/2016/03/apache-showdown-flink-vs.-spark.html<p>Why we chose Apache Flink for the Saiki Data Integration Platform.</p><p><a href="https://tech.zalando.com/blog/data-integration-in-a-world-of-microservices/">Saiki</a> is Zalando’s next generation data
integration and distribution platform in a world of
<a href="https://tech.zalando.com/blog/so-youve-heard-about-radical-agility...-video/">microservices</a>. Saiki ingests data
generated by operational systems and makes it available to analytical systems. In this context, the opportunity to do
near real time business intelligence has presented itself and introduced the task of finding the right stream processing
framework. In this post, we will describe the evaluation and decision process, and show why Apache Flink best fulfilled
our requirements, as opposed to Spark.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7fab50904edff18f00a54f3f27866ce8d8a62040_saiki-blog-image-1.png?auto=compress,format"></p>
<p>Zalando’s operational systems continuously produce events and publish those to Saiki’s unified log (Apache Kafka). From
there, Saiki stores them on a data lake (Amazon S3 based) or pushes them directly to consuming analytical systems. The
data lake is a centralised, secure, cost efficient data storage solution and is accessed for retrieval by our data
warehouse (Oracle) and other analytical systems. This architecture enables us to do near real time business intelligence
for, but not limited to, the following use cases:</p>
<p><strong>Business process monitoring.</strong> A business process, in its simplest form, is a chain of business events. These
represent actions performed within the whole Zalando platform. For example, when a customer places an order, the
business event “ORDER CREATED” is generated by the responsible microservice. When said order is successfully processed
and the pertaining shipments sent, the event “PARCEL_SHIPPED” is generated. Our team needs to monitor these business
processes in order to quickly detect and act on anomalies. Continuing the example - one anomaly could be that the
aforementioned events occurred within an unexpected high time interval, exceeding a previously specified threshold.</p>
<p><strong>Continuous ETL.</strong> As our Oracle data warehouse struggles with increasingly high loads, we need to relinquish some of
its resources by doing a part of the ETL in a different system, to secure our future growth and our ability to scale
dynamically. The main cost factor is the joining of data belonging to different sources, e.g. order, shipping and
payment information. As this information is written to our unified log via event streams, we want to join these into an
integrated stream. Another aspect to consider is that we want to provide this data integration not only for the data
warehouse, but also for other analytical downstream systems.</p>
<p>For the evaluation process, we quickly came up with a list of potential candidates: Apache Spark, Storm, Flink and
Samza. All of them are open source top level Apache projects. Storm and Samza struck us as being too inflexible for
their lack of support for batch processing. Therefore, we shortened the list to two candidates: Apache Spark and Apache
Flink. For our evaluation we picked the available stable version of the frameworks at that time: Spark 1.5.2 and Flink
0.10.1.</p>
<h2>Requirements</h2>
<p>We formulated and prioritised our functional and non-functional requirements as follows:</p>
<p>First and foremost, we were looking for a highly performant framework with the ability to process events at a
consistently high rate with relatively low latency. As more and more of our operational systems are migrating to the
cloud and sending data to Saiki, we aimed for a scalable solution. The framework should be able to handle back pressure
gracefully (i.e. spikes in throughput) without user interaction. For both of our use cases, we expected the need to use
stateful computations extensively. Therefore storing, accessing and modifying state information efficiently was crucial.</p>
<p>We were also looking for a reliable system that would be capable of running jobs for months and remain resilient in the
event of failures. A high availability mode where shadow masters can resume the master node’s work upon failure was
needed. For the stateful computations, we require a checkpointing mechanism. Thus, the re-computation of a whole Kafka
topic would not be necessary and a job can resume its work from where it left off before a failure.</p>
<p>Further important aspects for us were the expressivity of the programming model and the handling of out-of-order events.
For the former, a rich and properly implemented operator library was of relevance. The programming model should enable
simple but precise reasoning on the event stream and on event times, e.g. time when an event occurred in the real world.
The latter aspect assumes imperfect streams of events, with events arriving to the system not in the order they
occurred. It implies that the system itself can take care of out-of-order events, thus relieving the user from
additional work.</p>
<p>Other notable functional requirements were the “exactly once” event processing guarantee, Apache Kafka and Amazon S3
connectors, and a simple user interface for monitoring the progress of running jobs and overall system load.</p>
<p>The non-functional requirements included good open source community support, proper documentation, and a mature
framework.</p>
<h2>Spark vs. Flink – Experiences and Feature Comparison</h2>
<p>In order to assess if and how Spark or Flink would fulfill our requirements, we proceeded as follows. Based on our two
initial use cases we built proofs of concept (POC) for both frameworks, implementing aggregations and monitoring on a
single input stream of events. Due to its similarity in requirements regarding state handling and due to time
limitations, we did not implement POCs for the join operation.</p>
<p>For the aggregation use case, events containing information on items belonging to orders are generated and published to
a single Kafka topic. These are read by the stream processing framework. The result contains the summed item prices and
the average item price for each order. The result is written back to a different Kafka topic. We use the state to store
and update the current sum and average item price for each order.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/bd8bd1951f0099ab43e087f8fad6166f0a2dee45_saiki-blog-image-2.png?auto=compress,format"></p>
<p>For the monitoring use case, the generated input stream contains pairs of correlated events. The first event in the pair
represents the previously mentioned “ORDER CREATED” business event and the second “PARCEL_SHIPPED” event. The time
difference between the timestamps of the first and last event is set to a random number of days. The events are
differentiated according to the time difference and a threshold. The event stream is split into two streams: Error and
normal. The error stream contains all events for which the specified threshold has been exceeded, while the normal
stream contains the rest. These streams are then written back to Kafka.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/891cdbdf83743298d5b672390cea62cd9c87afaa_saiki-blog-image-3.png?auto=compress,format"></p>
<p>The aggregation use case has been implemented successfully for both frameworks. Implementing the monitoring use case has
been more intuitive in Flink, mainly because of the existence of the split operator, for which there was no equivalent
in Spark.</p>
<p>Regarding the performance of both frameworks, Flink outperformed Spark for our stream processing use cases. Flink
offered a consistently lower latency than Spark at high throughputs. Increasing the throughput has a very limited effect
on the latency. For Spark, there is always a trade off between throughput and latency. The user must manually tune
Spark’s configuration depending on the desired outcome. This of course also incurs redeployment costs. Our experiences
were consistent with the <a href="https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at">Yahoo!
benchmark</a>.</p>
<p>Spark 1.5.2 and Flink 0.10.1 state implementations were dissimilar and delivered expectedly different results. Spark
implements the state as a distributed dataset, while Flink employs a distributed in memory key/value store. With
increasing state, Spark's performance constantly degrades, as it scans its entire state for each processed microbatch.
It remains reliable and does not crash. Flink only has to look up and update the stored value for a specific key. Its
performance is constantly high, but it may throw OutOfMemoryErrors and fail the computation. This is due to the fact
that it could not spill the state to disk. This issue has been discussed and addressed by the software company <a href="http://data-artisans.com/">data
Artisans</a>. The current 1.0.0 version offers the possibility to use an out-of-core state based
on RocksDB.</p>
<p>Based on our experiences, we summarised and assessed the features most relevant to our requirements of Spark and Flink
in the following table:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8faf9353a1213b224d0285135c2b6ab6bce97ffe_saiki-table.png?auto=compress,format"></p>
<p>Notes:
1. After our evaluation, Spark 1.6 introduced a key/value store for the state.
2. The current Flink 1.0.0 version offers the possibility to use an out-of-core state based on RocksDB.</p>
<h2>Flink for Saiki</h2>
<p>Why did we end up choosing Apache Flink to be Saiki’s stream processing framework? Here are our reasons:</p>
<ul>
<li>Flink processes event streams at high throughputs with consistently low latencies. It provides an efficient, easy to
use, key/value based state.</li>
<li>Flink is a true stream processing framework. It processes events one at a time and each event has its own time
window. Complex semantics can be easily implemented using Flink’s rich programming model. Reasoning on the event
stream is easier than in the case of micro-batching. Stream imperfections like out-of-order events can be easily
handled using the framework’s event time processing support.</li>
<li>Flink’s support is perceivably better than Spark’s. We have direct contact to its developers and they are eager to
improve their product and address user issues like ours. Flink originates from Berlin’s academia, and a steady flow
of graduates with Flink skills from Berlin’s universities is almost guaranteed.</li>
</ul>
<p>Our team is currently working on implementing a solution for near real time business process monitoring with Flink. We
are continuously learning from the Flink community and we’re looking forward to being an active part of it. The release
of <a href="http://flink.apache.org/news/2016/03/08/release-1.0.0.html">Flink 1.0.0</a> has only strengthened our efforts in
pursuing this path.</p>Your favourite franchises are having an open source love affair with tech2016-03-29T00:00:00+02:002016-03-29T00:00:00+02:00Natali Vlatkotag:engineering.zalando.com,2016-03-29:/posts/2016/03/your-favourite-franchises-are-having-an-open-source-love-affair-with-tech.html<p>The coupling of pop culture and tech via accessible data is on our radar.</p><p>The tech world has been slowly permeating popular culture for some time now, with TV shows like Mr Robot and Silicon
Valley making their mark, all the way to Apple's Worldwide Developers Conference being treated like a rock concert
instead of the collection of release announcements it is. Apple is also further extending its reach and <a href="http://www.techtimes.com/articles/133308/20160213/apple-is-filming-dark-drama-original-tv-series-called-vital-signs-starring-dr-dre-report.htm">creating an
original TV
series</a>
called ‘Vital Signs’ starring Dr. Dre. Cool, right? With everyone's fingers in every pie, it's only a matter of time
before your favourite books, movies, and TV shows become inundated by tech's iterations.</p>
<p>Two franchises that have seen a lot of love in recent times are the revived Star Wars trilogies and the violent,
incestuous, fantasy-series-turned-TV-leviathan Game of Thrones. And thanks to their growing popularity, they now have
their own APIs.</p>
<p>Released <a href="https://www.reddit.com/r/asoiaf/comments/45lt0o/spoilers_everything_introducing_an_api_of_ice_and/">last
month</a>, the <a href="https://anapioficeandfire.com/">Game of
Thrones API</a> is an open source collection of quantified and structured data granting
access to most books, characters, and family houses of the series. The term “most” is the give-away here: The project is
open source, meaning it also needs further contributions for the data to be complete. To take part, you'll need to fork
the repository, implement your new functionality (or bug fix), write tests, and submit a pull request for the master
branch to be updated. Being open, no authentication is required to query the API for data, however it'll only support
GET-ting data. The API automatically paginates responses, too.</p>
<p>Joakim Skoog, the API's creator, has called for fellow Game of Thrones aficionados to commit language specific wrappers
and libraries to the project. A <a href="https://github.com/alexwebb2/node-api-iceandfire">Node.js library</a> has already been
created, which supplements the original <a href="https://msdn.microsoft.com/en-us/library/dn448365(v=vs.118).aspx">ASP.NET Web API
2</a> and <a href="https://msdn.microsoft.com/en-us/data/ef.aspx">Entity
Framework</a> the API is built on. There's also additional libraries for
<a href="https://github.com/murphb52/IceAndFireKit">Swift</a>, <a href="https://github.com/afram/iceandfire-graphql">GraphQL</a>, and
<a href="https://github.com/joakimskoog/anapioficeandfire-python">Python</a>.</p>
<p>Skoog was inspired by the <a href="http://swapi.co/">Star Wars API</a>, affectionately known as “SWAPI”. Here, all seven Star Wars
films have been categorically quantified and made programmatically-accessible through an HTTP API. Encodings are
provided in JSON and libraries exist for Python, JavaScript, Java, Go, Ruby, Angular and Objective-C. All resources
support JSON Schema and rate limiting has been implemented to ensure the service can handle a potentially large amount
of traffic.</p>
<p>The API delivers data via a “RESTish” implementation using Django and the <a href="https://django-rest-framework.org/">Django REST
Framework</a>. Paul Hallett, who developed the API around Christmas 2014, had
previously assembled the <a href="http://pokeapi.co/">Pokémon API</a>, and says that “if you provide data easily, someone will
consume it”. APIs like this provide developers with the resources they need to enjoy their favourite pastimes in a way
crafted specifically for them, taking their experience with these popular shows to an almost personal level.</p>
<p>This coupling of pop culture and tech via accessible data opens up a new way to consume our favourite books and movies,
on top of allowing the uninitiated to learn more about series canon. Anyone can get involved in these projects, making
access the key theme that connects pop culture and evolving tech. Here at Zalando, we love a bit of franchise action, so
don't be surprised if you find our techies contributing sometime soon!</p>How far will Apps take the shopping experience?2016-03-23T00:00:00+01:002016-03-23T00:00:00+01:00Nuzhat Naweedtag:engineering.zalando.com,2016-03-23:/posts/2016/03/how-far-will-apps-take-the-shopping-experience.html<p>Apps are pushing the envelope when it comes to a richer shopping experience.</p><p>Apps have earned their place amongst the smartphone-savvy, but Monday’s Apple announcement signaled a new found
importance in the way consumers will interact with apps in the future. The recently unveiled Apple TV upgrade will allow
customers to control their apps in new ways, which begs the question of where our Fashion Store app fits into the
equation.</p>
<p>Tim Cook <a href="http://fortune.com/2016/03/21/apple-tv-software-update-event/">announced</a> that over 5,000 apps are now
available via the App Store on Apple TV, and last year we were greeted with the news that shopping using e-commerce apps
was now a possibility on the Apple TV. While the major product news in Apple's Keynote centered on smaller products, the
allure of the big screen isn't lost on us. We believe that shopping through your TV may soon become the norm, but the
gravity of mobile's effect on the market shouldn't be sidestepped just yet. We want to be with our customers on every
step of their shopping journey, and right now their feet are firmly planted in the mobile world.</p>
<p><a href="https://tech.zalando.com/blog/oh-appy-day/">Earlier this year</a> we relaunched our Fashion App with dedicated mobile-only
content, as a push towards our mobile-first approach. We want customers to engage with an app that feels personal and
inspirational, which is why we've created quality content specifically for mobile users. Videos, editorials, and
scrollable look-books submitted by style influencers have been specially curated, with the Zalando Mobile Team behind
the app ensuring that all of our 15 markets are recognised and captivated.</p>
<p>Apple added a dedicated <a href="http://techcrunch.com/2015/11/06/apple-debuts-a-new-shopping-category-on-the-app-store/">Shopping
category</a> to the App store in
November 2015, which includes apps for mobile banking and coupons. Apps that support Apple Pay are also included. This
is a significant change, moving shopping apps out of the Lifestyle section and placing m-commerce in the spotlight.
What's important to note here is that this new category doesn't merely contain apps where you can shop, but also
features apps where customers can follow the latest trends, find influential content and product reviews, and be
inspired to discover new brands. This singular category's concept is exactly what we aim to deliver with our Zalando
Fashion Store app, where the customer's needs and desires are our top priority.</p>
<p>More than 60% of Zalando customers visit the Fashion Store from mobile devices, with that number increasing every
quarter. Our customers are growing more accustomed to the dynamic experience of mobile, and we're gearing up to make
those experiences more magnetic and engaging.</p>
<p>As a mobile-first company, we want to drive a mobile mindshift more than ever. We'll be diving deep into the app build
in the coming months to show you just how committed we are to this ethos, and ensuring our place in Europe's m-commerce
market is stronger than ever.</p>Selenium WebDriver Explained2016-03-16T00:00:00+01:002016-03-16T00:00:00+01:00Sebastian Montetag:engineering.zalando.com,2016-03-16:/posts/2016/03/selenium-webdriver-explained.html<p>Providing the best UI automation support for development teams.</p><p>The testing team at Zalando is using and extending the Selenium library in order to provide the best UI automation
support for development teams. Selenium is a big library and there are many technologies bundled within it: browser
drivers, WebDriver Protocol, Selenium clients and so on. If you feel lost in the Selenium jungle, you should keep on
reading. After this post you’ll feel a bit more confident about using Selenium, with a basic understanding of how it
works. This post will demonstrate what Selenium does under the hood when we are requesting an URL and getting the title
of an HTML page.</p>
<h2>Our Simple Test</h2>
<p>Let’s start with a simple test to check the page title of: <a href="https://www.zalando.de">https://www.zalando.de</a>:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/acefbd7a02daec98d94a828012a6af2c3d8964a7_screen-shot-2016-03-16-at-10.22.16.png?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d04f97dc5b86010dafacb226c7450dcc4668dbe9_screen-shot-2016-03-16-at-10.23.05.png?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5a9956e92473abb36afc7f11498a562f3ab300a7_screen-shot-2016-03-16-at-10.23.59.png?auto=compress,format"></p>
<p>But when we try to run the test, it doesn’t work:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/100515881ace29861c72effeb6764c5648c81d46_screen-shot-2016-03-16-at-10.25.44.png?auto=compress,format"></p>
<p>Selenium wants to know the path to a ChromeDriver. But what is a ChromeDriver, exactly?</p>
<p>The release of Selenium 2 included the introduction of WebDriver, a tool that’s responsible for controlling the browser
running the automated tests. The WebDriver for Chrome can be downloaded
<a href="https://sites.google.com/a/chromium.org/chromedriver/downloads">here</a>. When you execute the driver, this message
appears in the terminal:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4f618fb614134a06b1598d74baf458674ecb1580_screen-shot-2016-03-16-at-10.27.26.png?auto=compress,format"></p>
<p>The ChromeDriver reserves a port from the machine its running on; in this case port 9515. All WebDrivers work in a
similar fashion: they start up a server that receives commands for controlling the browser (for example, setting a
browser cookie.) Commands must follow the <a href="https://w3c.github.io/webdriver/webdriver-spec.html#protocol">WebDriver
Protocol</a> format, which defines a RESTish JSON API that
all WebDrivers must implement. Most programming languages have decent libraries for JSON and HTTP, so clients
interacting with the WebDriver are relatively easy to implement.</p>
<p>Now let's see how we can use the WebDriver Protocol to get the page title of: <a href="https://www.zalando.de">https://www.zalando.de</a>. First, we
execute the ChromeDriver and wait for the server to start. When the server is ready, we can send the following request
to create a new browser session:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3f8d4e5cd8f151b1582c389587a8af9d562d979c_screen-shot-2016-03-16-at-10.30.23.png?auto=compress,format"></p>
<p>Body:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/44e521485f0160ea7c56a15f783858094cba60b4_screen-shot-2016-03-16-at-10.31.08.png?auto=compress,format"></p>
<p>Response:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/278deda381b750e75cc3a0573808d80157b8fe70_screen-shot-2016-03-16-at-10.31.47.png?auto=compress,format"></p>
<p>The server responds with a sessionId—in this case, e4b3adf2fe9b10fbabd6611d1bb50c93. Next, we want the browser to
navigate to <a href="https://www.zalando.de">https://www.zalando.de</a>:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a541e2e9e9bbc122b7af5a269eb331e456b1ca85_screen-shot-2016-03-16-at-10.33.33.png?auto=compress,format"></p>
<p>Body:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ef2e5dfd9ab10ea5942ba748c91e21e06833c23c_screen-shot-2016-03-16-at-10.34.18.png?auto=compress,format"></p>
<p>When the command is sent, the browser navigates to https://www.zalando.de/ just as a real user would. Our final step is
to GET the page title:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/58389dbe72fd696b53b7e8ef75680cb8925498b0_screen-shot-2016-03-16-at-10.34.53.png?auto=compress,format"></p>
<p>The response:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/007ce7aaabcadb3cd2e9be5d2e4daf62c77fe09b_screen-shot-2016-03-16-at-10.35.28.png?auto=compress,format"></p>
<p>And there we have it, our title :)</p>
<p>Usually developers do not use this API directly, but instead download a client library for their programming language.
The languages that <a href="http://www.seleniumhq.org/download/">Selenium officially supports</a> are Java, C#, Ruby, Python and
JavaScript (Node).</p>
<h2>Java Client</h2>
<p>Our team is mostly working with Java, so let’s take a deeper look at the Java client. Now that we have covered how
Selenium works with the REST API, you hopefully have a better idea of how a Selenium client works. Basically, as an HTTP
client for the WebDriver server when we are creating a new ChromeDriver with:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f0dfe76b9d44280e049f3ea50ec5aae1b8720d7b_screen-shot-2016-03-16-at-10.38.35.png?auto=compress,format"></p>
<p>We end up launching the Chrome driver that we downloaded earlier. The call goes through the <a href="https://github.com/SeleniumHQ/selenium/blob/master/java/client/src/org/openqa/selenium/remote/RemoteWebDriver.java#L238">startSession(Capabilities
desiredCapabilities, Capabilities
requiredCapabilities)</a>
method, and a new browser session is created. Eventually this method makes a POST request to:
http://localhost:9515/session.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/50b200fe6d03c8cf9785b39f04bddae3ac64bb87_screen-shot-2016-03-16-at-10.40.29.png?auto=compress,format"></p>
<p>The above method will result in an HTTP POST request to: http://localhost:9515/session/:sessionId/url. If you are
interested, you can check how Java code is mapped to WebDriver Protocol URLs in the
<a href="https://github.com/SeleniumHQ/selenium/blob/master/java/client/src/org/openqa/selenium/remote/http/JsonHttpCommandCodec.java">JsonHttpCommandCodec</a>
class.</p>
<h2>Inside the Chrome WebDriver</h2>
<p>So far we have covered what goes on in the client side. Now, let’s look at what happens inside the Chrome WebDriver
server when it accepts the WebDriver Protocol messages. For example, what happens when we do a request like this:
http://localhost:9515/session/e4b3adf2fe9b10fbabd6611d1bb50c93/title?</p>
<p>To find out, I cloned the <a href="https://chromium.googlesource.com/chromium/src">chromium repo</a>. After some searching for code
that handles incoming WebDriver Protocol requests, I found the actual file at:
chrome/test/chromedriver/server/http_handler.cc. The file maps different WebDriver Protocol URLs to C++ functions. In
the case of /session/:sessionId/title, the following function gets executed:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/338d4d81aeba5abb8552e326cb63f6508ca9fbc9_screen-shot-2016-03-17-at-11.24.06.png?auto=compress,format"></p>
<p>The document title is received with a JavaScript function. The web_view->CallFunction uses <a href="https://developer.chrome.com/devtools/docs/protocol/1.1/index">Remote Debugging
Protocol</a> to send a command to Chrome browser. This works
in similar fashion to the WebDriver Protocol in that the commands are sent in JSON format, but in this case, they’re
targeted to Chrome’s remote debugging port.</p>
<p>Here’s an example of a Remote Debugging Protocol message that gets the title of a page:
{ "id": 2, "method": "Runtime.evaluate", "params": { "expression": "(function() { return document.title;}).apply(null,
[null, null, null])" }}
It is possible to evaluate any JavaScript expression on global object with the Runtime.evaluate method. No magic here!</p>
<h2>Conclusion</h2>
<p>Many things happen when we verify that a page has a correct title. Luckily for us, Selenium offers a simple-to-use
WebDriver APIs for different browsers in various languages that hides all the complexity. Want to learn more about
Selenium WebDriver? Tweet us <a href="https://twitter.com/ZalandoTech">@ZalandoTech</a> to let us know want you want our team to
write about next!</p>Portfolio advice for UX Interaction Designers2016-03-15T00:00:00+01:002016-03-15T00:00:00+01:00Jay Kaufmanntag:engineering.zalando.com,2016-03-15:/posts/2016/03/portfolio-advice-for-ux-interaction-designers.html<p>Top-line tips for Interaction Designers to convey what they do.</p><p>Do you do UX design? Is it obvious with one quick glance at your portfolio?</p>
<p>I recently reviewed and rejected an application to our “UX Designer for B2B solutions” job opportunity because the
conceptual experience wasn’t evident to me. I wrote in the rejection letter:</p>
<div class="highlight"><pre><span></span><code>“Your portfolio shows screens rather than how you got there. We have a role distinction at Zalando between UX Interaction Design and Visual UI Design. You look like a strong UI VD, but on the B2B team this role is already filled.”
</code></pre></div>
<p>The designer wrote back thanking me for feedback, clarifying that she does in fact have 5 years of UX design experience
and asking for advice about how to improve her portfolio.</p>
<p>Here is some top-line advice specifically to UX IxDs about how to present your work:</p>
<ul>
<li><strong>Make a PDF</strong> or descriptive website. Seeing a portfolio on Behance or Dribble implies that the designer is focused
on Visual UI Design. These platforms have a clear focus, so even if you hack them, it's just the wrong place to
strut your IxD stuff. Besides being surrounded the wrong crowd, on Dribble you don't get enough room to explain the
back story and creating a narrative is difficult.</li>
<li><strong>Show raw artefacts</strong> from the process -- enough to give hiring managers insight into your conceptual, IA, and
interaction design work. UX-focused recruiter <a href="http://www.experiencetalent.co.uk/blog/ux-portfolio-guide.html">Sean
Pook</a> once told me that he advises designers to “put
some pictures of Post-Its” into their portfolios. Wireframes, flow diagrams and colourful clusters of sticky notes
do in fact make it clear to me at a glance that you do a range of UX work.</li>
<li><strong>Customize your portfolio</strong> to the company and role you are applying to and really hone in on the work most
relevant to the job. Personally I don’t absolutely need to see specific domain knowledge (i.e. fashion, eCommerce,
or a WYSIWYG CMS) but I do want to see something relevant (i.e. complex interaction patterns as opposed to "simple"
or "marketing" web design work).</li>
</ul>
<p>Note that this advice is to make UX Designers, Interaction Designers, Information Architects, or Full-Stack Product
Designers stand out from their Visual UI Designer colleagues.</p>
<p>Are you a Visual Designer or UI Designer? Ignore most of the above. We do appreciate if you share your work with the
community. And we prefer pixels over Post-Its. And we especially want to talk to you. <a href="https://tech.zalando.com/jobs/ux/">Here’s who we’re looking
for</a>.</p>Streaming Huge Databases Using Logical Decoding2016-02-23T00:00:00+01:002016-02-23T00:00:00+01:00Oleksandr Shulgintag:engineering.zalando.com,2016-02-23:/posts/2016/02/streaming-huge-databases-using-logical-decoding.html<p>Practical aspects of extracting consistent data snapshots from a PostgreSQL database.</p><p>Zalando’s database engineering team recently spoke at <a href="http://fosdem2016.pgconf.eu/">FOSDEM PGDay</a>, an event hosted by
the PostgreSQL community the day before the <a href="https://fosdem.org/2016/">FOSDEM</a> conference in Brussels. I had the
opportunity to share insights on Streaming Huge Databases using Logical Decoding.</p>
<p>Logical decoding is a new feature of PostgreSQL (since version 9.4) that allows streaming database changes in a custom
format. My talk explains what potential problems one may encounter while extracting consistent snapshot of a big
database and approaches to mitigate these problems. A performance comparison of different existing Logical Decoding
plugins is discussed, and a unified interface to stream pre-existing data through a plugin output function is proposed.
Finally, I discussed some potential performance bottlenecks and their mitigation means.</p>
<p>Watch my talk below and let me know your thoughts:</p>
<p><strong><a href="https://www.slideshare.net/AlexanderShulgin3/streaming-huge-databases-using-logical-decoding" title="Streaming huge databases using logical decoding">Streaming huge databases using logical
decoding</a></strong>
from <strong><a href="http://www.slideshare.net/AlexanderShulgin3">Alexander Shulgin</a></strong></p>Student CVs for UX careers: Tips & tricks2016-02-23T00:00:00+01:002016-02-23T00:00:00+01:00Carina Kuhrtag:engineering.zalando.com,2016-02-23:/posts/2016/02/student-cvs-for-ux-careers-tips--tricks.html<p>Here we’d like to share some insights from our experience reviewing myriad UX applications.</p><p>Are you a student looking to get into user experience (UX) design or research?
We recently hosted local students at the Zalando Human Factors Student Day -- offering a CV critique and Q&A session
about applying to UX jobs.</p>
<p>Inspired by their questions, we’d like to share here some insights from our experience reviewing myriad UX applications.</p>
<p>The goal? To help you create the best CV/resume for launching your career.**
**</p>
<h3><strong>What are UX employers looking for?</strong></h3>
<p>If you’re just starting out in UX, it’s important to hone your message. Since you likely won’t have a lot of work to
show, what do hiring managers want to see to convince them you can break into UX?</p>
<ul>
<li><strong>Motivation to work in UX.</strong> Make your interest crystal clear through an objective statement and/or clear
prioritization of relevant experiences throughout your resume.</li>
<li><strong>Relevant academic background.</strong> A broad range of studies could lead to you a career in User Experience:
Psychology, Sociology, Communication Design, etc. Some universities offer programs in Human Factors or
Human-Computer Interaction.</li>
<li><strong>Experience in methods.</strong> Seek out internships, class case studies and personal projects so that you can include
some keywords around relevant processes. Conducted a survey? Done a card sort? Created a wireframe? Tell us!</li>
</ul>
<h3><strong>How to present yourself?</strong></h3>
<p>How can you possibly summarize yourself, with all your experiences and potential, into a 2-3 page CV? What angle will
leave a great first impression? Here are our practical tips for letting your profile shine:</p>
<ul>
<li><strong>Use your name as a headline.</strong> About half the CVs that we saw at our Student Day had “Curriculum Vitae” as the
headline. The first, largest information on your resume should be your name. The format tells us it’s a CV.</li>
<li><strong>Put education first.</strong> If your work experience is only internships to date, create clarity by putting the focus on
your current studies. If you’re applying for an entry-level job, include your expected graduation date so we
understand your availability at a glance. Keep education relevant: we don’t care what elementary school you
attended. After you’ve had your first job, reverse your CV then to put experience first.</li>
<li><strong>The most recent is the most important.</strong> We’re glad to see that students have seen the value of reverse
chronological order. Most of the student CVs we see have rejected the classic German Lebenslauf (which uses
chronological order) in favor of the new international standard of listing your most recent experience or education
first.</li>
<li><strong>Describe your experience.</strong> Give some context for every station in your CV by writing a very brief description or
including up to 3 bullet points explaining what you did or learned. If you have lots of experience, you can leave
the description off of the oldest if the title is clear.</li>
<li><strong>Put your education in context.</strong> Is your program of study clear? Human Factors is relatively unknown in Germany,
so explain briefly what the program is about. You could write, for instance, “M.Sc. Human Factors (Human-Computer
Interaction)”.</li>
<li><strong>Rate your language proficiency.</strong> Don’t forget to list your native language. We see a lot of international resumes
and it’s not always clear at a glance to us what is your native language. Latin? Sorry, we don’t care. Leave it off
your CV. Level C1 or B2 might mean something to someone, but not to all of us. Also use words like “fluent” or
“basic”.</li>
<li><strong>Quantify your skills.</strong> If you list software skills or coding languages, tell us how proficient you are. We assume
you can work in MS Office and are more interested in prototyping, design, eye-tracking or statistics software here.
Coding skills? Nice to know.</li>
<li><strong>Highlight the important information.</strong> In scanning your professional experience we will look out for job role
first, then the company. Don’t make everything bold.</li>
<li><strong>Use design, but keep it basic.</strong> Make your CV scannable by using meaningful line breaks, bolding names of
companies or universities, making use of white space, abbreviating names of months, etc. Use clean typography and
don’t use underlining, which looks sloppy.</li>
</ul>
<p>To get your start in UX, you’ll need more than a great CV. But a convincing resume paired with a personalized cover
letter can open the first doors for you.</p>
<p>Still a student? We’re looking for a <a href="https://tech.zalando.com/jobs/ux/77235-ux-student-in-residencem-f/">UX Student in
Residence</a>. Read it through. If you agree with us
that it’s a student’s dream job, please apply!</p>
<p>Got feedback? Shoot us an <a href="mailto:jay.kaufmann@zalando.de">email</a>.</p>Integrating Amazon DynamoDB into your development process2016-02-20T00:00:00+01:002016-02-20T00:00:00+01:00Aliaksandr Kavalevichtag:engineering.zalando.com,2016-02-20:/posts/2016/02/integrating-amazon-dynamodb-into-your-development-process.html<p>How to start development with DynamoDB, Gradle and Spring.</p><p>In this article I would like to talk about the integration of Amazon DynamoDB into your development process. I will not
try to convince you to use Amazon DynamoDB, as I will assume that you have already made the decision to use it and have
several questions about how to start development.</p>
<p>Development is not only about production code - it should also include integration tests and support different
environments for making more complex tests. How would you achieve it with SaaS? Especially for integration tests and
local development, Amazon provides local installation of DynamoDB. You can use it for your tests and local development,
it will save you a lot of money, and it will also increase the execution speed of your integration tests. In this post,
I'll show you how to write your production code and integration tests, and how to separate different environments with
Java, Spring Boot, and Gradle.</p>
<p>Let's start with a simple example. First we will need to create a Gradle build file with all the needed dependencies
included:</p>
<div class="highlight"><pre><span></span><code><span class="nv">apply</span><span class="w"> </span><span class="nv">plugin</span>:<span class="w"> </span><span class="s1">'java'</span>
<span class="nv">apply</span><span class="w"> </span><span class="nv">plugin</span>:<span class="w"> </span><span class="s1">'spring-boot'</span>
<span class="nv">buildscript</span><span class="w"> </span>{
<span class="w"> </span><span class="nv">repositories</span><span class="w"> </span>{
<span class="w"> </span><span class="nv">mavenCentral</span><span class="ss">()</span>
<span class="w"> </span><span class="nv">maven</span><span class="w"> </span>{
<span class="w"> </span><span class="nv">url</span><span class="w"> </span><span class="s2">"https://plugins.gradle.org/m2/"</span>
<span class="w"> </span>}
<span class="w"> </span>}
<span class="w"> </span><span class="nv">dependencies</span><span class="w"> </span>{
<span class="w"> </span><span class="nv">classpath</span><span class="w"> </span><span class="s2">"org.springframework.boot:spring-boot-gradle-plugin:1.3.2.RELEASE"</span>
<span class="w"> </span>}
}
<span class="nv">repositories</span><span class="w"> </span>{
<span class="w"> </span><span class="nv">mavenCentral</span><span class="ss">()</span>
}
<span class="nv">jar</span><span class="w"> </span>{
<span class="w"> </span><span class="k">baseName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'application-gradle'</span>
<span class="w"> </span><span class="nv">version</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'0.1.0'</span>
}
<span class="nv">dependencies</span><span class="w"> </span>{
<span class="w"> </span><span class="nv">compile</span><span class="ss">(</span><span class="s1">'org.springframework.boot:spring-boot-starter-web:1.3.2.RELEASE'</span><span class="ss">)</span>
<span class="w"> </span><span class="nv">compile</span><span class="w"> </span><span class="s1">'com.amazonaws:aws-java-sdk-dynamodb:1.10.52'</span>
<span class="w"> </span><span class="nv">compile</span><span class="w"> </span><span class="s1">'com.github.derjust:spring-data-dynamodb:4.2.0'</span>
<span class="w"> </span><span class="nv">testCompile</span><span class="w"> </span><span class="s1">'junit:junit:4.12'</span>
<span class="w"> </span><span class="nv">testCompile</span><span class="w"> </span><span class="s1">'org.springframework.boot:spring-boot-starter-test'</span>
}
<span class="nv">bootRun</span><span class="w"> </span>{
<span class="w"> </span><span class="nv">addResources</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">false</span>
<span class="w"> </span><span class="nv">main</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'org.article.Application'</span>
}
<span class="nv">test</span><span class="w"> </span>{
<span class="w"> </span><span class="nv">testLogging</span><span class="w"> </span>{
<span class="w"> </span><span class="nv">events</span><span class="w"> </span><span class="s2">"passed"</span>,<span class="w"> </span><span class="s2">"skipped"</span>,<span class="w"> </span><span class="s2">"failed"</span>
<span class="w"> </span>}
}
</code></pre></div>
<p>Two main dependencies for using DynamoDB are:</p>
<div class="highlight"><pre><span></span><code>compile 'com.amazonaws:aws-java-sdk-dynamodb:1.10.45'
compile 'com.github.derjust:spring-data-dynamodb:4.2.0'
</code></pre></div>
<p>Those dependencies include Amazon DynamoDB support for us. The first one includes a standard client for DynamoDB from
AWS, and the second adds Spring-Data support for DynamoDB.</p>
<p>The next step is creating a Spring-Boot configuration class to configure the connection to DynamoDB. It should looks
like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">package</span> <span class="n">org</span><span class="o">.</span><span class="n">article</span><span class="o">.</span><span class="n">config</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.apache.commons.lang3.StringUtils</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.socialsignin.spring.data.dynamodb.core.DynamoDBOperations</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.socialsignin.spring.data.dynamodb.core.DynamoDBTemplate</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.socialsignin.spring.data.dynamodb.repository.config.EnableDynamoDBRepositories</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.springframework.beans.factory.annotation.Value</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.springframework.context.annotation.Bean</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.springframework.context.annotation.Configuration</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">com.amazonaws.regions.Regions</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">com.amazonaws.services.dynamodbv2.AmazonDynamoDB</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapperConfig</span><span class="p">;</span>
<span class="nd">@EnableDynamoDBRepositories</span><span class="p">(</span><span class="n">basePackages</span> <span class="o">=</span> <span class="s2">"org.article.repo"</span><span class="p">,</span> <span class="n">dynamoDBOperationsRef</span> <span class="o">=</span> <span class="s2">"dynamoDBOperations"</span><span class="p">)</span>
<span class="nd">@Configuration</span>
<span class="n">public</span> <span class="k">class</span> <span class="nc">DynamoDBConfig</span> <span class="p">{</span>
<span class="nd">@Value</span><span class="p">(</span><span class="s2">"$</span><span class="si">{amazonDynamodbEndpoint}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">private</span> <span class="n">String</span> <span class="n">amazonDynamoDBEndpoint</span><span class="p">;</span>
<span class="nd">@Value</span><span class="p">(</span><span class="s2">"$</span><span class="si">{environment}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">private</span> <span class="n">String</span> <span class="n">environment</span><span class="p">;</span>
<span class="nd">@Value</span><span class="p">(</span><span class="s2">"$</span><span class="si">{region}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">private</span> <span class="n">String</span> <span class="n">region</span><span class="p">;</span>
<span class="nd">@Bean</span>
<span class="n">public</span> <span class="n">AmazonDynamoDB</span> <span class="n">amazonDynamoDB</span><span class="p">()</span> <span class="p">{</span>
<span class="n">final</span> <span class="n">AmazonDynamoDBClient</span> <span class="n">client</span> <span class="o">=</span> <span class="n">new</span> <span class="n">AmazonDynamoDBClient</span><span class="p">();</span>
<span class="n">client</span><span class="o">.</span><span class="n">setSignerRegionOverride</span><span class="p">(</span><span class="n">Regions</span><span class="o">.</span><span class="n">fromName</span><span class="p">(</span><span class="n">region</span><span class="p">)</span><span class="o">.</span><span class="n">getName</span><span class="p">());</span>
<span class="k">if</span> <span class="p">(</span><span class="n">StringUtils</span><span class="o">.</span><span class="n">isNotEmpty</span><span class="p">(</span><span class="n">amazonDynamoDBEndpoint</span><span class="p">))</span> <span class="p">{</span>
<span class="n">client</span><span class="o">.</span><span class="n">setEndpoint</span><span class="p">(</span><span class="n">amazonDynamoDBEndpoint</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">client</span><span class="p">;</span>
<span class="p">}</span>
<span class="nd">@Bean</span>
<span class="n">public</span> <span class="n">DynamoDBOperations</span> <span class="n">dynamoDBOperations</span><span class="p">()</span> <span class="p">{</span>
<span class="n">final</span> <span class="n">DynamoDBTemplate</span> <span class="n">dynamoDBTemplate</span> <span class="o">=</span> <span class="n">new</span> <span class="n">DynamoDBTemplate</span><span class="p">(</span><span class="n">amazonDynamoDB</span><span class="p">());</span>
<span class="n">final</span> <span class="n">DynamoDBMapperConfig</span><span class="o">.</span><span class="n">TableNameOverride</span> <span class="n">tableNameOverride</span> <span class="o">=</span> <span class="n">DynamoDBMapperConfig</span><span class="o">.</span><span class="n">TableNameOverride</span> <span class="o">.</span><span class="n">withTableNamePrefix</span><span class="p">(</span><span class="n">environment</span><span class="p">);</span>
<span class="n">dynamoDBTemplate</span><span class="o">.</span><span class="n">setDynamoDBMapperConfig</span><span class="p">(</span><span class="n">new</span> <span class="n">DynamoDBMapperConfig</span><span class="p">(</span><span class="n">tableNameOverride</span><span class="p">));</span>
<span class="k">return</span> <span class="n">dynamoDBTemplate</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Here we've created an Amazon DynamoDB client for a specified region. It’s important to notice that Amazon provides
DynamoDB in different regions, and those DBs are completely separate instances, therefore it’s important to specify the
region. By default, a client will use region "us-west-1". We've also added the possibility to change the DynamoDB
endpoint. For production code, you don’t need to specify this endpoint, since the client provided by Amazon will create
the appropriate URL itself. For test purposes you need only to specify the URL of your local DynamoDB installation.</p>
<p>Another decision to be made needs to be about environment separation. For each AWS account, you only need one DynamoDB
instance per region. There are two possibilities where you can have several environments (e.g. production and stage) in
Amazon DynamoDB.</p>
<p>The first approach is to have two separate accounts - one per environment. The main benefit of this approach is that you
have two completely separate environments. The main disadvantage however is that you have to maintain two accounts and
switch between them during development. There can be quite a big overhead for this task.</p>
<p>The second approach is to separate environments using table name prefixes. For example, for table "User" there will be
no real table with the name "User" in DynamoDB. Instead, there will be table names like "prodUser", "stageUser". The
main benefit just so happens to be the main disadvantage of the previous approach: You don’t have to switch between
accounts.</p>
<p>Now it's time to create a Java entity. It should look like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">package</span> <span class="n">org</span><span class="o">.</span><span class="n">article</span><span class="o">.</span><span class="n">domain</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBAttribute</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBHashKey</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBTable</span><span class="p">;</span>
<span class="nd">@DynamoDBTable</span><span class="p">(</span><span class="n">tableName</span> <span class="o">=</span> <span class="s2">"User"</span><span class="p">)</span>
<span class="n">public</span> <span class="k">class</span> <span class="nc">User</span> <span class="p">{</span>
<span class="nd">@DynamoDBHashKey</span>
<span class="n">private</span> <span class="n">String</span> <span class="n">userName</span><span class="p">;</span>
<span class="nd">@DynamoDBAttribute</span>
<span class="n">private</span> <span class="n">String</span> <span class="n">firstName</span><span class="p">;</span>
<span class="nd">@DynamoDBAttribute</span>
<span class="n">private</span> <span class="n">String</span> <span class="n">lastName</span><span class="p">;</span>
<span class="n">public</span> <span class="n">String</span> <span class="n">getUserName</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">userName</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">public</span> <span class="n">void</span> <span class="n">setUserName</span><span class="p">(</span><span class="n">final</span> <span class="n">String</span> <span class="n">userName</span><span class="p">)</span> <span class="p">{</span>
<span class="n">this</span><span class="o">.</span><span class="n">userName</span> <span class="o">=</span> <span class="n">userName</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">public</span> <span class="n">String</span> <span class="n">getFirstName</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">firstName</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">public</span> <span class="n">void</span> <span class="n">setFirstName</span><span class="p">(</span><span class="n">final</span> <span class="n">String</span> <span class="n">firstName</span><span class="p">)</span> <span class="p">{</span>
<span class="n">this</span><span class="o">.</span><span class="n">firstName</span> <span class="o">=</span> <span class="n">firstName</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">public</span> <span class="n">String</span> <span class="n">getLastName</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">lastName</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">public</span> <span class="n">void</span> <span class="n">setLastName</span><span class="p">(</span><span class="n">final</span> <span class="n">String</span> <span class="n">lastName</span><span class="p">)</span> <span class="p">{</span>
<span class="n">this</span><span class="o">.</span><span class="n">lastName</span> <span class="o">=</span> <span class="n">lastName</span><span class="p">;</span>
<span class="p">}</span>
<span class="nd">@Override</span>
<span class="n">public</span> <span class="n">boolean</span> <span class="n">equals</span><span class="p">(</span><span class="n">final</span> <span class="n">Object</span> <span class="n">o</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">this</span> <span class="o">==</span> <span class="n">o</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="err">!</span><span class="p">(</span><span class="n">o</span> <span class="n">instanceof</span> <span class="n">User</span><span class="p">))</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">User</span> <span class="n">forum</span> <span class="o">=</span> <span class="p">(</span><span class="n">User</span><span class="p">)</span> <span class="n">o</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">userName</span> <span class="o">!=</span> <span class="n">null</span> <span class="err">?</span> <span class="err">!</span><span class="n">userName</span><span class="o">.</span><span class="n">equals</span><span class="p">(</span><span class="n">forum</span><span class="o">.</span><span class="n">userName</span><span class="p">)</span> <span class="p">:</span> <span class="n">forum</span><span class="o">.</span><span class="n">userName</span> <span class="o">!=</span> <span class="n">null</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">firstName</span> <span class="o">!=</span> <span class="n">null</span> <span class="err">?</span> <span class="err">!</span><span class="n">firstName</span><span class="o">.</span><span class="n">equals</span><span class="p">(</span><span class="n">forum</span><span class="o">.</span><span class="n">firstName</span><span class="p">)</span> <span class="p">:</span> <span class="n">forum</span><span class="o">.</span><span class="n">firstName</span> <span class="o">!=</span> <span class="n">null</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">lastName</span> <span class="o">!=</span> <span class="n">null</span> <span class="err">?</span> <span class="n">lastName</span><span class="o">.</span><span class="n">equals</span><span class="p">(</span><span class="n">forum</span><span class="o">.</span><span class="n">lastName</span><span class="p">)</span> <span class="p">:</span> <span class="n">forum</span><span class="o">.</span><span class="n">lastName</span> <span class="o">==</span> <span class="n">null</span><span class="p">;</span>
<span class="p">}</span>
<span class="nd">@Override</span>
<span class="n">public</span> <span class="nb">int</span> <span class="n">hashCode</span><span class="p">()</span> <span class="p">{</span>
<span class="nb">int</span> <span class="n">result</span> <span class="o">=</span> <span class="n">userName</span> <span class="o">!=</span> <span class="n">null</span> <span class="err">?</span> <span class="n">userName</span><span class="o">.</span><span class="n">hashCode</span><span class="p">()</span> <span class="p">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">31</span> <span class="o">*</span> <span class="n">result</span> <span class="o">+</span> <span class="p">(</span><span class="n">firstName</span> <span class="o">!=</span> <span class="n">null</span> <span class="err">?</span> <span class="n">firstName</span><span class="o">.</span><span class="n">hashCode</span><span class="p">()</span> <span class="p">:</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">31</span> <span class="o">*</span> <span class="n">result</span> <span class="o">+</span> <span class="p">(</span><span class="n">lastName</span> <span class="o">!=</span> <span class="n">null</span> <span class="err">?</span> <span class="n">lastName</span><span class="o">.</span><span class="n">hashCode</span><span class="p">()</span> <span class="p">:</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The User entity looks like a usual POJO. In additional to that, we have a couple of annotations added. The DynamoDBTable
annotation shows that this class corresponds to the table with the name “User”. In this table we should only have one
hash key. To specify this we need the annotation DynamoDBHashKey. So, we marked the field userName as a hash key. We
also have two attributes: firstName and lastName annotated with DynamoDBAttribute. Please note that this entity has a
partition key without a sort key.</p>
<p>After Entity, we should create the UserRepository object, which is just an interface extended from CrubRepository. We
should specify the entity and type of id – then it's done! Now, we have basic CRUD operations implemented for us:</p>
<div class="highlight"><pre><span></span><code><span class="n">package</span> <span class="n">org</span><span class="o">.</span><span class="n">article</span><span class="o">.</span><span class="n">repo</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.article.domain.User</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.socialsignin.spring.data.dynamodb.repository.EnableScan</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.springframework.data.repository.CrudRepository</span><span class="p">;</span>
<span class="nd">@EnableScan</span>
<span class="n">public</span> <span class="n">interface</span> <span class="n">UserRepository</span> <span class="n">extends</span> <span class="n">CrudRepository</span> <span class="p">{</span> <span class="p">}</span>
</code></pre></div>
<p>At the moment, we have both User entity and UserRepository with basic CRUD operations implemented, so it’s time to check
them out with an integration test. First, we need to change our Gradle build to run an integration test. Local DynamoDB
should start before tests and stop right after. We also need to create tables in DynamoDB. Although it doesn’t have a
schema in the usual way, you still need to create tables and specify the partition key and sort key, if needed. To start
local DynamoDB, create tables, and stop the local DynamoDB instance, there’s a nice Maven plugin
<a href="https://github.com/jcabi/jcabi-dynamo">here</a>. The main disadvantage of this plugin is that it can create tables only
for local DynamoDB instances, but not for the real Amazon environment. As you'll need to create tables for the
production environment anyway, I believe this should be done exactly the same way as you’d do it for your local instance
(that's why I don't use this plugin). What I like to do is start a local DynamoDB instance from a Docker container. If
you don't have Docker yet, you can find instructions on how to set it up
<a href="https://docs.docker.com/engine/installation/">here</a>.</p>
<p>The first Gradle task that we need is to start the local DynamoDB instance:</p>
<div class="highlight"><pre><span></span><code>task<span class="w"> </span>startDB<span class="w"> </span>(type:Exec)<span class="w"> </span>{
<span class="w"> </span>commandLine<span class="w"> </span>"bash",<span class="w"> </span>"-c",<span class="w"> </span>"docker<span class="w"> </span>run<span class="w"> </span>-p<span class="w"> </span><span class="cp">${</span><span class="n">dbPort</span><span class="cp">}</span>:<span class="cp">${</span><span class="n">dbPort</span><span class="cp">}</span><span class="w"> </span>-d<span class="w"> </span>tray/dynamodb-local<span class="w"> </span>-inMemory<span class="w"> </span>-sharedDb<span class="w"> </span>-port<span class="w"> </span><span class="cp">${</span><span class="n">dbPort</span><span class="cp">}</span>"
}
</code></pre></div>
<p>This will start DynamoDB on the port specified with the property dbPort. We used 2 parameters to start DB: The first one
is “inMemory”. This parameter tells DynamoDB that it should be completely in memory. The second parameter is “sharedDb”.
This is responsible for making sure there isn’t any region separation in the DB.</p>
<p>The next step would be to create tables. We will create a table description in json format in the directory database, so
there should be just the one User.json file for now.</p>
<div class="highlight"><pre><span></span><code> {
"AttributeDefinitions": [
{
"AttributeName": "userName",
"AttributeType": "S"
}
],
"TableName": "User",
"KeySchema": [
{
"AttributeName": "userName",
"KeyType": "HASH"
}
],
"ProvisionedThroughput": {
"ReadCapacityUnits": 10,
"WriteCapacityUnits": 10
}
}
</code></pre></div>
<p>We also need to add a Gradle task to create the User table in DynamoDB:</p>
<div class="highlight"><pre><span></span><code>task<span class="w"> </span>deployDB(type:Exec)<span class="w"> </span>{
<span class="w"> </span>mustRunAfter<span class="w"> </span>startDB
<span class="w"> </span>def<span class="w"> </span>dynamoDBEndpoint;
<span class="w"> </span>if<span class="w"> </span>(amazonDynamodbEndpoint<span class="w"> </span>!=<span class="w"> </span>"")<span class="w"> </span>{
<span class="w"> </span>dynamoDBEndpoint<span class="w"> </span>=<span class="w"> </span>"--endpoint=<span class="cp">${</span><span class="n">amazonDynamodbEndpoint</span><span class="cp">}</span>"
<span class="w"> </span>}<span class="w"> </span>else<span class="w"> </span>{
<span class="w"> </span>dynamoDBEndpoint<span class="w"> </span>=<span class="w"> </span>""
<span class="w"> </span>}
<span class="w"> </span>commandLine<span class="w"> </span>"bash",<span class="w"> </span>"-c",<span class="w"> </span>"for<span class="w"> </span>f<span class="w"> </span>in<span class="w"> </span>\$(find<span class="w"> </span>database<span class="w"> </span>-name<span class="w"> </span>\"*.json\");<span class="w"> </span>do<span class="w"> </span>aws<span class="w"> </span>--region<span class="w"> </span><span class="cp">${</span><span class="n">region</span><span class="cp">}</span><span class="w"> </span>dynamodb<span class="w"> </span>create-table<span class="w"> </span><span class="cp">${</span><span class="n">dynamoDBEndpoint</span><span class="cp">}</span><span class="w"> </span>--cli-input-json<span class="w"> </span>\"\$(cat<span class="w"> </span>\<span class="nv">$f</span><span class="w"> </span>|<span class="w"> </span>sed<span class="w"> </span>-e<span class="w"> </span>'s/TableName\":<span class="w"> </span>\"/TableName\":<span class="w"> </span>\"<span class="cp">${</span><span class="n">environment</span><span class="cp">}</span>/g')\";<span class="w"> </span>done"
}
</code></pre></div>
<p>You’ll notice that when using this task we can create tables both for local and real AWS environments. We can also
create tables for different environments in the cloud. To do this, all we need is to pass the right parameters. To
deploy to the real AWS, you need to execute the following command:</p>
<div class="highlight"><pre><span></span><code>gradle deployDB -Penv=prod
</code></pre></div>
<p>Here the env parameter is the name of the property file. Then, we need the task to stop DynamoDB.</p>
<div class="highlight"><pre><span></span><code>task<span class="w"> </span>stopDB<span class="w"> </span>(type:Exec)<span class="w"> </span>{
<span class="w"> </span>commandLine<span class="w"> </span>"bash",<span class="w"> </span>"-c",<span class="w"> </span>"id=\$(docker<span class="w"> </span>ps<span class="w"> </span>|<span class="w"> </span>grep<span class="w"> </span>\"tray/dynamodb-local\"<span class="w"> </span>|<span class="w"> </span>awk<span class="w"> </span>'{print<span class="w"> </span>\$1}');if<span class="w"> </span>[[<span class="w"> </span>\<span class="cp">${</span><span class="nb">id</span><span class="cp">}</span><span class="w"> </span>]];<span class="w"> </span>then<span class="w"> </span>docker<span class="w"> </span>stop<span class="w"> </span>\<span class="nv">$id</span>;<span class="w"> </span>fi"
}
</code></pre></div>
<p>Let's configure those tasks to start DynamoDB before an integration test and to stop it right after:</p>
<div class="highlight"><pre><span></span><code>test.dependsOn startDB
test.dependsOn deployDB
test.finalizedBy stopDB
</code></pre></div>
<p>Before we can execute our first integration test, we need to create two property files – the first one for production
usage and the second one for the integration tests:</p>
<div class="highlight"><pre><span></span><code><span class="n">src</span><span class="o">/</span><span class="n">main</span><span class="o">/</span><span class="n">resources</span><span class="o">/</span><span class="n">prod</span><span class="p">.</span><span class="n">properties</span>
<span class="n">amazon</span><span class="p">.</span><span class="n">dynamodb</span><span class="p">.</span><span class="n">endpoint</span><span class="o">=</span>
<span class="n">environment</span><span class="o">=</span><span class="n">prod</span>
<span class="n">region</span><span class="o">=</span><span class="n">eu</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mh">1</span>
<span class="n">dbPort</span><span class="o">=</span>
<span class="n">AWS_ACCESS_KEY</span><span class="o">=</span><span class="n">realValue</span>
<span class="n">AWS_SECRET_ACCESS_KEY</span><span class="o">=</span><span class="n">realValue</span>
<span class="n">src</span><span class="o">/</span><span class="n">test</span><span class="o">/</span><span class="n">resources</span><span class="o">/</span><span class="n">application</span><span class="p">.</span><span class="n">properties</span>
<span class="n">amazon</span><span class="p">.</span><span class="n">dynamodb</span><span class="p">.</span><span class="n">endpoint</span><span class="o">=</span><span class="nl">http:</span><span class="c1">//localhost:7777</span>
<span class="n">environment</span><span class="o">=</span><span class="n">local</span>
<span class="n">region</span><span class="o">=</span><span class="n">eu</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mh">1</span>
<span class="n">dbPort</span><span class="o">=</span><span class="mh">7777</span>
<span class="n">AWS_ACCESS_KEY</span><span class="o">=</span><span class="n">nonEmpty</span>
<span class="n">AWS_SECRET_ACCESS_KEY</span><span class="o">=</span><span class="n">nonEmpty</span>
</code></pre></div>
<p>Now everything is ready to write our first integration test. It looks very simple: we try to save two entities and then
get one of them by ID:</p>
<div class="highlight"><pre><span></span><code><span class="n">package</span> <span class="n">org</span><span class="o">.</span><span class="n">article</span><span class="o">.</span><span class="n">repo</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">static</span> <span class="n">org</span><span class="o">.</span><span class="n">junit</span><span class="o">.</span><span class="n">Assert</span><span class="o">.</span><span class="n">assertEquals</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.article.Application</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.article.domain.User</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.junit.After</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.junit.Test</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.junit.runner.RunWith</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.springframework.beans.factory.annotation.Autowired</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.springframework.boot.test.SpringApplicationConfiguration</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.springframework.test.context.junit4.SpringJUnit4ClassRunner</span><span class="p">;</span>
<span class="nd">@SpringApplicationConfiguration</span><span class="p">(</span><span class="n">classes</span> <span class="o">=</span> <span class="n">Application</span><span class="o">.</span><span class="n">class</span><span class="p">)</span>
<span class="nd">@RunWith</span><span class="p">(</span><span class="n">SpringJUnit4ClassRunner</span><span class="o">.</span><span class="n">class</span><span class="p">)</span>
<span class="n">public</span> <span class="k">class</span> <span class="nc">UserRepositoryIT</span> <span class="p">{</span>
<span class="nd">@Autowired</span>
<span class="n">private</span> <span class="n">UserRepository</span> <span class="n">userRepository</span><span class="p">;</span>
<span class="nd">@After</span>
<span class="n">public</span> <span class="n">void</span> <span class="n">tearDown</span><span class="p">()</span> <span class="p">{</span>
<span class="n">userRepository</span><span class="o">.</span><span class="n">deleteAll</span><span class="p">();</span>
<span class="p">}</span>
<span class="nd">@Test</span>
<span class="n">public</span> <span class="n">void</span> <span class="n">findByUserName</span><span class="p">()</span> <span class="p">{</span>
<span class="n">final</span> <span class="n">User</span> <span class="n">user</span> <span class="o">=</span> <span class="n">new</span> <span class="n">User</span><span class="p">();</span>
<span class="n">user</span><span class="o">.</span><span class="n">setUserName</span><span class="p">(</span><span class="s2">"userName"</span><span class="p">);</span>
<span class="n">user</span><span class="o">.</span><span class="n">setFirstName</span><span class="p">(</span><span class="s2">"firstName"</span><span class="p">);</span>
<span class="n">user</span><span class="o">.</span><span class="n">setFirstName</span><span class="p">(</span><span class="s2">"lastName"</span><span class="p">);</span>
<span class="n">userRepository</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">user</span><span class="p">);</span>
<span class="n">final</span> <span class="n">User</span> <span class="n">actualUser</span> <span class="o">=</span> <span class="n">userRepository</span><span class="o">.</span><span class="n">findOne</span><span class="p">(</span><span class="n">user</span><span class="o">.</span><span class="n">getUserName</span><span class="p">());</span>
<span class="n">assertEquals</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="n">actualUser</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Once initiated, we should be able to execute this test using Gradle from the command line:</p>
<div class="highlight"><pre><span></span><code> gradle clean build
</code></pre></div>
<p>And there you go! In this post, I've showed you how to manage a local instance of Amazon DynamoDB during the development
process using Spring Data and the Gradle build tool. I have also shown you how to create tables for a real AWS
environment and how to separate environments using a standard Java client for AWS. The source code for this procedure
can be found on our <a href="https://github.bus.zalan.do/akavalevich/dynamoDB-article">GitHub page</a>.</p>How Agile Coaches Scale Continuous Improvement2016-02-12T00:00:00+01:002016-02-12T00:00:00+01:00Samir Hannatag:engineering.zalando.com,2016-02-12:/posts/2016/02/how-agile-coaches-scale-continuous-improvement.html<p>Zalando’s Agile coaches deep dive into how they scale continuous improvement.</p><p>Since 2015, we have been scaling the continuous improvement practice by finding inspiration from systemic thinking, management
3.0 and agile retrospective. You may wonder why we chose the retrospective as a concrete practice for implementation.
This is because teams inside Zalando’s Technology do what it takes to get things done: planning, coding, demoing,
managing backlogs, and much more. All whilst being bound up in operating, making it difficult to continuously improve.
In our opinion, retrospectives are not only helpful, but they are vital to allow teams to work successfully and
productively. The retrospective is about reflection and evolution. Operating without reflecting is like the walking dead
- <strong>but we are not zombies, we are alive, we are thinking</strong>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8f4c26a33350e87c8222ae390c08091ef9802ff0_handful.jpg?auto=compress,format"></p>
<p>This post will provide you with some feedback on the concepts our team of Agile coaches used to scale continuous
improvement: the adoption lifecycle, continuous improvement as a product, our circle of facilitation, urgency, desire
and supervision.</p>
<h3><strong>Start with early adopters</strong></h3>
<p>Instead of trying to convince or force hands, the idea is to look within the organisation for innovators and early
adopters. These are teams and people actually seeking out retrospective and ready to invest time to receive the
benefits.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b9fba6d8dc8d5b6ac373253662b51ba5936e8de8_adoption.jpg?auto=compress,format"></p>
<h3><strong>Implementing a high quality “retro app” product</strong></h3>
<p>Once we found three teams willing to work with us, we built a retrospective product (we called it retro app, because we
are mobile first ;-) ) that would tackle all the retrospective anti-patterns while being easy to use and appealing. The
retro app consists of a set of posters (click
<a href="https://drive.google.com/folderview?id=0B883V7riLW9lfmUyZVIwSkRxUnBFajhJdllXVmpKQjhXaTVJWFdRWE8zSGRMVzYtWndUbjA&usp=sharing">here</a>
to get the full HD version package - you can print them in A1):</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9b35641cb6672dba2e38e88f0fcb98fd13cc89fa_01_gather-data-collage.jpg?auto=compress,format"></p>
<p>This is how we use the posters :</p>
<p><strong>Gathering data:</strong> Warming up the brains and sharing the perspectives via a timeline.</p>
<p><strong>Generating insights:</strong> What can we improve? Focusing on the elements the team has leverage on.</p>
<p><strong>Deciding:</strong> Comparing the insights and selecting the low hanging fruits.</p>
<p><strong>Improving:</strong> Building a plan.</p>
<ul>
<li><strong>Clarifying the issue</strong> and bringing a sense of urgency by quantifying the impact.</li>
<li><strong>Define a target and create desire</strong>, so that the team is excited and wants to fix the issue.</li>
<li><strong>Build a concrete plan</strong> consisting of baby steps (with owners) to get closer to this vision.</li>
</ul>
<p><strong>Monitoring:</strong> Do what we plan, check that problems are not occurring anymore and enact decisions.</p>
<p><strong>Closing:</strong> With feedback, we improve everything including the retro app and the facilitation.</p>
<p>If this sounds familiar to you, we were inspired amongst others, from here:</p>
<ul>
<li><a href="http://www.estherderby.com/books">http://www.estherderby.com/books</a></li>
<li><a href="https://hbr.org/2013/11/stop-worrying-about-making-the-right-decision/">https://hbr.org/2013/11/stop-worrying-about-making-the-right-decision/</a></li>
<li><a href="https://hbr.org/2014/10/the-most-innovative-companies-dont-worry-about-consensus/">https://hbr.org/2014/10/the-most-innovative-companies-dont-worry-about-consensus/</a></li>
</ul>
<h3><strong>Circle of facilitation</strong></h3>
<p>To sustain and scale, a great tool is a must have. Teams need support in their continuous improvement efforts, but a
tool is not sufficient without skilled people. Whilst building the retro app, we turned team members into retrospective
facilitators by explaining how to use the retro app, equipping them with moderation skills, providing them with markers,
and post-its (read more on retrospective facilition
<a href="http://lmsgoncalves.com/2015/06/07/retrospective-facilitator/">here</a>). These facilitators, key element of the scaling
strategy, are all members of different teams and have different roles and accountabilities: they manage roadmaps,
projects, write code, test, and more.</p>
<p>They produce and therefore they want, as well as need, to contribute to their retrospective. We have learned that
moderating is not always the best friend of contributing, so we introduced a facilitation circle in the form of a
rotation. Rotation has exciting side effects: facilitators move from team to team, connect, break silos, discover new
perspectives, check how other teams work, see what they are doing well and what they are struggling with. They can then
bring back that knowledge to their own teams, and use the insights to grow.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2c5795f8302f2f78c491a89336e83e6cc85179c6_retro-gathering.jpg?auto=compress,format"></p>
<p>At this stage, we have a system that is continuously improving, tackling problems, and reaching decisions. It is a small
system, but a scalable one without dependencies that is self-replicating and autonomous.</p>
<p>At one point scaling becomes easier. Something similar to viral marketing happens and our early adopters naturally pull
the early majority. Growing is one thing, sustainability is another. We observe that with every single practice: initial
enthusiasm slowly start to drop off. We have learned that the keys to sustainability are: a sense of urgency, desire and
maintaining high quality standards.</p>
<h3><strong>Sense of urgency and desire</strong></h3>
<p>Understanding <em>why</em> you are doing something is a great way to make it sustainable. Our team of Agile coaches failed a
couple of times here by pushing for a retrospective that worked in the beginning, but was not sustainable. Therefore,
when we receive a request, our first step is to clarify it: get to the root cause, interview all the stakeholders,
collect all the perspectives and manage their expectations. This clarification exercise sometimes already solves the
problems. Just having people together and providing them with a safe and efficient environment (trust, moderation,
whiteboard, mind map) for discussion and reflection results in decision, ideas and actions.</p>
<p>During these sessions we observe two phases: the first being <em>" negative",</em> where teams share their pains, and clarify
to all parties how urgent it is to process all the tensions. The second phase being <em>“ positive”,</em> where teams
understand how to process the tensions and where they could be if these tensions were solved (also known as desire). We
observed that with a shared sense of urgency and desire, retrospectives are more likely to be sustainable.</p>
<h3><strong>Maintain quality aka supervision</strong></h3>
<p>We now have a high quality standard approach based on a great product (the retro app), and trained facilitators. But how
do we maintain this while scaling? Every tool can be misused, every facilitator can make mistakes, and you can’t plan
for everything. The strategy here was to enable feedback loops. Our Agile team set up a facilitator guild chat for fast
feedback and Q&A. On a monthly basis, facilitators talk about their retrospectives: what went well, what can be improved
and what issues they encountered. Then the whole process comes full cycle, and together we do a retrospective of our
retrospectives using the retro app!</p>
<h3><strong>What’s next?</strong></h3>
<p>After five months of work, we are in the middle of our journey. The innovators pulled the early adopters, who pulled the
early majority. There are around 30 teams are in our circle exchanging facilitators, learning from each other and
improving efficiently. It is a cultural shift to a continuous improvement mindset. But is our work complete? Of course
not! We’re just halfway through our journey, and we will spend the next few months answering questions that arise, such
as :</p>
<ul>
<li>We got rid off the agile coach as facilitators bottleneck, how to get rid of us as trainer for new teams?</li>
<li>Teams using the retro app since month now starts to get bored with the repetitive steps. How to develop new
activities, and who can do that?</li>
<li>We have a retrospective approach working, a network of facilitators… and what next? Can we build something bigger?</li>
</ul>
<p>There are probably blind spots we have not discovered yet, can we solve them with the ingredients we have at the moment?</p>
<p>We will figure it out and let you know.</p>From Monolith to Microservices (Video)2016-02-10T00:00:00+01:002016-02-10T00:00:00+01:00Kave Bishogotag:engineering.zalando.com,2016-02-10:/posts/2016/02/from-monolith-to-microservices-video.html<p>Managing our growing Microservices Ecosystem.</p><p>The <a href="http://microxchg.io/2016/index.html">MicroXchange 2016</a> conference took place on February 4-5 in Berlin, bringing
together technologists from all around the world with a passion for microservices architecture. Our Head of Engineering,
Rodrigue Schaefer took the stage to describe how Zalando engineers are taking a microservices approach to frontend as
well as backend through a rebuild of the company’s “shop”—the unit that includes 15 country-specific, customer-facing
websites.</p>
<p>Rodrigue’s talk explains how our 85+ engineering teams have managed to tackle the challenges that come with this
transition through a combination of organisational decisions, such as our approach to autonomous teams, and the development of
new tools, such as <a href="https://stups.io/">STUPS</a>, our open source platform as a service for using AWS and Docker. Find the
talk below for more insights.</p>
<p>Follow us on <a href="https://twitter.com/ZalandoTech">Twitter</a> to find out where we’re heading next!</p>
<p>More on Microservices:</p>
<ul>
<li><a href="https://tech.zalando.com/blog/data-integration-in-a-world-of-microservices/">Data Integration in a World of
Microservices</a></li>
<li><a href="https://tech.zalando.com/blog/video-scala-microservices-at-zalando/">Scala Microservices at Zalando</a></li>
</ul>Zalando’s Patroni: a Template for High Availability PostgreSQL2016-02-03T00:00:00+01:002016-02-03T00:00:00+01:00Oleksii Kliukintag:engineering.zalando.com,2016-02-03:/posts/2016/02/zalandos-patroni-a-template-for-high-availability-postgresql.html<p>A customized, high-availability PostgreSQL solution using Python and a distributed configuration store.</p><p>Last week members of Zalando’s database engineering team spoke at <a href="http://fosdem2016.pgconf.eu/">FOSDEM PGDay</a>, an event
hosted by the PostgreSQL community the day before the <a href="https://fosdem.org/2016/">FOSDEM</a> conference in Brussels. PGDay
was a big success, with a record number of attendees and lots of great talks. I presented on
<a href="https://github.com/zalando/patroni">Patroni</a>, Zalando’s Python-based PostgreSQL controller to provide automatic
failover functionality for PostgreSQL. Check out my slides below! You can find out more about Patroni and our other open
source projects on <a href="https://zalando.github.io/#repositories">our GitHub page</a>.</p>
<p><strong><a href="https://www.slideshare.net/ZalandoTech/high-availability-postgresql-with-zalando-patroni" title="High Availability PostgreSQL with Zalando Patroni">High Availability PostgreSQL with Zalando
Patroni</a></strong>
from <strong><a href="http://www.slideshare.net/ZalandoTech">Zalando Tech</a></strong></p>
<p>If you’re heading to PGConf in Moscow this week, come say hi. And don’t miss the talk by Zalando Head of Data
Engineering Valentine Gogichashvili on <a href="https://pgconf.ru/en/2016/89648">Data integration in the World of
Microservices</a>.</p>Trust Instead of Control2016-01-28T00:00:00+01:002016-01-28T00:00:00+01:00Sheetal Josephtag:engineering.zalando.com,2016-01-28:/posts/2016/01/trust-instead-of-control.html<p>Making Security Awareness the Fun Factor in a Tech Organisation</p><p>We recently changed how we do security at Zalando, from a command and control mode, to enabling people to make the
right decisions and do the right thing at the right time. Our focus is now strongly on people, and we believe this
should be true in any organisation employing even the best technical solutions.</p>
<p><em>“Security is both a feeling and a reality. And they're not the same… security is also a feeling, based not on
probabilities and mathematical calculations, but on your psychological reactions to both risks and countermeasures.”</em>
- <a href="https://www.schneier.com">Bruce Schneier</a>, Cryptographer, Security Technologist and Author</p>
<p>Sometimes, security is reduced to our psychological reaction to a given situation. How we change this reaction can make
all the difference to a security program. From cracking the German Enigma Cryptosystem, to the Stuxnet worm, to the most
talked about Ashley Madison hack last year, human error remains the one consistent contributor to most security
incidents. Though people can be the weakest link of security, at the same time they can also be trained to be the
greatest strength.</p>
<p>For these reasons, our security team at Zalando intends to focus on people and create a security mindset. To generate
security awareness in a meaningful and entertaining way, we’re creating more interactive and rewarding experiences with
our employees instead of subjecting them to trainings, videos and tutorials on security. We have termed these programs
our “Fun Factor.” We hope our engineers at Zalando will wholeheartedly enjoy these exercises that utilise their
engineering mindset of “breaking and fixing stuff.”</p>
<p>Here are some of the initiatives we have introduced:</p>
<h3>1. Internal Bug Bounty program</h3>
<p>The primary aim of this program is to get every engineer at Zalando to participate in the process of finding and
reporting security bugs in our internal and external systems, and get rewarded in the process! Apart from motivating
every employee to participate in security activities and thereby creating an awareness of what threats they themselves
could be susceptible to, the program also has a number of other benefits including:</p>
<ul>
<li>Increased awareness of security issues in our operating environment</li>
<li>Improvement of the company’s security posture</li>
<li>Reduction in the level of insider threats</li>
</ul>
<h3>2. The Security Champion Program</h3>
<p>In this program, we invite one person from every team at Zalando Technology to champion the cause of security. These
champions are the sentinels and security guides of their teams. We train champions on all concepts of security that
teams need to be cautious about while building products. Topics range from various credit card security requirements and
data protection laws, to security concepts regarding secure coding and secure design principles and even what to be
careful about when dealing with third parties.</p>
<p>We also encourage the security champions to collaborate with each other, exchange their thoughts on the day-to-day
security issues they face and create their own agendas for further action every week. As a company we benefit from this
because we empower teams with decision making capability from the very start of a project.</p>
<h3>3. Security “Capture the Flag” Contests</h3>
<p>We organised a very successful capture the flag hacking contest during the recent <a href="https://tech.zalando.com/blog/hack-week-4---the-video/">Zalando Hack
Week</a> in December 2015. Capture the Flag contests are an
excellent way to simulate a real world hacking experience and to enable participants to understand security loopholes.
These learnings could then be used by them to build more secure products.</p>
<h3>4. Security Movie Nights</h3>
<p>We organise security movie nights where we showcase movies that have a focus on security. The aim is for people to come
together in a relaxed atmosphere with free beer and pizza, enjoy the movie and get an opportunity to directly interact
and speak with the Security Team throughout the evening.</p>
<h3>5. Security Workshops</h3>
<p>Security workshops are two-hour workshops given by experts in the Security Team on various hacking techniques like SQL
injection, cross site scripting and the intricacies of an attack. Employees have shown a great amount of interest in
these topics, and we think this will again lead to safer coding practices.</p>
<p>We believe these programs will create an environment and mentality that is required to build products that are secure by
default. We want every engineer at Zalando to be a Security Superhero! We are curious to know what other companies are
doing in this regard, and look forward to collaborating with you to make security fun. Start the conversation with us on
<a href="https://twitter.com/ZalandoTech">Twitter</a>!</p>Oh Appy Day!2016-01-26T00:00:00+01:002016-01-26T00:00:00+01:00Zalando Technologytag:engineering.zalando.com,2016-01-26:/posts/2016/01/oh-appy-day.html<p>The revamp of our fashion store app is ready to install.</p><p>Mobile commerce is dominating globally. Earlier last year, <a href="http://ecommercenews.eu/key-figures-mobile-commerce-europe-revealed/">Ecommerce
news</a> wrote that during 2015 it was expected that
Europeans would spend around 45 billion euros via mobile devices, which was an increase of 88.7 percent from 2014. This
merely shows how important it is for companies to develop a strong mobile strategy.</p>
<p>We are dedicated to be a mobile first company, which is why we are constantly striving to create the best apps in a very
competitive market. Over the last quarter of 2015, our Zalando Apps Team has been revamping the official fashion store
app to provide customers with a more mobile driven product.</p>
<p>The updated version is available now on both Android and iOS. The app now includes dedicated mobile-only content such as
videos, trending editorial features and scrollable lookbooks for a richer experience that does not just focus on
products, but also content that is relevant to the customer’s journey.</p>
<p>Head of Mobile Product and Engineering at Zalando, Nuzhat Naweed, said that: “The update focuses around two things;
great experience and quality content. We want our customers to be inspired and discover all that Zalando has to offer,
as well as immersing themselves in the content that has been solely created for the app.”</p>
<p>Make sure that you update or download the app today on the <a href="https://itunes.apple.com/de/app/zalando-fashion-shopping/id585629514?l=en&mt=8">App
Store</a> and <a href="https://play.google.com/store/apps/details?id=de.zalando.mobile&hl=en">Google
Play</a>. Next month we’ll be featuring a deep dive
into the Zalando Mobile Team looking at how they built the app for 15 markets over the last quarter. Stay tuned for
more.</p>Hack Week #4 - the video!2016-01-20T00:00:00+01:002016-01-20T00:00:00+01:00Kave Bishogotag:engineering.zalando.com,2016-01-20:/posts/2016/01/hack-week-4---the-video.html<p>Highlights of a full week of Hacking at Zalando Tech.</p><p>First of all, we want to wish you all a Happy New Year from the team here at Zalando Tech! We can feel the excitement
and buzz in the corridors, and we are convinced that 2016 is going to be one thrilling, adventurous and full of
surprises kind of year.</p>
<p>But, before we get started, we wanted to reminisce a little bit on a special week in December, a.k.a <a href="https://tech.zalando.com/blog/hack-week-4-begins/">HackWeek
#4</a>.</p>
<p>It was our first international Hack Week, which brought together technologists from our Tech Hubs in
<a href="https://tech.zalando.com/locations/#berlin">Berlin</a>, <a href="https://tech.zalando.com/locations/#dublin">Dublin</a>,
<a href="https://tech.zalando.com/locations/#dortmund">Dortmund,</a> <a href="https://tech.zalando.com/locations/#erfurt">Erfurt,</a>
<a href="https://tech.zalando.com/locations/#hamburg">Hamburg,</a>
<a href="https://tech.zalando.com/locations/#moenchengladbach">Mönchengladbach</a> and
<a href="https://tech.zalando.com/locations/#helsinki">Helsinki</a>, to work on projects of their choice. The goal of Hack Week at
Zalando is to drive and encourage a culture of innovation, but also to allow our technologists to live their passions at
work. Not only did teams think out of the box, but they threw the box away completely!</p>
<p>Hack Week #4 reminded us how much creativity and innovation can be achieved when individuals team up for a cause.
Teaming up also created a whole new atmosphere and experience for individuals with different backgrounds and skills,
working on projects with a common goal in mind. We loved every moment of it, and look forward to sharing some highlights
with you. Have a look below, and let us know on <a href="https://twitter.com/ZalandoTech">Twitter</a> what you think of our Hack
Week!</p>Reactive Design Patterns2016-01-19T00:00:00+01:002016-01-19T00:00:00+01:00Ha Linh Mia Truongtag:engineering.zalando.com,2016-01-19:/posts/2016/01/reactive-design-patterns.html<p>A Talk by Typesafe’s Roland Kuhn (Slides)</p><p>Last week Zalando’s Tech Hub in Berlin had the great pleasure of hosting a talk by Dr. <a href="https://twitter.com/rolandkuhn">Roland
Kuhn</a>: leader of Typesafe’s <a href="http://akka.io/">Akka project</a>, and coauthor of the book
<a href="https://www.manning.com/books/reactive-design-patterns">Reactive Design Patterns</a> and the <a href="http://www.reactivemanifesto.org/">Reactive
Manifesto</a>. For a standing-room-only crowd, Roland highlighted the importance of
making reactive software: of considering responsiveness, maintainability, elasticity and scalability from the outset of
development. He explored several architecture elements that are commonly found in reactive systems, such as the circuit
breaker, various replication techniques, and flow control protocols. These patterns are language-agnostic and also
independent of the abundant choice of reactive programming frameworks and libraries. Check out his slides below:</p>
<p><strong><a href="https://www.slideshare.net/ZalandoTech/reactive-design-patterns-a-talk-by-typesafes-dr-roland-kuhn" title="Reactive Design Patterns: a talk by Typesafe's Dr. Roland Kuhn">Reactive Design Patterns: a talk by Typesafe's Dr. Roland
Kuhn</a></strong>
from <strong><a href="http://www.slideshare.net/ZalandoTech">Zalando Tech</a></strong></p>Meet Connexion: Our REST Framework for Python2016-01-11T00:00:00+01:002016-01-11T00:00:00+01:00Joao Santostag:engineering.zalando.com,2016-01-11:/posts/2016/01/meet-connexion-our-rest-framework-for-python.html<p>Automagically handle your REST API requests based on Swagger/OpenAPI 2.0 Specification files in YAML.</p><p>In transitioning from a monolith to microservices architecture, Zalando Tech has adopted “ <a href="https://tech.zalando.com/blog/on-apis-and-the-zalando-api-guild/">API
First</a>” as one of our key engineering principles. API
First ensures that our APIs are RESTful, robust, consistent, general, and abstracted from specific implementation and
use cases. <a href="http://swagger.io/">Swagger</a>, a specification for descripting REST APIs in a language-agnostic manner, has
become vitally important in our efforts to make API First a reality.</p>
<p>When our team tried to implement this principle for the first time, however, we faced some difficulty due to the lack of
related Python frameworks. Several frameworks produce a Swagger definition from an implementation, but none that we
found did the reverse. To fill the gap, my team and I recently developed Connexion: an open-source, REST framework for
Python, built on top of Flask and based on Swagger, and targeted for microservice development.</p>
<p>Connexion automagically handles request routing, OAuth2 security, request parameter validation and response
serialization based on a Swagger 2.0 Specification file in YAML. This eliminates the need to repeatedly write
boilerplate code across our microservices. Because it is based on Flask, Connexion supports everything that Flask does,
including deployment options and extensions. <a href="https://github.com/zalando/connexion">Visit our GitHub page to take a
look</a>.</p>
<p>In related: <a href="http://swagger.io/introducing-the-open-api-initiative/">Go here</a> to learn about the new Open API
Initiative, which creates an open governance model around the Swagger Specification under the Linux Foundation!</p>Using Elm to Create a Fun Game in Just Five Days2016-01-07T00:00:00+01:002016-01-07T00:00:00+01:00Kolja Wilcketag:engineering.zalando.com,2016-01-07:/posts/2016/01/using-elm-to-create-a-fun-game-in-just-five-days.html<p>Learn about 404 Elm Street, our open-source browser game made with Elm.</p><p><a href="http://elm-lang.org/">Elm</a> is getting a lot of traction these days. It is a fully functional language with a strong
type system that compiles to JavaScript, so naturally we’ve been wanting to learn it! During our recent Hack Week, we
found the perfect opportunity after realizing that Zalando was in dire need of an awesome game on our 404 page. Using
Elm, we created <a href="https://zalando.github.io/elm-street-404/">404 Elm Street</a>: a browser game in which you play Joe, a
courier who deliver packages to Zalando customers and shuttles returns back to the company. Players have to plan Joe’s
route carefully to prioritize deliveries and pick-ups at top speed. It’s a challenge!</p>
<p>Among the four of us who developed 404 Elm St., one of us had <a href="https://github.com/w0rm/elm-flatris">used Elm to develop a Flatris
clone</a>, one was new to functional programming, and two were experienced functional
programmers eager to try Elm. Here’s how we got the job done while learning a new language on the fly.</p>
<h3>Day One: Basic Framework and Sprite Rendering</h3>
<p>At Zalando, we always try to put ourselves in our customer's shoes. With 404 Elm St., we wanted to put our customers in
our shoes for a change. After brainstorming a minimum set of features, our team got to work.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/805bc167467ec5eacb265685aca9020ff726975e_elm-street-2.png?auto=compress,format"></p>
<ul>
<li>Kolja (senior front-end engineer, ClojureScript enthusiast, illustrator) created the first animated sprites</li>
<li>Andrey (senior front-end engineer, author of the aforementioned Flatris clone) put the raw framework together and
the code to animate the sprites</li>
<li>Vignesh (senior front-end engineer) got up to speed with Elm</li>
<li><a href="https://tech.zalando.com/blog/building-our-own-open-source-http-routing-solution/">Arpad</a> (front-end engineer)
worked on an A* algorithm to generate a pathfinder for Joe</li>
</ul>
<p>At the end of the day, we had created a map featuring houses and an animated fountain.</p>
<h3>Day Two: A* Pathfinding FTW</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9f7cb0d6e9567159547c9a550bbf36203e0ca910_elm-street-3.png?auto=compress,format"></p>
<p>Arpad finished his work on the algorithm; Andrey coded a mock page for pathfinding that used SVG to draw a calculated
path; Andrey and Vignesh pair-programmed and came up with a code model and game logic; and Kolja sketched.</p>
<p>The results: A working pathfinding algorithm and a map with actual game objects (houses, warehouses, trees, fountains
and Joe).</p>
<h3>Day Three: Game Logic Focus</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6b647d547a3e9b5b67eee72cca281ac9c6fbd035_elm-street-4.png?auto=compress,format"></p>
<p>We rendered different items and made sure that Joe was mobile. Andrey continued working on the game logic and event
handling, and Kolja continued drawing. By day’s end, Joe could move around in eight different directions and follow
paths.</p>
<h3>Day Four: Putting Things Together</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/21ab04c8f3547a48206baabdbd9bacc1e6926a72_elm-street-5.png?auto=compress,format"></p>
<p>We <a href="https://github.com/zalando/elm-street-404">went open source on GitHub</a>! We also added click-handling and other
components, and Kolja finished drawing the customers.</p>
<h3>Day Five: Final Presentation</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1dda33602d0a18377ce1724af5b818d09b9ca239_elm-street-6.png?auto=compress,format"></p>
<p>We set up a booth at the Hack Week project fair and demonstrated the game for our colleagues. Thanks to Vignesh’s
implementation of continuous deployment, we were able to add features and fix bugs during the fair. When our booth
visitors refreshed the screen, they saw our improvements.</p>
<h3>Elm Discoveries</h3>
<p>Elm can be quite challenging, and not everyone on our team was able to understand it completely. Adhering to a purely
functional paradigm was something new for us, even if we’d previously written functional code in JavaScript. In most
cases, we could agree on a function’s signature and make it return a hardcoded value; then one of us would take care of
implementation. We also organized our data structures to reduce the scope of the functions. Instead of keeping multiple
lists of articles for each location (be it a delivery person, a warehouse or a house), we placed all articles in one
list and used location as a tag. This allowed us to quickly iterate over and filter out the articles that we needed,
instead of collecting them from different locations.</p>
<p>Coming from JavaScript backgrounds, our team had to adjust to Elm’s immutability. Some of us even wrote functions that
compare records by checking the equality of each field, when a simple “==” operator was enough. This actually helped us
a lot by linking game objects together; a game object effectively became its own foreign key inside another object. One
thing we learned to keep in mind, however, was that changing an original object necessitated updating all of its copies.</p>
<p>Finally, we learned that our algorithm really needed to be streamlined, or else the code would turn out ugly and
complicated. Elm makes it obvious what needs to be refactored or rewritten in the future.</p>
<p>Once we’d established all the necessary primitives around the <a href="https://github.com/evancz/elm-architecture-tutorial/">Elm
architecture</a>, our productivity increased quickly. Using
JavaScript would have required setting up the boilerplate code of Webpack, Babel etc., and would have stolen much more
time.</p>
<p>The Elm compiler helped us greatly — allowing us to toss code around quickly, rename functions, and change types, while
remaining confident about the results. The compiler also helped us resolve a cyclic dependency issue by showing <a href="https://twitter.com/krisajenkins/status/677572860915941376">nice
ASCII art</a> of the dependency loop.</p>
<h3>Conclusion</h3>
<p>We had a great time making the game, and are still improving it; <a href="https://github.com/zalando/elm-street-404/issues">go
here</a> to contribute, share issues, and send pull requests. Some of our
plans:</p>
<ul>
<li>create “Start” and “Game Over” screens</li>
<li>add sound (if you’re a sound engineer, drop us a line at andrey.kuzmin at zalando dot de!)</li>
<li>put this thing on an actual 404 page!</li>
</ul>
<p>We’re also continuing our work with the Elm language, and have created a guild that will allow us to pursue things
further.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7076d864a6ff9badd6315f4acc01a2b9556e8ec5_elm-street-team.jpg?auto=compress,format"></p>Mobile Trends for 20162015-12-22T00:00:00+01:002015-12-22T00:00:00+01:00Kristina Walcker-Mayertag:engineering.zalando.com,2015-12-22:/posts/2015/12/mobile-trends-for-2016.html<p>An overview of mobile trends for the coming year, and how Zalando is tackling these</p><ol>
<li>Shopping straight from social media</li>
</ol>
<p>Social platforms, such as Instagram, Facebook and Pinterest, came up with the “Buy Button” this year so users can shop
selected items without leaving the respective social platform. Mobile users spend most of their time on social apps, and
according to Nielsen this is around 85% of their time. This offers customers a convenient way to order their favorite
items straight away.</p>
<p>In general, we assume that the mobile landscape will become more consolidated next year: the mobile leaders (social
platforms, messenger apps, etc.) will take over most of the mobile moments of the consumer. For brands, this means they
have to hook up with big players in order to keep up with their customers, by still providing own mobile services and
solutions.</p>
<p>Read more <a href="https://tech.zalando.com/blog/how-zalandos-app-makes-instagram-images-shoppable/">here</a> about how we made
Instagram shoppable within our Zalando fashion store app.</p>
<ol>
<li>Creating mobile-first experiences is crucial for companies.</li>
</ol>
<p>For even more companies, mobile has become a core pillar of their global strategy in 2015, and this will only grow
further in 2016. Mobile is not only a channel, but a core strategic component for every step of the shopping lifecycle.
The conversation has changed in boardrooms from: “Can we afford to get into mobile?” to “We can’t afford not to get into
mobile”. Only 20% of companies were creating mobile-first experiences in 2015, but this will change in the new year.</p>
<p>Mobile is already seen as an important engagement tool for the personal customer relationship. Brands should expect
consumers to become more empowered and, therefore, demand instant gratification, whether through up to date content or
purchases.</p>
<p>Employees need to understand that the shift from desktop to mobile is happening faster than imaginable at the moment.
The culture of product development needs to shift from desktop-first to mobile-first. This is not only doable by a
single mobile team, but by integrating the entire company in this shift.</p>
<p>Read more about what we did to drive the Mobile Mindshift within Zalando
<a href="https://tech.zalando.com/blog/why-zalando-is-celebrating-mobile-first-day/">here.</a></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ad16de398feb11016884b20ccdb29d32f6dc283a_shutterstock_169338014.jpg?auto=compress,format"></p>
<ol>
<li>Internet of Things and Smart Objects</li>
</ol>
<p>In future the average household will have several smart appliances, such as light bulbs that can be switched on and off
via smart devices, heating control,sports and health tools, and toys that can be controlled via apps.</p>
<p>Wearables, such as smartwatches and Smart TVs, were a big topic this year. But there are only a small percentage of the
wide selection of smart objects and connected household devices. Larry Page mentioned earlier this year: “We are no
longer in a mobile-first world, we are in a mobile-only world.”</p>
<p>In 2016, we can expect the Internet of Things and the mobility trend to peak with new offerings that will connect
devices to a vast array of gadgets, vehicles and further equipment.</p>
<p>Read more <a href="https://tech.zalando.com/blog/what-we-learned-while-making-zalandos-apple-watch-app/">here</a> on what we learnt
whilst developing Zalando’s Apple Watch App</p>
<ol>
<li>Mobile Payment</li>
</ol>
<p>In 2016, it can be expected that the amount of mobile payment opportunities will increase. People are becoming more
comfortable paying via mobile devices, which will only become easier and safer. We’re currently facing a big alteration
when it comes to financial transactions in the digital sphere. Swiping, waving or tapping are just some examples of
gestures, that can be used to pay with your handset.</p>
<p><a href="https://corporate.zalando.com/en/zalandocouk-launches-apple-pay-and-my-returns-0">Here</a> you can find out more on how we
launched Apple Pay in our App in UK as one of the first Apple Pay partners in Europe:</p>
<ol>
<li>Mobile will transform more businesses & industries</li>
</ol>
<p>Mobile services, such as UBER, Shazam, Google Maps or local transportation apps have improved the customer-experience in
so many ways, transformed entire industries and come up with completely new customer propositions. Remember those times
when you need to call a cab, but you didn’t have the number of the taxi operator at hand and or a clue where you were?
Or you listened to a song in the radio, and you desperately wanted to find out what it was?</p>
<p>For the transformation of the customer-experience, mobile plays a crucial role and drives business benefits, such as new
revenues and cost savings, but also the customer engagement by increasing loyalty through satisfaction and delight.
Context becomes very important to a fully personal and seamless experience: time, location and situation of the user
need to be adapted in order to fulfill the required needs of the consumer.</p>
<p>Watch the <a href="https://www.youtube.com/watch?v=fFXvjUrc-Wk">video</a> on how we want to reshape the future of e-commerce at
Zalando.</p>
<p>What do you think about our mobile trends for next year? Share your comments below.</p>
<p>Tweet the author: @mobilegeekgirl</p>Hack Week #4: Let’s Talk About Code, Baby!2015-12-21T00:00:00+01:002015-12-21T00:00:00+01:00Claudia Thiniustag:engineering.zalando.com,2015-12-21:/posts/2015/12/hack-week-4-lets-talk-about-code-baby.html<p>How to integrate non-devs into our Zalando Tech world.</p><p>Zalando’s many tech-driven projects and initiatives — from logistics to customer care to marketing analysis — are
cross-departmental. To help our thousands of non-tech employees better understand our many tech topics and terms, one of
our Hack Week teams created a training program called Let’s Talk About Code, Baby. The project aims to help us all to
better understand each other and create a more collaborative environment.</p>
<p>To create Let’s Talk, the team prototyped a virtual tour of our nine-story tech headquarters in Berlin and developed a
modular training concept. Team members conducted interviews and surveys with non-tech departments to find out which
types of information were most essential and relevant; a first glance at the results suggested four primary modules:</p>
<ul>
<li>Welcome to the Tech World: general tech info, from “how many bits fit in a byte?” to “how to write easy SQL
statements”</li>
<li>Zalando Technology: our teams, people and collaborations</li>
<li>Tech communication: a glossary of tech terms internal lingo</li>
<li>Learn to code: an online tutorial</li>
</ul>
<p>For Matthias Noll, Inhouse Trainer for Logistics, Hack Week was like an intense, five-day workshop on gathering new
insights into different training methods. He also got to hone some of his technical skills. “I really liked the Design
Thinking approach that our team used and think it was very effective,” he says. “We must have gone through at least 200
Post-its.”</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/430c9264720e13ab4728666dc031c39feb109785_pasted-image-0.png?auto=compress,format"></p>
<p>Although Let’s Talk About Code, Baby is for Zalando’s internal use, some of the modules could eventually prove
meaningful for external user groups: for example, refugees who would like to learn to code might find it helpful. We’re
eager to see the final product!</p>Hack Week #4: From Dublin to Dortmund2015-12-18T00:00:00+01:002015-12-18T00:00:00+01:00Humberto Coronatag:engineering.zalando.com,2015-12-18:/posts/2015/12/hack-week-4-from-dublin-to-dortmund.html<p>Hack Week is like a big reunion with your friends.</p><p>Hack Week #4 is the first for most of us who work in Zalando’s Fashion Insights Centre in Dublin, which opened earlier
this year. Our crew has heard a lot about how fun and creative past HWs in Berlin have been; with all the stories of
<a href="http://thespaceshoe.com">space shoe launches</a>, parties, and other historic moments capturing our imaginations, it’s no
wonder that all of our lunch conversations over the past few weeks have focused on our hacking plans. So when the day
finally came for the 14 of us from Dublin to fly to
<a href="https://tech.zalando.com/blog/hack-week-4-dortmundwillhackthis/">Dortmund</a>, we were all very happy.</p>
<p>What happens when you bring together 100 Zalandos from seven different tech offices and give them almost unlimited
resources and freedom to build whatever they want, however they want, and with whomever we want? Well, pretty much
everything and anything! HW feels like an early holiday gift for the hacker inside each of us. Close to 20 different
projects are underway in Dortmund: Some teams are busy making contributions to open source, one is studying better
alternatives to streaming, and others are building new ways to reach our customers. We also have some good-looking
hardware hacks in progress.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7afdc96bec552760219f6ed278fcd86290f13752_image-18-12-15-at-11.00.jpg?auto=compress,format"></p>
<p><em>“Hack Week is like a big reunion with your friends — you look forward to it all year long. This year’s HW hit it out of
the ballpark.”</em></p>
<p>The environment here is amazing! People are really excited about what they are building, working really hard, and
spontaneously applauding whenever one of us reaches a new milestone. Getting to work with colleagues from so many other
different locations and teams is very inspiring. We all have very different backgrounds and experiences, but when we sit
down and start building something, we start learning together … and our creativity bursts! Most of us had never met each
other before, but started collaborating online a few weeks ago: forming teams, pitching and planning our project ideas
and talking to other teams. After about ten minutes of hacking together, my team members and were already friends.</p>
<p>Besides working on our projects, we’re spending our time becoming rock band legends, exploring Dortmund, and playing
games (we even have our own piñata). I am also a judge for the Inclusion Ninja Award (read more
<a href="https://tech.zalando.com/blog/hack-week-4-awards/">here</a>), so I’ve been reaching out to many different teams to ask
them about their work.</p>
<p>It’s hard to believe that the week is nearly over, but I’m already looking forward to next year’s Hack Week.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/38532cf66d29bae7373e18cff3bac4ed66bf413c_image-18-12-15-at-11.01.jpg?auto=compress,format"></p>
<p><em>A sneak peek at a project that combines hardware-hacking with light Graffiti.</em></p>Hack Week #4: Hacking for Social Good2015-12-18T00:00:00+01:002015-12-18T00:00:00+01:00Kave Bishogotag:engineering.zalando.com,2015-12-18:/posts/2015/12/hack-week-4-hacking-for-social-good.html<p>How Hack Week helps to address social needs.</p><p>Zalando technologists use their skills and expertise for <a href="https://tech.zalando.com/blog/doing-data-science-for-social-good/">social
good</a> all year round — including <a href="https://tech.zalando.com/blog/one-last-thing-before-we-call-it-a-year-hack-week-4/">Hack
Week</a>. Some of our teams have been
spending the past week working on projects that address particular social needs. Let’s take a look at a few of them!</p>
<p><strong>“The Refugee Clothing Points” App</strong></p>
<p>Clothing is one of the greatest needs faced by the large number of refugees entering Germany from the Middle East. This
mobile app points refugees to the nearest distribution location and provides them with listings on available clothing,
including type of garment (shirt, pants, etc.) and intended recipient (man, woman, or children). It also connects
Zalando customers who wish to make donations to nearby distribution points. The app will be in two main languages,
English and Arabic and refugees will be able to search for needs even without Wifi.</p>
<p><strong>“ZENgage”</strong></p>
<p>Developed by one of our <a href="https://tech.zalando.com/locations/#dortmund">Dortmund</a> teams, ZENgage is a website that
matches non-governmental organizations (NGOs) in Germany with people who are looking for volunteering opportunities. To
find the right fit, ZENgage asks aspiring volunteers to create user profiles listing their skills and experience, which
will automatically provide them with an NGO fitting their profile, but more so one that needs their expertise.</p>
<p><strong>Second Fit</strong></p>
<p>This website matches Zalando employees and customers with NGOs that provide services — food, toys, clothing, even
cosmetics — to disenfranchised people living in Berlin.</p>
<p>All of these are candidates for our <a href="https://tech.zalando.com/blog/hack-week-4-awards/">Hack Week Do Award</a>, which
recognizes projects built for social good. Whether they win or not, most of the teams will continue working on their
projects in the new year.</p>Hack Week #4: Turn it up to Eleven2015-12-18T00:00:00+01:002015-12-18T00:00:00+01:00Selina McCarthytag:engineering.zalando.com,2015-12-18:/posts/2015/12/hack-week-4-turn-it-up-to-eleven.html<p>Hack Week is not all about projects and hard work. Check out what else goes on during Hack Week here:</p><p>Hack Week is not all about coding, innovating and working hard. Throughout the week we’ve hosted parties and side events
to help the teams really let their hair down and relax. The theme of <a href="https://tech.zalando.com/blog/one-last-thing-before-we-call-it-a-year-hack-week-4/">Hack Week
#4</a> is Rockstarz, so we’ve
organized lots of Rock n’ Roll-themed activities.</p>
<p>Since Monday, our Rockbandz competition has run daily at lunchtime in
<a href="https://tech.zalando.com/locations/#berlin">Berlin</a> and <a href="https://tech.zalando.com/locations/#dortmund">Dortmund</a>. Each
team includes a guitarist (or two), a lead singer and a drummer, who compete on instrument simulators. The competition
began with seven bands, and today the final two will compete for the ultimate title of Champion Rock Band!</p>
<p><img alt="Rockbandz!" src="https://images.prismic.io/zalando-jobsite/f81a31684b1442e19e44278b18741a6b61a44b85_rockbandz.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e216e9d6d07eeb29d2dc138ee187a6cee2c25b98_rockbandz-dortmund.jpg?auto=compress,format"></p>
<p>There are two other ongoing championships that end today: Helldiverz, a shooting game with challenges that have gotten
progressively harder over the week; and Rally Mario Da Kartz, a “Mario Kart on Speed” that spans all 32 original tracks
from Mario Kart 8 — but with the CPU set to “Hard,” and the items as “Frantic.”</p>
<p>Yesterday we hosted a Mini Rockerz event for Zalando Tech parents to bring their youngest mini-me’s to the office for
the day:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2c3e7e3a164ad04eda0f0b34b7fd3b9c6e048e0a_img_1479.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d9546e83e0fce3f5616500a6b7695e4ed2541fcf_tonyhauptphoto-zalandohackweek-wednesday16122015_71.jpg?auto=compress,format"></p>
<p>Parents in Berlin also had the chance to bring slightly older mini-me’s to our Mini Hackerz event, where a team of
Zalandos taught them some basic coding skills:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c3193ad0df8a1555bfaee16737c1af2a1bbaf622_img_1517.jpg?auto=compress,format"></p>
<p>On Tuesday, our tech execs — including SVP of Technology Philipp Erler and VP Engineering Eric Bowman — took a turn as
Chefs du Jour and served waffles and mulled wine to the teams:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d40d514c24f2a3d616140fbead324062eee2023e_sam_7786.jpg?auto=compress,format"></p>
<p>Wednesday’s fun included rock-band Jamz and hard rock Karaoke. Three of our homegrown Zalando bands took to the stage to
entertain the masses, who dined on hot dogs and <a href="https://en.wikipedia.org/wiki/Dark_%27N%27_Stormy">Dark 'N Stormies</a>:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4d4ce73e346377e339e0fa3ca66944a17e26e927_tonyhauptphoto-zalandohackweek-wednesday16122015_44.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c26b75ef12aac753f602c8545cf3b729180f4809_tonyhauptphoto-zalandohackweek-wednesday16122015_35.jpg?auto=compress,format"></p>
<p>Piñatas, rock-themed movie nights, pool and table-tennis competitions, and lots of beer and hanging out completed the
Hack Week experience for many of us:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7d78eca2184051160451c1fbac261b6861627934_dsc_3985.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6a9368b81f8a042f4ad45951b1c5d2b21e3b7306_2015-12-17-um-11-43-37.jpg?auto=compress,format"></p>
<p>Tonight’s award ceremony will lead into a final office-wide celebration for 2015. We owe a massive thank you to the
Zalando Hack Week organizers for making the week so incredibly cool. Rock on, dear organizers!</p>Hack Week #4: Zinder2015-12-18T00:00:00+01:002015-12-18T00:00:00+01:00Petteri Lappalainentag:engineering.zalando.com,2015-12-18:/posts/2015/12/hack-week-4-zinder.html<p>Zalando - Connecting people!</p><p>Seven members of Zalando’s Helsinki team (“Zelsinkis”) meet at the Helsinki airport on a Sunday afternoon. It doesn’t
take long for the first challenge to hit us in the face — or feet, I should say. Our airline decides to change our
departure gate, and we’re on the other side of the airport. Gotta love it. Got to RUN!</p>
<p>Our team — five engineers, one <a href="https://tech.zalando.com/jobs/tech/75772-producer/">producer</a> (that’s me!), and one team
assistant/community manager — safely arrive in Dusseldorf and start looking for info on how to find the right train to
Dortmund, and how to purchase our tickets. Past visits to Germany have taught us to bring cash along; using credit cards
everywhere remains a utopian fantasy here. We eventually reach the train, which is packed, and after a few hours arrive
safely to Dortmund. Let the hacking begin!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/efb264b4e2ad61959234fca7b7a16a1e2b1a1fd5_zinder.jpg?auto=compress,format"></p>
<p>My team and I have been spending our Hack Week working on Zinder: an iPhone application that takes all of the
information from our <a href="https://github.com/zalando/cube">internal tech personnel database</a>, converts it into user
profiles, and then shows the information in a Tinder-like format. If you like someone’s profile and they “like” you
back, you can start messaging each other instantly via HipChat.</p>
<p>To create Zinder, we put our internal personnel information in Redis and used Scala and Play Framework on the backend.
Our team includes two backend engineers (one of whom remained back in Helsinki), one iOS developer, and me. It’s been a
few years since I last wrote code, so I took up graphics design duty and generated our icons, logos and other assets. I
also learned a lot about <a href="https://developer.apple.com/xcode/">Xcode</a>, the iOS developer environment. On our second hack
day we received a lot of help from a veteran member of our Berlin tech team, who helped us create our login feature and
advised us on infrastructure. Beta testers from our Dortmund, Helsinki, and Dublin tech hubs are helping us to find and
fix issues.</p>
<p>For now, the purpose of Zinder is to help our colleagues have some fun and put names to faces. If we decide to formally
propose it for internal adoption, we’ll carefully rethink its concept and UX. Our team has talked about adding
additional features to help Zalandos find colleagues and teams with particular skills or interests.</p>
<p>Hack Week has been a great success for me personally: From Helsinki to Hack Week, from producer to UX designer, from
facing a challenge to overcoming a challenge. Like Bill and Ted, I’ve just had an an excellent adventure. And so much
fun!</p>Hack Week #4: #DortmundWillHackThis2015-12-17T00:00:00+01:002015-12-17T00:00:00+01:00Vivi Brooketag:engineering.zalando.com,2015-12-17:/posts/2015/12/hack-week-4-dortmundwillhackthis.html<p>How Hack Week hit Dortmund.</p><p>With our transformation from fashion retailer into <a href="https://tech.zalando.com/blog/zalandos-vp-brand-solutions-presents-at-the-july-2015-fashtech-konferenz./">fashion
platform</a>, our
amazing business success, and our implementation of <a href="https://tech.zalando.com/blog/so-youve-heard-about-radical-agility...-video/">Radical
Agility</a>, Zalando Tech has grown
exponentially in 2015. Nearly 900 technologists work from <a href="https://tech.zalando.com/locations/">seven different
locations</a>: Berlin (tech headquarters), Dortmund, Mönchengladbach, Erfurt, Hamburg,
Dublin and Helsinki. With such a huge team, we’ve had to open a second Hack Week location for the first time in Hack
Week history. We picked our shiny new office in <a href="https://tech.zalando.com/locations/#dortmund">Dortmund</a>, with
colleagues from other hubs either joining in person or working remotely.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/18fb9bae3f92d77e29dce24ce7c70355e1fd5236_img_3285.jpg?auto=compress,format"></p>
<p>This week our 60 Dortmund colleagues have been hosting 100 Zalandos from our various hubs (including Berlin) — creating
a vibrant international atmosphere in which creativity, innovation and (most importantly) fun can be unleashed!
Dortmunders have welcomed us (I’m from the Helsinki office) with open arms, and our multicultural teams are all busy
hacking away on more than 15 different projects. After a hard day’s hacking, we continue hanging out together — playing
pool, partying at the local holiday market, and letting loose. This unique atmosphere has really brought about genuine
innovation and helped us forge strong bonds. An unforgettable week for all involved. Dortmund hackers will rock on!!!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3b00294e014332c66127932c0d67a3346a1a330d_img_3230.jpg?auto=compress,format"></p>
<p>As one of Zalando Tech’s most important and highly anticipated events of the year, Hack Week always gets its own theme:
this year’s is “Rockstarz.” Rock band competitions, crazy karaoke, rock’n’roll-themed movie nights, and other awesome
side events have pumped us up and kept our energy flowing. Dortmund’s multicultural, diverse and tightly-knit
environment has inspired our teams to produce some really excellent projects that we fully expect will win some
<a href="https://tech.zalando.com/blog/hack-week-4-awards/">awards</a> at tomorrow’s ceremony.</p>Hack Week #4: Hack Week How-To2015-12-17T00:00:00+01:002015-12-17T00:00:00+01:00Carina Kuhrtag:engineering.zalando.com,2015-12-17:/posts/2015/12/hack-week-4-hack-week-how-to.html<p>Tips From Our Organizers</p><p>Five days, 100 projects, 900 participants: Hack Week #4 is our biggest and most international edition yet! I asked two
members of our organizing committee (aka “the Orga Crew”) — Ellen Nagel (Manager Executive Projects and Culture) and
Bastian Gerhard (Head of Innovation & Enablement) how to throw a successful Hack Week of this scope and size. If you’ve
ever thought about organizing your own similar event, read on.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5596cfdda26674c83ed7e9f06f166b48bf74d3cc_ellenandbastian.jpg?auto=compress,format"></p>
<p><strong>Carina: How many Hack Weeks have you organized?</strong>
<strong>Ellen:</strong> This is #4 for Bastian and me.</p>
<p><strong>Carina: Where did the idea originally come from?</strong>
<strong>Ellen:</strong> In 2013, we wanted to create a fun event that would drive innovation and improve company culture. We saw what
<a href="https://www.youtube.com/watch?v=jrrCSnOFyaY">Dropbox</a> and other companies were doing, got inspired, and decided to give
it a try. Luckily, our SVP of Technology Philipp Erler was a big fan of this idea.
<strong>Bastian:</strong> We wanted to pull our technologists out of their comfort zones and see what they were passionate about —see
if that passion would lead to some undiscovered, cool ideas. Keep in mind that in 2013, Zalando was moving very quickly.
Hack Week required letting 360 engineers work on something besides their usual work for a whole week. The risk of doing
that was high.</p>
<p><strong>Carina: Was it difficult to convince the Management Board of Hack Week’s potential value?</strong>
<strong>Ellen:</strong> Not at all, because Philipp pushed the idea. The management board’s support was the biggest gift. With HW
#1, it was a bit unclear what our exact expectations were — not only from top management, but from everybody. What
would happen with all the projects afterwards? How much guidance would the teams need? Was it worth it to pull 360
employees out of their daily work for an entire week? It was an incredible investment. But we got all the support we
needed.
<strong>Bastian:</strong> There were certain things where we didn’t get budget for; looking back, this was advantageous. Initially we
thought, for example, that we’d need external coaches for the project teams. In the end, the teams were completely
autonomous and capable. We would have never found this out if we had hired externals.</p>
<p><strong>Carina: Zalando remains a startup at heart, which sometimes requires making big things happen on short notice and
small budgets. Does this present a challenge?</strong>
<strong>Ellen:</strong> Sure, but in the end it has many positive side effects. It forces us to be creative.
<strong>Bastian:</strong> Often you just need to transfer ownership, distribute the work. Colleagues organize their own side events,
the organizing crew builds the awards instead of buying trophies, top management makes the intro videos and serves
snacks. It helps us to stay a bit humble.</p>
<p><strong>Carina: For HW #1, how did you tackle organization and figure out what you needed to do?</strong>
<strong>Ellen:</strong> We broke it down into smaller building blocks and leveraged the knowledge we’d gained from either setting up
or attending hackathons and other tech events. We asked ourselves, “which of these building blocks do we need to make
Hack Week a success?” One was branding: We needed some video production, a cool logo design, awards.
<strong>Bastian:</strong> In the typical, “data-driven” Zalando manner, we also thought about KPIs: What would be the output — number
of projects, return on investment — and outcome? How many Twitter followers would we gain? For HW #1, we created a
business case with a huge document full of concepts. By HW #2, we realized this wasn’t necessary and focused more on
the content. But I think that huge document was necessary for us to define scope.
<strong>Ellen:</strong> We experimented a lot. With everything.
<strong>Bastian:</strong> Oh yes. And it was really chaotic, and we needed to improvise a lot.</p>
<p><strong>Carina: How does the HW committee work together?</strong>
<strong>Ellen:</strong> For HW #1 we were only four or five people, and we didn’t have roles. It made things a bit chaotic. Now the
committee includes 20 people who focus on autonomous work streams. Every workstream starts with a simple project charter
created by the respective team members, who collaborate on identifying their goals and purpose and aligning on
requirements, scope and deliverables. There are many interdependencies between the work streams, so they have to
constantly communicate and make sure that they are on the same page. But we’re there to give them a safety net and
guidance. (The HW#4 team is doing a fabulous job — big thanks to all of them!)
<strong>Bastian:</strong> We encourage the Orga Crew to come up with their own ideas. We don’t dictate anything, or even give them
budget estimations. We want them to think big, not focus on budget. We often find that we need to realize their ideas
with a quarter of the budget though.
<strong>Ellen:</strong> Our teams don’t let budget restrictions stop them — they find creative solutions to make their projects
possible. That is what Zalando Tech culture is about.</p>
<p><strong>Carina: HW seems like a lot of work. Do you organize it on top of your daily duties?</strong>
<strong>Ellen:</strong> HW is part of our job, so it’s our daily work already. This year our Crew included several newbies, so I
think it felt a bit overwhelming for them at times. But if you identify with the project and want to achieve something
great and big, it’s an amazing opportunity. The team really loved preparing and wanted to make everything perfect.
That's why they stayed till late last Friday, decorating the office.
<strong>Bastian:</strong> I totally agree. This project lives and dies with the passion of the employees.</p>
<p><strong>Carina: It’s mid-December. When did you start preparing?</strong>
<strong>Bastian:</strong> At the beginning of Q4.
<strong>Ellen:</strong> Yes, around October. We need 2.5 to three months lead time. We have bi-monthly alignment meetings and the
workstreams start working in parallel.</p>
<p><strong>Carina: What is the first thing you need to decide?</strong>
<strong>Ellen:</strong> The motto, because a lot of things evolve from that — especially decorations and side events.
<strong>Bastian:</strong> The motto unifies the whole event and gives it a stronger identity; it makes it more memorable. It sticks,
during and afterwards. And with a motto, every Hack Week feels new.
<strong>Ellen:</strong> A motto is a good framework for branding not only the event itself, but also for the teams — they love to
refer to it in their project presentations. What I really love is to walk around our office and seeing all these
artifacts of past Hack Weeks everywhere.</p>
<p><strong>Why should companies conduct Hack Weeks?</strong>
<strong>Ellen:</strong> From a cultural perspective, it’ll strengthen your employees’ identification with your company and its brand.
Giving your teams five days and the necessary tools to work on a project that’s really fun for them is a big compliment
to them — it’s a tremendous show of appreciation. Additionally, it gives your employees a great opportunity to network
outside of their own teams. And your office space becomes really personal: Traces of every HW remain, and teams really
love that. No interior designer in the whole world could build such a personalized team space.
<strong>Bastian:</strong> It creates a strong team spirit. I mean, last year one team launched a <a href="https://tech.zalando.com/blog/we-launched-it-the-zalando-space-shoe-video/">shoe into
space</a> — that’s something they will tell
their grandchildren. People can experiment, try out the newest technologies, and by the end of the week they have
achieved something through teamwork; I think this is the biggest benefit. Another reason is that Hack Week is a good
source of bottom-up innovation.
<strong>Ellen:</strong> After HW #1, we were surprised by how many projects had a real business case or true customer focus. We
never told the teams that a project needed to offer business value for Zalando, or that it had to solve a customer
problem, but in the end this was often the case. I think it’s important to not reject any project idea — just let the
teams work on whatever they want. We actually only rejected a project once: because the team wanted to build a swimming
pool in our Sky Lounge!</p>Hack Week #4: The Knitting Machine2015-12-17T00:00:00+01:002015-12-17T00:00:00+01:00Claudia Thiniustag:engineering.zalando.com,2015-12-17:/posts/2015/12/hack-week-4-the-knitting-machine.html<p>How to turn an old knitting machine into a yarn-based printer.</p><p>Buy an old knitting machine from the 1980s and turn it into a yarn-based printer! This is what one of our Hack Week
teams has done — producing the latest knitted fashions, geek-style! Their key ingredients: some online shopping luck, a
few tutorials on how to operate the machine, an <a href="https://www.arduino.cc/en/Main/ArduinoBoardUno">Arduino Uno</a>, and the
open source hard- and software from <a href="http://ayab-knitting.com">All Yarns Are Beautiful</a>. Check it out!</p>Hack Week #4: Building the Best Conference App2015-12-16T00:00:00+01:002015-12-16T00:00:00+01:00Hayley Baldwintag:engineering.zalando.com,2015-12-16:/posts/2015/12/hack-week-4-building-the-best-conference-app.html<p>How we are building a high-performance conference app during Hack Week.</p><p>Nowadays it’s common for tech conferences to offer their own event-specific apps with schedules, location information,
and other essential details. While convenient, these apps are often sub-optimal — buggy and slow, even for the most
prestigious tech events. So with this in mind, one of our <a href="https://tech.zalando.com/blog/hack-week-4-begins/">Hack Week</a>
teams is aiming to build a high-performance event app for Zalando.</p>
<p>“We want to redefine Zalando fashion, music and tech events of the future,” says Zalando Community Manager Joanna
Buchmeyer.</p>
<p>Developing the app required Joanna and the team to understand how attendees physically move through conferences and
relate to each other. While fashion is <a href="https://tech.zalando.com/blog/how-zalandos-app-makes-instagram-images-shoppable/">very much
online</a>, fashion events still take
place in the three-dimensional world. To solve these puzzles, the project team includes test engineers, UX designers,
two frontend engineers, and members of the Zalando Tech community team, experiential marketing and <a href="https://tech.zalando.com/blog/zalando-opens-new-playground-for-tech-innovation/">Innovation
Lab</a> team.</p>
<p>Vivien Leung gives us a look into the concept that started the project:</p>
<h3>Building the App</h3>
<p>Zalando frontend engineer Ahmed Sayeed Abbas explains that the development team is using React Native to show that it’s
possible to build — in less than a week — a production-ready app that applies one logic for both Android and iOS. “The
community support is really strong for <a href="https://facebook.github.io/react-native/">React Native</a>, he says, “so we get a
fast response to questions and issues we might have.”</p>
<p>As for features and usability, the app addresses the fashion world’s emphasis on personal brand building and networking
by facilitating connections between people and opportunities. Some of its features include:</p>
<ul>
<li>scheduling, which event organizers can live-update</li>
<li>scheduling for users</li>
<li>making reservations so users can add themselves to guestlists for event satellite events, exhibits and shows</li>
<li>a built-in, user-curated social engagement view</li>
<li>connectivity to third-party applications like Twitter and Instagram</li>
<li>beacon integration for offers, prizes, and announcements</li>
</ul>
<p>The overarching idea is to mobilize, inspire and engage attendees while helping them to get the most out of their
conference experience.</p>Hack Week #4: Onboarding Goes Hack Week!2015-12-16T00:00:00+01:002015-12-16T00:00:00+01:00Janine Schneidertag:engineering.zalando.com,2015-12-16:/posts/2015/12/hack-week-4-onboarding-goes-hack-week.html<p>How a Hack Week project helps Zalando Tech newbies get on board.</p><p>What if you’ve just moved from Amsterdam to Berlin to join Zalando, and are now in desperate need of information about
life and the bureaucratic jungle in your new home country? If you’re Backend Engineer Daniel Franke, you take a
pragmatic approach and create Zalando Concierge: an onboarding app that aims to make life for Zalando newbies as easy as
possible.</p>
<p>Zalando Concierge provides new Zalandos with “all the right information at the right time,” to make onboarding and
adjusting to Z-life more fun than anxiety-producing. Some examples: a pre-relocation checklist; a to-do list for new
arrivals (registering at the Bürgeramt, getting health insurance, opening a bank account, etc.) and other practical
matters. The team is focusing on Berlin-related info during Hack Week, but will include information on all other Zalando
locations over the coming months. To populate the app with content and keep the info updated and relevant, we’ll ask
recently onboarded Zalandos to add tips, tricks and reviews of past question and answers. That way, the team can take
full advantage of our newbies’ insights and wisdom and share it with the “next generations” of Zalando Tech.</p>
<p>Zalando Concierge has received support from our tech onboarding team and People & Organization department. New people
from around the world join Zalando’s tech hubs in Berlin, Dortmund, Helsinki or Dublin every week. Everyone who has
moved from one country to another knows: Finding all the basic but essential (and accurate) information to set up a new
life in a new place can be challenging. Soon, Zalando Concierge will be here to help!</p>Accelerating Warehouse Operations with Neural Networks2015-12-15T00:00:00+01:002015-12-15T00:00:00+01:00Calvin Sewardtag:engineering.zalando.com,2015-12-15:/posts/2015/12/accelerating-warehouse-operations-with-neural-networks.html<p>How we've been using deep neural networks to steer operations at Zalando’s fashion warehouses.</p><p>Recent advances in deep learning have enabled research and industry to master many challenges in computer vision and
natural language processing that were out of reach until just a few years ago. Yet computer vision and natural language
processing represent only the tip of the iceberg of what is possible. In this article, I will demonstrate how my
colleagues Dr. <a href="https://de.linkedin.com/in/sebastianheinz/en">Sebastian Heinz</a> (data scientist), <a href="https://de.linkedin.com/in/rolandvollgraf">Roland
Vollgraf</a> (data science expert) and
<a href="https://de.linkedin.com/in/calvinseward">I</a> (data scientist) used deep neural networks in steering operations at
Zalando’s fashion warehouses.</p>
<p>As Europe’s leading online fashion platform, we have many exciting opportunities to apply the latest results from data
science, statistics and high-performance computing. Zalando’s vertically integrated business model means that I have
dealt with projects as diverse as computer vision, fraud detection, recommender systems and, of course, warehouse
management.</p>
<p>To solve the warehouse management problem I’ll be discussing, we trained a neural network that very accurately estimates
the length of the shortest possible route that visits a set of locations in the warehouse. I’ll demonstrate how we used
this neural network to greatly accelerate a processing bottleneck, which in turn enabled us to more efficiently split
work between workers.</p>
<p>The core idea is to use deep learning to create a fast, efficient estimator for a slow and complex algorithm. This is an
idea that can (and will) be applied to problems in many areas of industry and research.</p>
<h3>The Picker Routing Problem</h3>
<p>I'll restrict the scope of this article to a very simplified warehouse control situation in which I consider a warehouse
that consists of only one zone with a “rope ladder” layout. The rope ladder layout means that items are stored in
shelves, and the shelves are organized in multiple rows with aisles and cross aisles. Some of these shelves contain
items that customers have ordered and must therefore be retrieved by a worker. See Figure 1 for a schematic
representation of the situation.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/622c2a6801c88f95465328aa0e551ed3c187f67a_calvin-seward-post.png?auto=compress,format"></p>
<p><em>Figure 1: Two schematics of a rope ladder warehouse zone with picks. The blue shelves denote shelves with items to be
picked, so the goal is to find the shortest possible route that allows a worker to visit all blue shelves while starting
and ending at the depot.</em></p>
<p>In 2013 we tackled the so-called “picker routing” problem: given a list of items that a worker should retrieve from the
warehouse (the “pick list”), find the most efficient route or “pick tour” for the worker to walk. In a pick tour, the
worker starts with an empty cart at the depot, walks through the warehouse and places items into the cart, and finishes
with the full cart at the depot. This is essentially a special case of <a href="https://en.wikipedia.org/wiki/Travelling_salesman_problem">the traveling salesman
problem</a> (TSP). Although the TSP is, in general, NP-hard, by
exploiting the rope ladder layout, the optimal solution to our picker routing problem can be found in linear time in the
number of aisles. The details of the exact algorithm are discussed in [1] and [2], which details an even simpler
case.</p>
<p>Our contribution to the problem was to come up with the OCaPi algorithm, short for Optimal Cart Pick. This algorithm
finds the optimal pick tour not just for the worker, but also for the movements of the worker’s cart. A worker in the
warehouse is no different from a shopper in a supermarket; the slow and heavy cart is sometimes left in the cross aisle
as the worker picks until no more items can be carried. Only then does the worker return to the cart and deposit the
items. A nice explanation of the project can be found
<a href="https://tech.zalando.com/blog/defeating-the-travelling-salesman-problem-for-warehouse-logistics">here</a>, and we’ll be
publishing the algorithm soon. All this enabled us to quit using the S-Shape routing heuristic [3] and route the
workers and their carts in the optimal way. See Figure 2 for an example of S-Shape and OCaPi routes.</p>
<p><img alt="s-shape and optimal
tour" src="https://images.prismic.io/zalando-jobsite/0dda6464d1df5a59cdc0a118d0faa1af2886b473_s_shape_and_ocapi.png?auto=compress,format"></p>
<p><em>Figure 2: S-Shape and optimal (OCaPi) pick tours. The blue circles denote items that must be picked, the arrows the
tour that the worker walks (including trips that are necessary for cart handling), and the thick gray line denotes the
path of the cart. The figure on the left is the so-called S-Shape heuristic, and the figure on the right is the OCaPi
pick tour.</em></p>
<h3>The Batching Problem</h3>
<p>At Zalando’s scale of operations, thousands of new orders are placed every hour, and each order must be assigned to a
pick list. Only when a pick list contains a certain number of items are the items collected and packaged for the
customer. For our idealized example, we assume that the following rules must be followed when splitting orders into pick
lists:</p>
<ul>
<li>A pick list may not exceed a certain length.</li>
<li>Items in an order may not be split between pick lists. In this way all the items in an order are already together
when the cart’s contents are sent to be packaged for shipping.</li>
<li>The sum of the travel times (time walking plus time pushing the cart) for all pick list should be as small as
possible.</li>
</ul>
<p>For example, assume that we have 10 orders, each consisting of two items. Further assume that a worker can fit only 10
items into the cart. Then the orders must be split into two equal-sized pick lists. See Figures 3 and 4 for two possible
splits of the orders into pick lists. This is a highly idealized situation, [4] presents a more complete picture.</p>
<p><img alt="median pick list
split" src="https://images.prismic.io/zalando-jobsite/4bcba37412b386e7114b52aef56546e50bb58236_median.png?auto=compress,format"></p>
<p><em>Figure 3: OCaPi pick tours for ten orders of two items each randomly split between two pick lists. The items here are
color-coded by order; for example, the two brown items ‘v01’ and ‘v02’ on the left both belong to the same order. These
two items must therefore be picked together. The items with a ‘skp’ are items that need to be picked, but are contained
in the other pick list. It’s clear that this split isn’t optimal, for example on the top right of the left picture, we
see that the worker walks past two yellow colored items (‘v07’ and ‘v08’ in the right picture) and could have easily
collected those in the tour, and collected ‘v01’ and ‘v02’ during the other pick tour.</em></p>
<p><img alt="optimal pick list
split" src="https://images.prismic.io/zalando-jobsite/d5e16d4593cfd3b9eea9c72153a630fb4c338a79_best.png?auto=compress,format"></p>
<p><em>Figure 4: OCaPi pick tours of the optimal split of the ten orders from Figure 3. We see that this is much more
efficient than the split in Figure 3. For example, the list on the right contains items only on the right-hand side of
the warehouse zone. The optimal split shown here has a calculated travel time of 320.0 seconds versus the travel time of
346.6 seconds for the random split in Figure 3.</em></p>
<p>In theory, finding near-optimal splits of orders into pick lists should be easy enough: just split the orders into pick
lists, use the OCaPi algorithm to calculate travel times for all lists, and optimize with something like <a href="https://en.wikipedia.org/wiki/Simulated_annealing">simulated
annealing</a> to find the minimum travel time split. One major problem
with this idea is the OCaPi run time. At a few seconds per pick list, the OCaPi algorithm is just too slow for
real-world batching problems (think splitting orders between thousands of pick lists).</p>
<h3>Neural Network OCaPi Travel Time Estimator</h3>
<p>The OCaPi algorithm is nothing more than a very complicated, highly non-linear function:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9da6fa1e0285087bf8de361f75c0afb4c99fc4f1_screen-shot-2017-08-01-at-16.15.26.png?auto=compress,format"></p>
<p>that, given <em>n</em> items, maps <em>n</em> aisle-and-position warehouse coordinates to a positive real number, the total travel
time for these coordinates. In Figure 5 you can see a two-dimensional slice of this function.</p>
<p><img alt="ocapi
topology" src="https://images.prismic.io/zalando-jobsite/7c1d75583cab7cc2bb06459e49980241979dab9c_landscape_by_calvin.png?auto=compress,format"></p>
<p><em>Figure 5: Two-dimensional slice of the OCaPi travel time function. To create this plot, we distributed 10 items through
a warehouse zone, and used OCaPi to calculate what the travel time would be if the 11th pick was in a specific position.
Note that the travel time doesn’t increase near the cross aisles or the depot and increases sharply if an item must be
picked from the back corner, far from all other items.</em></p>
<p>From Figure 5, and by thinking about the problem, it is easy to see (and can be proven) that <em>f</em> is</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Lipschitz_continuity">Lipschitz-continuous</a> in the real-valued arguments, with the
Lipschitz constant equal to the worker’s walking speed;</li>
<li>Piecewise linear in the real-valued arguments, with slope either flat or equal to the worker’s walking speed;</li>
<li>Locally sensitive, meaning that the route a worker and cart take at a specific location is more strongly influenced
by nearby items than faraway items.</li>
</ul>
<p>Therefore, since <em>f</em> is a locally sensitive linear combination of many individual linear functions, it is the perfect
candidate to be modeled by convolutional neural networks with rectified linear units.</p>
<p>To reduce OCaPi calculation times from seconds to milliseconds, we generated 1 million random pick lists, and used OCaPi
to give each list a “label”: the calculated travel time. Then we fed the coordinates of the pick lists along with the
travel times into a convolutional neural network. To train the networks we used the popular <a href="https://github.com/BVLC/caffe">Caffe neural network
framework</a> linked with NVIDIA’s <a href="https://developer.nvidia.com/cuDNN">cuDNN</a> convolutional
neural network library, running on two NVIDIA Tesla K80 GPU Accelerators (total four GPUs). By training four models in
parallel (one on each GPU), we were able to find a neural network architecture that was very accurate with just a few
weeks of effort. The network estimation of travel times is off by an average of 32.25 seconds for every hour of
calculated travel time — a negligible amount when one considers all the factors that influence actual pick performance.
See Figure 6 for more notes on accuracy.</p>
<p><img alt="estimation
error" src="https://images.prismic.io/zalando-jobsite/95bd6e212a5f5e6b7531d88f45c5a18c9977bced_estimation_error.png?auto=compress,format"></p>
<p><em>Figure 6: A histogram of the relative error of the OCaPi travel time estimator, meaning estimated travel
time/calculated travel time for 5000 pick lists. The neural network estimate is only off by 0.895% on average.</em></p>
<h3>Training and Computing Time Improvement</h3>
<p>The whole point of this exercise was to make the OCaPi travel time estimation faster. So how did we do? We ran these
experiments on a machine with two Intel Xeon E5-2640 and two NVIDIA Tesla K80 accelerators. We linked Caffe against
cuDNN_v2 and OpenBLAS compiled from source.</p>
<p>The first compute-time hurdle was the training. With the Tesla K80 accelerators, we were able to update the network with
one million training examples in just 52.6 seconds compute time, a speedup of a factor of 20 compared to the CPU.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/65cec219ae8c6850353e59cda018a908b7cd5e93_screen-shot-2015-12-14-at-6.24.16-pm.png?auto=compress,format"></p>
<p>For the travel time estimate, which is just a forward pass through the network (also known as a neural network
inference),
we found that, since the network is fairly small, we don’t get a significant speedup by using the GPU. This test should
be taken with a grain of salt, since we didn’t link against <a href="https://software.intel.com/en-us/intel-mkl/l">Intel’s MKL</a>
or <a href="https://developer.nvidia.com/cuDNN">cuDNN_v3</a>, the latest CPU and GPU libraries.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b3a564f0f5a8c0624c0cf4e09759f35bd5aedc8e_screen-shot-2015-12-14-at-6.24.27-pm.png?auto=compress,format"></p>
<p><em>Table 2: Forward pass times (in seconds) per pick list, for varying batch sizes. We see that, with parallelization, all
three methods became faster, but the neural network is always much faster.</em></p>
<h3>Bringing It All Together</h3>
<p>There are many places in the warehouse management process where this fast and accurate OCaPi travel time estimator can
be applied, and I use the estimator to demonstrate how to solve the batching problem in the example from above. I wrote
a very simple optimization algorithm based on <a href="https://en.wikipedia.org/wiki/Simulated_annealing">simulated annealing</a>
that starts with 40 orders of 2 items, each split randomly between two pick lists. For 40 orders, there are <em>40! /
(2*20!*20!) \approx 6.9 * 10^10</em>, so 69 billion different ways to split the orders over two pick lists.</p>
<p><img alt="performance
speedup" src="https://images.prismic.io/zalando-jobsite/004ecda3d9597e49bf0fc61c732140047e23b1e1_sim_an_run.png?auto=compress,format"></p>
<p><em>Figure 7: Relative increase in estimated and calculated travel performance (so inverse of travel time) over a randomly
initialized pick list split during a simulated annealing experiment. Starting from the randomly initialized pick list
split, simulated annealing steps are performed to achieve new (and generally better) pick lists splits. The simulated
annealing algorithm optimizes the estimated travel times (the blue curve) as a proxy for what should actually be
optimized: the calculated travel time (the green curve).</em></p>
<p>For the setting above and a realistic zone layout, optimized batches allowed the workers to decrease their travel time
per item picked by an average of 11%, compared with a random batch. Clearly the actual benefit in production depends
highly on the order pool, the number of zones, and other factors. What we see here is not real-life improvement, but a
very informative academic exercise.</p>
<h3>Application to Any Black Box Algorithm</h3>
<p>At first glance, this post would suggest that the key takeaway is our travel time estimation speedup, and the better
batches that can then be created. However, the same approach is applicable to many fields of industry and research: We
were able to take an algorithm and, by treating it as a black box problem, transform it into a neural network that is
very fast and ready to be deployed at scale on both CPU and GPU architectures. I am confident there are many other
problems where this method can be applied, and I look forward to reading about exciting new breakthroughs powered by
Neural Networks and GPUs.</p>
<hr>
<p>[1] Kees Jan Roodbergen, René de Koster, Routing order pickers in a warehouse with a middle aisle, <em>European Journal
of Operational Research</em>, Volume 133, Issue 1, 16 August 2001, Pages 32-43, ISSN 0377-2217.</p>
<p>[2] H. Donald Ratliff and Arnon S. Rosenthal, Order-Picking in a Rectangular Warehouse: A Solvable Case of the
Traveling Salesman Problem, <em>Operations Research</em>, Vol. 31, No. 3 (May - Jun., 1983), pp. 507-521.</p>
<p>[3] Kees Jan Roodbergen and René de Koster, Routing methods for warehouses with multiple cross aisles, <em>International
Journal of Production Research</em> 39(9), 2001, pp. 1865-1883.</p>
<p>[4] Sebastian Henn, Sören Koch and Gerhard Wäscher, Order Batching in Order Picking Warehouses: A Survey of Solution
Approaches, January 2011, ISSN 1615-4274.</p>Hack Week #4: Awards2015-12-15T00:00:00+01:002015-12-15T00:00:00+01:00Selina McCarthytag:engineering.zalando.com,2015-12-15:/posts/2015/12/hack-week-4-awards.html<p>Zalando's Head of Innovation and Enablement explains the Hack Week judging process.</p><p>Yesterday I caught up with <a href="https://www.linkedin.com/in/bastiangerhard">Bastian Gerhard</a>: Zalando’s esteemed Head of
Innovation and Enablement, <a href="https://tech.zalando.com/blog/one-last-thing-before-we-call-it-a-year-hack-week-4/">Hack
Week</a> organizer/old hand, and 2015
project judge. I asked him about our award categories and judging processes.</p>
<p><strong>Hi, Bastian! Thanks for taking the time to talk. Can you enlighten me on the judging process?</strong></p>
<p>Sure. What do you want to know?</p>
<p><strong>Well, firstly: How did you select the judging panels?</strong></p>
<p>Each member of the tech management team was given the chance to participate: SVPs, VPs, and heads. Once chosen, they
were assigned to categories based on their experience and interests. Then, other judges were selected from other
departments where appropriate — for example, the category “Do” is based on a People and Organization project, so it made
sense for us to invite the head of P&O to judge that particular category.</p>
<p>This year there are panels of 3-4 people per award category. It’s our first time doing it like this. Previously,
everyone judged everything for every category. As you can imagine, this took forever and ended up causing big arguments,
so we decided to mix things up this year.</p>
<p><strong>Has anything else changed?</strong></p>
<p>Yes: We’ve also removed the rule that each team can only win one award. This is because firstly, we think it’s fairer.
Each award will be judged separately, so it makes sense for us to do it this way. Otherwise, each panel would have to
align before deciding — and that would skew the results, and probably cause lots of unnecessary arguments.</p>
<p><strong>Awesome! So can you tell me about some of the new categories there are this year?</strong></p>
<p>CUSTOMER-CENTRICITY: This is based on the Zalando objective to be more customer-centric. The award is for projects that
focus on the customer by truly looking at the customer’s needs. Often we start with a great solution without looking for
a need. This award celebrates projects that work the other way around.</p>
<p>DO: We wanted to encourage employees to do good and work on social good projects. This Award is for projects that
provide something to the outside world, or internally. They could be an internal CSR topic — e.g., training courses for
beginner coders, or an external topic like providing aid to refugees.</p>
<p>INCLUSION NINJA: This award developed around Zalando’s Diversity Guild [an informal group that promotes diversity and
inclusion discussions in tech]. It awards projects that encourage inclusive behavior. It celebrates diverse teams that
work across different departments, cultural backgrounds, hierarchies.</p>
<p>TGIM — THANK GOODNESS IT’S MONDAY: This awards projects that make us feel good on a Monday morning, and make us feel
excited to come back to the office after the weekend. The main contenders will be projects with a positive impact on the
working environment — ones that boost morale. It could be, for example, something like innovative decorations at the
office, or a tool that makes an internal process easier (like booking a meeting room), or something fun.</p>
<p><strong>Great. And what other Zalando Hack Week categories are there (for those who are new to Zalando Hackweek)?</strong></p>
<p>BSD 4.3: This is the award for best software. It needs to be something that solves a tough challenge. The project that
wins will be complex, difficult, some sort of extreme hack.</p>
<p>MARS ROVER: This is similar to BSD, but for hardware. For example, projects based around electrical engineering.</p>
<p>NIKOLA TESLA: So, this is the award for best innovation. It’s going to be judged on viability, feasibility, and impact.</p>
<p>QUICK WIN (previously “Cheap and Cheerful”): This is for the most cost-efficient projects, or projects that create the
biggest cost benefit. Projects in this category should solve a big problem with the least amount of effort, generating
the best ROI.</p>
<p>ONE ZALANDO: This award is based on team set-up. Teams that think outside of the box in terms of talking to other
departments, bringing in people from zLabels, Legal, etc.</p>
<p>ANDY WARHOL: This is voted on by the audience. There will be an award at both
<a href="https://tech.zalando.com/locations/#dortmund">Dortmund</a> and <a href="https://tech.zalando.com/locations/#berlin">Berlin</a> Hack
Weeks.</p>
<p>DUKE NUKEM FOREVER: The game stands for an epic failure after raising great expectations. So that's exactly the type of
project that could win this category. It’s symbolic though and is not always awarded.</p>
<p>SLINGSHOT: The winning project will be chosen based on technical feasibility, desirability [of their products] for
customers, and business viability. This award is run in conjuction with <a href="https://tech.zalando.com/blog/zalando-opens-new-playground-for-tech-innovation/">the Shuttle team (the Tech Innovation
Lab)</a>.</p>
<p><strong>Wow, there’s really an impressive spread of award categories.</strong></p>
<p>Well, we tried to cover every area of Zalando Tech, and include our core values.</p>
<p><strong>You did a great job — well done! Thanks again for taking the time to talk to me today, Bastian. And good luck with the
rest of the week!</strong></p>
<p>Thanks!</p>
<hr>
<p>You can find details of last year's awards
<a href="https://tech.zalando.com/blog/hackweek-december-2014-how-the-jury-decides-for-the-awards/">here</a></p>
<p>Zalando's top secret Hackweek trophies will be revealed at the end of the week. I tried to get a sneak peak but they're
heavily guarded. Keep an eye on this blogpost - I'll update it with the details of the trophies as the week goes on.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/be0cf91f21426f017368fb25c674591641e3655c_protectors.jpg?auto=compress,format"></p>Hack Week #4 Begins!2015-12-14T00:00:00+01:002015-12-14T00:00:00+01:00Hayley Baldwintag:engineering.zalando.com,2015-12-14:/posts/2015/12/hack-week-4-begins.html<p>We will, we will hack you!</p><p>Hack Week #4 has officially kicked off! This year's edition is <a href="https://tech.zalando.com/blog/one-last-thing-before-we-call-it-a-year-hack-week-4/">bigger than
ever</a>, with more Zalando
technologists, more tech offices, and more members of our news room to bring you daily coverage of the week's events. In
addition to being rock star-themed, this is also our first "international" Hack Week, with technologists from our
<a href="https://tech.zalando.com/locations/#dublin">Dublin</a> and <a href="https://tech.zalando.com/locations/#helsinki">Helsinki</a> hubs
joining our <a href="https://tech.zalando.com/locations/#berlin">Berlin</a> and
<a href="https://tech.zalando.com/locations/#dortmund">Dortmund</a> teams to create and innovate.</p>
<p>Our technology VPs officially launched Hack Week this morning by sharing their musical talents with us:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite%2Fc1b8b24c-e9a9-4a17-abae-c914357676f1_vr2o6.gif?auto=compress,format"></p>
<p>+ our SVP of Technology, Philipp in an image we'll never let him forget.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite%2F6c69d865-a6f6-4568-a7a0-d14b53590135_vr29t.gif?auto=compress,format"></p>
<p>We’ll be featuring project videos, images and interviews all week long. Follow us on
<a href="https://twitter.com/ZalandoTech">Twitter</a> and <a href="https://www.instagram.com/zalandotech">Instagram</a> to experience Hack
Week with us.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6dc622130d9fbed8e8a99b8d60ef0139298b13b0_dsc_3927.jpg?auto=compress,format"></p>One Last Thing Before We Call It a Year: Hack Week #42015-12-10T00:00:00+01:002015-12-10T00:00:00+01:00Bastian Gerhardtag:engineering.zalando.com,2015-12-10:/posts/2015/12/one-last-thing-before-we-call-it-a-year-hack-week-4.html<p>Zalando Tech's annual innovation week takes place this December. Bigger and more relevant than ever!</p><p>It’s <a href="https://tech.zalando.com/blog/zalando-hack-week---making-innovation-visible/">Hack Week</a> time! Our annual week of
open innovation and experimentation takes place December 14-18 in our Berlin and Dortmund tech offices and features more
participants (+900) from more locations (seven) than ever before. To celebrate our team’s growth and accomplishments
over the past year — and to look forward to an even more electrifying, headbanging-worthy 2016 — the theme of Hack Week
#4 is “rock stars”!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ad256b2a5b995c279753fd33442d86dcc0230a1d_mollstrtechnology.jpg?auto=compress,format"></p>
<p>Hack Week launched in 2013 with a very simple premise: For one full week, every Zalando technologist pauses their daily
business to pursue an idea or project that somehow contributes to our business or work culture. This year’s HW will
include a flashy project fair where each team will demo and describe their creations. We’ll wrap up with a ceremony
where we’ll honor the best projects with <a href="https://tech.zalando.com/blog/hackweek-december-2014-how-the-jury-decides-for-the-awards/">awesome
awards</a>.</p>
<p>Hack Week is the perfect time of the year for Zalandos to play, prototype, and experiment with new technologies. It
allows everyone in tech to stretch our brains, have fun and collaborate with colleagues from across the tech
organization. At Hack Week #3, two project teams built a <a href="https://tech.zalando.com/blog/hack-week-taking-the-shopping-experience-to-the-next-level/">virtual dressing
room</a> and an automated shoe
recognition solution powered by a Microsoft Kinect motion sensor. Two other teams built autonomous robots and <a href="https://tech.zalando.com/blog/hack-week-fashion-meets-tech---smart-wearables/">smart
wearables</a> out of electrical components.
Others successfully launched a women’s shoe <a href="https://tech.zalando.com/blog/we-launched-it-the-zalando-space-shoe-video/">into
space</a> (outfitted with a GoPro to record the
shoe’s journey). Some HW projects have had a substantial and direct impact on our business: for example, a smart
warehouse trolley equipped with LEDs to quickly load items where they belong that’s now used at our logistics centers,
and a photo-search feature on Zalando’s mobile
<a href="https://itunes.apple.com/de/app/zalando-fashion-shopping/id585629514?l=en&mt=8">app.</a> Quite a few of the software tools
developed during past Hack Weeks are open-source and available on <a href="https://github.com/zalando">our GitHub pag</a>e.</p>
<h2>More Important Than Ever in Our Platform World</h2>
<p>In early 2015 we announced an ambitious <a href="https://www.siliconrepublic.com/enterprise/2015/09/30/zalando-ceo-robert-gentz-we-are-building-the-aws-of-the-fashion-world">new business
strategy</a>
that will transform Zalando from an online shop into a comprehensive “fashion platform” featuring many new products and
services. The change will require a massive expansion of our business functions. We'll serve a wider range of customers
in different ways, build new technologies to support our platform architecture and continue to innovate as we expand.</p>
<p>Becoming a platform company also required a radical change in the way we work and build products.
Our trust in self-directed, highly autonomous and agile teams- Zalando’s engineers enjoy the maximum freedom to take
full ownership of the software they build.</p>
<p>Hack Week contributes greatly to all of this. It also unlocks a lot of creativity, as everyone gets to experiment with
latest technologies in a playful way. It’s a source of game-changing ideas that will help to reinforce our platform
strategy and shape the future of Zalando.</p>
<h2>Innovating Like a Star: What’s New in 2015</h2>
<p>Hack Week #4 will put hardware projects in a brighter spotlight than in years past. Our newly opened <a href="https://tech.zalando.com/blog/zalando-opens-new-playground-for-tech-innovation/">Innovation Lab
(a.k.a. The Shuttle)</a> will provide a
dedicated 450sqm digital fabrication “makerspace” with Arduino, 3D printers, sewing and soldering equipment,
latest-model mobile devices, and virtual and augmented reality gear. Z-technologists will be able to make almost
anything here.</p>
<p>It's international! In 2015, Zalando’s Technology department has grown to over 900 people. We have opened two
international Tech hubs in <a href="https://tech.zalando.com/blog/working-at-zalando-dublin/">Dublin</a> and
<a href="https://tech.zalando.com/blog/hello-helsinki/">Helsinki</a>. We also expanded to Hamburg and relocated to a bigger
building with our Dortmund team. We're flying in the teams from all of <a href="https://tech.zalando.com/locations/">our Tech
Hubs</a> for the week.</p>
<p>On the awards front, we’ve added some brand-new
<a href="https://tech.zalando.com/blog/hackweek-december-2014-how-the-jury-decides-for-the-awards/">categories</a> that reward
projects’ relevance to improving our business and work culture: inclusion/diversity, social good, and customer
empowerment.</p>
<p>Another first for this year: Zalando will provide support and resources for innovative ideas to develop and evolve
beyond Hack Week. Highly promising projects that complement our platform strategy will become eligible for Slingshot:
our intrapreneurship and acceleration program in which Zalando teams can spend three months of paid working time in our
Innovation Lab, fine-tuning and testing their radical concepts.</p>
<p>Members of our Hack Week newsroom will report live from the field all next week. Subscribe to our <a href="https://tech.zalando.com/blog/rss-all.xml">RSS
feed</a> and follow us <a href="https://twitter.com/ZalandoTech">@ZalandoTech</a> to
receive updates and learn more about Hack Week at Zalando!</p>Video: Reactive RESTful APIs with Akka HTTP and Slick2015-12-08T00:00:00+01:002015-12-08T00:00:00+01:00Lauri Appletag:engineering.zalando.com,2015-12-08:/posts/2015/12/video-reactive-restful-apis-with-akka-http-and-slick.html<p>Why we're building on top of reactive technologies like Akka HTTP and Slick.</p><p>Zalando engineers are currently <a href="https://tech.zalando.com/blog/from-jimmy-to-microservices-rebuilding-zalandos-fashion-store/">rebuilding our
“shop”</a> — the unit that
includes our 15 country-specific, customer-facing websites—to transform it from a monolith into microservices. As part
of this work, we've developed open-source tools like
<a href="https://tech.zalando.com/blog/building-our-own-open-source-http-routing-solution/">Innkeeper</a>: a simple, RESTful route
management API built on top of reactive technologies like <a href="http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/http/">Akka
HTTP</a> and
<a href="http://slick.typesafe.com/">Slick</a>. It will help our fleet of router instances to keep their routes in sync.</p>
<p>In this presentation for <a href="http://www.meetup.com/Zalando-Tech-Events-Dortmund/">Zalando Tech's Dortmund meetup</a> group,
Zalando Senior Software Engineer <a href="https://twitter.com/danpersa">Dan Persa</a> discusses Innkeeper’s architecture and how,
by choosing a reactive stack of technologies, we can stream database records back to the browser. He also talks about
how (and why) we secured our API using OAuth, and gives insights on how we use Docker and the <a href="http://stups.io/">STUPS</a>
tools to auto-scale this API on top of AWS. Watch:</p>
<p>And here are Dan's slides:</p>
<p><strong><a href="https://www.slideshare.net/ZalandoTech/building-a-reactive-restful-api-with-akka-http-slick" title="Building a Reactive RESTful API with Akka Http & Slick">Building a Reactive RESTful API with Akka Http &
Slick</a></strong>
from <strong><a href="http://www.slideshare.net/ZalandoTech">Zalando Tech</a></strong></p>Building System Packages from Python Modules (with Dependencies Included)2015-12-07T00:00:00+01:002015-12-07T00:00:00+01:00Sören Königtag:engineering.zalando.com,2015-12-07:/posts/2015/12/building-system-packages-from-python-modules-with-dependencies-included.html<p>Learn about the benefits of wrapping Python’s virtualenvs in system packages.</p><p>In <a href="https://tech.zalando.com/blog/building-system-packages/">our last post on packaging</a>, my colleague Felix Mueller
talked about why it’s good to manage all your software with your system’s native package management tools. He also
discussed how to build packages in an automated, consistent way. Now I’d like to describe the benefits of wrapping
Python’s virtualenvs in system packages.</p>
<p>Why is it particularly useful to package and ship Python modules as native system packages? For the same reasons that
apply to all other distributed software: to achieve reliable, atomic, reproducible and predictable deployments.</p>
<p>Python is Zalando Tech’s bread-and-butter language for writing tools and scripts for provisioning, managing and
maintaining servers. But deploying Python software can cause a lot of headaches — for example, when two packages require
different versions of the same dependency, or when you need to recompile C extensions on every single server (hello,
PyCrypto!). The latter case requires the whole build-essential[object Object]</p>
<p>Wrapping our Python tools in native system packages wouldn't entirely solve the problem. We’d have to port all
dependencies (and their dependencies) to Debian- or RedHat-land. This is where Python's virtualenv come into play. The
idea is to combine the best of both worlds: self-contained and dependency-less virtualenvs, and the manageability of
native system packages.
Admittedly, this idea is not that new: Berlin-based engineer <a href="https://hynek.me/">Hynek Schlawack</a> has scripted <a href="https://hynek.me/articles/python-app-deployment-with-native-packages/">his own
solution</a>, and Spotify have made efforts in this
direction with their <a href="https://github.com/spotify/dh-virtualenv">dh-virtualenv</a> extension to debhelper. But it works.</p>
<h3>How We Do It at Zalando</h3>
<p><a href="https://github.com/zalando/package-build">Package-build</a>, our open-source setup, combines the package builders fpm and
<a href="https://github.com/zalando/fpm-cookery">fpm-cookery</a> with our own script,
<a href="https://github.com/zalando/package-build/blob/master/cook-recipe.sh">cook-recipe.sh</a>. Luckily, fpm now includes support
for both <a href="https://virtualenv.readthedocs.org/en/latest/">virtualenvs</a> and fpm-cookery, the latter thanks to
<a href="https://github.com/bernd/fpm-cookery/commit/338eb810d8b2928d52b9c3cc043ec1f76954ea25">contributions from Zalando
engineers</a>. The previous version
of package-build used Vagrant as the base for the package build environment. Our current version replaces the heavier,
VirtualBox-backed Vagrant with lightweight Docker containers — excellent for providing short-lived packaging
environments.</p>
<p>Our cook-recipe.sh script runs inside Docker containers created ad hoc, and takes one or more recipe folder names as
parameters. If no parameters are given, it runs all recipes in all subfolders in recipes/. Within those subfolders, the
script looks for ./prepare.sh[object Object][object Object][object Object]
<a href="https://github.com/zalando/package-build/blob/master/cook-recipe.sh#L25"></a></p>
<p>Now all the essential tools for provisioning are already available, and we don’t have to install them during every new
build — saving us time. This script can be run standalone — for example:</p>
<div class="highlight"><pre><span></span><code>docker<span class="w"> </span>run<span class="w"> </span>-v<span class="w"> </span><span class="cp">${</span><span class="n">PWD</span><span class="cp">}</span>:/data<span class="w"> </span>package_build/centos6<span class="w"> </span>/data/cook-recipe.sh<span class="w"> </span>zalando-zcloud-virtualenv
</code></pre></div>
<p>Of course, you must publish the created packages in some of your repositories (we use <a href="http://www.aptly.info/">aptly</a>
for managing our .deb[object Object][object Object] <a href="http://www.fabfile.org/"></a></p>
<h3>An Example of Our Virtualenv Recipe Usage</h3>
<p>Comparing some code snippets from the old and new versions of package-build will show you how much cleaner the new
version looks. Here are the relevant parts of its
<a href="https://github.com/zalando/package-build/blob/master/recipes/zalando-zcloud/recipe.rb">recipe.rb</a> file:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span><span class="w"> </span><span class="n">ZalandoZcloud</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">FPM</span><span class="p">::</span><span class="n">Cookery</span><span class="p">::</span><span class="n">Recipe</span>
<span class="w"> </span><span class="n">description</span><span class="w"> </span><span class="s2">"Package containing CLI, agent and additional scripts for installing nodes via zCloud"</span>
<span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="s2">"zalando-zcloud"</span>
<span class="w"> </span><span class="n">version</span><span class="w"> </span><span class="s2">"0.2.8"</span>
<span class="w"> </span><span class="n">source</span><span class="w"> </span><span class="s2">"https://stash.zalando.net/scm/pymodules/zalando-zcloud.git"</span><span class="p">,</span><span class="w"> </span><span class="p">:</span><span class="n">with</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">:</span><span class="n">git</span><span class="p">,</span><span class="w"> </span><span class="p">:</span><span class="n">tag</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="s2">"#{version}"</span>
<span class="w"> </span><span class="n">build_depends</span><span class="w"> </span><span class="s2">"python-setuptools"</span>
<span class="n">platforms</span><span class="w"> </span><span class="p">[:</span><span class="n">ubuntu</span><span class="p">,</span><span class="w"> </span><span class="p">:</span><span class="n">debian</span><span class="p">]</span><span class="w"> </span><span class="n">do</span>
<span class="w"> </span><span class="n">depends</span><span class="w"> </span><span class="s2">"zalando-cmdb-client"</span><span class="p">,</span><span class="w"> </span><span class="s2">"python-paramiko >= 1.7.0"</span>
<span class="n">end</span>
<span class="n">platforms</span><span class="w"> </span><span class="p">[:</span><span class="n">centos</span><span class="p">]</span><span class="w"> </span><span class="n">do</span>
<span class="w"> </span><span class="n">depends</span><span class="w"> </span><span class="s2">"zalando-cmdb-client"</span><span class="p">,</span><span class="w"> </span><span class="s2">"python-paramiko >= 1.7.0"</span>
<span class="n">end</span>
<span class="n">def</span><span class="w"> </span><span class="n">build</span>
<span class="w"> </span><span class="n">safesystem</span><span class="w"> </span><span class="s1">'python setup.py build'</span>
<span class="n">end</span>
<span class="n">def</span><span class="w"> </span><span class="n">install</span>
<span class="w"> </span><span class="n">safesystem</span><span class="w"> </span><span class="s1">'python setup.py install --root=../../tmp-dest --no-compile'</span>
<span class="w"> </span><span class="n">end</span>
<span class="n">end</span>
</code></pre></div>
<p>The ZalandoZcloud[object Object][object Object][object Object][object Object]</p>
<p>Now compare this to our new
<a href="https://github.com/zalando/package-build/blob/master/recipes/zalando-zcloud-virtualenv/recipe.rb">zalando-zcloud-virtualenv</a>
variant:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span><span class="w"> </span><span class="n">ZalandoZcloud</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">FPM</span><span class="p">::</span><span class="n">Cookery</span><span class="p">::</span><span class="n">VirtualenvRecipe</span>
<span class="w"> </span><span class="n">description</span><span class="w"> </span><span class="s2">"Package containing CLI, agent and additional scripts for installing nodes via zCloud"</span>
<span class="n">name</span><span class="w"> </span><span class="s2">"zalando-zcloud"</span>
<span class="w"> </span><span class="n">version</span><span class="w"> </span><span class="s2">"0.2.8"</span>
<span class="n">build_depends</span><span class="w"> </span><span class="s2">"python-setuptools"</span>
<span class="w"> </span><span class="n">virtualenv_fix_name</span><span class="w"> </span><span class="bp">false</span>
<span class="w"> </span><span class="n">virtualenv_install_location</span><span class="w"> </span><span class="s2">"/opt/"</span>
<span class="n">end</span>
</code></pre></div>
<p>This recipe class is derived from FPM::Cookery::VirtualenvRecipe[object Object]</p>
<h3>Some Final Thoughts</h3>
<p>With a few simple scripts, we can build isolated, self-contained packages from our own software; provide them in our
internal repos; and not worry about deployment and dependencies. We can even use these scripts to package tarballs that
are randomly dropped into a web folder. Because a simple shell script performs the actual package-building, we can
easily use the same commands in a continuous integration context — i.e., to automatically build packages every time a
recipe changes or a new one has been added.</p>Video: “Scala Microservices at Zalando”2015-12-02T00:00:00+01:002015-12-02T00:00:00+01:00Lauri Appletag:engineering.zalando.com,2015-12-02:/posts/2015/12/video-scala-microservices-at-zalando.html<p>A Zalando delivery lead describes his team's work with Scala.</p><p>Zalando technologists have been using Scala in production since 2014, when we began transforming our monolithic
architecture into microservices. Much of our Scala development is done by the engineers in our <a href="https://corporate.zalando.com/sites/default/files/mediapool/05_brand_solutions_0.pdf">Brand
Solutions</a> department, which
builds products and services to support Zalando’s brand-partners. At the October meetup of <a href="http://www.meetup.com/SF-Scala/">SF
Scala</a>, the primary Scala meetup group in San Francisco, Brand Solutions Delivery Lead
<a href="https://twitter.com/koze">Alexander Kops</a> gave a brief talk describing his teams’ ongoing Scala efforts. These include
development of analytics tools and the creation of <a href="https://github.com/zalando/play-swagger">Play-Swagger</a>, an
open-source collaboration with Typesafe Tech Lead James Roper:</p>
<p>Update: Here are the slides from Alex's latest version of the talk, delivered for the Zurich and Bern JUGs in November
2015:</p>
<p><strong><a href="https://www.slideshare.net/ZalandoTech/zalando-tech-from-java-to-scala-in-less-than-three-months" title="Zalando Tech: From Java to Scala in Less Than Three Months">Zalando Tech: From Java to Scala in Less Than Three
Months</a></strong>
from <strong><a href="http://www.slideshare.net/ZalandoTech">Zalando Tech</a></strong></p>Building Our Own Open-Source HTTP Routing Solution2015-12-01T00:00:00+01:002015-12-01T00:00:00+01:00Arpad Ryszkatag:engineering.zalando.com,2015-12-01:/posts/2015/12/building-our-own-open-source-http-routing-solution.html<p>Enabling microservices deployment while decoupling routing from service logic.</p><p><a href="https://github.com/zalando/skipper">Skipper</a> is the open-source HTTP router we’ve created to help us rebuild the
infrastructure behind Zalando’s customer-facing <a href="https://tech.zalando.com/blog/from-jimmy-to-microservices-rebuilding-zalandos-fashion-store/">Fashion
Store</a> (“the Shop”).
Developed with Go, it’s <a href="https://godoc.org/github.com/zalando/skipper">“go get” compatible</a> and serves as a common entry
point in front of the Shop’s service components. Skipper can be useful for non-Zalando teams who are trying to deploy a
microservices architecture and want (or need) to decouple routing from service logic.</p>
<p>Skipper’s role is similar to the in-process router component in a typical MVC web application. It handles incoming
requests by first selecting a route based on the request attributes — typically the method and the path — and then
executing the associated controller action. What's different about Skipper is that the "controller" layer is moved to a
set of independent services.</p>
<h3>Why We Created Skipper</h3>
<p>One of our goals is to establish an effective development and deployment cycle for multiple engineering teams working on
the same product. Our approach is to run independently maintained and deployed services for web pages, so that we can
compose them into a single website. Skipper is a supporting component of this architecture.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b5561084b6f469054a73b340021309cd4ebe30cf_skipper-image.png?auto=compress,format"></p>
<p>Initially, we sought an existing router solution that would fulfill our three main requirements:</p>
<ul>
<li>Support detailed request matching when selecting routes. This not only includes choosing routes by request path, but
other request attributes like method and headers.</li>
<li>Provide fair performance at scale. This covers the high traffic that periodically hits the Fashion Store, as well as
the high number of routes we need to support our feature-rich website.</li>
<li>Be able to continuously reconfigure the routing table for new settings without downtime or temporary performance
penalty.</li>
</ul>
<h3>Why Not Vulcand?</h3>
<p>Before building Skipper, we evaluated a few existing solutions. A strong candidate was
<a href="https://github.com/vulcand/vulcand">Vulcand</a>, a great project by Mailgun that we used in our earliest rounds of
prototyping. We really liked how we could extend it with our own logic in the form of small pieces of middleware. Here,
we apply the term “middleware” as it’s used in some well-known web MVC frameworks: as software logic applied to a
general request flow or subset of routes (to change or augment requests and responses); or to execute related tasks.
Think of these middleware as filters in signal processing, but applied to HTTP traffic.</p>
<p>A simplified, schematic representation might look like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">route1</span><span class="o">:</span><span class="w"> </span><span class="n">request</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">filterA</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">filterB</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">http</span><span class="o">://</span><span class="n">service1</span><span class="o">.</span><span class="na">example</span><span class="o">.</span><span class="na">org</span>
<span class="n">route2</span><span class="o">:</span><span class="w"> </span><span class="n">request</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">filterA</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">fitlerC</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">http</span><span class="o">://</span><span class="n">service2</span><span class="o">.</span><span class="na">example</span><span class="o">.</span><span class="na">org</span>
</code></pre></div>
<p>For a few weeks we ran Vulcand in our prototyping setup while we focused on other Fashion Store components. In the end,
we realized that we needed a better mechanism for streaming and dynamically updating the routing configuration. That’s
when we decided to create our own version of the router by taking the best parts of Vulcand’s design and implementing
the features we needed. This is how Skipper came to be.</p>
<h3>Request Matching</h3>
<p>In the core of Skipper's mechanism is a simple reverse proxy that copies incoming requests to different target service
endpoints. The attributes of a particular HTTP request are used to decide the correct endpoint for that request and find
it in the configured routing table. This lookup is the single most expensive logic that affects any route.</p>
<p>In trying different solutions for request matching, we found that path-based <a href="https://en.wikipedia.org/wiki/Radix_tree">radix
tree</a> implementations produced the best results in terms of CPU time and
memory footprint. This matched our requirements well, because most of our routes are identified primarily by path. The
best-performing, off-the-shelf library we tried was <a href="https://github.com/julienschmidt/httprouter">httprouter</a>, which in
the Go world is a widely used router in web applications. It's an awesome piece of code, but fell short in meeting all
our requirements.</p>
<p>The next candidate was <a href="https://github.com/julienschmidt/httprouter">httptreemux</a>: an open-source, embeddable router
package with an easy-to-integrate interface. It satisfied our requirements, but at the cost of somewhat higher memory
usage. We only needed its tree lookup implementation, so we forked it, stripped off its unnecessary wrapping, added the
logic to match the rest of the HTTP request attributes (method, headers, etc.), and continued using it in Skipper. We
call this solution <a href="https://github.com/zalando/skipper/tree/master/pathmux">Pathmux</a>.</p>
<h3>Innkeeper: Route Configuration</h3>
<p>Our Fashion Store features many promotions and custom pages, so frequent updates of our custom routes are necessary.
Manually maintaining the data for all of these updates would be too laborious for a single team, so we created
<a href="https://github.com/zalando/innkeeper">Innkeeper</a>: a service that offers an integration point for Skipper and for any
other services that also need to deal with Fashion Store routes. Innkeeper provides an easy-to-use JSON/HTTP API as well
as OAuth2-based permission management for both humans and programs, and relies on our company-wide authentication
system.</p>
<p>As an alternative route storage solution, Skipper also supports <a href="https://github.com/coreos/etcd">etcd</a> clusters for
simpler scenarios. For the very simplest cases, it can read configuration from the file system.</p>
<h3>Conditioning Traffic with Filters</h3>
<p>Skipper filters are the middleware in the request-processing pipeline. While most of the request handling is supposed to
take place in the services behind Skipper, filters condition or augment the requests and responses for all routes and
subsets of routes. This functionality covers generating session IDs, custom logging, XSRF protection, securing cookies
and modifying the path, among other things. The mechanism is simple:</p>
<ul>
<li>the object representing the incoming request is passed to each filter of the route in the order of definition;</li>
<li>then the proxy request is made to the backend service;</li>
<li>the received response is passed to each filter in reverse order;</li>
<li>and, finally, the response is returned to the original client.</li>
</ul>
<p>One special case is when a filter handles the request on its own, and no proxy request is made to the backend.
Filters form the main extension point of Skipper for adding any custom logic. One needs only to implement a simple
interface for filter specifications in Go, deploy it with their custom-built version of Skipper, and run it.</p>
<h3>Eskip: Our Own Routing Language</h3>
<p>Earlier we showed a schematic description of a simple routing table, including the request matching, filtering and the
backend service endpoints. During Skipper's development, we discovered that we could easily turn this notation into a
formal syntax, and use it as a more readable way of describing routes. To support the syntax we created Eskip: a tool
and language that parses, displays and update routes. Eskip is more DevOps-friendly than manual setups as a supplement
to the default JSON interfaces, which are primarily targeting programmatic clients.</p>
<p>A simple routing table example in Eskip format:</p>
<div class="highlight"><pre><span></span><code><span class="n">apiRoute</span><span class="o">:</span><span class="w"> </span><span class="n">Path</span><span class="o">(</span><span class="s2">"/api/*_"</span><span class="o">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">flowId</span><span class="o">()</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="s2">"https://api.example.org"</span><span class="o">;</span>
<span class="n">uiRoute</span><span class="o">:</span><span class="w"> </span><span class="n">Path</span><span class="o">(</span><span class="s2">"/ui/*_"</span><span class="o">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">flowId</span><span class="o">()</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">auth</span><span class="o">()</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="s2">"https://ui.example.org"</span><span class="o">;</span>
</code></pre></div>
<p>When installing the Skipper packages, also install the cmd/eskip subdirectory. Then try running `eskip -help` for
hints.</p>
<h3>What’s Next</h3>
<p>In the upcoming months we'll go live gradually with the new infrastructure. Skipper will get hit by a fair amount of
traffic soon, and we’ll let you know how it holds up!</p>Read About Zalando UX in Smashing Magazine2015-11-27T00:00:00+01:002015-11-27T00:00:00+01:00Katy Campbelltag:engineering.zalando.com,2015-11-27:/posts/2015/11/read-about-zalando-ux-in-smashing-magazine.html<p>To attract motivated designers and user researchers, keep your eye on the why.</p><p><em>"Modern motivational psychology says that employees in the knowledge economy are motivated not by carrots and sticks
but by purpose, not by tasks but by mastery. Similarly, Simon Sinek, author of the bestseller</em> Start With Why<em>, argues
that the problem with typical job ads is that they skip over the purpose: 'They are all about the what, and not the
why.'"</em></p>
<p>That's an excerpt from " <a href="http://www.smashingmagazine.com/2015/11/writing-inspiring-job-descriptions-for-ux/">How To Write Inspiring Job Descriptions For
UX</a>" — a piece recently written by
Zalando UX Talent Lead Jay Kaufmann for <em>Smashing Magazine</em>, the leading portal for web designers and developers. Jay
draws upon his years of experience hiring UX professionals to provide tips and tricks for successfully advertising your
UX-related job openings. To truly speak to designers, he writes, it's necessary to write in a way that shows your
understanding of how designers think.</p>
<p>Designers don't just want a job, Jay reiterates, but a <em>quest:</em> "a great experience in improving other people’s
experiences." Designers, he adds, "want to take on a mission that aligns with their own values." <a href="http://www.smashingmagazine.com/2015/11/writing-inspiring-job-descriptions-for-ux/">Read the full article
here</a>.</p>Video: Swagger Creator Mentions Zalando Open Source2015-11-25T00:00:00+01:002015-11-25T00:00:00+01:00Lauri Appletag:engineering.zalando.com,2015-11-25:/posts/2015/11/video-swagger-creator-mentions-zalando-open-source.html<p>Tony Tam talks about our Connexion, our open-source Swagger framework.</p><p>As part of Zalando Tech's <a href="https://tech.zalando.com/blog/on-apis-and-the-zalando-api-guild/">API First</a> approach to
software development, many of our teams are building open-source tools for working with <a href="http://swagger.io/">Swagger</a>.
One of these projects is <a href="https://github.com/zalando/connexion">Connexion</a>: A Python Flask framework for automatically
handling REST API requests based on Swagger 2.0 specification files in YAML. We were excited to learn that Swagger
creator Tony Tam talked about Connexion in his presentation at <a href="http://austin2015.apistrat.com/">APIStrat Austin</a> last
week:</p>How to Prepare for Your Zalando Tech Interview2015-11-24T00:00:00+01:002015-11-24T00:00:00+01:00Lauri Appletag:engineering.zalando.com,2015-11-24:/posts/2015/11/how-to-prepare-for-your-zalando-tech-interview.html<p>Our VP Engineering shares some advice to help you succeed at your Zalando Tech interview.</p><p><em>Think you've got what it takes to</em> <a href="http://tech.zalando.com/jobs"><em>work for Zalando's technology team? Here's some advice from Zalando VP
Engineering</em></a> <em>Eric Bowman</em> <em>on what we're looking for.</em></p>
<p>At Zalando, we are looking for people who:</p>
<ul>
<li>Make it happen</li>
<li>Grow stronger every day</li>
<li>Always team up and empower others</li>
<li>Change the game</li>
</ul>
<p>Our tech interviewing process is designed to reveal the skills that candidates have (and whether you have represented
them honestly and accurately); to understand whether a candidate is someone who will thrive at Zalando; and to determine
whether Zalando will be a better place for having hired them.</p>
<p>This isn't an easy place to be: you'll be expected to work hard and be learning all the time. Everyone here is just
slightly outside their comfort zone, which can be overwhelming at times. But if you are someone who is passionate about
having an impact; driven to build amazing software that you are proud of, and that surprises and delights its users; and
committed to working together with others to get the job done, then this is probably a great place for you to be.</p>
<p>We have put in place something we call <a href="https://tech.zalando.com/blog/so-youve-heard-about-radical-agility...-video/">Radical
Agility</a>, which is aimed toward building
systems and teams that scale while enabling Zalando Tech to maintain the energy and speed of a startup. We do everything
we can to create an environment where teams are autonomous, and where there’s the space to master not just what you are
already pretty good at, but also new topics. We want purpose to drive the decisions we make, so we encourage everyone to
focus on the *why* behind what we do – not just the how, and the what.</p>
<p>We start from a position of trusting people to make good, peer-reviewed and fact-based decisions. We have built in to
both the organization and the system itself the precondition of trust. Reasonable mistakes are not only tolerable, but a
source of strength. If you don’t create an environment where mistakes are OK, then you can’t have an environment where
innovation can happen. The ability to innovate is one of the things that keeps me loving coming to work every day. You
can't plan it, and you don't know what form it will take, but when you create the right conditions for it, for everyone,
it just happens.</p>
<p>So: To prepare for an interview, understand the above. If you are an honest, authentic person who treats others well, is
passionate about what you do, is humble about what you do not know, and can talk comfortably about all these things, I
can't imagine why we wouldn't want to hire you.</p>How I Created My Own Ecommerce App Without Leaving Zalando2015-11-23T00:00:00+01:002015-11-23T00:00:00+01:00Luis Juarrostag:engineering.zalando.com,2015-11-23:/posts/2015/11/how-i-created-my-own-ecommerce-app-without-leaving-zalando.html<p>Want to work at Zalando, but also start your own company? It can be done!</p><p><a href="http://kokowinka.com/">KOKOWINKA</a>* is a <a href="https://tech.zalando.com/blog/why-zalando-is-celebrating-mobile-first-day/">Mobile
First</a> web app I’ve created that sells
fashion items at a discount (starting at 30 percent) to Spanish consumers. It works like this: A customer purchases an
item featured by an online retailer like Zalando, then the retailer pays me a reward (up to 15 percent of the selling
price). As the affiliate marketer, I don’t have to worry about purchase payments, refunds or shipping.</p>
<p>KOKOWINKA isn’t my primary job (yet): I spend my days as a <a href="https://tech.zalando.com/jobs/69964/">Zalando QA Engineer</a>.
What’s been amazing is that Zalando has supported and helped me throughout the development of my new business, for which
I feel incredibly lucky. For you employers out there, this post will hopefully inspire you to encourage and assist your
most entrepreneurial technologists in achieving great things. And for my fellow entrepreneurs: I conclude below with
some advice based on my own experiences.</p>
<h3>In the Beginning ...</h3>
<p>Let me start with a story: Once upon a time, there was a guy who had lots of business ideas, but who kept dismissing
them. One day, after browsing a technology blog, he stumbled across a post about a company that was starting to achieve
success for developing an app based on one of his previously discarded ideas. The man realized that this scenario plays
out for many people every day, then decided to pursue his next great idea. He would not avoid missing another
opportunity, he told himself.</p>
<p>Of course, that guy was me. Now comes the more exciting part of the story, in which I get down to work — at work!</p>
<h3>Zalando’s Support</h3>
<p>This came in two primary forms: On-the-job learning, and encouragement from the company to pursue my dream.</p>
<p>— Learning: Many of the skills I’ve learned at Zalando — working with <a href="https://tech.zalando.com/blog/on-apis-and-the-zalando-api-guild/">RESTful
APIs</a>, testing, and creating continuous integration
environment — helped me to create KOKOWINKA.</p>
<p>— Encouragement: My first step was to ask the head of my department for some time off to build KOKOWINKA, and he was
enthusiastic — telling me that it would be great for my professional development. Then I asked some of my colleagues,
who also endorsed the idea. Finally, I checked in with our People Services department to let them know of my plans. In
just six weeks, everything (including my financials and health insurance) was in place for me to get started. During my
time away, many Zalandos checked in to ask for status reports and the URL so they could see my progress and offer
feedback; this support is crucial, I think, when you want to start your own business endeavor.</p>
<p>I also talked to Zalando's affiliation department to clarify some aspects of our company's business terms and
conditions.</p>
<h3>Business Advice</h3>
<p>As part of developing KOKOWINKA, I wanted to learn some brand-new technical skills. I started prototyping using the
<a href="http://meanjs.org/">MEAN</a> stack, which I’d never used before; it was easy to write the JavaScript code, but I found
that I needed to do a lot of SysOp tasks. After a few days, I gave up on MEAN and decided to go with
<a href="https://www.meteor.com/">Meteor</a>, which made prototyping go so much faster.</p>
<p>After a few days of using Meteor, I had something to show to others. I got my first opportunity at the <a href="http://www.meetup.com/Meteor-Berlin/events/222836532/">Meteor meetup in
Berlin</a>, where I refactored some parts of my application and
received lots of valuable feedback from the participants. Days later, I deployed the first version of KOKOWINKA. The
first fuckup — “out of memory” — appeared just two hours after deployment. I read some blog posts about Meteor memory
requirements and decided to increase the RAM of the server from 512MB to 1GB. Problem solved!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0ba6283c9927526e8867c09e8498eae61025411e_meteor-berlin-meetup.jpg?auto=compress,format"></p>
<p><em>Presenting the first prototype on the “Meteor Meetup Berlin”</em></p>
<p>Developing KOKOWINKA was the easy part. Once it was deployed, I turned my focus to SEO, SEA and other marketing-oriented
things I didn’t really know too much about (yet). I read hundreds of blog articles about Google Keyword positioning and
took my best shot. This field is something I am still learning about; it is fascinating how the Google algorithm works.</p>
<h3>How’s It Going?</h3>
<p>KOKOWINKA’s success has been better than I’d expected. I haven’t spent any money on advertisement, yet the app is
attracting about 50 users/day; most traffic comes directly from Google searches. I'm working on a Christmas campaign
that will include Facebook advertising. As for product development: In 2016 I want to add more functionality, more shops
to the catalog, email subscriptions and mobile apps (iOS, Android) using <a href="http://ionicframework.com/">ionic</a>.</p>
<h3>Lessons Learned</h3>
<p>Here’s some valuable insight I’ve gained while working on KOKOWINKA:</p>
<ul>
<li>If you don’t have much time to invest on the project, make sure you plan and prototype it thoroughly before
starting.</li>
<li>Plan to spend at least 15 percent of your time addressing unforeseen issues. They always appear!</li>
<li>Spread the word. Share your idea with as many people as possible. You will receive very valuable feedback this way.</li>
<li>Don’t work alone. There will be days when you doubt the viability of your project. You will need someone who
encourages you to go forward and not to give up.</li>
<li>You should love your idea. If not, you will easily procrastinate and lose focus.</li>
<li>I cannot explain how amazing it is to see transform an idea that once resided solely in your brain into something
touchable/browsable. Keep working so you can experience this magic.</li>
</ul>
<p>Now that you know how I followed my dreams, hopefully you will follow yours too!</p>
<p><em>* I created the name “KOKOWINKA” with an online name generator.</em></p>How Zalando's Using Clojure+Spark (Slides)2015-11-20T00:00:00+01:002015-11-20T00:00:00+01:00Kave Bishogotag:engineering.zalando.com,2015-11-20:/posts/2015/11/how-zalandos-using-clojurespark-slides.html<p>Our talk from this year's Clojure/Conj, which we sponsored.</p><p>Senior Software/Data Engineer Hunter Kelly of Zalando’s Dublin <a href="https://tech.zalando.com/blog/working-at-zalando-dublin/">Fashion Insights
Centre</a> presented at this week’s
<a href="http://clojure-conj.org/">Clojure/Conj</a> on how we’re using Clojure and MLib to understand fashion in a more
quantitative manner. Hunter’s talk — "Using Clojure+Spark to Find All the Topics on the Interwebs"— centered on our
efforts to harness data from crowd-sourced websites such as <a href="https://www.dmoz.org/">DMOZ</a> and <a href="https://commoncrawl.org/">Common
Crawl</a>, which enables us to explore methods of categorising the web into a set of known
fashion-related topics. Then we can better answer such questions as: How many fashion-related topics are there on the
Internet? How closely are they related to each other, or to other non-fashion topics? Furthermore, what topic
hierarchies exist in this landscape?</p>
<p>Zalando was honored to participate in this year’s Clojure/Conj, which took place in Philadelphia, as both presenters and
sponsors. Check out Hunter’s slides from the talk ( <a href="https://www.youtube.com/watch?v=ARBiyYyW4Ow&feature=youtu.be">video version
here</a>):</p>
<p><strong><a href="https://www.slideshare.net/ZalandoTech/spark-clojure-for-topic-discovery-zalando-tech-clojureconj-talk" title="Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk">Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj
Talk</a></strong>
from <strong><a href="http://www.slideshare.net/ZalandoTech">Zalando Tech</a></strong></p>Why "Open Source First"2015-11-17T00:00:00+01:002015-11-17T00:00:00+01:00Raffaele Di Faziotag:engineering.zalando.com,2015-11-17:/posts/2015/11/why-open-source-first.html<p>The sooner you think about open-sourcing, the better.</p><p>In a recent meeting of Zalando Tech’s Open Source Guild — our informal, bimonthly gathering of
<a href="https://en.wikipedia.org/wiki/Free_and_open-source_software">FOSS</a> advocates and evangelists — we discussed our
now-published open <a href="https://tech.zalando.com/blog/zalando-techs-new-open-source-first-principles/">source principles</a>
and the importance of Principle #1: “Do ‘Open Source First.’” In other words, if your Zalando project can also be
useful to the tech world at-large, “release it as open source from the start.”</p>
<p>In creating <a href="https://github.com/zalando-techmonkeys/chimp">chimp</a>, a new deploy tool for my team, I didn’t take an Open
Source First approach. In retrospect, this was a big mistake. While developing the project, I repeatedly told myself
that I would “do it soon,” but never got around to doing it at all. The reasons: It was my first Golang project, and I
didn’t even know how to write a <strong>for</strong> loop over a map or do other pretty basic things. I was uncomfortable about
showing my Golang coding skills to the world, and consequently was uncomfortable about showing the project. Moreover,
chimp started as a hack (wait — what project <em>doesn’t</em> start as a hack?) and at first I wrote a lot of bad, useless
code.</p>
<p>In hindsight, developing chimp with an Open Source First mindset would have made my concerns moot. One reason is that
developing for “the world” can be a great incentive for getting things right the first time. When you tell yourself,
“I’ll make this better for open-sourcing later on,” you postpone taking steps and making improvements to ensure high
quality.</p>
<p>Furthermore, you lose out on a lot of positive feedback. At Zalando, our engineering teams are small and busy, and
relying on colleagues for everything testing- and review-related isn’t always practical. There’s a lot of knowledge on
the Web we can take advantage of by making our work open. At minimum, people will comment on projects they find
interesting and will suggest improvements.</p>
<p>Finally, open-sourcing projects from the beginning is good for increasing security. Thinking from the outset about
protecting the secrets and sensitive data touched by your project reduces the chances of making public something you
don’t want to share.</p>
<p>If I could do it all over again, I would make <a href="https://github.com/zalando-techmonkeys/chimp">chimp</a> open from Day One.
Meantime, I advise you to read our <a href="https://zalando-open-source-principles.readthedocs.org/en/latest/">principles</a> and
make your code Open Source First!</p>
<p><em>Image via</em> <em>Jessica Duensing/Open Source Way.</em></p>Achieving Correct Bloat Estimates of JSON Data in PostgreSQL2015-11-13T00:00:00+01:002015-11-13T00:00:00+01:00Oleksandr Shulgintag:engineering.zalando.com,2015-11-13:/posts/2015/11/achieving-correct-bloat-estimates-of-json-data-in-postgresql.html<p>How we improved PostgreSQL's perception of space occupied by JSON data.</p><p>Recently my team realized that tables in one of our PostgreSQL databases were showing colossal amounts of bloat: up to
40 out of 54 GB total (75%) for bigger tables, and up to 95% of 17 GB total for smaller ones. Having accurate estimates
is important to make certain database administration decisions aimed at reclaiming disk space and improving database
performance, but these numbers were something very unusual, so we wanted to investigate first.</p>
<p>For the table and index bloat estimations, we used
<a href="https://github.com/zalando/PGObserver/blob/master/sql/data_collection_helpers/bloated_tables_and_indexes.sql">queries</a>
based on the <a href="https://wiki.postgresql.org/wiki/Show_database_bloat">PostgreSQL Wiki’s</a> database-bloat query. Only
certain tables were affected, pointing to problems with the tables themselves.</p>
<p>To investigate possible causes for these really unusual bloat estimation numbers, we used the pgstattuple [object
Object]</p>
<p>We then turned our attention to the actual table schema. Our discovery: All the bloated tables included columns of type
JSON.</p>
<p>As you can see from the estimation queries, pg_stats.avg_width[object Object][object Object][object
Object][object Object][object Object][object Object]</p>
<p>Checking the avg_width[object Object]</p>
<p>After reading the code in analyze.c[object Object]</p>
<p><strong>Verifying Assumptions</strong>
In order to verify that statistics aren’t gathered for columns of type JSON, create a test table like this one —</p>
<div class="highlight"><pre><span></span><code>CREATE TABLE testjson(
id INT,
tx TEXT,
js JSON);
</code></pre></div>
<p>— and, after populating it with some data and running ANALYZE[object Object][object Object]</p>
<div class="highlight"><pre><span></span><code>=# SELECT staattnum, stawidth
FROM pg_statistic AS s
JOIN pg_class AS c ON s.starelid = c.oid
WHERE c.relname = 'testjson'
ORDER BY 1;
staattnum | stawidth
-----------+----------
1 | 4
2 | 5
(2 rows)
</code></pre></div>
<p>The pg_stats[object Object]</p>
<p>Post-9.4 versions of PostgreSQL come with type JSONB, which among other interesting properties, has the equality
operator defined. In our problem case, the database in question had been upgraded to 9.4. But switching ~100 GB of
tables over to a different column type — and that’s only on a single database shard, out of eight — was not a viable
option.</p>
<p>If only for the sake of statistics calculation, you can enhance your plain-text JSON type with the equality operator.
Just ensure that the operator is exclusively reserved for this particular use case so that your queries produce an
error, and not a meaningless result for the accidental JSON comparison.</p>
<h3><strong>The Straightforward Way</strong></h3>
<p>Create a schema with a descriptive name, and revoke public access to it:</p>
<div class="highlight"><pre><span></span><code>CREATE SCHEMA dontuse;
REVOKE USAGE ON SCHEMA dontuse FROM PUBLIC;
</code></pre></div>
<p>Now you need a function to create an operator on top of it. JSON is just a text type with a syntax check on input, so we
can cast the parameters to TEXT and compare:</p>
<div class="highlight"><pre><span></span><code><span class="nt">CREATE</span><span class="w"> </span><span class="nt">OR</span><span class="w"> </span><span class="nt">REPLACE</span><span class="w"> </span><span class="nt">FUNCTION</span><span class="w"> </span><span class="nt">dontuse</span><span class="p">.</span><span class="nc">jsoneq</span><span class="o">(</span><span class="nt">a</span><span class="w"> </span><span class="nt">JSON</span><span class="o">,</span><span class="w"> </span><span class="nt">b</span><span class="w"> </span><span class="nt">JSON</span><span class="o">)</span>
<span class="nt">RETURNS</span><span class="w"> </span><span class="nt">BOOL</span><span class="w"> </span><span class="nt">AS</span>
<span class="o">$$</span>
<span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="nt">a</span><span class="p">::</span><span class="nd">TEXT</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nt">b</span><span class="p">::</span><span class="nd">TEXT</span><span class="o">;</span>
<span class="o">$$</span>
<span class="nt">LANGUAGE</span><span class="w"> </span><span class="nt">SQL</span><span class="w"> </span><span class="nt">SECURITY</span><span class="w"> </span><span class="nt">INVOKER</span><span class="w"> </span><span class="nt">IMMUTABLE</span><span class="o">;</span>
</code></pre></div>
<p>And now, for the actual operator:</p>
<div class="highlight"><pre><span></span><code><span class="n">CREATE</span> <span class="n">OPERATOR</span> <span class="n">dontuse</span><span class="p">.</span><span class="o">=</span> <span class="p">(</span>
<span class="kr">PROCEDURE</span><span class="o">=</span><span class="n">dontuse</span><span class="p">.</span><span class="n">jsoneq</span><span class="p">,</span>
<span class="n">LEFTARG</span><span class="o">=</span><span class="n">JSON</span><span class="p">,</span>
<span class="n">RIGHTARG</span><span class="o">=</span><span class="n">JSON</span><span class="p">);</span>
</code></pre></div>
<p>The final step is to create an operator class. Postgres defines the hash access method, which needs only one operator:</p>
<div class="highlight"><pre><span></span><code>CREATE OPERATOR CLASS dontuse.json_ops
DEFAULT FOR TYPE JSON USING hash AS
OPERATOR 1 dontuse.= ;
</code></pre></div>
<p>Declare the operator class as the default for this type. Otherwise, the ANALYZE[object Object]</p>
<p>After re-analyzing the table, you can finally find the missing statistics entry for the column.</p>
<h3><strong>Case Closed?</strong></h3>
<p>If you now try to analyze a reasonably-populated table (in our test, two million rows), the process will take about 19
seconds(!) — whereas it takes only 200 <em>milliseconds</em> when the js[object Object]</p>
<p>If we use PL/PgSQL[object Object]</p>
<div class="highlight"><pre><span></span><code><span class="nt">CREATE</span><span class="w"> </span><span class="nt">OR</span><span class="w"> </span><span class="nt">REPLACE</span><span class="w"> </span><span class="nt">FUNCTION</span><span class="w"> </span><span class="nt">dontuse</span><span class="p">.</span><span class="nc">jsoneq</span><span class="o">(</span><span class="nt">a</span><span class="w"> </span><span class="nt">JSON</span><span class="o">,</span><span class="w"> </span><span class="nt">b</span><span class="w"> </span><span class="nt">JSON</span><span class="o">)</span>
<span class="nt">RETURNS</span><span class="w"> </span><span class="nt">BOOL</span><span class="w"> </span><span class="nt">AS</span><span class="w"> </span><span class="o">$$</span>
<span class="nt">BEGIN</span>
<span class="w"> </span><span class="nt">RETURN</span><span class="w"> </span><span class="nt">a</span><span class="p">::</span><span class="nd">TEXT</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nt">b</span><span class="p">::</span><span class="nd">TEXT</span><span class="o">;</span>
<span class="nt">END</span><span class="o">;</span>
<span class="o">$$</span><span class="w"> </span><span class="nt">LANGUAGE</span><span class="w"> </span><span class="nt">PLPGSQL</span><span class="w"> </span><span class="nt">SECURITY</span><span class="w"> </span><span class="nt">INVOKER</span><span class="w"> </span><span class="nt">IMMUTABLE</span><span class="o">;</span>
</code></pre></div>
<p>— we can reduce the time spent in ANALYZE[object Object]</p>
<h3><strong>Can We Do Better?</strong></h3>
<p>Well, there are at least two possibilities: Either write the comparison function in C, taking advantage of the fact that
JSON type’s internal representation is the same as that of type TEXT; or force the use of TEXT type’s equality
comparison function for the defined operator.</p>
<p>Writing an external function is not hard, but it requires making an external loadable module, installing it on all
affected systems, maintaining it, etc.</p>
<p>Using the existing function sounds more promising. Unfortunately, this cannot be done directly with the CREATE
OPERATOR[object Object]</p>
<div class="highlight"><pre><span></span><code><span class="o">=#</span> <span class="n">CREATE</span> <span class="n">OPERATOR</span> <span class="n">dontuse</span><span class="p">.</span><span class="o">=</span> <span class="p">(</span>
<span class="kr">PROCEDURE</span><span class="o">=</span><span class="n">pg_catalog</span><span class="p">.</span><span class="n">texteq</span><span class="p">,</span>
<span class="n">LEFTARG</span><span class="o">=</span><span class="n">JSON</span><span class="p">,</span>
<span class="n">RIGHTARG</span><span class="o">=</span><span class="n">JSON</span><span class="p">);</span>
<span class="n">ERROR</span><span class="p">:</span> <span class="n">function</span> <span class="n">pg_catalog</span><span class="p">.</span><span class="n">texteq</span><span class="p">(</span><span class="n">json</span><span class="p">,</span> <span class="n">json</span><span class="p">)</span> <span class="n">does</span> <span class="n">not</span> <span class="n">exist</span>
</code></pre></div>
<p>What <em>does</em> work, however, is direct modification of the pg_operator[object Object]</p>
<div class="highlight"><pre><span></span><code><span class="o">=</span><span class="p">#</span><span class="w"> </span><span class="n">BEGIN</span><span class="p">;</span>
<span class="o">=</span><span class="p">#</span><span class="w"> </span><span class="n">UPDATE</span><span class="w"> </span><span class="n">pg_operator</span><span class="w"> </span><span class="n">SET</span><span class="w"> </span><span class="n">oprcode</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">'</span><span class="n">pg_catalog</span><span class="p">.</span><span class="n">texteq</span><span class="p">'</span><span class="o">::</span><span class="n">regproc</span>
<span class="w"> </span><span class="n">WHERE</span><span class="w"> </span><span class="n">oprleft</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mh">114</span>
<span class="w"> </span><span class="n">AND</span><span class="w"> </span><span class="n">oprright</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mh">114</span>
<span class="w"> </span><span class="n">AND</span><span class="w"> </span><span class="n">oprname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sc">'='</span><span class="p">;</span>
<span class="n">UPDATE</span><span class="w"> </span><span class="mh">1</span>
</code></pre></div>
<p>In the example above, 114 is the OID of type JSON. If you test the ANALYZE[object Object]</p>
<h3><strong>Digging Even Deeper</strong></h3>
<p>Potential performance problems can result if the STATISTICS[object Object][object Object]</p>
<p>If the statistics target changes from 100 (the default) to 1000, for example, the ANALYZE[object Object][object
Object][object Object]</p>
<div class="highlight"><pre><span></span><code>CREATE OPERATOR CLASS dontuse.json_ops
DEFAULT FOR TYPE JSON USING btree AS
OPERATOR 1 dontuse.< ,
OPERATOR 3 dontuse.= ,
FUNCTION 1 dontuse.jsoncmp(JSON, JSON);
</code></pre></div>
<p>The exact definition of functions behind the operators = and < is not important here. For the purpose of
ANALYZE[object Object][object Object]</p>
<p>If we go the safe route again and write the support function in SQL, we might end up with ~2.5 seconds on our default
statistics target of 100. If we go the hacky way and override the support function to use the one provided for type
TEXT, we can get the reasonable 300 milliseconds on statistics target 100 — and we’re back to only 2.5 seconds with
target 1000.</p>
<p>For completeness, here’s a command that you also should not try at home:</p>
<div class="highlight"><pre><span></span><code><span class="p">=</span><span class="err">#</span><span class="w"> </span><span class="nx">BEGIN</span><span class="p">;</span>
<span class="p">=</span><span class="err">#</span><span class="w"> </span><span class="nx">UPDATE</span><span class="w"> </span><span class="nx">pg_amproc</span><span class="w"> </span><span class="nx">SET</span><span class="w"> </span><span class="nx">amproc</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="err">'</span><span class="nx">pg_catalog</span><span class="p">.</span><span class="nx">bttextcmp</span><span class="err">'</span><span class="o">::</span><span class="nx">regproc</span>
<span class="w"> </span><span class="nx">WHERE</span><span class="w"> </span><span class="nx">amproclefttype</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">114</span>
<span class="w"> </span><span class="nx">AND</span><span class="w"> </span><span class="nx">amprocrighttype</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">114</span>
<span class="w"> </span><span class="nx">AND</span><span class="w"> </span><span class="nx">amprocnum</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="nx">UPDATE</span><span class="w"> </span><span class="mi">1</span>
</code></pre></div>
<h3><strong>The Final Chord</strong></h3>
<p>Since PostgreSQL 9.2, you can provide a second support routine for the btree[object Object][object Object]</p>
<p>Yet another possibility is to provide a custom ANALYZE[object Object][object Object][object Object]</p>
<p>Now, the proper fix can only be implemented in PostgreSQL core: even if a type doesn’t provide equality comparison
operator, the average column width can still be estimated. The upcoming version of PostgreSQL 9.5 will have the <a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=82e1ba7fd6cc9ac3fb1d9b819dc7295b268d3703">fix
incorporated</a>.</p>Doing Data Science for Social Good2015-11-11T00:00:00+01:002015-11-11T00:00:00+01:00Hayley Baldwintag:engineering.zalando.com,2015-11-11:/posts/2015/11/doing-data-science-for-social-good.html<p>A weekend DataDive inside Zalando Tech!</p><p>We recently teamed up with the Data Science for Social Good Berlin ( <a href="http://dssg-berlin.org/">DSSG</a>) chapter to host
their first-ever DataDive in our Shuttle Innovation Lab. More than 60 data scientists, data enthusiasts and other
technologists from the Berlin tech community spent Friday night through Sunday afternoon working with
<a href="https://hackpad.com/DSSG-Berlin-2015-Data-Dive-NDw8vqyduYJ">datasets</a> provided by the nonprofit organizations
<a href="http://www.streetfootballworld.org/">streetfootballworld</a>, <a href="http://en.jambo-bukoba.com/">Jambo Bukoba</a> and
<a href="http://datalook.io/about/">DataLook</a>. “Data Divers” represented diverse skillsets and backgrounds (eg. NHST, linear
models, data visualization, NLP, sentiment analysis, and more), chose their own projects to work on and developed their
own workflows. The event also held spots for “observers” — myself included.</p>
<p>Have you heard of <a href="http://www.datakind.org/">DataKind</a>? Well, DSSG Berlin works similarly. They provide data to help
nonprofits prove that their programs are effective, and predict trends for future programs. In other words, they put the
numbers behind the narratives for social organizations that don’t have data scientists or other technologists on staff.
DSSG Berlin linked up to us via Zalando product specialist Kevin Wong, who’s an ambassador to the group and was helping
them to find an event location. As it turns out, many of our 60+ data scientists were already volunteering for the
initiative in their free time.</p>
<p>In addition to data-hacking, the dive included final presentations and breakout sessions. Zalando Data Scientist
Katharina Rasch delivered one such session about her participation in the <a href="http://dssg.io/">Data Science for Social Good
Fellowship</a> at the University of Chicago this past summer. Katharina was invited to spend three months
working with 41 other fellows, project managers, technical mentors and the organizers on data science projects for
nonprofits and government organizations.</p>
<p>“I had a grand time this summer” she said, “and I am excited to see that the idea of using our skills for social good is
also taking off in Germany. You can learn more about the projects we worked on <a href="http://dssg.io/projects/">here</a>.”</p>
<p>Volunteer DSSG Ambassadors helped the data divers manage their time and work flows, and overcome problem-oriented
roadblocks. The non-profits’ founders and team members also provided support and helped participants navigate the data
(jump to 51:02 in the video below for a very heartfelt thank you from Jambo Bukoba founder Clement Mulokozi)</p>
<p>Watch the final presentations of the results here:</p>
<p>Written Results:</p>
<ul>
<li><a href="https://hackpad.com/Jambo-Bukoba-Dataset-gTEZotNwLQr#:h=Results">Jambo Bukoba</a></li>
<li><a href="https://hackpad.com/Streetfootballworld-UwhpkzuhGqn#:h=Results">streetfootballworld</a></li>
<li><a href="https://hackpad.com/DataLook-Bike-Accident-Dataset-rDyq5U36ENl#:h=Results">DataLook</a></li>
</ul>
<p><em>*Note: not all data has been linked yet.</em></p>
<p>The DataLook team livestreamed the event, and a truly awesome video of their findings that you can watch
<a href="https://www.youtube.com/watch?v=XeUosRRcL_U">here</a>.</p>
<p>You can learn more about DSSG’s past and future projects on their <a href="https://github.com/dssg-berlin">GitHub</a> page. A big
thanks to the DSSG Berlin team and their volunteers for taking the time to put on such a great event — and for including
us! <a href="https://www.facebook.com/media/set/?set=a.917382561649200.1073741830.860748467312610&type=3&notif_t=like">Go here for event
photos</a>.</p>Zalando Tech's New Open Source Principles2015-11-10T00:00:00+01:002015-11-10T00:00:00+01:00Lauri Appletag:engineering.zalando.com,2015-11-10:/posts/2015/11/zalando-techs-new-open-source-principles.html<p>Zalando Tech's Open Source Guild outlines a vision for promoting open source development.</p><p>Two important parts of Zalando Technology's culture — our self-organized guilds, and our support for open source—
intersect via our Open Source Guild. The Guild is an informal group
of engineers and tech evangelists dedicated to strengthening <a href="http://github.com/zalando">our open source</a> culture both
internally and externally. Every two weeks, Guild members meet to brainstorm ideas for promoting our projects to the
tech community; collaborate on internal policies related to licensing and other relevant topics, and trade insights and
best practices.</p>
<p>One of the Guild's biggest projects of the past few months has been drafting a vision statement and set of "open source
principles" to encourage everyone on our team to think in an "Open Source First" way. After much commenting,
copy-tweaking and debate, here they are. Feel free to customize and adopt them for your own team's purposes:</p>
<p><strong>Vision</strong>: <em>We strongly believe that open source software benefits the tech community, and that providing broadly
useful code to the world is a virtue. We strive to work in an open source way to the betterment of Zalando and the
world.</em></p>
<ul>
<li><strong>Do “Open Source First”</strong>: If your Zalando project can also be useful to non-Zalandos, release it as open source
from the start.</li>
<li><strong>Take Ownership</strong>: Your team is responsible for ensuring that it’s possible to open source your project. Your
delivery lead is available for guidance.</li>
<li><strong>Share Your Code</strong>: All code shared between teams must be open source.</li>
<li><strong>Be Safe</strong>: To ensure the broadest possible use of your project, use the MIT License only.</li>
<li><strong>Deliver Quality</strong>: Provide a great out-of-the-box experience.</li>
<li><strong>Provide Documentation</strong>: Include a clear README and default working configuration.</li>
<li><strong>Stay Secure</strong>: Make sure your project doesn’t include Zalando specifics, such as credentials and private
identifiers.</li>
<li><strong>Ask for Help</strong>: Find colleagues to brainstorm ideas for your project and to review your work.</li>
<li><strong>Promote</strong>: Tell the world about your project via blog posts, social media and conference talks.</li>
<li><strong>Join the Open Source Guild</strong>: Help us make open source stronger at Zalando!</li>
</ul>
<p>In true open source fashion, these principles remain a work in progress; we'll make revisions and updates
<a href="https://github.com/zalando/zalando-howto-open-source">here</a>.</p>Video: "A Tale of Automation and Legacy"2015-11-09T00:00:00+01:002015-11-09T00:00:00+01:00Rodrigo Reistag:engineering.zalando.com,2015-11-09:/posts/2015/11/video-a-tale-of-automation-and-legacy.html<p>Learn how we're tackling Identity and Access Management in the cloud.</p><p>Last month my colleague Igor Ramadas and I attended <a href="http://summits.forgerock.com/london/">ForgeRock’s Identity Summit</a>
in London, where we presented a talk on Zalando’s efforts to enable business continuity while migrating to the cloud.
With automation in mind, we showed how using (some) coding tweaks makes it possible for two different worlds to
communicate securely: Our new cloud apps, based on <a href="https://stups.io/">STUPS</a> and Amazon Web Services; and our legacy
apps, which reside in our datacenters. We also discussed the challenges lying ahead in regard to Digital Identity and
Access Management (IAM).</p>
<p>Check out our presentation to learn more about how Zalando’s IAM Team is creating a unified and secure solution for our
IT environment:</p>
<p>You can also review the slides here:</p>
<p><strong><a href="https://www.slideshare.net/ForgeRock/identity-summit-uk-keep-talking-lessons-learned-during-our-migration-from-legacy-iam-to-forgerock" title="Identity Summit UK: KEEP TALKING: LESSONS LEARNED DURING OUR MIGRATION FROM LEGACY IAM TO FORGEROCK">Identity Summit UK: KEEP TALKING: LESSONS LEARNED DURING OUR MIGRATION FROM LEGACY IAM TO
FORGEROCK</a></strong>
from <strong><a href="http://www.slideshare.net/ForgeRock">ForgeRock</a></strong></p>Watch: "How to Auto-Scale Your API" (Video)2015-11-05T00:00:00+01:002015-11-05T00:00:00+01:00Kave Bishogotag:engineering.zalando.com,2015-11-05:/posts/2015/11/watch-how-to-auto-scale-your-api-video.html<p>Learn about our API strategy from two Zalando software engineers.</p><p>Given Zalando’s rapid growth, adopting an <a href="https://api.zalando.com/">API</a> First approach has proven to be one of the
most effective ways to manage big data and facilitate an effective and easier customer experience for a larger audience.
Last month Zalando Delivery Lead Sean Floyd and Software Engineer Luis Mineiro traveled to Belgrade, Serbia to speak at
the <a href="https://voxxeddays.com/belgrade15/#np-855">Voxxed Days Belgrade</a> conference about API First — focusing specifically
on “ <a href="https://tech.zalando.com/blog/auto-scaling-your-api-tips-from-zalando-slides/">How to Auto-Scale Your API</a>.” Watch
their talk:</p>
<p>Voxxed Days focuses on Java, web, mobile and JVM languages. The Belgrade edition brought together more than 500
participants and renowned speakers from different engineering communities. The next Voxxed Days will take place in
Berlin on January 28-29, 2016; learn more <a href="https://voxxeddays.com/berlin16/">here</a>.</p>Zalando Tech Screens "Big Dream" on Nov. 102015-11-04T00:00:00+01:002015-11-04T00:00:00+01:00Kave Bishogotag:engineering.zalando.com,2015-11-04:/posts/2015/11/zalando-tech-screens-big-dream-on-nov.-10.html<p>Zalando screens this women in tech documentary for free in Berlin.</p><p>As part of Zalando’s efforts to promote diversity in tech, we’ll screen the documentary <a href="http://www.bigdreammovement.com/">Big
Dream</a> on November 10 at our Berlin technology HQ. This documentary film, directed by
Kelly Cox and Iron Way Films and co-produced by Microsoft, follows the stories of seven women who overcome obstacles in
following their STEM (Science, Technology, Engineering and Math)-related career paths. This is the first showing of the
film in Berlin, so <a href="https://www.eventbrite.com/e/big-dream-screening-by-big-dream-movement-tickets-19113004511">RSVP</a> to
join us!</p>
<p><a href="http://www.mckinsey.com/insights/organization/why_diversity_matters">Diversity adds value to business</a>, yet the
challenge to close the gender, age, ethnicity, religion and personality gaps remain wide. At Zalando Tech, initiatives
like our Diversity Guild create opportunities for technologists and others company-wide to discuss diversity topics in
the tech industry and our workplace. Our team is proud of the diversity we’ve achieved — more than 50 nations and
counting are represented on our tech team— and wish to strengthen our open culture even more. Movies like <em>Big Dream</em>
remind us of the importance of this work.</p>
<p>Watch a clip here:</p>How to Web Summit2015-11-02T00:00:00+01:002015-11-02T00:00:00+01:00Selina McCarthytag:engineering.zalando.com,2015-11-02:/posts/2015/11/how-to-web-summit.html<p>A native Dubliner and Web Summit alumni shows us the way.</p><p>“In 4 years, Web Summit has grown from 400 attendees to over 22,000 from more than 110 countries. It’s been called ‘the
best technology conference on the planet.’ But we just think it‘s different. And that difference works for our
attendees, ranging from Fortune 500 companies to the world’s most exciting startups.” — websummit.com</p>
<p>Zalando is participating in this year's <a href="https://websummit.net/">Web Summit</a> in Dublin, Ireland with a booth, a fireside
chat on the mainstage featuring our cofounder Robert Gentz, a panel on the new Fashion track, and multiple satellite
events. Dublin is home to our <a href="https://tech.zalando.com/blog/working-at-zalando-dublin/">Fashion Insights Centre</a>,
opened in May 2015, and almost half of our Summit-attending team are locals. Summit attendees can stop by our booth and
meet our product specialists and UX designers, data scientists, software engineers, recruiters, and marketing
specialists.
One event ticket gives you access to 21 “Summits” — a total of 1,000 speakers, not including infinite opportunities to
meet representatives from the world’s leading tech companies. This year’s Summit will draw an estimated 30,000
attendees. It’s a lot of good stuff to pack into three days.</p>
<h2>How to Get the Most from Your Summit Experience</h2>
<p>Before joining Zalando I was part of the Web Summit planning team, and I have organized many events with the team ever
since. I’ve also been to the event as a regular attendee and have developed a pretty good understanding of how to get
the most out of it. Recently I shared my top tips at a meetup I organized and hosted at Zalando’s Berlin tech HQ. Here
they are now for your enjoyment:</p>
<ul>
<li>Reach out to interesting people over Linkedin before the event. Let them know that you’ll be at the Summit, and try
to set up a short meeting with them.</li>
<li>Make your own <a href="https://websummit.net/speakers">schedule of talks</a> you want to attend, and set alarms in your phone
so you don’t miss them.</li>
<li>Networking is important, but don’t get caught up and miss the talks. Check out the different stages and the
different industries.</li>
<li>There are tons of great satellite events and they’re not always explicitly promoted. Networking will often result in
an invite.</li>
<li>In general, people are much more open to meeting new people at conferences than they usually are. Don’t wait for
people to come to you — reach out first.</li>
<li>When you meet people, don’t immediately talk about work. Everyone does this, and it gets old quickly. Discuss
something else that you genuinely find interesting, like the key takeaway from a talk you attended.</li>
<li>Keep your expectations in check and don’t forget to enjoy the event! Networking is like dating: The more desperate
you are, the less attractive you are.</li>
<li>Sleep a lot in advance. You will be going round the clock!</li>
</ul>
<p>In addition to being a Summit team alumnus, I also happen to be a native Dubliner. Here are a few things that you need
to know about the city:</p>
<ul>
<li>Weather is unpredictable, but predictably bad. Bring a coat and an umbrella.</li>
<li>Public transport is not comparable to many European cities. You might have to dish out some money for a cab, and you
can’t walk everywhere.</li>
<li>City bikes are popular and found everywhere. Try one!</li>
<li>Check out <a href="http://lovindublin.com/">lovindublin.com</a> and <a href="http://dublin.lecool.com/">dublin.lecool.com</a> for more
insider Dublin info</li>
</ul>
<p>If you’re coming to see us at the event, here’s where we’ll be:</p>
<h3>Fireside Chat: Main stage, Wednesday, November 4, 2:15 PM</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b2e131100e977f85065f1046b061c4c0c351ca5b_screen-shot-2015-11-02-at-19.21.46.png?auto=compress,format"></p>
<h3>Fashion Stage Panel: Thursday November, 4, 10:20 AM</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/97152ef9a4f17bbb7d404f18efec5b5eccf84347_screen-shot-2015-11-02-at-19.22.21.png?auto=compress,format"></p>
<p>On <a href="https://twitter.com/ZalandoTech">Twitter</a>, we'll use the hashtag #WebSummit. Find our booth in the “builders area”
complete with Club Mate (a Zalando Tech staple beverage), Nerf guns, a coding game our new starter engineers built, and
a ton of swag. Looking forward to meeting you!</p>
<p><em>Photos courtesy of</em> <em>websummit.com</em></p>Attention Tech Entrepreneurs!2015-10-28T00:00:00+01:002015-10-28T00:00:00+01:00Benjamin Wörnertag:engineering.zalando.com,2015-10-28:/posts/2015/10/attention-tech-entrepreneurs.html<p>Zalando-Helsinki launches a new residency for startups that connect people to fashion.</p><p><em>Update: Due to the extremely positive feedback we received during</em> <em>SLUSH, we want to give more teams the chance to
apply for the Zalando Tech Residency Program and are extending the application deadline to Monday, Nov. 30, 2015 at noon
CET. All teams who have already applied will hear from us soon.</em></p>
<p>Zalando’s story begins in 2008 with two guys shipping shoes out of their student apartment. Co-founders Robert Gentz and
David Schneider wanted to improve people's online shopping experiences by providing a greater assortment of products,
plus convenient cost-free delivery and returns. Since then we’ve become Europe’s largest online fashion platform —
offering services in brand solutions, smart logistics, performance marketing, wholesale pricing, and many other areas of
ecommerce. Our success story includes all the mistakes, pivots, surprises and risks that many startups face. Now, we
want to share our learnings and insights by offering the next generation of tech startups in the fashion and lifestyle
space a three-month residency at our new <a href="https://tech.zalando.com/blog/hello-helsinki/">Helsinki Tech Hub</a>.</p>
<p>We opened our Helsinki office this August as our <a href="https://tech.zalando.com/blog/working-at-zalando-dublin/">second</a> Tech
Hub outside of Germany. Following the opening of our <a href="https://www.youtube.com/watch?v=7vpjFMXBdzc">Fashion Insights Centre in
Dublin</a>, we announced that our Helsinki team would focus on developing new
apps and mobile products. Helsinki is also home to a vibrant and active startup community that has truly impressed us,
and was a major factor in our decision.</p>
<p>Through this new program, our Helsinki team will host startup teams for three months — offering expertise, guidance and
resources to help participants build their businesses. Applicants should be founders of startups that help connect
people to fashion in some way. Helsinki Site Lead <a href="https://angel.co/tuomas-kytomaa">Tuomas Kytömaa</a> — a Silicon Valley
veteran whose background includes tech and entrepreneurship — will serve as primary mentor for our program
participants.
Interested in participating? Here are the minimum qualifications:</p>
<ul>
<li>You’re a team of 2-5 founding members with an innovative business idea and an already existing prototype/beta
(ideally with first traction)</li>
<li>Your product solves a problem related to some area of e-commerce: eg. customer insights, smart logistics, adtech,
product and brand discovery, fashion-related content</li>
<li>You’re already based in Helsinki, or willing to relocate for the duration of the program</li>
</ul>
<p>At the beginning of the three-month program we’ll jointly define what is needed to fully develop your product, develop a
go-to-market strategy, and reach a critical mass of users.</p>
<p>Program benefits include:</p>
<ul>
<li>Three-month program beginning in Q1 2016.</li>
<li>Completely equipped workplace for up to five team members</li>
<li>Full access to our facilities and amenities: meeting rooms, chill-out areas, free drinks and snacks</li>
<li>Regular mentoring session with our Head Coach Tuomas</li>
<li>Access and coaching sessions with Zalando experts from all areas of our business</li>
<li>No strings attached, all this is 100% free for startups</li>
</ul>
<p>If you're ready for the challenge, you can use this
<a href="https://docs.google.com/a/zalando.de/forms/d/1fuhf7ikzmvyKnLpNbhp2zLNI_xH7ALG6LNCKNS1o-2I/viewform">form</a> to apply. If
we like your idea, we’ll schedule a video interview so we can get to know each other. Any questions? Reach out to
startups@zalando.fi.</p>From Jimmy to Microservices: Rebuilding Zalando’s Fashion Store2015-10-16T00:00:00+02:002015-10-16T00:00:00+02:00Dan Persatag:engineering.zalando.com,2015-10-16:/posts/2015/10/from-jimmy-to-microservices-rebuilding-zalandos-fashion-store.html<p>How Zalando's Spearheads team is facilitating our monolith-to-microservices evolution.</p><p>One of the largest clusters of engineering teams driving Zalando is our Fashion Store department, which develops and
maintains our 15 country-specific, customer-facing websites. The Fashion Store focuses on creating great online
experiences for Zalando’s +16 million active customers — developing great features like <a href="https://www.zalando.co.uk/catalog/">the
catalog</a> and the <a href="https://www.zalando.co.uk/only-sons-monkey-print-t-shirt-blue-os322o006-k11.html">product detail
page.</a></p>
<p>Our Fashion Store engineers are organized into small, independent, autonomous teams that own their code. These teams take
responsibility for the whole application development cycle: programming, assuring quality, deploying and operating their
applications. My team, the Spearheads, formed in May 2015 to completely rebuild the Fashion Store. In this post — the
first in a series — I’ll discuss the Spearheads’ motivation and goals, following up in future installments with more
detailed insights into our architectural decisions.</p>
<h3>Spearheading Change</h3>
<p>The key goal of Team Spearheads is to create a new, state-of-the-art architecture that enables engineers to work
autonomously and fosters innovation. Our team — seven engineers and a UX/UI designer — decided to replace “Jimmy”, our
monolithic shop application, with microservices built by multiple autonomous teams. Under Jimmy, all of the Shop teams
shared the same code base, lacked true ownership or the freedom to make decisions, and couldn’t move quickly because of
Jimmy’s slow startup time and our overly complicated deployment processes.</p>
<h2>Spearhead Key Principles</h2>
<h3>Enable Team Autonomy</h3>
<p>Autonomy allows every team, including those belonging to the Fashion Store to execute a
full product cycle on currently active features without being dependent on any other teams. We can also introduce new
features autonomously. If dependencies are unavoidable, we try to limit them as much as possible.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f55c07d732742810c966c319ded205dc58e635fc_the-spearheads.jpg?auto=compress,format"></p>
<h3>Rapid Feature Development</h3>
<p><a href="http://martinfowler.com/articles/microservices.html">Microservices</a> allow teams to independently develop new features
and put them into production. Being able to quickly bring new features live will motivate our product team to come up
with innovative ideas, experiment, and “play” with adding and removing features — producing wins for our growing
customer base.</p>
<h3>Create a Consistent User Experience</h3>
<p>We need to balance our drive for team autonomy and experimentation with the need to provide a consistent user
experience. Although different teams will develop and deploy each page of our new Fashion Store, the end result should
appear as if a single person designed them. This requires the same visual style for each part of the website, as well as
coherent handling of user input, animation timings, wording, and even performance of each individual page.</p>
<h3>High Perceived Performance</h3>
<p>By targeting perceived performance instead of absolute numbers, our team can focus more effort on customer experience
rather than raw system numbers. With our new architecture, we can immediately stream markup from different microservices
to the browser and arrange it on the client side (see <a href="https://www.facebook.com/notes/facebook-engineering/bigpipe-pipelining-web-pages-for-high-performance/389414033919">Facebook’s Big
Pipe</a>).
For some of our pages this might not be enough, as we need the content to be ordered on the server side. For this
particular use case, we’ve developed a new approach called Smart Pipe (we’ll get into more detail about it in a future
blog post).</p>
<h3>Cutting-Edge Technologies</h3>
<p>In choosing which technologies to use, the Spearheads want to keep current Fashion Store team members happy and engaged
while also attracting new talent. As an example, we’ve decided not to support non-JavaScript browsers — eliminating lots
of overhead that would have made our architecture slower and harder to implement. We enjoy writing software in many
languages; being polyglot is both fun and challenging. We started writing our prototypes in Scala, Go, JavaScript
(Node.js). Open sourcing the code we write is a priority for us. That’s how nice projects like
<a href="https://github.com/zalando/skipper">Skipper</a>, <a href="https://github.com/zalando/innkeeper">Innkeeper</a> and
<a href="https://github.com/zalando/beard">Beard</a> were born.</p>
<h3>Simplicity</h3>
<p><em>"The effort required to design something is inversely proportional to the simplicity of the result." — Roy Fielding</em></p>
<p>Simple is beautiful! Spearheads strive to eliminate any unnecessary complexity from the new architecture. In many of our
earliest brainstorming sessions, we tried to find the right balance between enabling team autonomy and creating a
consistent user experience. We took into account many of the possible solutions for each of the problems we were facing
and, by discussing each one, eliminated the weakest ideas. We developed prototypes for the viable solutions and chose
the ones which best fit our use cases. As we needed to create a router for the Fashion Store, we developed two
prototypes, one written in Scala and Skipper, written in Go. In the end we chose to continue with the Go solution and
open sourced it.</p>
<h3>Communication</h3>
<p>Internal communication and feedback are very important to us. Having a new architecture is pointless if the other
Fashion Store teams don’t find it helpful. We regularly present on our progress with the whole tech team. We consider
other teams’ feedback and document all of our decisions on the internal tech wiki.</p>
<p>That’s it for today! In my next blog post, I’ll disclose more details about my team and on the new Fashion Store
architecture.</p>"Choosing the Right Components": Zalando at HelsinkiJS2015-10-15T00:00:00+02:002015-10-15T00:00:00+02:00Lauri Appletag:engineering.zalando.com,2015-10-15:/posts/2015/10/choosing-the-right-components-zalando-at-helsinkijs.html<p>Zalando Senior Frontend Engineer Dmitriy Kubyshkin helps you to avoid picking the wrong ones.</p><p>Zalando Senior Frontend Engineer <a href="https://twitter.com/d_kubyshkin">Dmitriy Kubyshkin</a> recently visited our <a href="https://tech.zalando.com/locations/helsinki/">new Helsinki
tech hub</a> and had some time to head out to the local JavaScript meetup,
<a href="http://helsinkijs.org/">HelsinkiJS</a>, to share his engineering expertise. Watch his talk here:</p>How to Use Parameter Names in SQL Functions2015-10-15T00:00:00+02:002015-10-15T00:00:00+02:00Andras Vaczitag:engineering.zalando.com,2015-10-15:/posts/2015/10/how-to-use-parameter-names-in-sql-functions.html<p>Use a 'new' PostgreSQL 9.2 feature to make your life better!</p><p>At Zalando, <a href="https://tech.zalando.com/blog/watch-fashion-is-hard-postgresql-is-easy/">we store</a> most of our valuable
data in <a href="https://tech.zalando.com/blog/analyzing-extreme-distributions-in-postgresql/">PostgreSQL</a>. When we want to
access it, we typically use a layer of PostgreSQL functions. With every release, we roll out a new set of functions,
neatly organized into versioned API schemas. The application then reads and changes the data by calling the functions of
the current API schema. This means we have many functions in our databases — and a relatively large team of developers
who write them.</p>
<p>If you have been developing PostgreSQL functions for a long time, you know that, until relatively recently, there used
to be a pain point when writing query language functions (better known as SQL functions). On the one hand were the
PL/pgSQL functions, where parameters names could have been used in the function body; on the other hand were the poor
SQL functions, where this was impossible. We had no option but to use the positional parameter notation:
$1[obje[object Object]</p>
<p>But now the misery is over! From <a href="http://www.postgresql.org/docs/9.2/static/sql-createfunction.html#AEN68193">PostgreSQL 9.2
onwards</a>, even SQL language functions can
make use of parameter names. To see the difference, let's pick a function from Zalando’s codebase (with a minor tweak
from me):</p>
<div class="highlight"><pre><span></span><code><span class="nv">CREATE</span><span class="w"> </span><span class="nv">OR</span><span class="w"> </span><span class="nv">REPLACE</span><span class="w"> </span><span class="nv">FUNCTION</span><span class="w"> </span><span class="nf">get_something_to_process</span><span class="p">(</span>
<span class="w"> </span><span class="nv">p_offer</span><span class="w"> </span><span class="nv">TEXT</span><span class="p">,</span>
<span class="w"> </span><span class="nv">p_template_id</span><span class="w"> </span><span class="nv">INTEGER</span><span class="p">,</span>
<span class="w"> </span><span class="nv">p_valid_to</span><span class="w"> </span><span class="nv">TIMESTAMP</span><span class="p">,</span>
<span class="w"> </span><span class="nv">p_valid_from</span><span class="w"> </span><span class="nv">TIMESTAMP</span><span class="p">,</span>
<span class="w"> </span><span class="nv">p_name</span><span class="w"> </span><span class="nv">TEXT</span><span class="p">,</span>
<span class="w"> </span><span class="nv">p_code</span><span class="w"> </span><span class="nv">TEXT</span><span class="p">,</span>
<span class="w"> </span><span class="nv">p_limit</span><span class="w"> </span><span class="nv">INTEGER</span><span class="w"> </span><span class="nv">DEFAULT</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="nv">p_offset</span><span class="w"> </span><span class="nv">INTEGER</span><span class="w"> </span><span class="nv">DEFAULT</span><span class="w"> </span><span class="mi">0</span>
<span class="p">)</span><span class="w"> </span><span class="nv">RETURNS</span><span class="w"> </span><span class="nv">SETOF</span><span class="w"> </span><span class="nv">request</span><span class="w"> </span><span class="nv">AS</span>
<span class="p">$</span><span class="nv">BODY</span><span class="p">$</span>
<span class="w"> </span><span class="o">...</span>
<span class="w"> </span><span class="nv">WHERE</span><span class="w"> </span><span class="nv">r_offer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">$</span><span class="mi">1</span>
<span class="w"> </span><span class="nv">AND</span><span class="w"> </span><span class="nv">r_template_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">$</span><span class="mi">2</span>
<span class="w"> </span><span class="nv">AND</span><span class="w"> </span><span class="nv">r_valid_from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">$</span><span class="mi">3</span>
<span class="w"> </span><span class="nv">AND</span><span class="w"> </span><span class="nv">r_valid_to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">$</span><span class="mi">4</span>
<span class="w"> </span><span class="nv">AND</span><span class="w"> </span><span class="nv">r_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">$</span><span class="mi">5</span>
<span class="w"> </span><span class="nv">AND</span><span class="w"> </span><span class="nv">r_code</span><span class="w"> </span><span class="nv">IS</span><span class="w"> </span><span class="nv">NOT</span><span class="w"> </span><span class="nv">DISTINCT</span><span class="w"> </span><span class="nv">FROM</span><span class="w"> </span><span class="p">$</span><span class="mi">6</span>
<span class="w"> </span><span class="nv">LIMIT</span><span class="w"> </span><span class="p">$</span><span class="mi">7</span><span class="w"> </span><span class="nv">OFFSET</span><span class="w"> </span><span class="p">$</span><span class="mi">8</span><span class="p">;</span>
<span class="p">$</span><span class="nv">BODY</span><span class="p">$</span>
<span class="nv">LANGUAGE</span><span class="w"> </span><span class="o">'</span><span class="nv">sql</span><span class="o">'</span><span class="w"> </span><span class="nv">STABLE</span><span class="w"> </span><span class="nv">SECURITY</span><span class="w"> </span><span class="nv">DEFINER</span><span class="p">;</span>
</code></pre></div>
<p>Can you tell what's wrong? Compare it to the following, dollar-free version:</p>
<div class="highlight"><pre><span></span><code>...
WHERE r_offer = p_offer
AND r_template_id = p_template_id
AND r_valid_from = p_valid_to
AND r_valid_to = p_valid_from
AND r_name = p_name
AND r_code IS NOT DISTINCT FROM p_code
LIMIT p_limit OFFSET p_offset;
</code></pre></div>
<p>Now it's probably obvious why you didn't get the expected result. I swapped two parameters: p_valid_from[object
Object][object Object]</p>
<p>And that's all. Use this 'new' feature to make your life better!</p>How Zalando’s App Makes Instagram Images Shoppable2015-10-13T00:00:00+02:002015-10-13T00:00:00+02:00Marcel Daaketag:engineering.zalando.com,2015-10-13:/posts/2015/10/how-zalandos-app-makes-instagram-images-shoppable.html<p>Delivering inspiration through user-generated content.</p><p>I’ve been a product manager on the Zalando mobile apps team since 2013, and for as long as I can remember our customers
have been telling us that they derive inspiration for their fashion purchases via Instagram. With <a href="http://fortune.com/2014/12/10/instagram-leaves-twitter-in-the-dust-with-300-million-active-users/">more than 300 million
active users</a> sharing
more than 70 million photos and videos every day, Instagram has become the perfect spot to easily share and discover new
outfits and street styles.</p>
<p>To capitalize on Instagram’s popularity and utility, my team recently joined up with Zalando’s in-house
<a href="https://tech.zalando.com/blog/agile-ux-testing-at-zalando/">UserLab</a> and content and social media teams on a project
exploring the wide field of user-generated content (UGC). The goal: to discover ways to inspire and engage our users
even more. The result? A new iOS feature that makes Instagram shoppable for Zalando customers.</p>
<p><strong>The Architects of Our Idea: You</strong></p>
<p>As with any project we launch, my team invited a group of Zalando customers — including “power users,” who use our
features regularly — to drive our creative process. Every month, we invite selected users to our UserLab to test our new
apps and the usability of specific features. We also receive direct feedback through our app email addresses ( <a href="mailto:ios@zalando.de">ios at
zalando.de</a>, <a href="mailto:android@zalando.de">android at zalando.de</a>, and <a href="mailto:winapps@zalando.de">winapps at
zalando.de</a>), app store reviews (
<a href="https://itunes.apple.com/de/app/zalando-fashion-shopping/id585629514?mt=8">ios</a>,
<a href="https://play.google.com/store/apps/details?id=de.zalando.mobile&hl=de">android</a>,
<a href="https://www.microsoft.com/de-de/store/apps/zalando-shopping/9wzdncrdfsv7">windows</a>) and internal data-tracking systems.
At times, we even take a guerilla approach and ask random people on the street about our ideas.</p>
<p>For the UGC project, we invited six customers to the UserLab for a four-hour session. We asked them about how they use
and interact with platforms like Instagram and Pinterest; the aspects of those platforms that inspire them and keeps
them returning; and which media outlets, websites, and personalities they follow, and why. We gave them space to come up
with their own (sometimes wild!) ideas about features, apps, and platforms, and this really helped us to understand
their insights and experiences.</p>
<p>After hours of intensive brainstorming, questioning and competitor analysis, my team learned a lot about what mattered
most when concepting a UGC feature. It became clear that our users love Instagram because it’s personal, populated with
high-quality images, and puts fashion items into context (who is wearing what? how do they wear it? how can one combine
new items with current favorites?). We also learned that customers get frustrated when they see something they love but
can’t buy it directly. Instagram lacks a simple, one-click method of purchasing items. My team and I came up with a
concept to make Instagram images shoppable via the Zalando app and got to work.</p>
<p><strong>How We Delivered</strong></p>
<p>Our new iOS feature shows the best Instagram images to display on our mobile app and connects those images to products
from our shop. Our customers already use #zalandostyle or #shareyourstyle to tag outfits they want to share with the
Instagram community; using basic image recognition and manual curation, we link these hashtagged images to our products.
From time to time, we also include other hashtags in the selection process.</p>
<p>The most inspiring tagged images go live in the app, which creates a fun challenge for our customers as they try to
become featured on Zalando. This selection also encourages high-quality content, which we and our customers value
greatly. (Our social media team chooses and approves new user generated images every week day. If you have an outfit to
share, post it on Instagram with the hashtags #zalandostyle or #shareyourstyle and check the app some hours later to
see if you were selected!)</p>
<p>A new section of our app shows the very best of what users share on Instagram, and is accessible directly from the start
page. Tap on a style to see products that are either exact matches or very similar recommendations.</p>
<p>So that customers only see items they can actually purchase, we don’t feature sold-out items. For these, we’ve
implemented a simple logic: If there is no product connected to an image anymore, it doesn’t appear.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5346fe0b9f3b8e84840637015cac33b1ffa45de9_screen-shot-2015-10-13-at-12.17.14.png?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/819dd605dee185aada2b94b8951d4f82f0e815aa_screen-shot-2015-10-13-at-12.18.56.png?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/92f693e53c018ae6d98329e498156a81655cb2be_screen-shot-2015-10-13-at-12.19.44.png?auto=compress,format"></p>
<p>This feature is still wearing baby shoes. We are still generating our first insights and monitoring how users behave
with it. If you have ideas how to develop it further, please drop us a line at ios@zalando.de.</p>My Droidcon Greece Experience2015-10-09T00:00:00+02:002015-10-09T00:00:00+02:00Raymond Chenontag:engineering.zalando.com,2015-10-09:/posts/2015/10/my-droidcon-greece-experience.html<p>Eating, touring Thessaloniki, and presenting a talk about our Android app fails.</p><p>Last month Thessaloniki welcomed the <a href="http://2015.droidcon.gr/">first Droidcon in Greece</a>. It was also my first time in
Greece, and I was fortunate to be invited as a speaker. Aside from the high quality of the conference itself, the
organizers (special thanks to <a href="https://plus.google.com/u/0/+ElizaCamber">Eliza Amber</a>) did a great job of spoiling us.
It started at the airport, where a taxi was waiting for me. On Droidcon Eve, the other speakers and I were given a bus
tour of Thessaloniki.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/27b0ebccc4840f94fb0f1aa05708546033d84c03_arch_galerius2-2.jpg?auto=compress,format"></p>
<p>My fellow Droidcon speakers and I stopped at the Arch of Galerius for a photo (above) and at a Bougatsa shop around the
Byzantine Walls. We tried both the salty and sweet versions of the pastry; my vote went to the sweet Bougatsa with
cinnamon, pictured below:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/1dd755df6c132c56f7468852b333fccfe8b4d1c6_bougatsa.jpg?auto=compress,format"></p>
<p>Before lunch, we visited a local market and tried some mezze (ham, cheese and the biggest olives I’ve ever seen.) Then
we were treated to a seafood lunch where, again, we experienced that extremely good Greek hospitality and cuisine.</p>
<h3>First Sustenance, Now Substance: My Favorite Droidcon Talks</h3>
<p>Droidcon took place at Noesis ( ΝΟΗΣΙΣ ), a science park on the outskirts of Thessaloniki. All of the talks were
interesting, but these takeaways were the most memorable:</p>
<p>Google developer advocate <a href="http://2015.droidcon.gr/session/damien-mabin/">Damien Mabin</a>, a former game developer,
offered his advice on game monetization: Even if a user spends just $1 on your game, it's still more than what you would
earn from having them watch ads in your game/app. As soon as a user spends money on your app, remove the ads. The
"whales" big spenders) still generate more revenue the average user who spends less than $5.</p>
<p>Damien also noted that time-constraint challenges incentivize players to spend more time in the game. These challenges
generate a nice surge in traffic from the same users. (This point applies to non-game apps as well.)</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6b5df07ea9b748dcd840bd847d964933b405b771_talk_google_damien_mabin_side.jpg?auto=compress,format"></p>
<p>Developer and evangelist <a href="http://2015.droidcon.gr/session/svetlana-isakova/">Svetlana Isakova</a> of Jetbrains (editors of
Android Studio) demoed Kotlin, Jetbrain’s homegrown programming language. Nicknamed “the Swift of Android,”
<a href="http://kotlinlang.org/">Kotlin</a> is considered to be less verbose but more readable than Java, and it supports
immutability, nullable types and lambdas. The easiest way to start using it is by automatic conversion of Java file to
Kotlin file, done with an Android Studio/IntelliJ plugin. Java can interoperate with Kotlin methods and vice versa. In
one of my pet projects, I’ve found Kotlin to be very concise for POJO classes with setter and getter methods.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/84f54aad30e56754b97914c0ba09887a6a8c4610_talk_kotlin.jpg?auto=compress,format"></p>
<p><a href="http://2015.droidcon.gr/session/josh-skeen/">Josh Skeen</a>, an instructor at Big Nerd Ranch, led a great workshop on
RxJava. Rx stands for Reactive, and you may have heard of Retrofit + RxJava in combination. Josh explained the
fundamental concepts of <a href="https://github.com/ReactiveX/RxJava">RxJava</a>, a library for composing asynchronous and
event-based programs by using observable sequences. We tried to solve his <a href="https://github.com/mutexkid/rxjava-koans">RxJava
Koans</a> in order to reach RXJava enlightenment (photo below).</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/51d853cccc029ccc191eceb7c96168d91048e254_talk_rxjava_josh.jpg?auto=compress,format"></p>
<p><a href="http://2015.droidcon.gr/session/savvas-dalkitsis/">Savvas Dalkitsis</a> from Shazam discussed how aspect-oriented
programming can isolate code related to analytics and ads from your business logic. It reminded me of our code base at
Zalando, which uses a similar technique (tracking) for analytics.</p>
<p>As for my own talk (my second Droidcon presentation — I got to speak at Droidcon Berlin earlier this year), I focused on
“App Fails and Retrospectives.” The topic didn’t quite fit the SDK track, which was where I appeared, in that it mostly
offered failure-related anecdotes based on real-world Zalando experiences. Judging by the audience’s reaction, our
“best” failure was the time <a href="https://play.google.com/store/apps/details?id=de.zalando.mobile">our app</a> was about to
become the Editor’s Choice on Google Play, and on the same evening I had to revert to the previous app version because
of an untested crash that was going to affect millions of users.</p>
<p>I came up with the idea of “epic fails” from an active Zalando group chat called “#guild-fuckup,” one of the most
active of Zalando Tech’s +100 guilds. I found inspiration there the day before the Droidcon Berlin deadline, submitted
the talk ... and was accepted! I couldn’t offer expertise on trendy topics: Internet of Things, wearables, Android Auto,
but failure and mistake-making was something I thought every dev or dev team could relate to, big company or small
startup.</p>
<h3>All Good Things Come to an End</h3>
<p>After the very last talk, we took a “family” photo and finished the event with a Greek dinner (of course).</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/acdc6ce0758d72acbeec291126b4cf64ef93d3f4_group_photo.jpg?auto=compress,format"></p>
<p>Before leaving Greece, I bought a Terkenlis chocolate cake (tsourekia) for my team in Berlin. A Greek colleague
recognized the cake and thanked me for bringing the best back from Thessaloniki. I thought it was the least I could do
for all the hospitality I enjoyed there! It was definitely worth the trip.</p>What We Liked and Learned at JSConf.eu2015-10-05T00:00:00+02:002015-10-05T00:00:00+02:00Andrey Kuzmintag:engineering.zalando.com,2015-10-05:/posts/2015/10/what-we-liked-and-learned-at-jsconf.eu.html<p>Microsoft Edge, Mozilla Flame, WYSISWYM, JSCodeShift, and more.</p><p>Late last month brought the seventh edition of <a href="http://2015.jsconf.eu/">JSConf.eu</a>, Berlin’s largest JavaScript
conference, and my colleague <a href="https://twitter.com/unsoundscapes">Andrey Kuzmin</a> and I were lucky enough to attend. Aside
from offering (always!) great food, the conference also featured an array of interesting talks over two tracks. Here’s
what we enjoyed the most:</p>
<h3>Microsoft Edge</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7d7f38603895ba1ee27d5c0dddb9a6c3ed7aa6a1_microsoft-edge.jpg?auto=compress,format"></p>
<p><a href="https://twitter.com/kaijaeger">Kai Jäger</a> of Microsoft gave an introduction to Microsoft’s most recent — and joyfully
anticipated — browser. Edge seems designed specifically to silence the developer community who would otherwise say, “of
course the website doesn’t render properly in [this MS browser]”; it even picks up Webkit vendor prefixes, if
necessary. Jäger explained that the Microsoft team had a hard time testing for compatibility because web servers showed
them dumbed-down webapps; consequently, its user agent string now contains every browser.</p>
<p>Perhaps the most important aspect of Edge is that Microsoft is being particularly transparent about it. As an example,
we’d been asking developers to vote for <a href="https://wpdev.uservoice.com/forums/257854/suggestions/6263916">SVG External
Content</a> support for cross-browser <a href="https://tech.zalando.com/blog/creating-bulletproof-svg-icons/">SVG
icons</a>; last week, Microsoft delivered on this feature.
Developers can still suggest features (and <a href="https://dev.modern.ie/platform/status/">track the progress</a> of their
suggestions).</p>
<h3>Building a Robust WYSIWYM Editor</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a3e4cd5940e8c2222edd25d4b1def28f1430b1b1_building-robust-wysiwym.jpg?auto=compress,format"></p>
<p><a href="https://twitter.com/marijnjh">Marijn Haverbeke</a>, the creator of <a href="http://prosemirror.net/">ProseMirror</a>, spoke about
WYSIWYM, or “what you see is what you mean" — his approach to editing formatted text in the browser. WYSIWYM’s key
difference from WYSIWYG (what you see is what you get) is that, while the latter allows you to style text, it doesn’t
enable you to structure the text semantically. Marijn spoke about different approaches to editing formatted text in the
browser: using the contentEditable browser feature, for instance, or implementing everything — including the cursor and
selection — from scratch.</p>
<p>Both approaches have their pros and cons, so in many situations applying a combination of the two works best. Use
contentEditable, but then intercept default events and write your own code to update the DOM and overcome browsers’
inconsistencies.</p>
<h3>Perceived Performance</h3>
<p>None of our hard metrics (Time-to-First-Byte, for example) are of much use when we fail to take human psychology into
account. How much do you have to speed up your app before your users even notice a difference? Or, inversely, by how
much can you fuck up a production deployment? <a href="https://twitter.com/mishunov">Denys Mishunov</a> suggests using a 20 % rule
to simplify the science behind it: If your app previously took one second to load, users won’t notice a 200
milliseconds’ difference in either direction. Luckily, you don’t have to wait for the video of Denys’s talk, because he
has been exploring this topic in great detail at <a href="http://www.smashingmagazine.com/2015/09/why-performance-matters-the-perception-of-time/">Smashing
Magazine</a>.</p>
<h3>RegExp Unicode Flag</h3>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d6427ec587601f34b4aff6a308e6060d7af8c698_regexp-unicode-flag.jpg?auto=compress,format"></p>
<p>ES6 will include an “u” flag for available regular expressions. Finally, you will be able to safely write [😸-🙀]/gu to
match all cat faces! This wasn’t possible before, because Emojis are encoded by two Unicode code points (surrogate
halfs); ES5 would interpret the above as [\uD83D\uDE38-\uD83D\uDE40]. <a href="https://twitter.com/mathias">Mathias
Bynens</a> from Opera has created <a href="https://github.com/mathiasbynens/regexpu">regexpu</a>: a
transpiler that makes it possible to use the “u” flag right now. It’s already integrated in
<a href="https://babeljs.io/">Babel</a>, so you should start using the “u” flag for all new Regexes. Please be careful, though,
when adding the flag to old Regexes, because it may alter their behavior in unexpected ways.</p>
<h3>JSCodeShift</h3>
<p>Another big-potential project from Facebook. JSCodeShift is (basically) scripted refactorings of your code, called
“codemods.” Presenter <a href="https://twitter.com/cpojer">Christoph Pojer</a> used it to update a project from React.createClass
to ES6 classes in mere seconds. Facebook plans to release codemods along with API changes, which could relieve
developers from a lot of churn (even more so if the practice spreads to other projects). Internally, it uses a parser
from Babel and a clever diffing algorithm to ensure that the correct indentation is preserved, and that the least number
of source lines is touched by the transformation. The plan is to make the Babel parser sensitive to whitespace to make
JSCodeShift even more robust. Play around with <a href="http://felix-kling.de/esprima_ast_explorer/">AST explorer</a> to get a
better idea.</p>
<h3>Disconnected Networking</h3>
<p>Adobe Engineer <a href="https://twitter.com/razvancaliman">Razvan Caliman</a>’s very inspiring talk reminded us that we might not
actually need HTTP or WebSockets to do stuff. He demo’ed several data transmissions via ultrasound (or regular sound),
low-fi gesture detection using ultrasound and the Doppler effect — using Ambient Light Events to detect Morse code and
some other creative applications of visible light communication.</p>
<h3>Time Zones</h3>
<p>Time zones are always fun, as long as you don’t deal with them yourself. (You probably know the excellent “falsehoods
programmers believe about time” already, but just in case: <a href="http://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time">Part
1</a>, <a href="http://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time">Part
2</a>.) What really stuck as the
perfect example of how messy they are: Morocco switches time zones not twice but four times a year, because they change
it before and after Ramadan, too. As the Islamic calendar is a lunar calendar, Ramadan shifts through the Gregorian
calendar. Oh and Sydney had a one-off change before the Olympics. Good luck coding around that! The main takeaway from
timezone-talking <a href="https://twitter.com/iamnotyourbroom">Gilmore Davidson</a>: Avoid dealing with timezones. Think hard if
you really, really need them. “In the end, they still don’t tell you whether someone is awake or at work.”</p>
<h3>Consequences of an Insightful Algorithm</h3>
<p>The closing talk by <a href="http://www.callbackwomen.com/home.html">CallbackWomen</a> founder <a href="https://twitter.com/cczona">Carina C.
Zona</a> was about algorithms that kinda work for the majority of people, but fail horribly for
the rest. An example: Facebook’s “Year in Review” app, which showed technologist Eric Meyer pictures of his daughter —
captioned “Anne turned 7!” — who <a href="http://meyerweb.com/eric/thoughts/2014/12/24/inadvertent-algorithmic-cruelty/">had died that
year</a>. Another: When Google Photos tagged
<a href="http://mashable.com/2015/07/01/google-photos-black-people-gorillas/#EZkuVGsmtGqk">black people as gorillas</a>. Yet
another: Fitbit tracking people’s sexual activity and <a href="http://techcrunch.com/2011/07/03/sexual-activity-tracked-by-fitbit-shows-up-in-google-search-results/">making the data public by
default</a>.
Carina’s message was a reminder to the audience that any of us developers could one day create a product that causes a
similar type of damage. We are making decisions all the time, often quickly. Yet we need to be mindful at all times
about the impact of potential false-positives or false-negatives, in order to do no harm.</p>
<h3>Firefox OS Workshop</h3>
<p>We got to have a quick hands-on with a Firefox OS phone. If you don’t know, this is a mobile phone operating system
built on web standards, so that you can inspect your Home Screen with Firefox’s developer tools (and see that it’s all
just divs and spans).</p>
<h3>Mozilla Flame</h3>
<p>The “getting started” experience was quite pleasant and easy to follow. Andrey, always living on the edge, immediately
tested support for <strong>marquee</strong> and <strong>blink</strong> tags (the former worked, the latter didn’t). Support for web standards is
so good that one Mozilla employee suggested that I just use <strong>getUserMedia</strong> instead of the non-web Camera API to access
the camera.</p>
<h3>Honorable Mentions</h3>
<p>The guys from <a href="http://www.audiotool.com/">audiotool.com</a> generate JavaScript code out of their JVM byte code. Apparently
they have one Java codebase that they cross-compile to Android, iOS, and web.</p>
<p><a href="https://twitter.com/wa7son">@wa7son</a> created several npm packages to work with Apple’s Airplay.</p>
<p><a href="https://twitter.com/_munter_">@_munter_</a> made a <a href="https://github.com/Munter/fusile">task runner</a> that’s easy to
understand and doesn’t need much configuration.</p>
<p><a href="http://horsedrawingtycoon.com/">Horse Drawing Tycoon!</a></p>Building an OpenVPN Cluster, Zalando-Style2015-10-01T00:00:00+02:002015-10-01T00:00:00+02:00Sebastian Bärtag:engineering.zalando.com,2015-10-01:/posts/2015/10/building-an-openvpn-cluster-zalando-style.html<p>A Zalando DevOps Engineer describes how we did it.</p><p>Since Zalando’s earliest days, members of our technology team have been able to use OpenVPN to work anywhere, anytime.
Back then (the late 2000s), we had only a single (though state-of-the-art) instance to work with, and a lot of manual
maintenance to perform. Last year, in light of a years-long period of explosive growth, our team realized that we needed
to build something scalable, fully redundant, and easier-to-maintain for hundreds of users. In other words, our own
OpenVPN cluster!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/dc7dba759c4ce8f8503cf97be84c2e15a27ed00b_building-a-vpn-cluster-zalando.jpg?auto=compress,format"></p>
<p>The above diagram illustrates the structure of our new cluster, which I built using typical network models as guides.
Achieving greater reliability was the primary goal behind this design, which makes it possible to scale up easily and
add as many VPN servers as we need. It’s also easier to add many more IPs. Our team decided to use six servers in order
to achieve site reliability and redundancy in each side. For better scaling, we put the servers behind a load balancer
for the external datacenters (DC1 + DC2). The internal servers (DC3) are not load-balanced because they are running on
high-availability hosts. The config looks like this:</p>
<div class="highlight"><pre><span></span><code> client
dev tun?
proto udp
hand-window 10
remote 10.1.1.1 1194
remote 10.1.1.2 1194
remote 1.2.3.4 1194
remote 5.6.7.8 1194
resolv-retry infinite
nobind
persist-key
persist-tun
ca trusted_chain.pem
cert user.crt
key user.key
comp-lzo
verb 3
route-method exe
route-delay 2
</code></pre></div>
<p>On the server, the user config is very simple — we only add the IP address:</p>
<div class="highlight"><pre><span></span><code> vpnXX:/etc/openvpn/ccd# cat username
ifconfig-push 192.168.178.10 192.168.178.9
ifconfig-ipv6-push fd7a:6ca6:e640:8000::192.168.178.10
</code></pre></div>
<p>The rest of the config is similar for all the servers in the cluster:</p>
<div class="highlight"><pre><span></span><code><span class="nx">mode</span><span class="w"> </span><span class="nx">server</span>
<span class="nx">tls</span><span class="o">-</span><span class="nx">server</span>
<span class="nx">tls</span><span class="o">-</span><span class="nx">cipher</span><span class="w"> </span><span class="nx">TLS</span><span class="o">-</span><span class="nx">DHE</span><span class="o">-</span><span class="nx">RSA</span><span class="o">-</span><span class="nx">WITH</span><span class="o">-</span><span class="nx">AES</span><span class="o">-</span><span class="mi">256</span><span class="o">-</span><span class="nx">GCM</span><span class="o">-</span><span class="nx">SHA384</span><span class="p">:</span><span class="nx">TLS</span><span class="o">-</span><span class="nx">DHE</span><span class="o">-</span><span class="nx">RSA</span><span class="o">-</span><span class="nx">WITH</span><span class="o">-</span><span class="nx">AES</span><span class="o">-</span><span class="mi">256</span><span class="o">-</span><span class="nx">CBC</span><span class="o">-</span><span class="nx">SHA</span><span class="p">:</span><span class="nx">TLS</span><span class="o">-</span><span class="nx">DHE</span><span class="o">-</span><span class="nx">DSS</span><span class="o">-</span><span class="nx">WITH</span><span class="o">-</span><span class="nx">AES</span><span class="o">-</span><span class="mi">256</span><span class="o">-</span><span class="nx">CBC</span><span class="o">-</span><span class="nx">SHA</span>
<span class="err">#</span><span class="w"> </span><span class="nx">some</span><span class="w"> </span><span class="nx">encryption</span><span class="w"> </span><span class="nx">hardening</span>
<span class="nx">push</span><span class="w"> </span><span class="s">"topology net30"</span>
<span class="nx">topology</span><span class="w"> </span><span class="nx">net30</span>
<span class="nx">port</span><span class="w"> </span><span class="nx">XXXXX</span><span class="w"> </span><span class="err">#</span><span class="nx">chose</span><span class="w"> </span><span class="nx">your</span><span class="w"> </span><span class="nx">port</span>
<span class="nx">proto</span><span class="w"> </span><span class="nx">udp</span>
<span class="nx">dev</span><span class="w"> </span><span class="nx">vpninterface</span>
<span class="nx">dev</span><span class="o">-</span><span class="k">type</span><span class="w"> </span><span class="nx">tun</span>
<span class="nx">ca</span><span class="w"> </span><span class="nx">keys</span><span class="o">/</span><span class="nx">trusted_chain</span><span class="p">.</span><span class="nx">pem</span>
<span class="nx">cert</span><span class="w"> </span><span class="nx">keys</span><span class="o">/</span><span class="nx">XXX</span><span class="p">.</span><span class="nx">crt</span>
<span class="nx">key</span><span class="w"> </span><span class="nx">keys</span><span class="o">/</span><span class="nx">XXX</span><span class="p">.</span><span class="nx">key</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="nx">This</span><span class="w"> </span><span class="nx">file</span><span class="w"> </span><span class="nx">should</span><span class="w"> </span><span class="nx">be</span><span class="w"> </span><span class="nx">kept</span><span class="w"> </span><span class="nx">secret</span>
<span class="nx">dh</span><span class="w"> </span><span class="nx">keys</span><span class="o">/</span><span class="nx">XXX</span><span class="p">.</span><span class="nx">pem</span>
<span class="nx">crl</span><span class="o">-</span><span class="nx">verify</span><span class="w"> </span><span class="nx">keys</span><span class="o">/</span><span class="nx">XXX</span><span class="p">.</span><span class="nx">pem</span>
<span class="nx">ifconfig</span><span class="w"> </span><span class="m m-Double">192.168.178.1</span><span class="w"> </span><span class="m m-Double">192.168.178.2</span>
<span class="nx">script</span><span class="o">-</span><span class="nx">security</span><span class="w"> </span><span class="mi">2</span>
<span class="nx">learn</span><span class="o">-</span><span class="nx">address</span><span class="w"> </span><span class="o">/</span><span class="nx">etc</span><span class="o">/</span><span class="nx">openvpn</span><span class="o">/</span><span class="nx">route_add</span><span class="p">.</span><span class="nx">sh</span>
<span class="nx">client</span><span class="o">-</span><span class="nx">disconnect</span><span class="w"> </span><span class="o">/</span><span class="nx">etc</span><span class="o">/</span><span class="nx">openvpn</span><span class="o">/</span><span class="nx">route_delete</span><span class="p">.</span><span class="nx">sh</span>
<span class="nx">keepalive</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="mi">10</span>
<span class="nx">comp</span><span class="o">-</span><span class="nx">lzo</span>
<span class="nx">persist</span><span class="o">-</span><span class="nx">key</span>
<span class="nx">persist</span><span class="o">-</span><span class="nx">tun</span>
<span class="nx">status</span><span class="w"> </span><span class="o">/</span><span class="nx">logs</span><span class="o">/</span><span class="nx">openvpn</span><span class="o">-</span><span class="nx">status</span><span class="p">.</span><span class="nx">log</span>
<span class="nx">verb</span><span class="w"> </span><span class="mi">3</span>
<span class="nx">client</span><span class="o">-</span><span class="nx">config</span><span class="o">-</span><span class="nx">dir</span><span class="w"> </span><span class="nx">ccd</span>
<span class="nx">ccd</span><span class="o">-</span><span class="nx">exclusive</span>
<span class="err">##</span><span class="w"> </span><span class="nx">duplicate</span><span class="o">-</span><span class="nx">cn</span>
<span class="nx">log</span><span class="o">-</span><span class="nx">append</span><span class="w"> </span><span class="o">/</span><span class="nx">logs</span><span class="o">/</span><span class="nx">openvpn</span><span class="p">.</span><span class="nx">log</span>
<span class="nx">management</span><span class="w"> </span><span class="nx">x</span><span class="p">.</span><span class="nx">x</span><span class="p">.</span><span class="nx">x</span><span class="p">.</span><span class="nx">x</span><span class="w"> </span><span class="nx">port</span>
<span class="err">##</span><span class="nx">routen</span>
<span class="nx">push</span><span class="w"> </span><span class="s">"dhcp-option DNS 8.8.8.8"</span>
<span class="nx">push</span><span class="w"> </span><span class="s">"dhcp-option DOMAIN "</span>
<span class="nx">push</span><span class="w"> </span><span class="s">"route 8.8.8.8 255.255.255.255"</span>
<span class="nx">push</span><span class="w"> </span><span class="err">“</span><span class="nx">route</span><span class="w"> </span><span class="nx">X</span><span class="p">.</span><span class="nx">X</span><span class="p">.</span><span class="nx">X</span><span class="p">.</span><span class="nx">X</span><span class="w"> </span><span class="nx">Y</span><span class="p">.</span><span class="nx">Y</span><span class="p">.</span><span class="nx">Y</span><span class="p">.</span><span class="nx">Y</span><span class="err">”</span>
</code></pre></div>
<p>Our configuration offers the same IP to the user every time, which greatly simplifies rule setting. However, a new
problem arises in such a cluster design: Basically, the network in the datacenter is forced to know which server the
user is currently logged on. To handle cases like these, we’ve created two simple scripts in case the cluster can self
learn where the user is login. route_add.sh will add the logged-in user using a static route to the VPN server;
route_delete.sh will remove it after logout. Two examples:</p>
<p><strong>route_add.sh</strong></p>
<div class="highlight"><pre><span></span><code> #!/bin/bash
if [ ! -z "$ifconfig_pool_remote_ip" ]; then
ip route add "$ifconfig_pool_remote_ip" dev $dev
ip route add "::$ifconfig_pool_remote_ip" dev $dev
fi
exit 0
</code></pre></div>
<p><strong>route_delete.sh</strong></p>
<div class="highlight"><pre><span></span><code> #!/bin/sh
ip route del "$ifconfig_pool_remote_ip"
</code></pre></div>
<p>After adding the route to the server’s routing table, the only necessary thing to do is to announce it. We use a dynamic
routing protocol in the backend of our datacenter, and <a href="http://www.nongnu.org/quagga/">Quagga</a> on every VPN server to
announce the routes. Depending on your infrastructure, you can use one of the preferred routing protocols: Routing
Information Protocol (RIP), Border Gateway Protocol (BGP) or Open Shortest Path First (OSPF).</p>
<p>It’s important to activate routing on the VPN server. You can do this by typing:</p>
<div class="highlight"><pre><span></span><code> echo 1 > /proc/sys/net/ipv4/ip_forward
router ospf
redistribute kernel
redistribute connected # this is the important line
network X.X.X.X area 0.0.0.0
area 0.0.0.0 authentication message-digest
</code></pre></div>
<p>The user can log on and off to every server, and the IP address will be announced to the entire network.</p>
<p>For security and access control, we chose to use <a href="http://www.fwbuilder.org/">Firewall Builder</a>, an open-source iptables
manager that allows us to build a ruleset and install it on all servers in the cluster simultaneously. Since making
improvements to our user management system earlier this year, we’ve reduced the complexity of iptables from hundreds of
rules/10,000 lines to 50 rules and a more viewable interface.</p>
<p>With this set up, we have about 800 users (150 in parallel) and haven’t faced any performance issues at anytime, from
anywhere!</p>My First Weeks as a Zalando Tech Engineer in Dortmund2015-09-24T00:00:00+02:002015-09-24T00:00:00+02:00Malte Pickhantag:engineering.zalando.com,2015-09-24:/posts/2015/09/my-first-weeks-as-a-zalando-tech-engineer-in-dortmund.html<p>Zalando-Dortmund newbie Malte Pickhan describes getting acquainted with his new work digs.</p><p>In mid-July, I joined Zalando’s engineering team in Dortmund as a Java engineer working on the Payments team. All
Zalando new hires spend their first few weeks (usually four) being onboarded in Berlin. Pretty cool, right? But not so
optimal if you and your girlfriend are expecting a baby at any moment, as was true in my case.</p>
<p>As any new hire might be, I was a bit hesitant to tell my new team that I had to say “no” to something as crucial as
onboarding. I needn’t have worried: My delivery lead (the team member primarily responsible for overseeing the
technical/engineering aspects of my work) said I could do the onboarding at a later date. This outcome reassured me that
I made the right decision by joining Zalando: Despite its size, the company remains flexible and respects employees’
human needs. I still got to do some onboarding — but much closer to home, in the Dortmund team’s shiny new office.</p>
<p>Despite all the usual newbie presentations, introductions and paperwork, I was able to be productive pretty quickly —
getting down to work in my very first week. My first Zalando project is a microservice that exposes a REST interface
(described with <a href="http://swagger.io/">Swagger</a> and implemented with Spring-Boot) to route certain types of requests to
specific endpoints. Earlier this year, Zalando began migrating applications to <a href="https://aws.amazon.com">AWS</a> — packaging
them with <a href="https://www.docker.com/">Docker</a> and deploying them with our own open-source Platform as a Service,
<a href="https://stups.io">STUPS</a>. A lot about the project was unfamiliar to me, but as someone who learns by doing, I was soon
comfortable with the work.</p>
<p>Dortmund newbies spend our first three months with three different teams to get familiar with the office, check for
“chemistry,” and decide what we’d like to work on most. With autonomous teams, every team gets to decide which
technologies and languages to apply to their projects; in Dortmund, many teams choose Scala. Additionally, you receive a
powerful laptop and are open to choose between two models.</p>
<p>Aside from getting great hardware and even greater engineering challenges to work on, I’ve been enjoying our monthly
“Funny Fridays” — Dortmund Tech team gatherings with beer, pizza and conversation. I’ve been learning from our internal
tech talks and Thursday morning “Tech All-Hands” sessions, broadcast live from our Berlin office. And I’ve enjoyed the
comfortable surroundings and amenities — Club Mate, fresh snacks —available in our new digs.</p>
<p>My girlfriend and I are still waiting for our delivery date, but thanks to Zalando I'll get two weeks off as soon as
Baby arrives!</p>Order Processing at Scale with Camunda (Slides)2015-09-23T00:00:00+02:002015-09-23T00:00:00+02:00Jörn Horstmanntag:engineering.zalando.com,2015-09-23:/posts/2015/09/order-processing-at-scale-with-camunda-slides.html<p>Learn more about how Zalando's using the Camunda engine.</p><p>Last week my colleague I presented at <a href="https://network.camunda.org/meetings/55">Camunda Community Day</a>: An all-day event
highlighting the latest developments in the Camunda BPM Network (BPMN) engine. This wasn't my first talk at Camunda: In
March, some of my colleagues and I <a href="https://tech.zalando.com/blog/camunda-meets-cassandra-at-zalando/">presented our prototype for running Camunda on
Cassandra</a>. This time around, I focused on how we've
been using Camunda to implement our sales order process within our
<a href="https://tech.zalando.com/blog/so-youve-heard-about-radical-agility...-video/">evolving</a> microservice architecture.</p>
<p>Running the process engine at Zalando's scale has brought some challenges. But BPMN allows a clearly documented process
that is easy to understand for both product specialists and developers, and is never out of sync with our actual running
code. I talked about how we integrated the process engine into our highly available system, how we customized it to
increase performance, and what our deployment and architecture looks like. Learn more about our work with Camunda <a href="https://network.camunda.org/whitepaper/26">via
this brand-new case study</a>. Then check out my slides below:</p>
<p><strong><a href="https://www.slideshare.net/ZalandoTech/order-processing-at-scale-zalando-at-camunda-community-day" title="Order Processing at Scale: Zalando at Camunda Community Day">Order Processing at Scale: Zalando at Camunda Community
Day</a></strong>
from <strong><a href="http://www.slideshare.net/ZalandoTech">Zalando Tech</a></strong></p>Why Zalando Is Celebrating “Mobile First Day”2015-09-23T00:00:00+02:002015-09-23T00:00:00+02:00Richard Nebtag:engineering.zalando.com,2015-09-23:/posts/2015/09/why-zalando-is-celebrating-mobile-first-day.html<p>The many steps we’re taking to become a #MobileFirst company.</p><p>Pretend for a moment that you work in a dynamic, multinational ecommerce company, and that you are tasked with getting
your +9,000 coworkers to think, support and believe in “Mobile First.” How, and where, do you begin? Zalando’s Mobile
First team has been wrestling with this idea all year, and has come up with three key ingredients:</p>
<p><strong>A solid, easy-to-understand definition of what it means to be “Mobile First.”</strong> The majority of our customers now
visit Zalando via smartphones and tablets. In the future, mobile will be the primary touchpoint. To remain
customer-centric, we need to innovate across devices.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f5759839cd996b4e22c471b58032f34f6312e0a9_screen-shot-2015-09-23-at-10.03.27.png?auto=compress,format"></p>
<p><strong>Compelling statistics.</strong> More than half of Zalando’s +16 million active customers shop via their mobile devices, and
soon mobile purchases will surpass desktop. The Zalando app has now been downloaded more than 11 million times and
Mobile now makes up 57% of Zalando’s overall traffic. These stats make it clear that we can’t grow our business by
simply relying on the success of our desktop website. We need to listen to our users, and adapt our overall business
strategy by shifting our focus to the latest devices and channels. accordingly.</p>
<p><strong>Fun.</strong> This brings us to <strong>#MobileFirst Day</strong>, Zalando’s latest celebration of our Mobile First initiative. On
September 24, we’ll host a full day of presentations featuring internal and external (Facebook, Instagram, Uber) mobile
experts. Each Zalando location in Berlin will hold its own Mobile Fair featuring booths from all our departments—from
fashion search to finance—that showcase their mobile activities; our Zalando teams in Dortmund, Dublin and Helsinki will
tune in via livecast. We’re also asking our employees to test our current apps and beta versions of new app projects,
and to share their feedback with our mobile teams.</p>
<p><strong>Behind our Mobile First Movement.</strong>
In crafting our internal Mobile First campaign, we’ve followed a few key mantras:</p>
<p><strong>Take a bottom-up approach.</strong>
In March 2015 we launched our M-Bassador program, in which every Zalando department holds workshops to define their own
Mobile First-specific activities and appoints a Mobile First ambassador to liaise with Zalando Mobile.</p>
<p><strong>Assume nothing.</strong>
Because it’s so new, Mobile First development offers no guarantees. To help our M-Bassadors become more comfortable with
so much uncertainty, our team hosted an intro workshop to identify high-level roadblocks to going Mobile First and
taught them how to run workshops themselves.</p>
<p><strong>Go multi-pronged.</strong>
Other Mobile First initiatives we’ve implemented in recent months:</p>
<ul>
<li><strong>Letting employees choose their mobile.</strong> Zalandos can upgrade to their preferred operating system: iPhone 6,
Android, HTC ONE, or Sony Xperia.</li>
<li><strong>Fun, easy and mobile focused internal contests and campaigns.</strong> One example is our mobile video selfie contest. We
asked Zalandos to send us a selfie-video in which they told us how mobile has changed their lives. Everyone who
participated had the chance to win an iPad.</li>
<li><strong>Getting support from leadership.</strong> The members of our management board have been strong allies—using their regular
internal talks to update all of Zalando on our Mobile First progress. They’re great at asking us provocative
questions that encourage us to think critically about our work.</li>
<li><strong>A curated newsletter with news of our ongoing apps development and different teams’ Mobile First activities.</strong>
Rental services for employees who want to try out different mobile devices during off-hours, or even while they’re
on vacation.</li>
<li><strong>Multi-device mobile playgrounds at each Zalando location.</strong> These resemble the display area at your local Apple
store—except that they include Android and Windows in addition to iOS—but similarly offer a great user experience:
No passwords or extra steps are required to try out new devices, download apps or browse the web.</li>
<li><strong>Mobile Workshop kits</strong> that include 30 ready-to-use iOS and Android tablets and WiFi boxes for meetings or
workshops.</li>
<li><strong>Wireless meeting rooms</strong> that include Apple TVs and presentation devices compatible with mobile devices.</li>
<li><strong>Mobile Responsive Design workshops</strong> for all Zalando designers.</li>
</ul>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e1dd8c943bbdca8855c830eeb8663f9000b70a41_screen-shot-2015-09-23-at-10.15.10.png?auto=compress,format"></p>
<p><em>Mobile Workshop kit rollout.</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/577cf4a68cc2784d2b53b809cf66445e2c55646f_screen-shot-2015-09-23-at-10.18.36.png?auto=compress,format"></p>
<p><em>A Mobile Playground in our Berlin Tech HQ.</em></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/123f349c34ecad2d7a881bf626a12f22fa2b5070_screen-shot-2015-09-23-at-10.20.33.png?auto=compress,format"></p>
<p><em>M-Bassador Workshop in action!</em></p>
<p>In the words of Eric Schmidt, "The trend has been mobile was winning; it's now won." Mobile is here to stay, and we’re
excited about where our Mobile First movement will take us.</p>
<hr>
<p><a href="https://twitter.com/MobileGeekGirl">Kristina Walcker-Mayer</a> is a Mobile Apps Product Specialist and Mobile First
Evangelist at Zalando. She’ll speak more on this topic at the upcoming MobileApp Europe Conference in Potsdam, Germany.</p>
<p><a href="https://twitter.com/RichardNeb">Richard Neb</a> is a Business Development Manager Customer Experience and part of
Zalando’s Mobile First team.</p>Data Integration in a World of Microservices2015-09-21T00:00:00+02:002015-09-21T00:00:00+02:00Olaf Melchiortag:engineering.zalando.com,2015-09-21:/posts/2015/09/data-integration-in-a-world-of-microservices.html<p>Read about Saiki: our open-source, cloud-based, microservices-friendly data integration infrastructure.</p><p>There’s not much one can say in favor of big, monolithic enterprise applications. Not only does the typical monolith
become an inextricable mess of interdependencies — it also hinders development teams from moving quickly when trying out
new ideas. Yet the monolith does present one advantage: Its consolidation of data in a single place makes business
intelligence fairly simple. For example, if you want to analyze yesterday’s sales you can — at least in principle —
simply query your operational database.</p>
<p>The more modular your enterprise application becomes, the more work you have to do to bring all of your data together.
At Zalando, our technology infrastructure has been fairly modular for quite some time. With our adoption of
team autonomy and microservices enables usi to facilitate large-scale growth of our massive business while reinforcing
the agility of a startup.</p>
<p>For those of us on Zalando’s Business Intelligence team, microservices have brought about some interesting challenges
in terms of how we manage our data. As part of our learning process, we recently designed and built <strong>Saiki</strong>: a
scalable, cloud-based data integration infrastructure that makes data from our many microservices readily available for
analytical teams. Named after the Javanese word for “queue,” Saiki is built mostly in Python and includes components
that provide a scalable Change Data Capture infrastructure, consume PostgreSQL replication logs, and perform other
relevant tasks.</p>
<h3>Why Saiki</h3>
<p>Even before we adopted microservices at scale, questions like “how many shoes did we sell yesterday?” presumed the prior
integration of data distributed over a significant number of sources. Article as well as order data is horizontally
sharded over eight PostgreSQL databases, so there is no way to simply fire up some ad hoc SQL to do a quick analysis.
Before analyzing the data, we have to move it to a single place. Our core, Oracle-based data warehouse, where
information from numerous source systems is integrated, has always been a critical component of Zalando’s data
infrastructure. Without it, all but the simplest analytical tasks are futile.</p>
<p>With each service owning its data, data is spread over a significantly larger number of cloud-based microservices — each of which
can use individual persistence mechanisms and storage technologies. No small challenge. Adding to the complexity is that
Zalando is fiercely data-driven: At any given moment, several of our teams are working on large-scale data analysis
projects using a vast number of different systems and tools. Meanwhile, other teams are busy exploring ways to better
distribute this data across multiple applications. Finally, we need to make the right data available to our various
customers at the right times, and in the right formats.</p>
<p>Enter Saiki, which manages all of this data integration and distribution with Amazon Web Services (AWS). It makes use of
<a href="https://stups.io/">STUPS</a>, our open-source Platform as a Service (PaaS), which allows for a secure and audit-compliant
handling of the data involved.</p>
<h3>How Saiki Works</h3>
<p>We no longer live in a world of static data sets, but are instead confronted with an endless stream of events that
constantly inform us about relevant happenings from all over the enterprise. Whenever someone creates a sales order, or
packs an item in one of our warehouses, a dedicated event notice will be created by the respective microservice.
Handling this stream of events involves roughly four main tasks: Accessing these events, storing them in an appropriate
structure, processing them, and finally distributing them to various targets. We built Saiki and its components to do
all these tasks.</p>
<ul>
<li><strong>Accessing</strong>: Typically a microservice or application has to push event notifications to one of our APIs. When
programmatically pushing messages to our API is not an option, we can use Saiki Banyu — a tool that listens to
PostgreSQL's logical replication stream and converts it into a series of JSON objects. These objects are then pushed
to our central API. This approach allows for a non-intrusive and reliable Change Data Capture of PostgreSQL
databases.</li>
<li><strong>Storing</strong>: As the backbone for queueing the stream of event data, we chose <a href="https://kafka.apache.org/">Apache
Kafka</a>, a system with impressive capabilities. We deployed Kafka and Apache Zookeeper
in AWS to make our event data available for downstream systems.</li>
<li>Having access to an event stream opens up a lot of new options for data <strong>processing</strong>. We plan to move our
near-realtime business process monitoring application to this new system, and hope to become less dependent on huge,
nightly ETL batch runs— moving closer to near-real-time data processing.</li>
<li><strong>Distributing:</strong> We’re also investigating possible data sinks for data distribution. Nganggo, a project now
underway, is a working prototype of a fast materialization engine that writes event data to a central Operational
Data Store powered by PostgreSQL. We’re also working on a service that makes change data available for batch imports
into other systems both inside and outside of AWS (for instance, our core data warehouse). Finally, we plan to use
our S3 data lake to provide standardized data sets for further large-scale analyses.</li>
</ul>
<p>Our team is polishing the first parts of Saiki for production use, but our data integration adventure has only just
begun. We will update you as our work progresses!</p>Working at Zalando Dublin2015-09-18T00:00:00+02:002015-09-18T00:00:00+02:00Selina McCarthytag:engineering.zalando.com,2015-09-18:/posts/2015/09/working-at-zalando-dublin.html<p>An inside look at our Fashion Insights Centre.</p><p>Recently I was lucky enough to spend a week working in Zalando’s gorgeous <a href="http://dublin.zalando.com/">Fashion Insights
Centre</a>, located in Dublin’s Silicon Docks. If you’re not familiar with Dublin, the Silicon
Docks are the beating heart of the city’s vibrant tech scene: Google, Facebook, Linkedin, Amazon, Twitter and many other
international tech giants house their European headquarters there. Our office is nestled directly on the waterfront,
right in front of Google.</p>
<p>Open since April 2015 (and with the <a href="https://www.youtube.com/watch?v=7vpjFMXBdzc">blessing of the Irish prime
minister</a>), our Fashion Insights Centre focuses on using machine learning,
engineering, and research & development to generate data-related insights that our technologists can use across our
entire platform. Our Dublin team of 16 engineers, data scientists and other tech professionals are almost all brand-new
but are settling in quickly and absorbing Zalando's tech culture. Ongoing office upgrades, an ever-expanding
range of amenities—including a pool table and height-adjustable desks (which double-up as a great bar on Friday
evenings!)—and newbies joining every week give the office that manic buzz of a budding startup.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5c21337b32c61427873c85b8ed5c0d6eb3b4abc8_screen-shot-2015-09-18-at-18.17.38.png?auto=compress,format"></p>
<p>In terms of setup, Dublin has self-organized into two teams. “Team Buffalo” is building our Smart Product Store: a
fundamental component of our <a href="https://tech.zalando.com/blog/zalandos-vp-brand-solutions-presents-at-the-july-2015-fashtech-konferenz./">new product
platform</a> that
requires de-duplication and enrichment of product data (often incomplete upon arrival, and with conflicting forms and
formats)The team is tackling difficult data science and engineering problems with NoSQL databases, Spark, Kafka and EC2.
Meanwhile, “Team Dougal” is building a large-scale web crawler to capture all fashion-related content on the Internet.
They’ll use machine learning and text mining to detect and predict emerging fashion trends and topics.</p>
<p>Zalando doesn’t ship products to Dublin, so any on-the-ground knowledge of our operations comes thanks to our tech team.
The Dublin crew has been active giving talks at meetups and conferences, hosting <a href="https://www.facebook.com/coderdojodublin">Coder Dojo
Dublin</a> every Saturday, and speaking to media. Earlier this month, most of the
Dublin team represented for us at Career Zoo, a recruitment and networking fair in Dublin that we sponsored (find media
coverage of our participation <a href="http://www.rte.ie/news/2015/0912/727405-career-zoo/">here</a> and
<a href="https://www.siliconrepublic.com/careers/2015/08/31/zalando-twitter-sponsor-career-zoo">here</a>.) Dublin Head of
Engineering David O’Donoghue told the audience more about our tech culture, and Head of Data Engineering Valentine
Gogichashvili flew in from Berlin to appear on the main stage Big Data panel. The rest of our technologists challenged
our booth visitors to Nerf gun battles and, of course, excite and inspired potential candidates about our positions.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/63794e06acf66988876e5c854d27c354679f1421_screen-shot-2015-09-18-at-18.18.37.png?auto=compress,format"></p>
<p>Speaking of our Dublin tech opportunities, we are still seeking talented data scientists and engineers to grow our team.
If you’re ready for the challenge of your life, eager to work autonomously, have an impact and solve complex problems,
visit our jobs page. We’re aiming to revolutionize how the world connects to fashion from all angles, and realize that
what we’re trying to do is ambitious ( and, to many, crazy.) If that appeals to you,
<a href="https://tech.zalando.com/jobs/">apply</a> and take the first step toward joining us.</p>ZMON: Zalando's Open Source Monitoring Tool (Slides)2015-09-10T00:00:00+02:002015-09-10T00:00:00+02:00Lauri Appletag:engineering.zalando.com,2015-09-10:/posts/2015/09/zmon-zalandos-open-source-monitoring-tool-slides.html<p>The slides from our recent talk at DevOps Dublin.</p><p>Recently Zalando Database Engineer Jan Mußler stopped by our <a href="https://tech.zalando.com/locations/dublin/">Fashion Insights Centre in
Dublin</a> to tell the <a href="http://www.meetup.com/DevOps-Ireland/">DevOps Dublin</a>
crowd about <a href="https://github.com/zalando/zmon">ZMON</a>: Zalando's open source monitoring solution. In late 2013, Zalando’s
technology team maxed out on our Icinga/Nagios infrastructure both performance-wise but (especially) in terms of
manageability. Taking advantage of our annual Hackweek—a weeklong event for Zalando technologists to build, play and
experiment independently—Jan and other Zalando engineers built ZMON to provide our teams with a performant, reliable,
and flexible tool for monitoring all levels of our platform.</p>
<p>ZMON is equipped to monitor low-level system metrics via SNMP/NRPE and HTTP requests to exposed metrics, as well as
higher-level KPIs via SQL and much more using Python expressions as tasks. Its base components include a scheduler,
Redis for state and queue, and a distributed set of workers responsible for evaluating checks and alerts. On top of
these are a frontend component for Dashboards and Alerting (it includes Grafana to make the most of your time series
data) and KairosDB as a metric store. Trying out ZMON is as easy as spinning up a Vagrant box!</p>
<p>Check out Jan's slidedeck here:</p>
<p><strong><a href="https://www.slideshare.net/ZalandoTech/zmon-monitoring-zalandos-engineering-platform" title="ZMON: Monitoring Zalando's Engineering Platform">ZMON: Monitoring Zalando's Engineering
Platform</a></strong>
from <strong><a href="http://www.slideshare.net/ZalandoTech">Zalando Tech</a></strong></p>Meet Zalando Tech at Career Zoo2015-09-09T00:00:00+02:002015-09-09T00:00:00+02:00Selina McCarthytag:engineering.zalando.com,2015-09-09:/posts/2015/09/meet-zalando-tech-at-career-zoo.html<p>Zalando is a main sponsor of this career and networking event for Dublin technologists.</p><p>Since the April 2015 opening of Zalando’s <a href="https://tech.zalando.com/locations/dublin/">Fashion Insights Centre</a> at the
Silicon Docks, Zalando’s tech team in Dublin has grown faster than we had anticipated. But our search for the best data
science and software engineering talent remains in full-speed-ahead mode, which is why we’re proudly sponsoring this
year’s <a href="http://www.careerzoo.ie/">Career Zoo</a>: Ireland’s leading event for tech networking and hiring.</p>
<p>In addition to a panel discussion featuring Zalando VP of Data Andreas Antrup, Zalando’s Career Zoo activities include
hosting a mega-booth equipped with Nerf Gun shooting range and a bit of “Z-Kino”—focusing on footage from our yearly
Hack Week projects. Zalando technologists from our Dublin and Berlin offices will be on hand at the booth to answer any
questions you have about our tech stack, operations and <a href="https://tech.zalando.com/jobs/">exciting opportunities</a> in
Dublin, Germany (Berlin, Dortmund, Monchengladbach, and Erfurt), and Helsinki. Talk to the men and women who have helped
Zalando evolve from a two-person, online retail shop into a +9,000-employee, publicly traded fashion platform doing
business in 15 European countries!</p>
<p>We’ll also give a <a href="http://www.careerzoo.ie/techbox/">Tech Box talk</a> on our culture of “ <a href="https://tech.zalando.com/blog/so-youve-heard-about-radical-agility...-video/">Radical
Agility</a>,” which emphasizes Autonomy,
Mastery, Purpose and Trust. This presentation will give you a great sense of what it’s like to work at Zalando and how
we’ve adapted our culture to transform from an online shop into a fashion <em>platform</em>. Our Dublin hub will play a key
role in this transition by focusing on research and development around recommendations, personalization, and real-time
insights.</p>
<p>Check out our Dublin team at last week’s Career Zoo launch:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d8130b6b862cb28bfee9f562065af5dea32e734d_screen-shot-2015-09-09-at-01.19.48.png?auto=compress,format"></p>
<p>Hope to see you there!</p>Zalando Opens New Playground for Tech Innovation2015-09-04T00:00:00+02:002015-09-04T00:00:00+02:00Hayley Baldwintag:engineering.zalando.com,2015-09-04:/posts/2015/09/zalando-opens-new-playground-for-tech-innovation.html<p>“The Shuttle” launches inside our Tech HQ</p><p>Ever wish you could build an Arduino or play with Legos on the job? If you’re a Zalando technologist, now you can!
Recently we launched The Shuttle, a dedicated place for members of our tech team to stretch themselves creatively, let
our imaginations run wild, and build things.</p>
<p>Housed on the ground floor of our tech office in Berlin, the Shuttle is an initiative designed with our <a href="https://tech.zalando.com/blog/so-youve-heard-about-radical-agility...-video/">Radical
Agility</a> approach in mind- a dedicated
place to enable group and individual innovation and ideation.</p>
<p>Some of its objectives include:</p>
<ul>
<li>supporting a Zalando team’s product discovery process to test assumptions with real-world customers</li>
<li>facilitating research and development in Internet of Things, augmented reality and other cutting-edge technologies</li>
<li>3D printing modelling (first project: an airplane!)</li>
<li>prototyping of all sorts</li>
</ul>
<p>At the heart of this work is <a href="https://en.wikipedia.org/wiki/Design_thinking">Design Thinking</a>, an approach adopted and
promoted by the Shuttle team via workshops and ideation sessions.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/544c0e5ac1247b50506aaac232d78ab385355ca2_kopie-von-zalando_lab_0261.jpg?auto=compress,format"></p>
<p>To celebrate the Shuttle’s opening, Zalando Tech hosted “Tech Innovation Week” featuring a launch party and 19
“snackable” internal workshops and presentations. Topics included “Lego Serious Play,” “Intro to Arduino,” “End-to-End
Product Building,” an hour-long iOS app-making tutorial, and a demo of <a href="https://vimeo.com/104618362">Lightfight</a>, a live
light-writing program. The week closed with an eight-hour ideation workshop for our Business Excellence and Zalon teams.</p>
<p>Some photos of our workshops in action:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/82688b2cf4e10b2219823947c7a08045a7fe2359_sam_5384.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b438bf9ebc9340f9d820dd608b466657f6119437_sam_5466-1.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a52465002bf238e1fd2e57dc6965e97fa3e8b7a5_sam_5782.jpg?auto=compress,format"></p>
<p>The name “Shuttle” was inspired by the “flying shuttle” - a tiny but transformative component added to weaving looms in
Great Britain. The addition of the shuttle has been said to have sparked the rapid acceleration of the textile industry
during the Industrial revolution.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/32b7ab873ff75dda270b3d16b6db29c20997140d_kopie-von-zalando_lab_0286-1.jpg?auto=compress,format"></p>
<p>If you pass our Tech HQ at Mollstr. 1 (near Berlin’s Alexanderplatz), you can catch a glimpse of what’s happening inside
the Shuttle via its big storefront windows. You also might get a chance to visit it in the upcoming months, as we hope
to use the space for hosting Tech meetups, workshops and barcamps. Join our Zalando Tech Events <a href="http://www.meetup.com/Zalando-Tech-Events-Berlin/">Berlin Meetup
Group</a> to stay up-to-date on any Shuttle-related events that are open
to the public.</p>On APIs and the Zalando API Guild2015-09-03T00:00:00+02:002015-09-03T00:00:00+02:00Dr. Thomas Frauensteintag:engineering.zalando.com,2015-09-03:/posts/2015/09/on-apis-and-the-zalando-api-guild.html<p>Building high-quality, long-lasting APIs has never been more important.</p><h3>Working at Scale</h3>
<p>Our engineering teams enjoy end-to-end ownership of their work, which
enables us to preserve our start-up spirit and ability to experiment in the face of explosive growth (+800 technologists
and counting). They’re also responsible for all software provisioning and quality-related aspects — including design,
implementation, code review, testing, continuous integration, and operation. Teams can challenge the purpose of their
work, make technical decisions autonomously, and pick up new skills via their individualized development plans.</p>
<h3>About RESTful APIs, API First and API Quality</h3>
<p>Zalando’s software architecture centers around decoupled microservices that provide functionality via RESTful APIs with
a JSON payload. Small teams own and deploy these microservices, which are run in AWS teamaccounts.
Microservices development begins with API definition outside the code and getting peer review feedback.</p>
<p>Our APIs most purely express what our systems do, and are therefore highly valuable business assets. With this in mind,
we’ve adopted “<a href="https://opensource.zalando.com/restful-api-guidelines/#api-first">API First</a>” as one of
our key engineering principles. API First encompasses a set of quality-related standards we encourage our teams to
follow to ensure that our APIs:</p>
<ul>
<li>are easy to understand and learn</li>
<li>are general and abstracted from specific implementation and use cases</li>
<li>are robust and easy to use</li>
<li>have a common look and feel</li>
<li>follow a consistent RESTful style and syntax</li>
<li>are consistent with other teams APIs and our global architecture</li>
</ul>
<p>Ideally, all Zalando APIs should look like the same author created them.</p>
<p>Designing high-quality, long-lasting APIs has become even more critical for us since we started developing our <a href="https://tech.zalando.com/blog/zalandos-vp-brand-solutions-presents-at-the-july-2015-fashtech-konferenz./">new,
open platform
strategy</a>,
which transforms Zalando from an online shop into an expansive fashion platform. Our strategy emphasizes lots of public
APIs used by our external business partners via third-party applications. Good API design is hard work, takes time and
ideally involves ample code review. We can only evolve our APIs by providing backward compatibility with robust clients.
We cannot afford to break our APIs, and must approach large-scale changes cautiously. API First helps to keep us on
track.</p>
<h3>Zalando’s API Guild</h3>
<p>In March 2015, some of us created an API Guild for teams to share their
experiences and discuss how to ensure API quality. The API Guild, like all of our internal guilds, is an informal group
of Zalando technologists that meets regularly to advance topics of interest. Members represent diverse teams across our
organization. In its early days the API Guild attracted only a few members and focused on REST. Since then the Guild has
grown significantly, with around 20 dedicated members who focus on:</p>
<ul>
<li>shared knowledge and best practices around API design and API implementation in our polyglot environment (Scala,
Java, Clojure, Python, etc.)</li>
<li>standards and guidelines</li>
<li>quality assurance via API peer review feedback</li>
</ul>
<p>Many members have gained experience in designing RESTful APIs, and use the Guild to share their knowledge or serve as
team ambassadors. In addition to developing a RESTful API practices document, we’ve discussed examples of good API
design and implementation, picked “APIs of the Month,” and organized <a href="https://tech.zalando.com/blog/designing-restful-apis-a-zalando-coder-dojo/">RESTful API Coder
Dojos</a>, among other activities.</p>
<p>The Guild has been a valuable forum for us to increase awareness of great APIs, design techniques and best practices.
Members meet bimonthly to discuss API design topics, make decisions on guidelines, and improve our documentation. All
meetings, documents, reviews and chats are public and open for all engineers. If API issues come up during peer review
or in discussions, they go on the Guild’s meeting agenda. If they can’t be clarified in a time-box, someone takes
responsibility for further research and follow-up.</p>
<h3>Avoiding the API Review Bottleneck</h3>
<p>We want outside team peers to review all APIs, so we have adopted an open review procedure that’s as lightweight as
possible. The API Guild is invited to all reviews; even more importantly, so are the teams that use the APIs. Here, the
Guild is not acting as "approval board,” but as a sounding board and experienced review resource that helps us to avoid
bottlenecks.</p>
<p>Despite our strides, API review still involves a lot of work. To ensure valuable Guild-generated feedback of all
reviews, we encourage members to commit around 10 percent of their week to API design and best-practice
sharing—emphasizing the Guild’s purpose and value, and asking them to record their contributions in their personal
and/or team OKRs. We’ve also introduced a weekly stand-up that focuses specifically on review coordination and enable
new hires to pair up with experienced members.</p>
<h3>Status and Outlook</h3>
<p>The API Guild has proven to be a valuable tool for sharing knowledge and best practices of API design and
implementation. It supports our autonomous engineering teams in achieving overarching alignment and high quality of the
APIs they own. With the Guild’s help we make RESTful API design, implementation and review go smoothly and steadily for
us. The Guild’s work is especially important for our autonomous teams and our emerging microservice landscape, which is
driven by REST, SaaS, cloud, API First, and peer review.</p>
<p>With our increasing knowledge and experience, the API Guild will focus more and more on reviews to achieve overarching
design consistency quality. For the time being, there is still much about REST that we have to learn—for example,
supporting microservice and API discovery. Another challenging topic is HATEOAS; we don’t have specific recommendations
for a Zalando standard yet, and expect this to emerge via experimentation.</p>
<p><em>To learn more about API engineering at Zalando, please also check out the more recent tech blog post
<a href="https://engineering.zalando.com/posts/2019/04/developing-zalando-apis.html">Developing Zalando APIs</a>
and the InfoQ interview with Dr. Thomas Frauenstein
<a href="https://bit.ly/3ssFJ8O">How Zalando Delivers APIs (InfoQ)</a>.</em></p>A Zalando Tops “Most Read Data Science Articles” List2015-09-02T00:00:00+02:002015-09-02T00:00:00+02:00Hayley Baldwintag:engineering.zalando.com,2015-09-02:/posts/2015/09/a-zalando-tops-most-read-data-science-articles-list.html<p>A Zalando Delivery Lead’s post comes in at #1!</p><p>This week the blog/e-newsletter <a href="http://www.datascienceweekly.org/blog/30-most-read-data-science-articles-2015-so-far?utm_content=buffer91a22&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer">Data Science
Weekly</a>
released their list of the “Most Read Data Science Articles of 2015.” We were pretty excited to see that our very own
Mikio Braun topped the list with his article, “Three Things About Data Science You Won’t Find In the Books.”</p>
<p>Mikio joined Zalando in August 2015 as a Delivery Lead, Recommender Systems & Search. His post offers candid and
easy-to-understand insights on approaching Data Science that apply to both a beginners and experienced audience.</p>
<p>“Having lots of data by itself does not mean that you really need all the data,” he writes. “The question is much more
about the complexity of the underlying learning problem. If the problem can be solved by a simple model, you don’t need
that much data to infer the parameters of your model.”</p>
<p>You can read Mikio’s full article <a href="http://blog.mikiobraun.de/2015/03/three-things-about-data-science.html">here</a>.</p>Hello, Helsinki!2015-08-28T00:00:00+02:002015-08-28T00:00:00+02:00Hayley Baldwintag:engineering.zalando.com,2015-08-28:/posts/2015/08/hello-helsinki.html<p>Zalando opens another international tech hub —this time, in Scandinavia!</p><p>Zalando opened a dedicated Tech Hub in Helsinki this week. Our Helsinki office is our second international Tech office
following the <a href="http://blog.zalando.com/en/blog/zalando-tech-dublin-opening-our-fashion-insights-center">Dublin Fashion Insights
Centre</a> we opened in April. On
Wednesday night, 250+ guests, partners and friends from the Helsinki tech and start-up community met in our new space to
celebrate.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b6141726b0dd5a1af39e8a1e1b65ee72f7d8d931_hka_6667.jpg?auto=compress,format"></p>
<p>While our Dublin hub offers insights into all areas of the Zalando platform through data insights; our Helsinki hub will
focus primarily on mobile and UX products. In keeping with our mobile first approach and <a href="https://www.youtube.com/watch?v=yF5gLkbpMW4&feature=youtu.be">platform
strategy</a>, the first product will be a new app enabling
customers to discover a massive fashion and lifestyle assortment, while creating direct connections between brands,
retailers and consumers. To support this growth, we’re actively recruiting full-stack and mobile developers, UX
specialists, data scientists, and product owners from across the industry, as well as in academia.</p>
<p>Last night’s party included two Fireside chats. Our Head of Tech Expansion <a href="http://startupextreme.co/speaker/marc-lamik/">Marc
Lamik</a> and <a href="http://www.helsinkibusinesshub.fi/people/micah-gland/">Micah
Gland</a>, Deputy CEO of Helsinki Business Hub sat down to discuss
our decision to go North and why we choose expand our tech operations in Helsinki.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/248f35d6f56ff4655e63b2be6ce8a734c0a12ac1_hka_6867.jpg?auto=compress,format"></p>
<p>Zalando VP of Engineering <a href="https://twitter.com/ebowman">Eric Bowman</a> and Product Owner Daniel Schneider teamed up to
talk about the specific products we’ll be building in Helsinki and the role that our <a href="https://www.youtube.com/watch?v=fJl2adFWpG4&feature=youtu.be">Radical
Agility</a> approach will play in their development.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2ef357703798db28b1f3b2fa2e35ce67054210c7_hka_6935_2-1.jpg?auto=compress,format"></p>
<p>We’ll share the <a href="https://tech.zalando.com/blog/watch-the-fireside-chats-from-our-helsinki-office-opening/">video of the
talks</a> soon, but here’s a look
at the new space.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2a10d72e99b35dcde76b72c1e750e7a87548979e_img_0352.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/009e8bc1d14b35ea23b4779d6121466a22367593_image1-2.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8778c4d6cb016abd142ce7f0c69b7e17d0fdb00f_image2-1.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7884bffb2f91ce910cf1ba44a2ef79af069a8161_hka_6507.jpg?auto=compress,format"></p>
<p>*A new office means that we’re building an amazing new team in Helsinki. Have a look at our <a href="https://tech.zalando.com/jobs/">open
positions</a>! For press inquiries reach us at: press@zalando.de and tweet us anytime
<a href="https://twitter.com/ZalandoTech">@ZalandoTech</a></p>Tech.EU Catches up with Zalando2015-08-21T00:00:00+02:002015-08-21T00:00:00+02:00Lauri Appletag:engineering.zalando.com,2015-08-21:/posts/2015/08/tech.eu-catches-up-with-zalando.html<p>Zalando Cofounder Robert Gentz chats with Tech.EU about our growth.</p><p>During the recent <a href="https://tech.zalando.com/blog/zalando-did-tech-open-air/">Tech Open Air</a> festivities,
<a href="http://tech.eu/features/5632/zalando-robert-gentz-video-interview/">Tech.EU</a> Editor <a href="https://twitter.com/robinwauters">Robin
Wauters</a> sat down with Zalando Cofounder Robert Gentz to talk about our growth,
platform strategy and the Berlin tech scene. Take a look:</p>Designing RESTful APIs: A Zalando Coder Dojo2015-08-20T00:00:00+02:002015-08-20T00:00:00+02:00Dan Persatag:engineering.zalando.com,2015-08-20:/posts/2015/08/designing-restful-apis-a-zalando-coder-dojo.html<p>How we helped our technology team to learn API design. Plus: Tips!</p><p>The Zalando Tech team organizes internal <a href="https://tech.zalando.com/blog/test-driven-development-a-zalando-coder-dojo/">Coder
Dojos</a> to foster collaboration and soft
skills-learning. Dojos encourage us to leave our comfort zones and work together in different configurations, to achieve
results faster. They’re fun and challenging for everyone involved.</p>
<p>Our third and most recent coder dojo focused on designing RESTful APIs. One of the fundamental principles of <a href="https://tech.zalando.com/blog/so-youve-heard-about-radical-agility...-video/">Radical
Agility</a> is to take an <a href="https://tech.zalando.com/blog/auto-scaling-your-api-tips-from-zalando-slides/">API
First</a> approach to software development,
which means creating first-class APIs that are truly RESTful. Designing RESTful APIs might look easy on the surface, but
any experienced API designer will warn you that there are many pitfalls. With the Dojo, we wanted to give our team the
chance to practice API design and enhance their skills.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c60677c9f00c971ce71ff053ede6bb524f645037_zalando-api-first-coder-dojo-2.jpg?auto=compress,format"></p>
<h3>Our API Dojo Format</h3>
<p>To bring our participants up to speed, we reviewed all of the phases of designing three RESTful APIs — analyzing our
design decisions and considering tradeoffs. When designing the APIs, we split the group into smaller groups of different
sizes and then asked for feedback on how productive the participants thought they were, based on the size of their
group. Productivity decreased significantly when the group reached six people; the larger the group, the more group
members tended to concentrate on small, unimportant and time-consuming points, and lose sight of the big picture. In the
smaller groups, members realized they could participate more actively in the discussions and find solutions faster.</p>
<p>Another interesting but expected finding was that our people learned the new API-related concepts really quickly. This
wasn’t so evident during our first session, as the participants didn’t identify solutions before their time was up.
(Timeboxing is a coder dojo rule for us, which participants didn’t like; going forward, we’ll make the first session
longer.) The second session, however, brought solutions and compromises as the participants learned to self-organize
more efficiently. Suddenly, the “timeboxed” rule was not a problem anymore! By the third session, participants were
beating the clock. I liked that the groups adapted so well, and that the “timeboxed” constraint helped them become
faster and better.</p>
<h3>How to Run Your Own API Coder Dojo</h3>
<ul>
<li>How do you replicate our experience? Here are some practical tips: Timebox every activity: This will enable you to
try more things and include more sessions.</li>
<li>Throw away the code: The goal of a Coder Dojo is to focus on process, not to deliver something. Eliminate pressure
as much as possible.</li>
<li>Do a retrospective after each session: Practicing is not enough. Learning requires reflecting on what you just did.
Sharing your personal experience with the group will help everybody improve.</li>
<li>Get feedback from participants: Create a feedback board so that your participants feel more engaged, and so you can
improve for next time. The best part about feedback is that it’s free! People are glad to give it—you only have to
ask for it. :)</li>
<li>Pizza and beer are the secret ingredients: In the morning, engineers are powered by coffee. In the evening, slices
and beer fuel us.</li>
<li>Make it FUN: If it's not fun, we're doing it wrong!</li>
<li>By the end of the Coder Dojo, our participants understood how to successfully tackle and resolve some key API design
problems. We also gained a clearer understanding of what distinguishes a mediocre API from a great one.</li>
</ul>
<p>Looking forward to the next Dojo!</p>PostgreSQL Backups Done Right (Video)2015-08-14T00:00:00+02:002015-08-14T00:00:00+02:00Hayley Baldwintag:engineering.zalando.com,2015-08-14:/posts/2015/08/postgresql-backups-done-right-video.html<p>Longtime PostgreSQL contributor Devrim Gündüz speaks at Zalando’s Sky Lounge.</p><p>Earlier this week the <a href="http://www.meetup.com/PostgreSQL-Meetup-Berlin/">PostgreSQL Meetup Group Berlin</a> and Zalando
welcomed EnterpriseDB Principal Engineer and longtime PostgreSQL contributor <a href="https://twitter.com/devrimgunduz">Devrim
Gündüz</a> to our Sky Lounge. Devrim offered his insights on best practices and tools for
PostgreSQL backup and recovery—watch his helpful (and funny) presentation to catch what you missed:</p>
<p>Devrim obviously knows a lot about Postgres, but it’s his Postgres tattoo that truly puts him in the elite class.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/bf533f501f397f2597b4bf38c00aa6fd4bd79dd9_img_0230-1.jpg?auto=compress,format"></p>
<p>Zalando is one of the world’s largest and most enthusiastic Postgres users. Find out more about our work by reading
these articles from our data engineering team:</p>
<p><a href="https://tech.zalando.com/blog/analyzing-extreme-distributions-in-postgresql/">Analyzing Extreme Distributions in
PostgreSQL</a>
<a href="https://tech.zalando.com/blog/watch-fashion-is-hard-postgresql-is-easy/">Fashion is Hard, PostgreSQL is easy (Video)</a>
<a href="https://tech.zalando.com/blog/the-perils-of-modifying-postgresql-system-catalogs/">The Perils of Modifying PostgreSQL System
Catalogs</a></p>How Zalando Helps Brands to Win Online (Video)2015-08-12T00:00:00+02:002015-08-12T00:00:00+02:00Lauri Appletag:engineering.zalando.com,2015-08-12:/posts/2015/08/zalandos-vp-brand-solutions-presents-at-the-july-2015-fashtech-konferenz..html<p>Zalando's VP Brand Solutions presents at the July 2015 Fashtech-Konferenz.</p><p>Last month the Fashtech-Konferenz in Berlin invited Zalando VP Brand Solutions Christoph Lange to present a talk on how
his team is enabling VF Corporation, Nike and other top fashion brands to reach customers like never before. Watch
Christoph in action:</p>
<p>Thanks to <a href="http://excitingcommerce.de/2015/08/12/die-zalando-brand-solutions-fur-nike-und-40-weitere-labels/">Exciting
Commerce</a> for
covering Christoph's talk!</p>Gearing up for Zalando’s Mario Kart Championship2015-08-11T00:00:00+02:002015-08-11T00:00:00+02:00Florian Wagnertag:engineering.zalando.com,2015-08-11:/posts/2015/08/gearing-up-for-zalandos-mario-kart-championship.html<p>Hot tarmac, screeching tires and banana peels on the speedway!</p><p>Fasten your seat belts: It’s MARIO KART CHAMPIONSHIP FEVER at Zalando Tech’s Berlin HQ! No need to switch on the traffic
news, though: You’ll be safe on the streets. We’re keeping the action contained to our brand-new gaming area, where
we’ll be playing Mario Kart 8 on our Wii U!</p>
<p>Zalando’s first Mario Kart Championship took place in December 2014 during <a href="https://corporate.zalando.com/en/zalando-technology-startet-mit-ueber-100-projektideen-die-dritte-hack-week">our Hack
Week</a>
festivities. We were overwhelmed by our team’s enthusiastic participation and the thrilling atmosphere, so no doubt—a
follow-up was in order! Mario Kart fans of all levels are all welcome to compete for the highly-in-demand, lovingly
handcrafted Winner’s Cup. If you’re a Z-Tech who enjoys the satisfying moment when a red turtle shell hits an opponent,
you’re in.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/b9e621a544c0de4485ae7f4963a9dfee860b0b96_screen-shot-2015-08-11-at-18.12.34.png?auto=compress,format"></p>
<p>Turtle shells are not the only frightening things on our racetrack. Would you mess with a team named “Les Fous Du
Volant”? How about the sinister-sounding “Pizza Connection”? Such a team might throw pizzas on the track in order to
hypnotize opponents with yumminess—making them stop immediately to eat the bait. (Indeed, pizzas and beer are an
essential part of our Zalando Tech culture, and our finalists and audience will be able to enjoy plenty of both.)
Qualifying rounds for the Mario Kart Championship Cup run every week in August.</p>
<p>What will the future bring? In this world, nothing is certain—except that Zalando Tech is going to maximize our Mario
Kart thrills!</p>
<p><img alt="thrill meme" src="http://media.giphy.com/media/14uEUh3v44Rm5q/giphy.gif"></p>
<p>*Theresa is Zalando Tech’s community manager. Together she and Zalando Software Engineer Florian Wagner brought this
legendary Mario Kart Tournament to life. Theresa and our community team organize the many events and initiatives that
keep our +800 technologists healthy and happy. To learn more about life at Zalando Tech, visit our <a href="https://tech.zalando.com/jobs/">jobs
page</a>.</p>Meet Zalando at the First OpenTechSchool Conference2015-08-05T00:00:00+02:002015-08-05T00:00:00+02:00Hendrik Mittroptag:engineering.zalando.com,2015-08-05:/posts/2015/08/meet-zalando-at-the-first-opentechschool-conference.html<p>Go back to school with us this August 15-16 in Dortmund, Germany</p><p>From August 15-16 the first-ever <a href="https://otsconf.com/">OpenTechSchool (OTS) conference</a> will take place in Dortmund-
located in the <a href="https://en.wikipedia.org/wiki/Ruhr">Ruhr-Region</a> of Germany and home to Zalando's <a href="https://tech.zalando.com/locations/dortmund/">second German Tech
Hub</a>. Zalando is excited to be a sponsor and participate. My colleagues
Petra Lehrmann, Ingo Weinzerl, Andreas Rueppert, Stefan Muehlenbaeumer, Janine Lehmann and I will all be there to talk
to you about anything related to technology. Come say hi, or just have fun and listen to the talks!</p>
<p>If you’re not familiar with OTS, it’s a global movement that aims to offer free tech education to everyone. OTS brings
together technology enthusiasts of all genders, backgrounds and experience levels who want to either coach others or
learn in a friendly environment. Its online community shares and collectively improves its learning materials, which are
available for anyone worldwide to use to organise new OTS chapters.</p>
<p>The OTS conference will include talks on functional programming, hardware hacking, UX, evolutionary algorithms and
diversity in open source, among other topics. Find all talk abstracts <a href="https://otsconf.com/#talks">here</a>. The second day
includes—in good old OTS tradition— <a href="http://blog.otsconf.com/post/121620705005/our-6-workshops">a day of workshops</a> and
hands-on learning opportunities.</p>
<p>But it wouldn't be an OpenTechSchool event if there weren’t extra efforts made to ensure inclusivity and respect for
diversity. The conference comes with a clear <a href="https://otsconf.com/#coc">Code of Conduct</a> that everyone—including
attendees, speakers, sponsors and the organizing team—must agree to, and takes place in a completely accessible venue
with a guidance system for the visually impaired. Vegan and vegetarian meal options, child care, and discounted tickets
for students and the OTS community will all be available. Oh, and there will be cheesecake. What more can you ask for?</p>
<p>You can find all information and get your tickets to the conference <a href="https://otsconf.com/">here</a>. Hope to see you!</p>
<p>*Photo courtesy of OpenTechSchool. Image by <a href="http://www.hanneswoidich.de/">Hannes Woidlich</a>, OpenTechSchool logo by
<a href="http://jimmymorris.co.uk/">James Morris</a>.</p>Mobile Testing Challenges at Zalando + 6Wunderkinder2015-07-31T00:00:00+02:002015-07-31T00:00:00+02:00Hayley Baldwintag:engineering.zalando.com,2015-07-31:/posts/2015/07/mobile-testing-challenges-at-zalando--6wunderkinder.html<p>The Mobile Quality Crew Meetup</p><p>Recently Zalando Tech hosted the <a href="http://www.meetup.com/Berlin-Mobile-Quality-Crew/events/223890333/">Mobile Quality
Crew</a> meetup on “Mobile Testing Challenges at
Zalando and 6WunderKinder.” Mobile Quality Crew-Berlin organizers Hannes Lenke (CCO & CFO of TestObject) and Oliver
Rupnow (QA Expert & Trainer at Díaz & Hilterscheid) do a great job of seeking out relevant topics and speakers, while
maintaining an engaged and open-minded community of mobile developers and QA specialists. We were excited to host them.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c3d0d592927210f4e48a552cced71fadb5bce86a_mqc_20150723_06.jpg?auto=compress,format"></p>
<p>6Wunderkinder QA Lead Justin Ison gave a great talk focusing on testing automation, and our own Hendrik Seffler
(Delivery Lead, Engineering Productivity) and <a href="https://tech.zalando.com/blog/speeding-up-xcode-builds/">Dmitry Bespalov</a>
(iOS Developer) teamed up to share our testing setup and automation for iOS and Android. Have a look at Zalando’s
presentation here:</p>
<p><strong><a href="https://www.slideshare.net/ZalandoTech/mobile-quality-crew-meetup-51094746" title="Mobile Testing Challenges at Zalando Tech">Mobile Testing Challenges at Zalando
Tech</a></strong>
from <strong><a href="http://www.slideshare.net/ZalandoTech">Zalando Tech</a></strong></p>
<p>*Mobile Devs - check out Dmitry’s post on speeding up our xcode builds
<a href="https://tech.zalando.com/blog/speeding-up-xcode-builds/">here</a>.</p>
<p>The guestlist was a good mix of mobile developers, founders, QA Leads and Analysts. It was a full house - and we
definitely recommend attending the next one.</p>Analyzing Extreme Distributions in PostgreSQL2015-07-30T00:00:00+02:002015-07-30T00:00:00+02:00Stefan Litschetag:engineering.zalando.com,2015-07-30:/posts/2015/07/analyzing-extreme-distributions-in-postgresql.html<p>The rare things matter.</p><p><img alt="null" src="https://images.prismic.io/zalando-jobsite/714d5139cf5e65e3cfc5ee5ac53261513e580a79_pic1.png?auto=compress,format"></p>
<p>Recently my team and I observed in our PostgreSQL databases a sporadic increase in the execution time of stored
procedures (see the graph above). Often it happened that an analyze of the referenced table solved the issue. In our
case, fluctuations in our execution plan caused statement timeouts. This led to errors in our applications.</p>
<p>We wanted to understand this behavior better. Which circumstances prompted more frequent plan fluctuations? How exactly
could we influence the system to be more reliable? To find answers, we tested how different configurations of PostgreSQL
influenced the results of the query planner. This post shares the results of our tests.</p>
<p>One of the query planner’s strengths is that it enables PostgreSQL to use statistics about the distribution of your data
in the database tables to decide which of the various possible execution plans is the cheapest. The database system
regularly runs <strong>analyze</strong> commands to update those statistics. In order to keep the costs of this operation low with
respect to time and the space required for storing those data, <strong>analyze</strong> takes a limited, random sample of the table.</p>
<p>The reliability of the statistical information about your data distribution depends on the ratio of the <em>size of the
sample taken</em> to the <em>size of the entire table</em>. As the PostgreSQL
<a href="http://www.postgresql.org/docs/current/static/sql-analyze.html">documentation</a> already states, this might lead to
less-than-optimal query plans.</p>
<p>A database user can query the <strong>pg_stats</strong> view to examine the data processed by the query planner. <strong>pg_stats</strong> view
offers a number of attributes for each column of every table. For instance, it shows the fraction of null values in a
column, or the correlation between the physical and logical ordering of rows (see
<a href="http://www.postgresql.org/docs/current/static/view-pg-stats.html">documentation</a> for more information). For non-unique
columns, it shows the most common values with their frequencies and histogram boundaries for all other values. The
histogram boundaries split the range of values into buckets of equal size. Given the number of rows and the number of
buckets, the query planner can calculate the selectivity of a given attribute. With histogram buckets, we can properly
deal with the uniform distribution of data in our tables.</p>
<p>PostgreSQL also addresses non-uniform distributions. <a href="http://www.postgresql.org/docs/9.2/static/row-estimation-examples.html">The sum of portions represented by the histogram does not include
fractions of the most common values</a>. This
allows PostgreSQL to make more precise estimations. It also allows the query planner to avoid index scans for any query
that affects a big portion of the rows on an indexed predicate.</p>
<p>The <a href="http://www.postgresql.org/docs/current/static/planner-stats.html">default statistics target parameter</a> defines how
many values are to be stored in the list of most common values, and also indicates the number of rows to be inspected
(the value of this parameter, multiplied by 300). This statistics target defaults to 100, which means that PostgreSQL
will inspect the maximum of 30,000 rows.</p>
<p>We can use this <strong>pg_stats</strong> view to investigate which values would change in the event of a fluctuating execution
plan. Database applications often use a status field such as NEW, PROCESSING, or DONE to mark the processing status of a
row. This results in a typical distribution of the status field values: a small set of new rows to be processed (NEW),
some rows already being processed (PROCESSING), and a large number of rows remaining in their final state (DONE). In a
constantly growing table, rows of unprocessed states represent a constantly decreasing portion of the table.</p>
<p>To understand in which cases those query plans happen to flip, we created a table with test data that differ in their
distribution. We chose four different values to represent the status of the real data. In different columns of our test
table, we generated different distributions of the status. In this set of test columns, the fraction of the NEW status
decreases from 1 per 2700 rows to 1 per 286,000 rows.</p>
<p>Here’s what some values from <strong>pg_stats</strong> view look like for one column of our test table after we execute analyze:</p>
<div class="highlight"><pre><span></span><code>─[ RECORD 1 ]─────┬──────────────────────────────
attname │ log4_fewval
most_common_vals │ {3,2,1}
most_common_freqs │ {0.8978,0.101233,0.000966667}
histogram_bounds │
</code></pre></div>
<p>Given that this <strong>log4_fewval</strong> column is indexed, and that we want to fetch rows with the least frequent values, we
get the following execution plan:</p>
<div class="highlight"><pre><span></span><code><span class="n">local_test_db</span><span class="o">=</span><span class="p">#</span><span class="w"> </span><span class="n">explain</span><span class="w"> </span><span class="n">select</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="mh">1</span><span class="p">)</span><span class="w"> </span><span class="n">from</span><span class="w"> </span><span class="n">sli_stat_testdata</span><span class="w"> </span><span class="n">where</span><span class="w"> </span><span class="n">log4_fewval</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mh">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">QUERY</span><span class="w"> </span><span class="n">PLAN</span>
<span class="err">──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────</span>
<span class="w"> </span><span class="n">Aggregate</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">5.59</span><span class="p">.</span><span class="mf">.5.60</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">1</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">0</span><span class="p">)</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Index</span><span class="w"> </span><span class="n">Only</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="n">using</span><span class="w"> </span><span class="n">sli_stat_testdata_log4_fewval_idx</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">sli_stat_testdata</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.43</span><span class="p">.</span><span class="mf">.5.58</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">1</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">0</span><span class="p">)</span>
<span class="w"> </span><span class="n">Index</span><span class="w"> </span><span class="nl">Cond:</span><span class="w"> </span><span class="p">(</span><span class="n">log4_fewval</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mh">0</span><span class="p">)</span>
<span class="p">(</span><span class="mh">3</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span>
</code></pre></div>
<p>If we repeat the <strong>analyze</strong>, we might come up with the following result from <strong>pg_stats</strong>:</p>
<div class="highlight"><pre><span></span><code>─[ RECORD 1 ]─────┬────────────
attname │ log4_fewval
most_common_vals │ {3}
most_common_freqs │ {0.898933}
histogram_bounds │ {0,2,2}
</code></pre></div>
<p>The execution plan for our query also changes:</p>
<div class="highlight"><pre><span></span><code><span class="n">local_test_db</span><span class="o">=</span><span class="p">#</span><span class="w"> </span><span class="n">explain</span><span class="w"> </span><span class="n">select</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="mh">1</span><span class="p">)</span><span class="w"> </span><span class="n">from</span><span class="w"> </span><span class="n">sli_stat_testdata</span><span class="w"> </span><span class="n">where</span><span class="w"> </span><span class="n">log4_fewval</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mh">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">QUERY</span><span class="w"> </span><span class="n">PLAN</span>
<span class="err">──────────────────────────────────────────────────────────────────────────────────────────────────────────────</span>
<span class="w"> </span><span class="n">Aggregate</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">104657.96</span><span class="p">.</span><span class="mf">.104657.97</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">1</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">0</span><span class="p">)</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Bitmap</span><span class="w"> </span><span class="n">Heap</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">sli_stat_testdata</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">6218.30</span><span class="p">.</span><span class="mf">.103827.68</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">332111</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">0</span><span class="p">)</span>
<span class="w"> </span><span class="n">Recheck</span><span class="w"> </span><span class="nl">Cond:</span><span class="w"> </span><span class="p">(</span><span class="n">log4_fewval</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mh">0</span><span class="p">)</span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Bitmap</span><span class="w"> </span><span class="n">Index</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">sli_stat_testdata_log4_fewval_idx</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.00</span><span class="p">.</span><span class="mf">.6135.27</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">332111</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">0</span><span class="p">)</span>
<span class="w"> </span><span class="n">Index</span><span class="w"> </span><span class="nl">Cond:</span><span class="w"> </span><span class="p">(</span><span class="n">log4_fewval</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mh">0</span><span class="p">)</span>
<span class="p">(</span><span class="mh">5</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span>
</code></pre></div>
<p>The estimated costs are 20,000 times higher!</p>
<p>In the first case, the value 0 was represented neither in the list of most common values nor in histogram bounds. As we
can see in the execution plan, the planner assumed that our WHERE clause will affect 1 row. In the second example, the
histogram bounds array represents our value of interest. This means that more than 10% of the rows have values between 0
and 2. Because there are only two buckets, every bucket represents almost 5% of the data. The query planner has to
assume more than 2.5% of the rows have the value 0 (rows=332.111). This is way more than the real 98 rows.</p>
<p>The query planner overestimated the amount of data affected by this query. This caused the planner to do a <a href="mailto:http://http://www.postgresql.org/message-id/12553.1135634231@sss.pgh.pa.us">Bitmap Heap
Scan</a> and re-check of the filter
condition. In addition to the estimated larger number of rows, the execution reexamined all rows in those pages, further
slowing it down. That’s why we finally encountered the statement timeout.</p>
<p>Ideally, all four status values are found in the list of most common values (MCV), since the default statistics target
of 100 should give us up to 100 different values. Due to the small sample size, some of the values were not seen during
<strong>analyze</strong>, which led to wrong statistics. If the analyze process misses some of the rare values, it has no data to
estimate the distribution of those rare values. For our use case, that means: If not all of the distinct values of our
status are represented in the list of most common values, Postgres assumes that those values are distributed uniformly.</p>
<p>We have tested for different distributions how often all four different values are represented in the MCV list. The
following diagram shows that, down to a frequency of 0.04%, all values are represented in the MCV list. The more we
continue decreasing the frequency of the status 0 value, the more we miss it in the MCV list in our <strong>pg_stats</strong> view.
In the extreme case that there is no row for status 0, the average will be 3.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/774f32401efde6a64ade9ba2aabdd55a16a3d1c4_pic2.png?auto=compress,format"></p>
<p>The administrator can address this issue by adjusting the statistics target for the table. This raises the question: To
which value do we have to adjust the statistics target for this column?</p>
<p>We’ve tried out the percentage of rows the analyze has to inspect so that all distinct values are represented in the MCV
list. The following diagram shows, for different distributions, from which statistics target we reach repletion in the
MCV list.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e34a5e87fe05c81a09cd10251baef9d2d1eb7b28_pic3.png?auto=compress,format"></p>
<p>In some cases it may make things worse if we do not drastically increase the statistics target. In the case “1 per
286,000,” for example, we did not achieve repletion even when we inspected 12% of the rows. In those cases it might be
better to archive data, so the fraction of these new rows will remain on a higher level.</p>
<p>In many cases, the standard configuration of PostgreSQL <strong>analyze</strong> works very well. If tables grow larger than several
million records, you must examine the distribution of your data. In any case, increasing the statistics targets can only
mitigate those issues. Uncertainty will remain, because the approach is statistical. We can adjust the probability in
our favor.</p>
<p>From this analysis, we learned to which precise values we had to adjust the configuration. Since then, our application
runs stably without the previous observed statement timeouts.</p>Zalando's Traveling Prototyping Team2015-07-28T00:00:00+02:002015-07-28T00:00:00+02:00Hayley Baldwintag:engineering.zalando.com,2015-07-28:/posts/2015/07/zalandos-traveling-prototyping-team.html<p>From Dublin to Dortmund, Zalando's prototyping team is on the move—and mapping our future.</p><p>In recent months, Zalando technologists have been working hard on our platform strategy: our shift, from an online fashion
platform to a multi-service platform that offers fashion as a service. Put another way, Zalando is no longer satisfied
with being “an online shop.” We are in the process of a massive integration of all our components and services that
connect people to fashion, and fashion to people. For this plan to work, our engineering team must create an
architecture for innovation.</p>
<p>Zalando’s prototyping project began with a vision of how the global architecture of our new platform strategy might
look. Topics for discussion range from efficient processing of high data and traffic volumes to compliance. Leading the
project are nine senior engineers and four product owners. Each team member has allocated specific time slots for their
everyday jobs of managing teams, and have spent the remainder of their time working specifically on prototyping.</p>
<p>To allow for intense focus and collaboration, the team spent two weeks working in our <a href="http://dublin.zalando.com/">Dublin Fashion Insights
Centre</a>, then came back to Berlin for a few days. This week (and last) they are working from
<a href="https://tech.zalando.com/locations/dortmund/">our Dortmund office</a>.</p>
<p>Here they were a few weeks ago in Dublin:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/29139f2ec440ff149f989d8ec3bae250b20bf48c_dublin2.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2c508e140e6e08fceee502a4255a470fc83b1864_dublin-prototyping.jpg?auto=compress,format"></p>
<p>They got creative with some fruit:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4f5b8f22bbaf3e87458b2f7b42d8e534a20fae37_20150708_224450-1.jpg?auto=compress,format"></p>
<p>During their off-hours, our travelling prototyping team explored Ireland—visiting the Cliffs of Moher on the western
side and snapping some photos for the rest of us:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/2839e6730543e897d250cc3c1be9e56ce614f193_20150705_113212.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/545ae49ba97ef3631e9d3a871a9302f7ba1f0320_prototyping-team-2.jpg?auto=compress,format"></p>
<p>The prototyping team has been sending the rest of us weekly updates and their to-do lists so we can keep up with their
progress. When they’re ready to show us something they’ll present their outcomes (concepts and working prototypes) in an
internal tech talk. Then, the implementation will begin. Exciting times ahead!</p>"Using Git Hooks to Help Your Team Work Autonomously" (Video)2015-07-23T00:00:00+02:002015-07-23T00:00:00+02:00Lauri Appletag:engineering.zalando.com,2015-07-23:/posts/2015/07/using-git-hooks-to-help-your-team-work-autonomously-video.html<p>Watch the video from Zalando's presentation at EuroPython 2015!</p><p>Like Java and Scala, Python is essential to Zalando's technology operations. It's what we used to create many of the
components and tools of
<a href="https://tech.zalando.com/blog/radical-agility-with-autonomous-teams-and-microservices-in-the-cloud/">STUPS</a>, our open
source Platform as a Service (PaaS), and is key to our <a href="https://tech.zalando.com/jobs/70050/?gh_jid=70050">data
engineering</a> and
<a href="https://tech.zalando.com/jobs/65981/?gh_jid=65981">devops</a> efforts. To spread the word about our Python work and give
back to the amazing Python community, Zalando hosted this year's <a href="https://ep2015.europython.eu/en/">EuroPython</a>
conference in Bilbao, Spain. As an added bonus, Software Engineer Joao Santos was invited to present a talk on how his
team has been using Git hooks to work more autonomously. Joao's talk begins at 1:32:30:</p>Zalando Did Tech Open Air2015-07-21T00:00:00+02:002015-07-21T00:00:00+02:00Hayley Baldwintag:engineering.zalando.com,2015-07-21:/posts/2015/07/zalando-did-tech-open-air.html<p>The Recap.</p><p>This was the first year that <a href="https://tech.zalando.com/blog/zalando-does-tech-open-air/">Zalando Tech sponsored Tech Open Air
(TOA)</a>, and the <a href="http://toa.berlin/">festival</a> was a big deal
for our team. We enjoyed every minute of it, and had fun connecting with all the great people we met.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/9a93e1b800b5c6002892bdcca268012058035934_img_0058.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f4b38a0fa29e11fae45bfe37174cf4222cb8d3d1_img_0048.jpg?auto=compress,format"></p>
<h2>Some TOA highlights:</h2>
<p><strong>The Nerf Gun Challenge at our Unconference Booth:</strong> We didn't think we'd meet so many people as passionate about Nerf
games as we are! A special shout-out to our repeat visitors, who always pretended it was their first time.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f64b1f2c4a6f95b03e80937b3b4d87673b99f313_img_0077.jpg?auto=compress,format"></p>
<p><strong>Zalando Co-founder Robert Gentz' Fireside Chat:</strong> Our tech culture, explained.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8fea76414268f655ac8ebb2c71534fcdca295da8_dsc_0944.jpg?auto=compress,format"></p>
<p><strong>Zalando’s “Diversity in Tech” Panel:</strong> An honest, genuine and thought-provoking conversation by our five panelists,
and some excellent questions by our audience. A big thanks to all of you who tweeted excerpts to us
<a href="https://twitter.com/ZalandoTech">@ZalandoTech</a>. This panel will be the first of many events at Zalando; stay tuned for
details.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3f5d70b637f30be61ec10ebcea93067d9bfb1dc8_imag0567.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0d8221fb4c2af6570e511b6f7b3cb97677194520_img_0163.jpg?auto=compress,format"></p>
<p><strong>The Afterparty:</strong> <a href="https://twitter.com/jamesonwhiskey">Jameson Irish Whiskey</a> sponsored an amazing night for us. Let's
do it again soon! During the party clean-up, we discovered that our Storm Trooper and Darth Vader figurines had
disappeared. Whoever you are: when we see you, we'll be ready with our Nerf guns.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/724d4568767df97b4dfe80ef480207cd5858da97__mg_61478_1080px.jpg?auto=compress,format"></p>
<p>The people who really stole the show? Our tech recruiters, who talked to tons of people about our <a href="https://tech.zalando.com/jobs/">open
opportunities</a> in Germany, Helsinki and Dublin. You can meet them at any of our <a href="http://www.meetup.com/Zalando-Tech-Events-Berlin/">Zalando
meetups</a>.</p>
<p>If you didn't meet us and still want to connect, do it in person by attending one of our upcoming tech meetups at our
Zalando Tech HQ at Mollstr. 1. You can join our Meetup group <a href="http://www.meetup.com/Zalando-Tech-Events-Berlin/">here</a>.
This week's agenda:</p>
<ul>
<li>Tonight: AXChange - Axure Meetup. <a href="http://www.meetup.com/Zalando-Tech-Events-Berlin/events/223673260/">RSVP here</a></li>
<li>Thursday: Mobile Quality Crew- “The Challenges of Mobile Testing.” <a href="http://www.meetup.com/Zalando-Tech-Events-Berlin/events/223893672/">RSVP
here</a></li>
<li>29 July: Scala User Group-Berlin Brandenburg- “Composable Validations on Rich Variably-Nested Domain Classes.” <a href="http://www.meetup.com/Zalando-Tech-Events-Berlin/events/224044828/">RSVP
here</a></li>
</ul>The Perils of Modifying PostgreSQL System Catalogs2015-07-14T00:00:00+02:002015-07-14T00:00:00+02:00Oleksii Kliukintag:engineering.zalando.com,2015-07-14:/posts/2015/07/the-perils-of-modifying-postgresql-system-catalogs.html<p>You shouldn’t modify tables under the pg_catalog schema without first consulting the pgsql-hackers mailing list.</p><p>PostgreSQL is a very flexible database system. Its flexibility derives from its way of storing metadata. Unlike many
other databases, Postgres involves no “magic”: Every database object — table, type, function or cast — is described by a
row in a special table, called a system catalog. There are multiple system catalogs: pg_class stores information about
tables, pg_type describes types. And from pg_proc one can extract all properties of functions — including, for those
written in interpreted languages, the source code. One can also obtain exhaustive information about the enum types:</p>
<div class="highlight"><pre><span></span><code><span class="n">okliukin</span><span class="o">=</span><span class="c1"># SELECT * FROM pg_enum WHERE enumtypid = 'city'::regtype;</span>
<span class="w"> </span><span class="n">enumtypid</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">enumsortorder</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">enumlabel</span>
<span class="o">-----------+---------------+-----------</span>
<span class="w"> </span><span class="mi">16481</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Berlin</span>
<span class="w"> </span><span class="mi">16481</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Dortmund</span>
<span class="w"> </span><span class="mi">16481</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Dublin</span>
<span class="p">(</span><span class="mi">3</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span>
</code></pre></div>
<p>It might be tempting to modify the database metadata by using DML commands to change the values in the catalogs. But
this is dangerous, and likely to cause data loss. Let’s explore some commands that you should not try to run in
production.</p>
<p>Let’s try to add a new label directly. Before PostgreSQL introduced ALTER TYPE ADD VALUE syntax for enums in version
9.1, this sort of statement was very popular. It is still used from time to time, like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">okliukin</span><span class="o">=</span><span class="c1"># INSERT INTO pg_enum VALUES('city'::regtype, 4, 'Brieselang');</span>
<span class="n">INSERT</span><span class="w"> </span><span class="mi">16487</span><span class="w"> </span><span class="mi">1</span>
</code></pre></div>
<p>And as immediately used in a table:</p>
<div class="highlight"><pre><span></span><code>okliukin=# CREATE TABLE meeting(m_id serial, m_city city, m_time timestamp);
CREATE TABLE
…
postgres=# select * from meeting order by m_city;
m_id | m_city | m_time
------+------------+----------------------------
1 | Dublin | 2015-07-14 07:50:34.976061
2 | Brieselang | 2015-07-07 13:50:34.976061
(2 rows)
</code></pre></div>
<p>Why would Dublin be ahead of the Brieselang, you may ask? The answer is in the ‘enumsortorder’ column, which defines how
the values of the given enum are sorted. By naively updating the catalog and ignoring the implementation details, we
made the database system deliver results that one would not normally expect.</p>
<p>Now, what if we try to modify the sort order?:</p>
<div class="highlight"><pre><span></span><code><span class="nx">okliukin</span><span class="p">=</span><span class="err">#</span><span class="w"> </span><span class="nx">UPDATE</span><span class="w"> </span><span class="nx">pg_enum</span><span class="w"> </span><span class="nx">SET</span><span class="w"> </span><span class="nx">enumsortorder</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="nx">WHERE</span><span class="w"> </span><span class="nx">enumtypid</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="err">'</span><span class="nx">city</span><span class="err">'</span><span class="o">::</span><span class="nx">regtype</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="nx">enumlabel</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="err">'</span><span class="nx">Brieselang</span><span class="err">'</span><span class="w"> </span><span class="p">;</span>
<span class="nx">ERROR</span><span class="p">:</span><span class="w"> </span><span class="nx">duplicate</span><span class="w"> </span><span class="nx">key</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="nx">violates</span><span class="w"> </span><span class="nx">unique</span><span class="w"> </span><span class="kd">constraint</span><span class="w"> </span><span class="s">"pg_enum_typid_sortorder_index"</span>
<span class="nx">DETAIL</span><span class="p">:</span><span class="w"> </span><span class="nx">Key</span><span class="w"> </span><span class="p">(</span><span class="nx">enumtypid</span><span class="p">,</span><span class="w"> </span><span class="nx">enumsortorder</span><span class="p">)=(</span><span class="mi">16481</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="nx">already</span><span class="w"> </span><span class="nx">exists</span><span class="p">.</span>
</code></pre></div>
<p>Alas, sort order positions are unique for any given type (*). We may discover that there is a proper command to add a
new type value, i.e.:</p>
<div class="highlight"><pre><span></span><code><span class="n">okliukin</span><span class="o">=</span><span class="c1"># ALTER TYPE city ADD VALUE 'Brieselang' AFTER 'Berlin';</span>
<span class="n">ERROR</span><span class="p">:</span><span class="w"> </span><span class="k">enum</span><span class="w"> </span><span class="n">label</span><span class="w"> </span><span class="s2">"Brieselang"</span><span class="w"> </span><span class="n">already</span><span class="w"> </span><span class="n">exists</span>
</code></pre></div>
<p>As one can see from the output of the command above, enum labels are unique as well.</p>
<p>A brave DBA might decide to remove the old value altogether and add a new one in the correct place. It would be nice to
have an ALTER TYPE DELETE VALUE command. Unfortunately, enums still aren’t first-class citizens in the PostgreSQL world,
and there is no way to remove or rearrange values. Since there is no matching SQL command, a DBA might decide to modify
system catalogs directly:</p>
<div class="highlight"><pre><span></span><code><span class="n">okliukin</span><span class="o">=</span><span class="c1"># DELETE FROM pg_enum WHERE enumtypid = 'city'::regtype AND enumlabel = 'Brieselang';</span>
<span class="n">DELETE</span><span class="w"> </span><span class="mi">1</span>
</code></pre></div>
<p>Adding the new value works:</p>
<div class="highlight"><pre><span></span><code>okliukin=# ALTER TYPE city ADD VALUE 'Brieselang' AFTER 'Berlin';
ALTER TYPE
</code></pre></div>
<p>And we can visualise the new content of pg_enum:</p>
<div class="highlight"><pre><span></span><code><span class="n">okliukin</span><span class="o">=</span><span class="c1"># SELECT * FROM pg_enum WHERE enumtypid = 'city'::regtype;</span>
<span class="w"> </span><span class="n">enumtypid</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">enumsortorder</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">enumlabel</span>
<span class="o">-----------+---------------+-----------</span>
<span class="w"> </span><span class="mi">16481</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Berlin</span>
<span class="w"> </span><span class="mi">16481</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Dortmund</span>
<span class="w"> </span><span class="mi">16481</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Dublin</span>
<span class="w"> </span><span class="mi">16481</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1.5</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Brieselang</span>
<span class="p">(</span><span class="mi">4</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span>
</code></pre></div>
<p>Another interesting implementation detail: the sort order column has a floating-point type that allows you to add new
values in the middle of the enum without changing any existing ones.</p>
<p>But now, when we select data from our table, we get this:</p>
<div class="highlight"><pre><span></span><code><span class="n">okliukin</span><span class="o">=</span><span class="c1"># SELECT * FROM meetings;</span>
<span class="n">ERROR</span><span class="p">:</span><span class="w"> </span><span class="n">invalid</span><span class="w"> </span><span class="n">internal</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="k">enum</span><span class="p">:</span><span class="w"> </span><span class="mi">16487</span>
</code></pre></div>
<p>What happened, and where does the value 16487 come from?</p>
<p>For many system catalogs, PostgreSQL actually stores a hidden column called OID. This integer column contains a
cluster-wide unique number in direct correspondence to the row it represents. One can get this value by including OID
directly in the select statement. As such, columns are not covered by ‘SELECT *’:</p>
<div class="highlight"><pre><span></span><code><span class="n">okliukin</span><span class="o">=</span><span class="c1"># select oid, * from pg_enum ;</span>
<span class="w"> </span><span class="n">oid</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">enumtypid</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">enumsortorder</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">enumlabel</span>
<span class="o">-------+-----------+---------------+------------</span>
<span class="w"> </span><span class="mi">16482</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">16481</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Berlin</span>
<span class="w"> </span><span class="mi">16484</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">16481</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Dortmund</span>
<span class="w"> </span><span class="mi">16486</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">16481</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Dublin</span>
<span class="w"> </span><span class="mi">16495</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">16481</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1.5</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Brieselang</span>
<span class="p">(</span><span class="mi">4</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span>
</code></pre></div>
<p>Here there is no value with OID 16487—it corresponds to the label ‘Brieselang’ that we inserted manually and then
deleted. The ‘meeting’ table does not store the enum values directly; instead, it contains references to the OIDs of the
enum labels in the pg_enum catalog. We can change the OID for the Brieselang label to match the one stored in the
‘meeting’ table:</p>
<div class="highlight"><pre><span></span><code><span class="n">okliukin</span><span class="o">=</span><span class="c1"># UPDATE pg_enum SET oid = 16487 WHERE enumtypid = 'city'::regtype AND enumlabel = 'Brieselang';</span>
<span class="n">ERROR</span><span class="p">:</span><span class="w"> </span><span class="n">cannot</span><span class="w"> </span><span class="n">assign</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">system</span><span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="s2">"oid"</span>
<span class="n">LINE</span><span class="w"> </span><span class="mi">1</span><span class="p">:</span><span class="w"> </span><span class="n">UPDATE</span><span class="w"> </span><span class="n">pg_enum</span><span class="w"> </span><span class="n">SET</span><span class="w"> </span><span class="n">oid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">16487</span><span class="w"> </span><span class="n">WHERE</span><span class="w"> </span><span class="n">enumtypid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'city'</span><span class="p">::</span><span class="n">reg</span><span class="o">...</span>
</code></pre></div>
<p>OID columns do not allow assignment of values by users, so we are stuck. You can rescue some data by omitting the enum
values that correspond to the ‘deleted’ labels, i.e.</p>
<div class="highlight"><pre><span></span><code><span class="n">okliukin</span><span class="o">=</span><span class="c1"># SELECT * FROM meetings WHERE m_city IN (SELECT enumlabel::city FROM pg_enum WHERE enumtypid = 'city'::regtype AND oid != 16487);</span>
<span class="w"> </span><span class="n">m_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">m_city</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">m_time</span>
<span class="o">------+------------+----------------------------</span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Dublin</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">07</span><span class="o">-</span><span class="mi">14</span><span class="w"> </span><span class="mi">08</span><span class="p">:</span><span class="mi">14</span><span class="p">:</span><span class="mf">07.288819</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="n">row</span><span class="p">)</span>
</code></pre></div>
<p>In general, however, there is no obvious way to actually retrieve the data in the m_city column for all table rows
without either hacking PostgreSQL source code or changing the binary data in the physical files representing the table
content. It might be an easy recovery task in our example, where we lost only one label. But what if your enum has tens
or hundreds of labels, each representing a state machine for the order, and you delete and then reinsert half of them —
only to change their order? (**) This is why one should never modify PostgreSQL system catalogs directly.</p>
<p>“*” As a side note, even if the UPDATE would succeed, the new sort order will not emerge in the same session due to
caching effects. The ALTER TYPE ADD VALUE command, on the other hand, does it properly. In addition, PostgreSQL
documentation states that the new OID should be an odd number; otherwise, the ‘enumsortorder’ column is ignored. The
correct solution to the problem of changing the sort order of enum values is to first create a new enum type with the
same labels but a different order, then do type conversion for the table columns using the old enum values USING
::text::new_enim, and finally rename the new enum value to the old one — dropping (or renaming) the old one beforehand.</p>
<p>“**” You can change the data type of the pg_enum column to oid in the pg_attribute system catalog, update the oid
values of the enum labels that cause the error to the values, and then change the pg_attribute type back, but see the
title of this blog post.</p>Zalando Does Tech Open Air2015-07-09T00:00:00+02:002015-07-09T00:00:00+02:00Hayley Baldwintag:engineering.zalando.com,2015-07-09:/posts/2015/07/zalando-does-tech-open-air.html<p>Superpower Revelations at Tech Open Air.</p><p>"Technology is a transformative power that disrupts entire industries and touches every angle of life." -
<a href="http://toa.berlin/">toa.berlin</a></p>
<p>Zalando is participating in this year's <a href="http://toa.berlin/">Tech Open Air</a> (TOA) in a big way, with a massive
"super-booth" at the festival's two-day <a href="http://toa.berlin/unconference/">Unconference</a> and multiple satellite events.
We're also throwing a massive afterparty on July 16th (RSVP
<a href="https://www.eventbrite.de/e/whats-your-superpower-the-zalando-toa-afterparty-tickets-17670088711">here</a> - password:
#ZuperPower.) Unconference attendees can stop by our booth and watch footage of our yearly Hack Week event, learn about
our <a href="https://github.com/zalando/">open source</a> projects, ask about our upcoming <a href="http://www.retaildetail.eu/en/eu-m-tail/item/17629-zalando-opens-fashion-gallery-and-innovation-lab-in-berlin">Innovation
Lab</a>, and
network with Zalando's superhero-engineers. Talk to the men and women who have helped transform Zalando from a
two-person, online retail shop to a publicly traded fashion platform doing business across Europe and with nearly 8,000
employees! (Speaking of employees, maybe you'd like to become one? Our recruiters will be taking resumes and conducting
mini-interviews from our booth; visit <a href="https://tech.zalando.com/jobs/">our jobs page</a> to read about our openings.)</p>
<p>Launched in 2012, (TOA) is a three-day, Berlin-based art, music and technology festival that has become massively
popular worldwide — think SXSW, but with a stronger focus on tech. Of all the great events that take place in Berlin
each year, TOA is perhaps the biggest and buzz-iest, garnering attention from WIRED, The Next Web, <em>Rolling Stone</em> and
TechCrunch.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/5761c621e6f9169d58f1bc9db9348ca585a3d515_10580136_635373156559366_6483002187383474509_n.jpg?auto=compress,format"></p>
<p>This year's TOA includes a two-day <a href="http://toa.berlin/unconference/">Unconference</a> and a day of satellite events hosted
by Berlin tech companies; see Zalando's events <a href="http://www.meetup.com/Zalando-Tech-Events-Berlin/?scroll=true">here</a>.
This is the perfect format: With so much to see, attendees won’t miss out on any of the installations, talks, demos and
workshops on offer. Even if you’re stuck “at work” during the event, you can take advantage of TOA's many chill-out
areas and pods. And with its super-relaxed, inclusive and unpretentious vibe, TOA offers something for everyone;
attendees include engineers and founders as well as artists, designers, students and musicians.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3457ab22cd15a3461801870c996086883d8693fb_10526129_635371219892893_8551693812153177222_n.jpg?auto=compress,format"></p>
<p>On <a href="https://twitter.com/ZalandoTech">Twitter</a>, we'll use the hashtag <strong>#ZuperPower</strong> to highlight our activities,
photos and announcements. We invite you to use it, too!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/399df7c84fc76f6bf7374afae8e34687bcbe6e63_10525887_635364313226917_6066847729807085656_n.jpg?auto=compress,format"></p>
<p>And don’t miss the fireside chat with Zalando Co-founder Robert Gentz, which takes place on July 16 at 2:00 inside the
Alte Teppiche Fabrik (below), first floor.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/eb1511f3bfab1a09cb63d3fe0762230044579748_10525779_635354469894568_1129557750886786032_n.jpg?auto=compress,format"></p>
<p>The full rundown of Zalando Tech at TOA:</p>
<p><strong>Wednesday, July 15</strong></p>
<ul>
<li>9:00 - end: Super-booth at the Unconference. All day, both days!</li>
</ul>
<p><strong>Thursday, July 16</strong></p>
<ul>
<li>9:00 - end: Super-booth at the Unconference. All day, both days!</li>
<li>2:00: Fireside chat with Robert Gentz on out tech culture: From Retail to Tech</li>
<li>21:00: What’s your Superpower? The Zalando + TOA Afterparty (RSVP
<a href="https://www.eventbrite.de/e/whats-your-superpower-the-zalando-toa-afterparty-tickets-17670088711">here</a> - password:
#ZuperPower)</li>
</ul>
<p><strong>Friday, July 17 - Satellite Events</strong></p>
<p><strong>Zalando Tech Skylounge, Mollstr. 1</strong>
12:00: Diversity in Tech Panel & Lunch (Skylounge at Zalando Tech HQ, Mollstr. 1 in Berlin)
<a href="http://www.eventbrite.com/e/diversity-in-tech-panel-lunch-tickets-17658990516?aff=es2">details + RSVP here</a>
14:00: Zalando Tech Superheroes Showcase (also Skylounge at Zalando Tech HQ, Mollstr. 1 in Berlin)
<a href="http://www.eventbrite.com/e/zalando-tech-superheroes-showcase-tickets-17622022945?aff=es2">details + RSVP here</a></p>
<p>If you haven’t been to TOA, you need to know that it’s incredible and absolutely worth attending. If you can’t make the
Unconference you can still catch us at our three satellite events above — no TOA ticket required for those. Hope to see
you!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/38e060f7f0dc109ae40c88ef76530fe16d1f8739_10537333_635363696560312_6253118854077273660_n.jpg?auto=compress,format"></p>
<p><em>All photos courtesy of Tech Open Air.</em></p>AWS Summit 2015: Zalando Keynote2015-07-07T00:00:00+02:002015-07-07T00:00:00+02:00Lauri Appletag:engineering.zalando.com,2015-07-07:/posts/2015/07/radical-agility-on-aws-video.html<p>Zalando's VP Engineering tells the AWS Summit crowd how we're using AWS.</p><p>Last week Zalando VP Engineering Eric Bowman was one of the featured speakers at last week's <a href="https://aws.amazon.com/summits/berlin/">AWS
Summit</a> in Berlin: a full day of talks, hands-on labs, orange swag (we felt
right at home!) and networking opportunities hosted by <a href="https://aws.amazon.com/summits/berlin/">AWS</a>. Eric was part of
the keynote by Amazon CTO Dr. Werner Vogels, who invited industry leaders to describe how cloud computing has
transformed business. Eric took the opportunity to tell the (huge!) audience of technologists about Zalando's adoption
of AWS and our open source
<a href="https://tech.zalando.com/blog/radical-agility-with-autonomous-teams-and-microservices-in-the-cloud/">STUPS</a>, which
provides a convenient and audit-compliant Platform-as-a-Service (PaaS) for multiple autonomous teams on top of AWS. Some
choice quotes from Eric:</p>
<ul>
<li>"The best people want to work in an autonomous way."</li>
<li>We really give our teams a lot of space and time to learn not only how to use AWS, but also to learn how to build
complex distributed applications that have massive throughput, high performance, high availability."</li>
<li>"We think of teams as a unit of trust ... we've tried to create the context where people can make mistakes and it's
not fatal."</li>
</ul>
<p>And here are Eric's slides:</p>
<p><strong><a href="https://www.slideshare.net/kikibobo/aws-summit-berlin-2015-zalando-keynote-50009384" title="AWS Summit Berlin 2015 Zalando Keynote">AWS Summit Berlin 2015 Zalando
Keynote</a></strong>
from <strong><a href="http://www.slideshare.net/kikibobo">Eric Bowman</a></strong></p>Zalando goes to ReactEurope Paris2015-07-07T00:00:00+02:002015-07-07T00:00:00+02:00Andrey Kuzmintag:engineering.zalando.com,2015-07-07:/posts/2015/07/zalando-goes-to-reacteurope-paris.html<p>3 Zalandos withstand 35° with no AC for the love of React</p><p>Created and open-sourced by Facebook, React has attracted a productive community of engineers who tend to rethink the
old ways of building front-end and iterate over newer concepts. At Zalando, we’ve used React to create our <a href="https://tech.zalando.com/blog/shop-the-look/">Shop the
Look</a> feature and some internal tools, and are considering whether to
adopt it more broadly across our organization. To help ourselves reach a more informed decision, a few of us attended
ReactEurope: a two-day (and Zalando-sponsored) conference held in Paris last week.</p>
<p><img alt="An intense 1:1 with our Head of Engineering Brand Solutions, Rodrigue Schaeffer.
" src="https://images.prismic.io/zalando-jobsite/efc449519bfb586fd4b5bf92b998ca1f5976bdd5_img_20150702_164242-1.jpg?auto=compress,format"></p>
<p>With an agenda heavy on presentations by Facebook's React core team members, ReactEurope gave us a nice overview of the
ideas that have evolved in the React community. Some of these topics were quite controversial: For example, <a href="https://twitter.com/chantastic">Michael
Chan</a>, Web Developer at Ministry Centered Technologies, presented on using Inline Styles
in React components instead of CSS. Others were hilarious: e.g. Productive Mobile’s <a href="https://github.com/azproduction">Mikhail
Davydov’s</a> how-to on building Text UI in React, in which he basically recreated
<a href="https://www.midnight-commander.org/">Midnight Commander</a> in the browser and proxied it into the terminal.</p>
<p>Many of the presenters used the new term “Developer Experience” (DX), which applies user experience ideas to software
engineering. <a href="https://twitter.com/dan_abramov?lang=en">Dan Abramov</a>, a front-end engineer at Stampsy, brought React hot
module reloading to the next level with his iteration over the Flux pattern. Called Redux, Abramov’s project applies a
functional approach to stores from the Flux pattern and turns them into reducers. The live demo was really awesome, so I
advise you to check out <a href="https://www.youtube.com/watch?v=xsSnOQynTHs">the video</a>. And speaking of demos, the one by
React core member <a href="https://twitter.com/_chenglou">Cheng Lou</a> was also impressive: He proposed a way of creating seamless
animations in React that basically deprecates ReactCSSTransitionGroup.</p>
<p><img alt="Dan Abramov takes the stage
" src="https://images.prismic.io/zalando-jobsite/11dc43f68df596e1815920dfcf60f42fc69cd8ad_img_6765.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/3559e36b32c7c3c0f7be59f7e17a2d474e9e3c76_img_6860.jpg?auto=compress,format"></p>
<p>Over several talks, the React core team debuted a reference implementation of
<a href="https://github.com/graphql/graphql-js">GraphQL</a>—a language for describing data requirements in a declarative way—and
showed how a GraphQL server can be built on top of existing Rest APIs. They also demonstrated Relay (soon to be open
sourced), which brings the ideas behind GraphQL into React components—allowing them to specify what data is needed from
the server. Can’t wait to try them out!</p>
<p>I also enjoyed <a href="https://twitter.com/sebmarkbage">Sebastian Markbåge’s</a> presentation on the DOM and its flaws. Like
Markbåge, I also really hope that browsers will one day provide us a more low-level way of interacting with the
platform. On the other hand, I’m not worried anymore about the future of JavaScript, because we have Babel — a project
by <a href="https://twitter.com/sebmck">Sebastian McKenzie</a> that transpiles future JavaScript into the current version.</p>
<p>ReactEurope was a great opportunity to talk to fellow engineers and the speakers. Choppy WiFi couldn’t withstand the
onslaught of the crowd, and this encouraged people to spend more time talking with each other. I, for one, had a chance
to talk with Hootsuite engineer <a href="https://twitter.com/skidding">Ovidiu Cherecheș</a>, who created
<a href="https://github.com/skidding/cosmos">Cosmos</a> — a tool to develop encapsulated React components, and which can be used
together with image regression tests. My team and I are currently experimenting with Cosmos, so getting tips from Ovidiu
was pretty handy.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/47f1be8d1900664b71e942a57ccba4d9db1ae64a_img_20150702_182315.jpg?auto=compress,format"></p>
<p>A side note: Testing as a topic wasn’t really covered at ReactEurope. (Maybe in 2016?) But this didn’t detract from what
was a great conference. Looking forward to the next one!</p>
<p>You can hear from Andrey <a href="https://twitter.com/unsoundscapes">@unsoundscapes</a> and
<a href="https://twitter.com/ZalandoTech">@ZalandoTech</a> for more tech news and events.</p>Watch: "From Java to Scala in Less Than Three Months"2015-07-03T00:00:00+02:002015-07-03T00:00:00+02:00Lauri Appletag:engineering.zalando.com,2015-07-03:/posts/2015/07/watch-from-java-to-scala-in-less-than-three-months.html<p>Where can you find video from Zalando's Scala Days 2015 presentation? We know the answer.</p><p>Scala has acquired a reputation for being difficult to learn. But as Zalando Head of Logistics Engineering Daniel Nowak
and Brand Solutions Delivery Lead Alexander Kops told the audience at <a href="http://event.scaladays.org/scaladays-amsterdam-2015">Scala
Days</a> last month, a bit of communication goes a long way in
onboarding teams to deliver working Scala code quickly and relatively painlessly. <a href="https://www.parleys.com/tutorial/from-java-scala-less-than-three-months">Go here to watch their
talk</a> — and pick up some tips on how to get
your own engineering team up to Scala speed.</p>Auto-Scaling Your API: Tips from Zalando (Slides)2015-06-30T00:00:00+02:002015-06-30T00:00:00+02:00Lauri Appletag:engineering.zalando.com,2015-06-30:/posts/2015/06/auto-scaling-your-api-tips-from-zalando-slides.html<p>Learn more about Zalando's "API First" approach — and find out who Jimmy is.</p><p>Over the weekend a delegation of Zalandos were in Barcelona, Spain to enjoy the first-ever
<a href="http://www.jbcnconf.com/">JBCNConf</a>: the Barcelona Java Users Group's first-ever conference for Java and JVM
enthusiasts. Two of our crew — Delivery Lead Sean Floyd and Software Engineer Luis Mineiro — went the extra step and
gave a presentation on Zalando's "API First" approach. As Luis and Sean explained in their talk, Zalando is in the
middle of a paradigm shift: an evolution from online fashion shop to fashion platform. As part of this evolution, our
engineering team has gone “API First”: publishing APIs that other companies can use to take advantage of our massive
amounts of data and build their own applications.</p>
<p>Zalando’s <a href="https://api.zalando.com">primary public API</a> is implemented with Java using Spring and RestEasy, and offers
programmers access to the web shop. It also allows for basic operations such as searching for articles, categories,
filters and brands.</p>
<p>Making our APIs public has presented some very exciting technical challenges: specifically, how do we auto-scale our API
as our audience grows? Sean and Luis gave answers to these and other questions by sharing insights into the
architecture-related choices we have made as part of our API First shift. Check out their slides here:</p>
<p><strong><a href="https://www.slideshare.net/ZalandoTech/jbcnconf" title="Auto-scaling your API: Insights and Tips from the Zalando Team">Auto-scaling your API: Insights and Tips from the Zalando
Team</a></strong>
from <strong><a href="http://www.slideshare.net/ZalandoTech">Zalando Tech</a></strong></p>RSVP for Recommenders.ie’s July Meetup at Zalando-Dublin2015-06-23T00:00:00+02:002015-06-23T00:00:00+02:00Hayley Baldwintag:engineering.zalando.com,2015-06-23:/posts/2015/06/rsvp-for-recommenders.ies-july-meetup-at-zalando-dublin.html<p>Zalando's Dublin office hosts our second meetup.</p><p>When it comes to hosting tech meetups, Zalando’s brand-new Fashion Insights Centre in Dublin is officially in full
swing. Earlier this month we hosted the <a href="https://tech.zalando.com/blog/zalando-dublin-hosts-cassandra-in-focus-meetup/">Dublin Cassandra
Users</a>, our very first tech meetup in our
spacious Silicon Docks digs—and on July 1 we’re opening our doors to
<a href="http://www.meetup.com/recommenders-ie">recommenders.ie</a>, a new group for recommender systems enthusiasts. The night
will include two talks:</p>
<ul>
<li>Gracenote Research Engineer Cameron Summer will explain how to enhance entertainment data systems using algorithmic
analysis of big data generated from Internet radio</li>
<li>Benjamin Heitmann, a post-doctoral researcher at the Insight Centre for Data Analytics at NUI Galway, will discuss
how to balance privacy concerns and personalization</li>
</ul>
<p>In addition to great talks, we’ll have pizza and beer. Join us by <a href="http://www.meetup.com/recommenders-ie/events/223253683/">RSVPing
here</a>—but hurry, as seats are going fast.</p>
<p><strong>Zalando is hiring data scientists and software engineers in Dublin.</strong> <strong>Have a look at our current openings.</strong></p>What We Learned While Making Zalando's Apple Watch App2015-06-23T00:00:00+02:002015-06-23T00:00:00+02:00Kateryna Gridinatag:engineering.zalando.com,2015-06-23:/posts/2015/06/what-we-learned-while-making-zalandos-apple-watch-app.html<p>Zalando Mobile's tips and shortcuts to help you develop with Apple's new WatchKit framework.</p><p>In May the Zalando mobile team updated our iOS app to include several extra features, including navigation and a new
product detail page. We also included Apple Watch support for the very first time, to make the Zalando app more
user-friendly and innovative. Because it’s so new, <a href="https://developer.apple.com/watchkit/">the WatchKit framework</a> is
likely to present some challenges even for highly experienced iOS developers. With this in mind, I'd like to share some
insights regarding our team’s experience of working with it to help you avoid some pitfalls and save time.</p>
<p>Initially our idea was to use the WatchKit extension to implement a custom push notification. To do this, we needed to
create an Apple Watch app with the Notification scene. Our key learnings:</p>
<p><strong>It's not possible to send a push notification from the real environment to the Apple Watch simulator.</strong>
The only way to send push notifications on the simulator is to use PushNotificationPayload.apns inside of a WatchKit
extension. This payload contains an example JSON needed to send to an Apple Watch.</p>
<p><strong>A "category" field in PushNotificationPayload.apns should have the same name as the “notification category” name in a
Storyboard.</strong>
You can create a few notification categories with different interfaces in the same project. According to the category
name in the payload, the application itself decides which interface to present.</p>
<p><strong>To test a push notification on the simulator, configure your targets.</strong>
To do this, our team consulted the " <a href="https://developer.apple.com/library/ios/documentation/General/Conceptual/WatchKitProgrammingGuide/ConfiguringYourXcodeProject.html">The Build, Run, and Debug
Process</a>”
section of the iOS Developers Library.</p>
<p><strong>There is no guarantee that your dynamic notification will arrive.</strong>
We added two types of notifications in the Apple Watch App: dynamic and static. During testing, however, only static
notifications arrived on a real device. We investigated this issue and learned that dynamic notifications arrive only in
special cases, such as when the Apple Watch is being worn and its battery level is above 50%. Yet even if these
conditions are met, you can’t guarantee that your dynamic notification will arrive; the system decides, and there's no
way to influence it.</p>
<p>It should be possible for developers to test dynamic notifications. Hopefully Apple will correct this soon.</p>
<p><strong>You can’t create a WatchKit Extension separate from an Apple Watch App.</strong>
You cannot customize the push-notification appearance without also creating a dedicated Apple Watch app, even though you
might not need it. The only way to support custom notifications is to have both an app and extension.</p>
<p><strong>You can’t run the iPhone app from Apple Watch.</strong>
It initially seemed possible on the simulator, but when we tested our results on an actual Apple Watch we learned that
we couldn’t launch a parent app from an Apple Watch app. Epic fail.</p>
<p><strong>Unlike the iPhone, the Apple Watch does not support animation.</strong>
By adding a set of images in Images.xcassets, we implemented a custom loading indicator. Unfortunately, this is
currently the only way to display animations on the watch. Neither Core Graphics nor Core Animation are available with
the watch.</p>
<p><strong>Do not use the main thread in handleWatchKitExtensionRequest:reply.</strong>
All interactions between the iPhone and Apple Watch are performed in the background. Calling openParentApplication:reply
inside the Apple Watch app does not open the parent app, as the team initially assumed from the name; the method works
only in background threads. I advise you to ensure that all the methods you invoke in the
handleWatchKitExtensionRequest:reply: inside an AppDelegate do not work on the main thread. Otherwise, some problems may
occur on a locked iPhone or when the main app is in the background.</p>
<p><strong>Choose the right setup of provisioning profiles to avoid difficulties during app uploading.</strong>
Our team’s work really started to get interesting when began to provision our profiles configuration. The biggest
challenge was to set up our configs and make them work. Before making an app package, make sure that:</p>
<ul>
<li>bundleId of the WatchKit Extension is in format bundleId(of main app).watchkitextension</li>
<li>bundleId of the Apple Watch app is in the format bundleId(of main app).watchkitapp</li>
<li>WKAppBundleIdentifier (in .plist of the WatchKit Extension) is in format bundleId(of main app).watchkitapp</li>
<li>WKCompanionAppBundleIdentifier (in .plist of the Apple Watch app) is the same as bundleId of the main app</li>
</ul>
<p>We created separate provisioning profiles for the WatchKit Extension and Apple Watch app and configured them in Code
Signing with the same Team name. A few odd bugs remained even after we followed these steps, but after some
manipulations we were able to generate the application package.</p>
<p>In the end, we created a fully-functioning Apple Watch app. A simple list of current sales from the Zalando shop would
have been underwhelming, so we added a detailed view, custom loading indicator and both hands-free and "force touch"
event handling. The end result:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/8296e1bb7a650155d026567bbc023160905e93d1_apple-watch.jpg?auto=compress,format"></p>
<p>While developing the Apple Watch app, our team faced many difficulties due to the lack of documented examples to follow
and learn from and lack of real devices for testing. Many behaviors that worked in a certain way on the simulator worked
completely differently on the real device. But in the end, it was definitely a great learning experience!</p>Zalando Goes to GOTO Amsterdam 20152015-06-17T00:00:00+02:002015-06-17T00:00:00+02:00Lauri Appletag:engineering.zalando.com,2015-06-17:/posts/2015/06/zalando-goes-to-goto-amsterdam-2015.html<p>Zalando is a proud silver sponsor and presenter at GOTO Amsterdam 2015!</p><p>For the second time in two weeks, a posse of Zalando technologists are donning our "We Dress Code" hoodies, packing up
our stickers and swag, and traveling to Amsterdam to attend a first-class tech conference. (Last week's event: <a href="http://scaladays.org/">Scala
Days</a>.) This week's major event is the localized installment of GOTO, the international software
development conference series hosted and organized by <a href="http://trifork.com/">Trifork</a>. If you happen to be attending the
conference, be sure to stop by our booth to meet some of our engineers and team. And definitely don't miss Zalando Cloud
Engineer/" <a href="http://stups.io/">STUPS</a> Hacker" Henning Jacobs' presentation, " <a href="http://gotocon.com/amsterdam-2015/presentation/A%20Cloud%20Infrastructure%20for%20Scaling%20Innovation%20Across%20Autonomous%20Teams">A Cloud Infrastructure for Scaling
Innovation Across Autonomous
Teams</a>"
(Solutions Track, Friday at 3:50 PM).</p>Zalando Hosts PostgreSQL Meetup Group Berlin #22015-06-15T00:00:00+02:002015-06-15T00:00:00+02:00Hayley Baldwintag:engineering.zalando.com,2015-06-15:/posts/2015/06/zalando-hosts-postgresql-meetup-group-berlin-2.html<p>Two great PostgreSQL speakers present at Zalando on June 18!</p><p>For the second installment of the <a href="http://www.meetup.com/PostgreSQL-Meetup-Berlin/events/223181762/">PostgreSQL Meetup
Group</a>, we have a special surprise: Devrim Gündüz, a
major contributor to PostgreSQL and a well-known community member, is in town from Istanbul, Turkey and will give us
some insights into backup and recovery solutions for Postgres. We’ll also have long-standing Postgres community member
Susanne Schmidt present on pgTap, a unit-testing framework for your Postgres database. She'll give an overview of
PgTAP's usage, its capabilities and other positive aspects.</p>
<p><a href="http://www.meetup.com/PostgreSQL-Meetup-Berlin/events/223181762/">RSVP at the event page</a> to claim your spot for the
meetup, which takes place on June 18 at Zalando's technology office in Mitte.</p>Zalando-Dublin Presents "Cassandra in Focus"2015-06-11T00:00:00+02:002015-06-11T00:00:00+02:00Lauri Appletag:engineering.zalando.com,2015-06-11:/posts/2015/06/zalando-dublin-hosts-cassandra-in-focus-meetup.html<p>Our first meetup in Dublin focuses on Cassandra, one of our favorite big data technologies.</p><p>Next week Zalando Senior DBA Jan Mußler will travel north to our new Dublin office to present a talk on how we use
Cassandra in our <a href="https://tech.zalando.com/blog/monitoring-the-zalando-platform/">monitoring</a> and analytics work--and we
couldn't be more excited, because this will be Z-Dublin's first-ever tech meetup. Jan's talk is part of <a href="http://www.meetup.com/Dublin-Cassandra-Users/events/222659125/">the Dublin
Cassandra Users meetup</a>, which will also include a
presentation by DataStax evangelist Christopher Batey. Regular readers of our tech blog know that we've recently <a href="https://tech.zalando.com/blog/camunda-meets-cassandra-at-zalando/">built
a prototype using Cassandra</a> and the Camunda BPM
engine, so we're looking forward to chatting and trading insights with the 60 "Cassandra Rebels" who have signed up to
attend.</p>
<p>The meetup takes place on June 17 and is full, but add your name to the waitlist and maybe you'll get a spot. Stranger
things have happened ...</p>Speeding up Xcode Builds2015-06-05T00:00:00+02:002015-06-05T00:00:00+02:00Dmitry Bespalovtag:engineering.zalando.com,2015-06-05:/posts/2015/06/speeding-up-xcode-builds.html<p>How Zalando Mobile achieved 80% faster Xcode builds and 30% faster compilation speeds.</p><p>A few months ago <a href="https://twitter.com/JanGorman">Jan Gorman</a>, delivery lead for Zalando’s mobile engineering team,
published <a href="http://tech.zalando.com/blog/mobile-engineering-at-zalando/">a great overview of our team’s development
processes</a>. I’d like to drill down a bit deeper on one
topic of great importance to us: development speed.</p>
<p>My interest in this subject was recently piqued after I read <a href="https://labs.spotify.com/2013/11/04/shaving-off-time-from-the-ios-edit-build-test-cycle/">Spotify’s blog
post</a> about how their mobile
team sped up their Xcode build time in the Edit-Build cycle. I started to wonder if Zalando might achieve similar
results. So I experimented a bit— changing our Xcode build settings and playing around with different Swift coding
styles. The result: 80% faster build times, which in turn saved time for every Zalando iOS dev.</p>
<h3>How I Did It</h3>
<p>Thanks to some insight from
<a href="https://stackoverflow.com/questions/1027923/how-to-enable-build-timing-in-xcode/2801156#2801156">StackOverflow</a>, the
first thing I did was to enable build duration setting in Xcode. This produced new measurements that I could easily
compare with the old:</p>
<div class="highlight"><pre><span></span><code> $ defaults write com.apple.dt.Xcode ShowBuildOperationDuration YES
</code></pre></div>
<p>The Zalando mobile team uses Cocoapods, so we can add a post-installation phase to a Podfile to change the Debug
Information Format setting in all targets:</p>
<div class="highlight"><pre><span></span><code> post_install do |installer|
puts("Update debug pod settings to speed up build time")
Dir.glob(File.join("Pods", "**", "Pods*{debug,Private}.xcconfig")).each do |file|
File.open(file, 'a') { |f| f.puts "\nDEBUG_INFORMATION_FORMAT = dwarf" }
end
end
</code></pre></div>
<p>After making this change in a separate branch, I compared before and after build times for:</p>
<ul>
<li>changes in Swift source code (add a println() statement)</li>
<li>changes in Objective-C source code (add NSLog(@“”) statement)</li>
<li>builds after cleaning the workspace</li>
</ul>
<p>The results were surprising: Turning on “dwarf” setting improved our Edit-Build time by 70-80%, and reduced our clean
build time by 15% (22 seconds). Not bad.</p>
<h3>A Second Experiment</h3>
<p>I wanted more speed. After several Edit-Build cycles, I noticed that compiling Swift files takes an eternity to
complete. As it turns out, it is slower to compile changes in Swift source code than to simply make changes in
Objective-C source code. More specifically, making changes to Swift files triggers recompilation of most of Obj-C files
because we are importing a Xcode-generated Swift header file in many places across the app. Looks like Objective-C gives
the compiler better hints than Swift does.</p>
<p>After identifying the files slowest to compile, and conducting some additional experiments, I found that using
extensions too generously -- for example, having a class extension for each protocol implementation -- increases build
time. So I converted most of the extensions in Swift files to simple class methods, which resulted in a four-second
median improvement (seven-second average) in compilation time. The change was simple -- I just merged all class
extensions into one class:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// Before</span>
<span class="n">class</span><span class="w"> </span><span class="n">SizeViewController</span><span class="p">:</span><span class="w"> </span><span class="n">UIViewController</span><span class="w"> </span><span class="p">{</span>
…
<span class="p">}</span>
<span class="n">extension</span><span class="w"> </span><span class="n">SizeViewController</span><span class="p">:</span><span class="w"> </span><span class="n">UITableViewDataSource</span><span class="w"> </span><span class="p">{</span>
…
<span class="p">}</span>
<span class="n">extension</span><span class="w"> </span><span class="n">SizeViewController</span><span class="p">:</span><span class="w"> </span><span class="n">UITableViewDelegate</span><span class="w"> </span><span class="p">{</span>
…
<span class="p">}</span>
<span class="n">extension</span><span class="w"> </span><span class="n">SizeViewController</span><span class="p">:</span><span class="w"> </span><span class="n">SizeCellDelegate</span><span class="w"> </span><span class="p">{</span>
…
<span class="p">}</span>
…
<span class="c1">// After</span>
<span class="n">class</span><span class="w"> </span><span class="n">SizeViewController</span><span class="p">:</span><span class="w"> </span><span class="n">UIViewController</span><span class="p">,</span>
<span class="w"> </span><span class="n">UITableViewDataSource</span><span class="p">,</span><span class="w"> </span><span class="n">UITableViewDelegate</span><span class="p">,</span>
<span class="w"> </span><span class="n">SizeCellDelegate</span><span class="w"> </span><span class="p">{</span>
…
<span class="p">}</span>
</code></pre></div>
<p>Of course, there are always trade-offs. Although you’ll get faster build time by merging all extensions into one class,
at the same time you sacrifice code readability and ease of maintenance. For the sake of experiment, I’ve decided to
continue with one big class.</p>
<p>I’ve written a small benchmark that compares the compilation time of a class with N method and the compilation time of a
class with N single-method extensions. You can find the <a href="https://github.com/diamondsky/swift-extensions-performance">benchmark code on
GitHub:</a></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f8a86b0fdb0f5c7967fa90f7473d59874a87817c_bespalovxcodebuildschart.png?auto=compress,format"></p>
<p>The compilation time of extensions increases after 3,000 methods. Although the chart suggests that even 3,000 extensions
compile as quickly as a single class with 3,000 methods, it usually doesn’t work this way in a real project. This is
because of the method’s structure: for example, using instance variables will increase your compilation time. In the
end, I hope our compilation times will improve with newer versions of Xcode and Swift.</p>RSVP for New Relic’s Free Workshop with Zalando Mobile2015-06-04T00:00:00+02:002015-06-04T00:00:00+02:00Hayley Baldwintag:engineering.zalando.com,2015-06-04:/posts/2015/06/rsvp-for-new-relics-free-workshop-with-zalando-mobile.html<p>Sign up ASAP for New Relic's free workshop with Zalando's Jan Gorman in Berlin!</p><p>Zalando Mobile Delivery Lead <a href="https://twitter.com/JanGorman">Jan Gorman</a> is teaming up with <a href="https://newrelic.com/">New
Relic</a> to give a free workshop on how Zalando’s mobile team uses New Relic to improve our
customer experience. As Jan <a href="https://blog.newrelic.com/2015/04/27/ecommerce-zalando-berlin/">recently told New Relic’s Manesh
Tailor</a>, Zalando mobile has been using New Relic for
crash reporting and to monitor the performance of our third-party APIs.</p>
<p>In his workshop, Jan will discuss how to:</p>
<ul>
<li>use New Relic to achieve business results</li>
<li>provide a high-quality customer experience across software applications</li>
<li>gain insights and increase visibility into your digital channels</li>
</ul>
<p>Organized by New Relic, the workshop takes place on July 1 at the Amano Hotel in Berlin and is geared towards working
software engineers and managers. <a href="https://www.eventbrite.com/e/new-relic-workshop-deeper-insights-into-your-digital-channels-registration-17149284973">Register
here</a>.</p>
<p>Follow us at <a href="https://twitter.com/ZalandoTech">@ZalandoTech</a> for more even updates and announcements!</p>How to Fix What You Can't Kill: Undead PostgreSQL queries2015-04-20T00:00:00+02:002015-04-20T00:00:00+02:00Sandor Szücstag:engineering.zalando.com,2015-04-20:/posts/2015/04/how-to-fix-what-you-cant-kill-undead-postgresql-queries.html<p>The standard way to kill a TCP connection inPostgreSQLis to usepg_terminate_backend($PID). However, in some situations this function does not work. To help you avoid negative outcomes when closing such connections, here is a simple hack.</p><p>The standard way to kill a TCP connection in PostgreSQL is to use pg_terminate_backend($PID). However, in some
situations this function does not work. To help you avoid negative outcomes when closing such connections, here is a
simple hack.</p>
<p>Undead queries</p>
<p>The Zalando team relies on PostgreSQL for almost all backend applications and we manage more than a hundred database
clusters reliably storing terabytes of data.</p>
<p>Recently we noticed that a few of our queries were running for hours or even days without terminating. Because our team
sets most of our databases to terminate queries after 10 minutes (with statement_timeout set to '10m'), this outcome
was completely unexpected.</p>
<p>We started to investigate and discovered that:</p>
<ul>
<li>in most cases, these never-ending queries were returning a lot of data (sometimes even megabytes);</li>
<li>recipients of the data were non-existent;</li>
<li>query was not killable by select pg_terminate_backend($PID) call;</li>
<li>process of that query was waiting for send() syscall to finish;</li>
<li>the underlying TCP connection was in the TCP ESTABLISHED state, but the client was already gone, so no data was
being transmitted over it.</li>
</ul>
<h2>The problem</h2>
<p>Such an undead query introduced at least two major issues:</p>
<ul>
<li>it is impossible to shutdown the cluster nicely (as postgres will be waiting for query termination or will try to
send software termination signal (TERM) to all running queries and will still wait until they terminate, so the only
way to stop the cluster with undead query would be to use --immediate option or effectively sending
non-ignorable KILL signal to all the processes and crashing the server.</li>
<li>long running transactions (and such an undead query is a transaction from the point of view of PostgreSQL) stop the
advancing of the event horizon (or the transaction ID of the oldest running query) and this in turn does not
allow (AUTO)VACUUM to clean up any records, that have been modified after the beginning of the oldest running query.</li>
</ul>
<h2>What is happening?</h2>
<p>It looks like the undead queries are the result of situations, when send() system call waits for the data to be
transferred over the TCP connection, but the recipient does not receive it. There are several possibilities here:</p>
<ul>
<li>client host died with power failure or there was a network issue and the TCP connection on the server host did not
realise it. In this case the TCP keepalive mechanism will kick in and try to detect, that the connection is dead
(see
<a href="https://web.archive.org/web/20171029021513/http:/tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html">http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html</a>);</li>
<li>client application is hanging (or paused) and does not receive any data from the server, in this case keepalive
works fine and the send() syscall will never end, even when a TERM signal was sent to it, because PostgreSQL is
using SA_RESTARTflag for signal processing and SO_SNDTIMEO is not used at all (see man 7 signal).</li>
</ul>
<h2>What to do?</h2>
<p>Probably first of all one should reduce the keepalive detection timeout to some more reasonable time (default is 2 hours
+ 9 * 75 sec or about 2 hours and 12 minutes). One can do that by changing the default system settings or by tuning
postgres configuration parameters (see
<a href="https://web.archive.org/web/20171029021513/http:/www.postgresql.org/docs/current/static/runtime-config-connection.html#GUC-TCP-KEEPALIVES-IDLE">http://www.postgresql.org/docs/current/static/runtime-config-connection.html#GUC-TCP-KEEPALIVES-IDLE</a>)</p>
<p>But when you already have an undead query running and you are sure that the client does not exist the solution can be to
forcefully close the TCP connection.</p>
<p>To do that you can either</p>
<ul>
<li>send a TCP packet with a FIN flag</li>
<li>send a TCP packet with an RST flag</li>
</ul>
<p>As we do not expect, that the client will answer the FIN flag, sending RST flag will do the nasty job of closing
our ESTABLISHED TCP connection without waiting for a response from the client.</p>
<h2>How to Send an RST Flag</h2>
<p>To send a correct RST packet, collect all the information you need to break into a TCP stream:</p>
<ul>
<li>SRC IP</li>
<li>SRC TCP port</li>
<li>DST IP -> DB-Host</li>
<li>DST TCP port -> 5432</li>
<li>Sequence number</li>
</ul>
<p>Because we have full control of our database host, as well as the PID of the process that holds the connection (in this
case, 34140), we can easily collect all unknown information:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span><span class="c1"># DB-Host</span>
$<span class="w"> </span>ps<span class="w"> </span>fauxww<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span><span class="m">34140</span>
postgres<span class="w"> </span><span class="m">34140</span><span class="w"> </span><span class="m">0</span>.5<span class="w"> </span><span class="m">0</span>.0<span class="w"> </span><span class="m">13042260</span><span class="w"> </span><span class="m">9040</span><span class="w"> </span>?<span class="w"> </span>Ss<span class="w"> </span>Apr01<span class="w"> </span><span class="m">5</span>:13<span class="w"> </span><span class="se">\_</span><span class="w"> </span>postgres:<span class="w"> </span>robot<span class="w"> </span>prod_eventlog_db<span class="w"> </span><span class="m">10</span>.161.137.203<span class="o">(</span><span class="m">50166</span><span class="o">)</span><span class="w"> </span>SELECT
</code></pre></div>
<p>As you can see, the SRC IP is 10.161.137.203 and the SRC TCP port is 50166.</p>
<p>Now we have to get the current sequence number to attack the target TCP stream. You might have to wait a while to see a
packet -- this will depend on the keepalive settings (if the default values are used, then not longer than 2 hours):</p>
<div class="highlight"><pre><span></span><code><span class="err">#</span><span class="w"> </span><span class="n">DB</span><span class="o">-</span><span class="k">Host</span>
<span class="err">$</span><span class="w"> </span><span class="n">tcpdump</span><span class="w"> </span><span class="o">-</span><span class="n">vvni</span><span class="w"> </span><span class="ow">any</span><span class="w"> </span><span class="k">host</span><span class="w"> </span><span class="mf">10.161.137.203</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">port</span><span class="w"> </span><span class="mi">50166</span>
<span class="mi">10</span><span class="err">:</span><span class="mi">08</span><span class="err">:</span><span class="mf">02.679268</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">123</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">10348</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">41</span><span class="p">)</span>
<span class="mf">10.161.137.203.50166</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">10.10.116.76.5432</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0xcaaa</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">130742508</span><span class="err">:</span><span class="mi">130742509</span><span class="p">,</span><span class="w"> </span><span class="n">ack</span><span class="w"> </span><span class="mi">2921339488</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">1</span>
</code></pre></div>
<p>Our sequence number is 130742508, which we’ll now use to send a spoofed TCPpacket and stop the stream. hping3 can send
arbitrary packets via RAW sockets and also helps us to stop the stream:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>hping3<span class="w"> </span>-a<span class="w"> </span><span class="m">10</span>.161.137.203<span class="w"> </span>-s<span class="w"> </span><span class="m">50166</span><span class="w"> </span>-p<span class="w"> </span><span class="m">5432</span><span class="w"> </span>--rst<span class="w"> </span>-M<span class="w"> </span><span class="m">130742508</span><span class="w"> </span><span class="m">10</span>.10.116.76
</code></pre></div>
<p>As you can see, in the open tcpdump session the packet was successfully received:</p>
<div class="highlight"><pre><span></span><code><span class="err">#</span><span class="w"> </span><span class="n">running</span><span class="w"> </span><span class="n">tcpdump</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">DB</span><span class="o">-</span><span class="k">Host</span>
<span class="mi">10</span><span class="err">:</span><span class="mi">25</span><span class="err">:</span><span class="mf">41.225359</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">64</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">24896</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">none</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">40</span><span class="p">)</span>
<span class="mf">10.161.137.203.50166</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">10.10.116.76.5432</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">R</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0x41f5</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">130742508</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">512</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span>
</code></pre></div>
<p>Postgres then closes the process; we send a TCP reset packet signalling that the client does not know about this
connection.</p>
<p>We hope this post helps you to fix edge cases with connections to postgres and avoid frustration along the way. Tell us
if it works for you by pinging us on Twitter at @ZalandoTech.</p>We Launched It! The Zalando Space Shoe (Video)2015-03-03T00:00:00+01:002015-03-03T00:00:00+01:00Rodrigo Reistag:engineering.zalando.com,2015-03-03:/posts/2015/03/we-launched-it-the-zalando-space-shoe-video.html<p>Zalando launched a lone Zign shoe to space on the 21st of February, 2015.</p><p>February 21, 2015 marked another milestone in the success story of Zalando: we sent a shoe into the stratosphere!</p>
<p>After a <a href="http://tech.zalando.com/posts/hackweek-december-2014-zalando-space-launch.html">failed launch</a> in mid-December
during our Zalando Hack Week, the Space Shoe Team resolved to persevere and try again. Over the following weeks we made
some adjustments to our <a href="http://tech.zalando.com/posts/hackweek-december-2014-zalando-space-launch.html">prototype</a> and
planned a new attempt. Our fine-tuning included buying a more resistant rope for holding the ship; improving the
coordination around releasing the balloon and craft so that we could avoid overstretching the rope; and checking the
weather forecast--paying close attention to predicted wind conditions. After making these changes, we knew we were good
to go again!</p>
<p>So, on what was a beautiful Saturday in Poland, we accomplished our mission. The details:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/08089bf09a4e42191a92c03c64126c0689562832_balloontripkm.jpg?auto=compress,format"></p>
<p><strong>08:00</strong> - Team meets at our departure point: Zalando’s tech office in Mollstraße 1, Berlin. Six members of our
14-member team showed up--not bad at all for 8 AM on a Saturday. A lot of sleepy eyes in our group, but we had high
hopes nevertheless. We gathered all the equipment and loaded everything into our cars.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/52f5f3e8144610703eab292884b3da6242ae1dba_ship_duke.jpg?auto=compress,format"></p>
<p><strong>08:30</strong> - Start of our journey to the launch site at Kostrzyn, Poland, about 100km from Berlin. We chose Kostrzyn
because we knew that it has a wide, open field perfect for activities such as launching golden women’s shoes into space.
:)</p>
<p><strong>10:05</strong> - Arrival in Kostrzyn. After unloading our equipment, we inflate the balloon with helium and set up (and turn
on!) the <a href="http://de.gopro.com/">GoPro</a> camera, smartphone camera and GPS tracker. We seal everything inside except for
the shoe, which we attach to an outer metal structure and tie to the balloon.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c591933d3225bfafdf42d00f220bb2cc742742d1_balloon.jpg?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/27dc9d414f928ce09143b8b3b255be8057314139_ship_all.jpg?auto=compress,format"></p>
<p><strong>11:05</strong> - Prepare for launch. 3… 2… 1… and… LIFTOFF! And this time, the ship really blasts into space! We hear the
first burst of excitement from the Space Shoe team!</p>
<p><strong>11:16</strong> - We lose contact with the ship. This means that the ship has either traveled too far above the earth’s
surface to get GSM reception, or--we hope--has been launched successfully. Fingers crossed.</p>
<p><strong>11:20</strong> - The team takes a short lunch break to cool down and process our emotions.</p>
<p><strong>11:42</strong> - We begin our search journey. Although we haven’t yet restored contact with the ship, we take off in our car
toward the expected landing site - identified by the simulation we did on <a href="http://predict.habhub.org/">this cool
website</a>.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/beaa7fde44767bf04ba5a1be88aec3a13dfe3155_prediction.jpg?auto=compress,format"></p>
<p>13:47 - The ship is alive! We’ve restored contact and now the Space Shoe team releases a second burst of excitement. We
can now precisely track the ship as it soars farther and farther toward the heavens.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/936a5bd7b73dad3f9a2c58539fa7974566442f98_shoe_sms_final-1.jpg?auto=compress,format"></p>
<p><strong>16:29</strong> - Arrive at the closest point in the road near the landing site. Apparently the ship had landed in the middle
of the woods, so that when we walk over to the spot where the GPS tracker points us, we easily spot it. There it is, in
perfect shape! Commence the team’s third burst of excitement.</p>
<p><strong>16:59</strong> - Our goal achieved, we get back in our cars and start our drive back to Berlin.</p>
<p><strong>18:30</strong> - The car in front of us suddenly stops on the side of the road, and someone gets out of the back seat. He
motions for us to pull over. What is happening? Oh: It is Christian Kunert, a security engineer from Zalando’s platform
team, who has already downloaded the flight video from the GoPro camera and wants to give us a preview. We’ve
successfully captured the ship’s entire journey, as well as some beautiful footage of Earth. Cue the team’s fourth burst
of excitement.</p>
<p><strong>20:59</strong> - We arrive back at the Zalando tech office. It’s time to watch the footage and listen to some victory music,
eat some victory pizza and drink some victory beers. We’ve made it, and it feels good.
Our Space Shoe launch day was surely one of the most exciting moments that Team Space Shoe and I have had at Zalando.
Please watch our ship’s journey video below and marvel at the grandeur of our planet from the perspective of a Zalando
shoe.</p>Hack Week: Zalando 3D printing2014-12-19T00:00:00+01:002014-12-19T00:00:00+01:00Nick Muldertag:engineering.zalando.com,2014-12-19:/posts/2014/12/hack-week-zalando-3d-printing.html<p>Make your 3D printer work.</p><p>Imagine a world where everyone could print their own shoes. These guys are working on it. We're exploring 3D printing as
a new way to quickly prototype designs.</p>Behind the scenes: Zalando Space Launch2014-12-18T00:00:00+01:002014-12-18T00:00:00+01:00Nick Muldertag:engineering.zalando.com,2014-12-18:/posts/2014/12/behind-the-scenes-zalando-space-launch.html<p>Sending a shoe into space is no easy task. But we did it!</p><p>Sending a shoe into space is no easy task, in just a few days a handful of developers, engineers, and product managers
created this prototype that they will launch tomorrow! Here we take a behind the look at the team and the equipment the
guys are using to send their shoe into space! Up, up and away!! Get the latest updates from the <a href="http://thespaceshoe.com/">official space shoe
website</a></p>Hack Week: 3D Item View with cardboard like Virtual Reality Kit2014-12-17T00:00:00+01:002014-12-17T00:00:00+01:00Nick Muldertag:engineering.zalando.com,2014-12-17:/posts/2014/12/hack-week-3d-item-view-with-cardboard-like-virtual-reality-kit.html<p>Hack Week: Virtual reality is all the rage right now.</p><p>Virtual reality is all the rage right now. And we have tons of spare Zalando cardboard boxes laying around. These
creative folks put two and two together and are exploring new ways Zalando can use virtual reality.</p>Hack Week: Ask Zalanda2014-12-17T00:00:00+01:002014-12-17T00:00:00+01:00Nick Muldertag:engineering.zalando.com,2014-12-17:/posts/2014/12/hack-week-ask-zalanda.html<p>Using Artificial Intelligence to create Zalanda.</p><p>Customer satisfaction is key to Zalando. We're always looking at new ways to improve our service. This group uses
Artificial Intelligence to create Zalanda, a friendly voice which you can take with you wherever you go.</p>Hack Week: A Short Introduction2014-12-16T00:00:00+01:002014-12-16T00:00:00+01:00Daniel Nowaktag:engineering.zalando.com,2014-12-16:/posts/2014/12/hack-week-a-short-introduction.html<p>Hack Week 3 starts today! Interested in what will happen the next days?</p><p>Hack Week 3 starts today so it is now my pleasure to give you a short introduction what Hack Week is and what will
happen at Zalando Technology in the next days.</p>
<h2>The idea</h2>
<p>If I have to summarize Hack Week in one sentence, I’d say it is similar to the Google Friday but here we have it for an
entire week. The topics range from new digital shopping experiences like <a href="http://tech.zalando.com/blog/hack-week-taking-the-shopping-experience-to-the-next-level/">virtual reality dress
rooms</a>, trying on clothes
digitally, robots, delivery by drones, and <a href="http://tech.zalando.com/blog/hack-week-zalando-3d-printing/">3D-printing</a> or
experimenting with new technologies. The topics are all quite interesting but most importantly is that it's a week in
the spirit of innovation, freedom and motivation that spreads over all Zalando tech facilities.</p>
<h2>The process</h2>
<p>Three weeks before every Hack Week the ideaboard is opened and pitching begins. This is a place where everyone can share
their concepts or find projects that they would like to support. In these three weeks we additionally hold pitch
sessions where everyone can promote their projects to find other people to join their projects.</p>
<p>Then during the week all regular work stops and everybody starts hacking, programmers, designers, and product managers
all come together in the name of technology. It's not all work though with various social activities like office
mini-golf and table soccer tournaments running in parallel.</p>
<p>Cream of the crop is the week’s final presentation (and party) where every team presents their hacks. Every three
minutes a new project comes leaving you speechless. At the end the best teams get awards in different categories like
best hardware project, most innovative project, best teamwork and many more.</p>
<p>So that was a short introduction, now I have to join my team as well. ;-)</p>
<h2>Our Hack Week Video</h2>Hack Week: Fashion Meets Tech - Smart Wearables2014-12-12T00:00:00+01:002014-12-12T00:00:00+01:00Nick Muldertag:engineering.zalando.com,2014-12-12:/posts/2014/12/hack-week-fashion-meets-tech---smart-wearables.html<p>Fashion designers meet engineers!</p><p>Fashion designers meet engineers! Our fashion designers from zLabels join hands with our engineers to take on smart
wearables. If smart wearable electronics, circuit boards and soldering is your thing be sure to take a look at how one
team tackles this futuristic problem.</p>HACK WEEK: Reverse Engineering with Zalando parcels2014-12-06T00:00:00+01:002014-12-06T00:00:00+01:00Carina Kuhrtag:engineering.zalando.com,2014-12-06:/posts/2014/12/hack-week-reverse-engineering-with-zalando-parcels.html<p>It's Hack Week again and here's all about our cardboard furniture project.</p><p>Usually you can identify the best Zalando customers among your colleagues by the amount of zalando boxes they collect
under their desks :) Some creative colleagues got inspired at the sight of stockpiled parcels and saw more in them than
just packages. They started the hackweek project “cardboard furniture”in which they are currently building stuff out of
old cardboard boxes. If you visit them in their creative space you already get to see variant pieces of art like a
cardboard deer head, a T-Rex skeleton, a chair or a coffee table. Their belief: “everything that works with wood, also
works with cardboard.”</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d7192887c2b79909c9a17cc7de9b0ce9b51aeb1f_14204312460_c9c50a0517_b.jpg?auto=compress,format"></p>
<p>Big ideas start with small prototypes</p>
<p>Around ten people are gathered in the DIY-corner of the Zalando office at Mollstraße, cutting templates and puzzling
over stencils. Some of the team members downloaded instructions and templates for woodwork, others just build what comes
to their mind.</p>
<p>So obviously, HackWeek is not only about coding and building features for Zalando’s customers, but about being creative
and having fun. When you take a closer look at their cardboard art development processes however, the team’s
technological background becomes visible. Being the techies they are, every piece is build as a miniature prototype
first to see how it will look like before going to production stage and reverse engineering is applied to figure out how
to copy the cool stuff they found :) The models look pretty promising so far and we will keep you updated. Follow us on
our <a href="https://www.flickr.com/photos/zalandotech/">flickr</a> page or via twitter (
<a href="https://twitter.com/ZalandoTech">@ZalandoTech</a>)</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/64ceab961ddea64ed31d482142b98899f526af16_14374666706_cfff478fe6_b.jpg?auto=compress,format"></p>
<p>Work in progress</p>HACK WEEK: The Great Unpacking Experience2014-12-06T00:00:00+01:002014-12-06T00:00:00+01:00Rushil Davetag:engineering.zalando.com,2014-12-06:/posts/2014/12/hack-week-the-great-unpacking-experience.html<p>How dfo you feel when you receive your favorite shoes from Zalando?</p><p>How do you feel when you receive your favorite shoes from Zalando? I am sure you would be very excited to put them on as
quickly as possible. But, in order to get those shoes out of the box, you shall also unpack the Zalando box. What if
Zalando makes you happier and catches your emotions also when you unpack that box?
A team, at Hack Week, is working hard to find out what would be the best unpacking experience for Zalando customers. The
idea is to make them feel special, sometimes also by surprising them, when they get their favorite items out of Zalando
boxes. This experiment also shows how creative and innovative we are at Zalando where we put our efforts in tiny details
and greater user experience.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e9054fa79ba3d6ad053e2e14a8ee86e403328c13_1234.jpg?auto=compress,format"></p>
<p>Customer is the king</p>
<p>The team members started this project with design thinking in focus, to find out how Zalando can improve its current
unpacking experience. They gathered details on various box types, different kind of package content and also evaluated
technical feasibility for the project. They spoke to the internal experts from logistics and also interviewed some
customers.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/402e8808621d4da8c2b1a98e80dd688a05f5b80a_12345.jpg?auto=compress,format"></p>
<p>“Paula” to keep screaming with joy</p>
<p>After that, they created a user persona, popular with design thinking methodology, called “Paula”. They started
brainstorming and experimenting with different ideas, like adding fragrances or putting puzzles inside the box, which
could make “Paula” happy when she unpacks that box. They created five different prototypes and they tested them with
Zalando users within the company. They asked users to give “+1” to the prototypes they like and also give their overall
feedback. In the end, they will analyze these results in order to find out the best unpacking experience for Zalando
customers.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/78a30cc0c80aa5a18ef59cfc01ed0343100ce03d_156.jpg?auto=compress,format"></p>
<p>Surprise surprise!!!</p>
<p>As excited as the team is, I would expect our users to say “Ohh nice! I would like that again!”, when they unpack their
Zalando boxes. Of course, all these to keep “Paula” screaming with joy!!!</p>Zalando Hack Week - Making Innovation Visible2014-12-02T00:00:00+01:002014-12-02T00:00:00+01:00Bastian Gerhardtag:engineering.zalando.com,2014-12-02:/posts/2014/12/zalando-hack-week---making-innovation-visible.html<p>One week to brainstorm and execute your own ideas without limits!</p><p>Five years of extremely fast growth and continuous success lie behind us. Always forward-focused, we hardly had time to
look left or right. But for one week in late December, our Technology department made a full stop and let the year wind
down with playful innovation and experimentation. Say “Hi” to Hack Week – one week for over 400 people to brainstorm and
execute their very own ideas without limits across teams, functions and several office locations.</p>
<p>Our software release train and the project business halted for an entire week. Collaborative hacking took the place of
releases and project launches. The term “hack” thereby did not refer to hackers and was not limited to software or
hardware development. Instead, anything in one way or another contributing to Zalando’s core business or the improvement
of our everyday office life was welcome.</p>
<p>In order to push the creativity within the teams, eight beautifully handmade trophies were announced early on and then
awarded to the best projects at the closing event (see below for a list of the award categories and final winners).
Furthermore, a first among our established prototyping processes, the <a href="https://en.wikipedia.org/wiki/Design_thinking">design
thinking</a> method was utilized extensively by some of the teams to better
understand the context of a problem. This systematic innovation process enabled us to generate more meaningful insights
and creative solutions.</p>
<p>Hack Week was accompanied by numerous side shows, including daily warm-ups, video games, infusions (our platform for
various types of knowledge sharing) as well as a movie night, foosball tournament involving dozens of teams, and an
office chariot derby. On Wednesday night, to celebrate half-time achievements, we all gathered at Zalando’s Sky Lounge
over the roofs of Berlin for a little Christmas party – genuine only with almond cookies and Glühwein (traditional hot
wine punch).
Ultimately, this long week of insane prototyping concluded with the final award ceremony. In sum, the project
presentations were much more entertaining than any feature-length game show out there and created a great spirit before
everyone left into the holidays. The most memorable moments of the week were captured on camera (see video below).</p>
<h2>Impressive outcome</h2>
<p>It is fair to say that the big turnout and passion displayed for Hack Week exceeded our initial expectations. At the
outset of the event, almost 150 project proposals were given. Two-thirds of them turned into active teams. And half of
those presented their achievements on stage at the closing event. Aside from all the fun and late night action, we now
see numerous follow-up initiatives that will very likely help to improve Zalando’s customer experience, underlying
business processes and our office culture as a whole.
The award winners should provide a rough understanding about the overall quality of Hack Week’s outcome:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/efd16189651297ce04416831ba5e35eb176af49e_screen-shot-2015-05-27-at-17.42.13.png?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f1ca24cc258ea7514c876ef8864962e7c2be176f_screen-shot-2015-05-27-at-17.42.29.png?auto=compress,format"></p>
<h2>What we learned for next time</h2>
<p>Hack Week brought a lot of creative potential to the surface and helped us to reinforce Zalando’s innovation culture. We
have seen a great passion among the teams combined with an eagerness to improve various aspects of the business and to
demonstrate personal achievements. In this respect, Hack Week promoted team-building and provided an opportunity to
recognize and reward ideas for improved staff satisfaction.
For next time, however, we are planning to encourage more interaction with other Zalando departments to amplify the
overall outcome. By providing more business insights to the teams, we believe to create an even better alignment of
customer experience and cutting-edge technology.
More impressions of Hack Week are available on <a href="https://www.flickr.com/photos/zalandotech/">Flickr</a>.</p>HACK WEEK: Design Thinking Applied2014-06-12T00:00:00+02:002014-06-12T00:00:00+02:00Carina Kuhrtag:engineering.zalando.com,2014-06-12:/posts/2014/06/hack-week-design-thinking-applied.html<p>Learn about Design Thinking principles.</p><p>When I was walking around in the Zalando offices searching for a Hack Week project to write about, I passed by a meeting
room that instantly caught my attention. The door to the room had been left open and the team seemed to have gone out
for lunch. I had actually already decided what to write about and wanted to visit the team that I had looked up in our
Hack Week Wiki and interview them.</p>
<p>But then I stopped and took a closer look at the room. The walls were full of colored post-it’s, posters, work-flows and
other sorts of map-like visualizations. I tried to figure out what this project could be about, but all I understood was
that the team had made some effort in trying to understand what Zalando customers want. I saw interview insights that
were clustered in a matrix, posters of personas and what seemed to be a detailed collection of all possible
stakeholders. This is the stuff that makes my user research heart beat faster.</p>
<p>Since it was lunchtime and the room was deserted I started asking around if somebody knew what kind of project was
taking place here and eventually I found two project team members that had just come back from lunch and had 10 minutes
time to give me a brief intro to the project. So apparently the purpose of the product is to create a new shopping
experience with the Zalando catalog filters. With the support of a Design Thinking coach they are using the four days of
Hack Week to come up with ideas for the experience, they want to design for. And what made me really happy was that they
involve users from day 1. They showed me their schedule for the four days of Hack Week and explained how the Design
Thinking principles are applied in their project process which can be summarized as “Understand, Observe,
Synthesize/Ideate, Prototype and Test”. They told me how they started on Tuesday with a detailed mapping of their
stakeholders and went on with recruiting and (telephone-) interviewing different types of customers. They also collected
internal information from Zalando colleagues and market research findings. When I was talking to them they had just
finished the first two phases and were about to start synthesizing their data. They invited me to join the session and I
understood that now was the time to aggregate all the insights they gathered to some sort of leading question which in
turn will form the basis for the next phase.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/6772a06d25dc1fe0b6ac87045a4811a03c8d7245_14404953622_6a7c1f74c8_b.jpg?auto=compress,format"></p>
<p>As soon as the team was complete after lunch break they started a discussion in which they recapped their learnings from
the stakeholder mapping, telephone interviewing different customer types and tapping other sources of information with
the goal of finding out as much as possible about the customers they want to design for. They had already figured out
the biggest needs and fears of their typical customer and now the challenge was to combine this to a statement for the
following design phases. At this point the so-called “how-might-we”-phrase comes into play. This seems to be a simple
but powerful technique, as it brings problem-solving in the focus of design. It reads: “How might we help _______
(name) to _____ (need), _______ (problem/barrier)”.
As the team already discussed and agreed on the motivations, needs, fears and possible problems of their Zalando persona
it didn't take long until they came up with a good “how-might-we” phrase and were ready for the next phase: ideation.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7d666b9518e6f2c73bddb8de7c88a6f59c0de26a_14404953012_59c2ce8399_b.jpg?auto=compress,format"></p>HACK WEEK: Taking the Shopping Experience to the next level2014-06-11T00:00:00+02:002014-06-11T00:00:00+02:00Martin Tschitschketag:engineering.zalando.com,2014-06-11:/posts/2014/06/hack-week-taking-the-shopping-experience-to-the-next-level.html<p>Users could try-on Zalando's products with our "KINECT Virtual Dressing” project.</p><p>One of the most visible and successful projects outside Zalando was a project at last Hack Week that initially seemed
just like a nice gadget. As a result of this “KINECT Virtual Dressing (for Xbox)” project, users could try-on Zalando's
products using the sensor bar, called KINECT, which is a widely known extension for the Microsoft game console Xbox.
Fashion apparel pictures are projected on a live visual imagery of the user who can see himself on a TV screen. When the
user moves, the virtual dress is supposed to follow the body on-screen. In last Hack Week back in December, a working
prototype was created, however KINECT evolved in the meantime.</p>
<p>The first prototype of its kind was not at the forefront when it came to usability and design. It is the project team’s
current challenge to tackle the implementation of a nice user interface for the virtual dressing room.</p>
<p>This time, due to our good relations to Microsoft, we gained direct support from them and they provided the newest
pre-release version of KINECT to support this project. In turn, we will develop a presentable version and Microsoft will
show it in the event accompanying the football world cup in June in their showroom in Berlin. Also, the Zalando app
could be one of the first shopping apps for the growing entertainment platform like Xbox One to enter uncharted waters.</p>
<p>We will follow further how this this exciting highlight project develops for the Hack Week as part of this blog.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/d1073b9ca39961afff289b1ba6cc5adcdf9827dc_14211220080_be9a25575e_b.jpg?auto=compress,format"></p>
<p>KINECT is not just for gaming! We do fashion with it!</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/bc6dfad885ea224cbd5a2a338b2c3f21ec655503_14367771006_6e02f7a8c7_b.jpg?auto=compress,format"></p>
<p>Ahh!! Good idea. Let's discuss.</p>HACK WEEK: Let’s Hack!2014-06-10T00:00:00+02:002014-06-10T00:00:00+02:00Carsten Ernsttag:engineering.zalando.com,2014-06-10:/posts/2014/06/hack-week-lets-hack.html<p>Business as usual? Not this week! Let's Hack!</p><p>Business as usual? Re-inventing the Zalando shopping experience? Optimizing our backend and Logistics? Not this week! At
least not exactly :-)</p>
<p>Zalando’s second Hack Week has officially been kicked off. If you are wondering what the hack week is all about: It’s an
event where Zalando Technology staff creates, innovates and participates to various projects and events which are not
necessarily connected to their daily work. The emphasis is on having fun, but there is also a great side-effect: Some
projects will even make it to a production stage.</p>
<h2>Who is Working on What And Why?</h2>
<p>More than 400 Zalando techies from diverse backgrounds, like Product Managers, Developers, Quality Assurance Managers,
and Designers - just to name a few- are working on their own projects. In preparation of this event, people came up with
creative ideas and drew out more than 80 projects during the last couple of weeks and they are currently being
developed.</p>
<p>If you go through Zalando Technology office now, you will find cross-functional as well cross-departmental group of
geeks coding, painting, playing, doing whatsoever to achieve their project visions and goals.</p>
<p>Of course it will be a lot of fun but with that, some techies would also go for a victory. On coming Friday, trophies in
ten different categories will be awarded to the best project teams. To make sure that trophies go to the deserving
teams, a distinguished jury will decide on who will finally climb the winner’s podium. We will keep you posted on this.</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/f7be245b29b121b2f278864b45fc95ee60db5b2c_1.jpg?auto=compress,format"></p>
<p>And it started cooking @Hack Week: Ideas, innovations, technology and fun</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/68c5f309b0f880bba36176ae8c6a493dfb30bdfc_2.jpg?auto=compress,format"></p>
<p>Design thinking in action</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0f5e089d67d1b82045139ac0895068ac0483e51a_3.jpg?auto=compress,format"></p>
<p>And of course have fun with great smile! Zalandoites have it all.</p>
<p><a href="http://tech.zalando.com/posts/grand-prix-de-la-hack-week.html">We keep you updated on this</a></p>Writing Python command line scripts2014-03-31T00:00:00+02:002014-03-31T00:00:00+02:00Henning Jacobstag:engineering.zalando.com,2014-03-31:/posts/2014/03/writing-python-command-line-scripts.html<p>Python is great for writing command line scripts.</p><p>Python is great for writing command line scripts and we use it a lot for internal tools and scripts at Zalando. Before
extending a three line Bash script I usually rethink and implement it in Python. This post summarizes some conventions
and best practices I recommend.</p>
<h2>Command Line Options</h2>
<p>Do you know the command line options of GNU tar? Probably not all of them. Does <em>-v</em> just show the version or does it
enable verbose mode? Defining some standard options avoids confusion and will let you focus on the more important
aspects: writing the actual script logic. I use the following standard options for my scripts:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/a0f6c7dfc468545dd673a9b1c336b8f8b0794e40_screen-shot-2015-05-27-at-18.02.44.png?auto=compress,format"></p>
<p>The Python <em>argparse</em> module is excellent for command line option parsing:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/c93afb5296aea332bdd9c3f05156a313736eddce_screen-shot-2015-05-27-at-18.02.51.png?auto=compress,format"></p>
<p>By using <em>add_mutually_exclusive_group</em>, the user is prevented from accidentally specifying both <em>-v</em> and <em>-q</em> (they
would contradict each other).</p>
<p>There is something to watch out if you are defining command line options: Often you will need some sensitive data passed
into your script, e.g. user and password to connect to a database. As command line options are visible in shell history,
in CRON logs, CRON mails and even remote via SNMP (!) <strong>it should never be necessary to pass credentials via command
line options!</strong> To avoid passing passwords on the command line you can either:</p>
<ul>
<li>require a config file for your script --- but a config file should only be used if the configuration is complex
enough</li>
<li>use some other authentication mechanism (e.g. Kerberos) --- this is often not possible</li>
<li>use an existing credentials store (e.g. ~/.pgpass for PostgreSQL when using
<a href="https://pypi.python.org/pypi/psycopg2">psycopg2</a>) --- if you can, use this solution (esp. for psycopg2)</li>
<li>allow passing passwords via special files</li>
</ul>
<p>I'm often using the last approach by allowing to load a password from file if the option value starts with "@":</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/0fb4c99a59145ef077c4d805220862284028db37_screen-shot-2015-05-27-at-18.09.48.png?auto=compress,format"></p>
<p>Another pitfall comes when printing/logging options passed to the script. It should be a matter of course not printing
complete database connection strings or similar sensitive information.</p>
<h2>Logging</h2>
<p>Sprinkling your script with <em>print</em> statements is a bad idea. By using the standard <em>logging</em> module we get log levels,
string formatting and exception printing for free:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4bf7f80bf07916a574eb6b7dd57880c0c51ecc7b_screen-shot-2015-05-27-at-18.09.57.png?auto=compress,format"></p>
<p>I recommend the following guidelines for log levels:</p>
<p>DEBUG
Information about the script's execution details normally not necessary to see. DEBUG lines will be printed/logged if
--<em>verbose</em> <em>(-v)</em> command line argument is used.</p>
<p>INFO
This should be the default expected output of your script. The script's main tasks should be appropriately reflected by
INFO log lines.</p>
<p>WARN
Warnings should be "fixable" by the user. WARN log entries would also be shown in CRON mails for CRON command line
scripts, i.e. they should be fixed (for consistency) but have no real impact. WARN log entries should be printed even if
the <em>--quiet (-q)</em> command line flag is used.</p>
<p>ERROR
Every error state requiring the user's attention and potentially preventing the successful script completion.</p>
<h2>Header and Structure</h2>
<p>By adding the right shebang we can make the script executable (still needs chmod <em>+x</em> of course). The encoding is
important for string literals:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/e4042aa974a2024bb10d883806e5430bcd9d2552_screen-shot-2015-05-27-at-18.10.06.png?auto=compress,format"></p>
<p>By using a docstring instead of a regular comment we can easily reuse it in different places, e.g. we can pass it as a
description parameter to the <em>ArgumentParser</em> class.</p>
<p>Split your main script logic from argument parsing and use the <em>__name__</em> check to allow importing your script:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/4d626efc50d4a787d20c885a43d98e1731969bc1_screen-shot-2015-05-27-at-18.10.12.png?auto=compress,format"></p>
<p>Now you can use the standard python repl or <a href="http://ipython.org/">ipython</a> to import and test your script:</p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/84ba25146db5854d6463e1e5727fd352db1f8b57_screen-shot-2015-05-27-at-18.10.29.png?auto=compress,format"></p>
<p>DOs and DON'Ts</p>
<ul>
<li>DO use the <em>argparse</em> module</li>
<li>DO allow specifying all configurations via arguments (if they are not overly complicated)</li>
<li>DO use the logging module and follow <em>logging</em> guidelines</li>
<li>DO check your code with <em>pyflakes</em></li>
<li>DO format your code according to <a href="https://www.python.org/dev/peps/pep-0008/">PEP8</a></li>
<li>DO use meaningful return codes <em>(sys.exit(retcode))</em></li>
<li>DON'T (never!) pass sensitive credentials <em>(passwords)</em> via command line options</li>
<li>DON'T (never!) print information which could contain sensitive information (e.g. database connection strings)</li>
<li>DON'T use print statements, use standard logging instead</li>
<li>DON'T use old-style string formatting (% operator), use built-in logging format strings or <em>"{}".format(..)</em>.</li>
</ul>
<p>For pyflakes and code formatting (PEP8-like) I use my <a href="https://github.com/hjacobs/codevalidator">codevalidator.py</a>
script.</p>
<h2>Example Script</h2>
<p><a href="http://tech.zalando.com/listings/example-command-line-script.py.html">example-command-line-script.py</a></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/7df93b6e0eb3405eba9fb1b6bfdaa2d184405f9c_screen-shot-2015-05-27-at-18.22.40.png?auto=compress,format"></p>
<p><img alt="null" src="https://images.prismic.io/zalando-jobsite/ef259fe30c2f95b288128be218e4130100cdbbc1_screen-shot-2015-05-27-at-18.22.54.png?auto=compress,format"></p>