Direct Data Sharing using Delta Sharing - Introduction: Our Journey to Empower Partners at Zalando
In this post, we explain how we transformed fragmented partner data sharing at Zalando by implementing Delta Sharing, evolving from a pilot solution to an organization-wide platform that enables real-time, secure data access across our partner ecosystem.
The Challenge That Started It All
Picture this: You're a partner working with Zalando, trying to understand how your products are performing on one of Europe's largest fashion platforms. You need insights to make strategic decisions about inventory, pricing, and assortment planning. But instead of getting seamless access to the data you need, you find yourself juggling multiple systems, formats, and manual processes just to piece together a coherent view of your business performance. This was the reality our partners faced, and it was a problem we couldn't ignore.
At Zalando's Partner Tech division within our Data Foundation pillar, we share data & insights to partners across three distinct business models to steer their business:
- Traditional wholesale relationships where Zalando purchase and resell products,
- Partner Program enabling direct-to-consumer sales, and
- Connected Retail linking brick-and-mortar stores to our platform.
Each of these partnerships generates valuable data, but getting that data into partners hands in a useful format had become a significant challenge.
This introduction article will cover the following parts:
- Brief overview of the problem statement
- Brief overview of existing solutions and partner needs
- Journey of identifying a potential solution
- Why did we choose delta sharing
- From pilot to platform
- Lessons learned
- Looking ahead
This article will not cover:
- Delta sharing in-depth explanation of its technical architecture
- Databricks and Unity Catalog capabilities
The Wake-Up Call: Understanding the Real Impact
Our journey began with what seemed like routine partner interviews, but the conversations quickly revealed a sobering reality. Through months of discussions, we identified critical pain points undermining our partner relationships:
The Fragmented Data Landscape forced partners to juggle SFTP transfers, CSV downloads, self-service reports, and API calls. Each method served a purpose, but together they created a complex web requiring expertise across multiple systems just to get a complete business view.
Manual Data Processing had become a hidden tax, partners were allocating 1.5 FTE per month solely for data extraction and consolidation. Strategic talent was stuck wrestling with data formats instead of analyzing trends and making business decisions.
Limited Data Accessibility meant our UIs weren't designed for heavy data downloads that sophisticated partners needed. Time restrictions on data availability often blocked access to historical information during critical planning cycles.
KPI Inconsistency was eroding trust. The same metric could show different values depending on which system partners used, leading to confusion and hesitation to rely on our data for important decisions.
The Monetization Opportunity was clear every hour partners spent on manual processing was an hour not spent on strategic activities that could grow their business and ours.
Beyond addressing these pain points, we recognized a significant opportunity. As the owner of a vast volume of commercial data across Europe's fashion ecosystem, Zalando is uniquely positioned to unlock our partners' full potential. Rather than simply fixing data access issues, we could transform how partners leverage insights to grow their businesses and strengthen our collaborative relationships.
Mapping the Partner Landscape
As we dug deeper, we realized that our "one-size-fits-all" approach wasn't serving anyone well. Our partner ecosystem spans thousands of active partners, from small retailers managing a few hundred SKUs to major brands with catalogs exceeding tens of thousands of products. Data volumes vary dramatically, some partners work with megabytes of weekly sales data while others require terabyte-scale historical datasets for strategic planning. In total, we manage 200+ datasets with sizes ranging up to 200TB, and the usage of these data assets helps steer our >โฌ5 billion GMV commercial partner platform business.
Large partners operated like well-oiled machines, seeking programmatic access through secure, automated pipelines. They had the technical sophistication to handle complex integrations but needed the data to flow seamlessly into their existing analytics infrastructure.
Medium-sized partners lived in a hybrid world, comfortable with dashboards and periodic data pulls but not necessarily equipped for real-time streaming solutions. They needed flexibility without overwhelming complexity.
Small partners often relied on familiar tools like spreadsheets and required ad-hoc access to specific datasets. For them, simplicity and accessibility trumped technical sophistication.
Meanwhile, the data requirements were equally diverse. Some partners craved real-time insights to react quickly to market changes, while others needed comprehensive historical datasets for long-term trend analysis. Some required incremental updates to keep their systems synchronized, while others preferred batch processing aligned with their internal workflows. Our existing solutions - APIs, SFTP, S3 buckets, and email, each addressed some of these needs but none provided a comprehensive answer. We were solving point problems while missing the bigger picture.
The Quest for a Better Solution
Armed with this understanding, we embarked on a systematic search for a solution that could address our partners diverse analytical needs without creating yet another siloed system. We knew we needed something that would stand the test of time and scale with our growing partner ecosystem.
Our evaluation criteria were ambitious but necessary. The solution needed to align with Zalando's broader data strategy while being cloud-agnostic enough to work with partners' varied infrastructure. It had to support the full spectrum of partner ecosystems, from small businesses running on spreadsheets to enterprise operations with sophisticated data pipelines.
Performance and scalability were non-negotiable, we needed to handle terabyte-scale datasets efficiently. Security couldn't be an afterthought; we required granular access controls, data encryption, and comprehensive auditing capabilities. The solution also needed to support the full range of data access patterns our partners required: real-time streaming, batch updates, incremental and delta changes, and historical analysis.
Perhaps most importantly, we needed something that wouldn't lock us into a corner. The solution had to be extensible and compatible with open tools, ensuring our partners could integrate it with their existing workflows rather than forcing them to adopt entirely new processes.
Discovering Delta Sharing: The Game Changer
Our research led us to Delta Sharing, and the more we learned, the more excited we became. Here was an open protocol specifically designed for secure data sharing across organizations, exactly what we needed. But it wasn't just the technical capabilities that caught our attention; it was the philosophy behind it.
Delta Sharing promised zero-copy access to data, meaning partners could work with live datasets without the overhead of constant data transfers. It supported access through programmatic interfaces, business intelligence tools, and yes, even spreadsheets, covering all our partner segments in one solution. The protocol could handle massive datasets efficiently while maintaining security through design, not as an add-on feature.
When we discovered Databricks' managed Delta Sharing service, the decision became clear. While we appreciated the open-source nature of the protocol, the managed service offered something invaluable: the operational excellence we needed for a production system serving critical partner relationships.
The managed solution provided robust governance through Unity Catalog integration, built-in security features, comprehensive audit logging, and most importantly, it freed our team from the operational overhead of maintaining yet another infrastructure component. We could focus on delivering value to partners rather than troubleshooting servers.
The architecture was elegantly simple yet powerful. Partners could access shared data through token-based (what we are supporting for the initial phases) authentication combined with credential files, providing security without complexity. The system supported both open sharing for all partners and Databricks-to-Databricks sharing for the partners who already using databricks in their data landscape, giving us flexibility as our needs evolved.
Taking the First Steps: Our Proof of Concept
Being the first team at Zalando to implement Delta Sharing meant we were venturing into uncharted territory. We approached this with the methodical mindset that had served us well in identifying the problem: careful testing, thorough evaluation, and honest assessment of limitations.
However, we didn't tackle this challenge alone. Success required close collaboration with key stakeholders across Zalando's technical organization. Our central Data Foundation team provided crucial guidance on Unity Catalog integration and governance frameworks, helping us understand how Delta Sharing would fit into Zalando's broader data architecture. Their expertise proved invaluable in navigating the complexities of our existing data infrastructure.
Equally important was our partnership with the AppSec and IAM team. Given that we were essentially creating new pathways for external data access, security considerations were paramount. The teams helped us evaluate authentication mechanisms, assess potential security vectors, and ensure our implementation met Zalando's stringent security and auth identity standards from the ground up.
We conducted a comprehensive proof of concept to understand both the capabilities and constraints of Delta Sharing in our specific environment. This collaborative approach allowed us to identify critical limitations early and develop mitigation strategies.
Our POC revealed both the promise and the practical challenges of implementation. The integration with Unity Catalog, while powerful, introduced operational complexities around permissions and access management that required careful coordination with our Data Foundation colleagues. The lack of self-service APIs for token management meant we initially had to handle partner onboarding manuallyโnot ideal for scale, but manageable for our pilot phase with AppSec's guidance on secure token distribution.
These discoveries didn't discourage us; they informed our implementation strategy and strengthened our cross-team relationships. Every pioneering effort encounters obstacles, and having the right collaborative framework allowed us to turn these challenges into learning opportunities that would benefit future implementations across Zalando.
Simplified Architecture: How It Works at Zalando
With our proof of concept validated, we moved forward with a streamlined architecture that demonstrates the core principles of Delta Sharing.
Understanding Delta Sharing Terminology:
- Delta Share: A logical container that groups related tables for secure distribution to external recipients
- Recipient: A digital identity representing each partner in our Delta Sharing system
- Activation Link: A secure URL that allows partners to download their authentication credentials
Step 1: Data Preparation and Centralization
We prepare datasets based on partner needs and store them in a scalable storage system. These are then cataloged in a central metadata and governance layer, which ensures consistency, control, and acts as a single source of truth.
Step 2: Access Configuration
We create access points (recipients) for each partner and assign the appropriate permissions. These access points act as logical groupings for related data, allowing for secure and organized distribution. Each access point generates unique link, which is then securely provided to the respective partner.
Step 3: Direct data Access
When partners receive their activation link, they use it to establish a secure connection to the data distribution system. Once authenticated, partners can make direct requests to access the underlying data.
This approach delivers several key benefits:
- Partners get direct access to live data without the overhead of data copying.
- The authentication mechanism ensures security through time-limited, partner-specific access credentials.
- And because the data remains in its original location, we avoid the storage duplication and ongoing synchronization challenges.
Implementation steps: Simplified
A typical steps involved in sharing datasets externally
1. Prepare the final datasets (delta tables) via Unity Catalog.
2. Create a 'Share' (logical container).
3. Add the 'delta tables' to the โShareโ.
4. Create a recipient for each partner.
5. Grant permissions to the recipient for accessing the share.
Databricks provides an extensive documentation to understand the Delta sharing technical details and its APIs to build the solution based on Delta Sharing.
Bridging the Gap: Making Partner Adoption Seamless
Building an elegant technical solution was only half the challenge, the other half was ensuring our partners could actually use it effectively. We developed comprehensive user guides with step-by-step instructions for accessing shared data through familiar tools like Pandas and Apache Spark.
The guides included practical examples and troubleshooting scenarios, enabling partners to go from receiving their activation link to pulling their first dataset in minutes. By providing clear documentation for Delta Sharing connector APIs, partners could integrate our data directly into their existing analytics pipelines without disrupting established workflows.
From Pilot to Platform: The Ripple Effect
Word of our Delta Sharing pilot began spreading through Zalando's internal networks, generating inquiries from teams across the organization. Other departments working with partners started reaching out, recognized the potential for their own data sharing challenges.
This interest validated our approach and presented an opportunity to avoid fragmentation. Rather than having each team build their own implementation, we collaborated to evolve our solution into a comprehensive platform for recipient management across Zalando.
Building the Platform: From Solution to Service
This realization sparked our next evolution: transforming our pilot into a comprehensive platform for recipient management across Zalando. Instead of being a single-use solution for Partner Tech, we're building the infrastructure that will enable any team at Zalando to implement secure, scalable data sharing through Delta Sharing.
We're not just building technology; we're building expertise. Our platform includes comprehensive guidance for teams preparing their datasets, ensuring they align with platform expectations and can scale effectively. We're codifying the lessons we learned during our proof of concept and pilot phases, transforming our hard-won knowledge into reusable best practices.
As we scale beyond our initial partner use case, we're looking into making data access for partners more efficient by exploring Databricks OIDC federation capabilities. This would allow some partners to directly access their data, protected by their own identity infrastructure and without generating an intermediate token.
The Challenges of Scale
Scaling from a single-team pilot to an organization-wide platform brings its own set of challenges. We're not just multiplying our current solution; we're reimagining it for diverse use cases we haven't fully explored yet. Different teams have different data governance requirements, varying security constraints, and unique integration needs.
The technical architecture that worked for our Partner Tech use case needs to be flexible enough to accommodate everything from real-time operational data sharing to periodic analytical exports. We're essentially building a data-sharing platform that can serve as the foundation for multiple teams while maintaining the performance, security, and reliability standards each team requires.
This expansion also means deeper collaboration with Zalando's data governance frameworks. As more teams adopt Delta Sharing through our platform, we need to ensure consistent policies around data access, audit trails, and compliance reporting. The platform needs to be sophisticated enough to handle complex multi-tenant scenarios while remaining simple enough that teams can adopt it without extensive training.
Lessons learned: Key Insights Of The Delta Sharing Journey
Our transformation from fragmented data sharing to a unified Delta Sharing platform taught us valuable lessons that extend beyond technical implementation.
Start with Deep Partner Understanding, Not Technology
Our biggest revelation was that the technology choice wasn't the starting point, it was the outcome of truly understanding our partners' pain points. The months we spent in partner interviews weren't just research; they were the foundation of everything that followed. The 1.5 FTE per month that partners were spending on manual data processing represented strategic talent being wasted on operational tasks.
One Size Doesn't Fit All And That's Okay
We needed one platform that could serve different partner segments in different ways. Large partners needed programmatic access, medium partners wanted flexibility, and small partners required simplicity.
Cross-Team Collaboration Is Non-Negotiable
Being the first team at Zalando to implement Delta Sharing taught us that pioneering new technology requires strong partnerships across the organization. Our success depended entirely on collaboration with the Central Data Foundation team for Unity Catalog expertise and the AppSec team for security guidance and the IAM team for identity&auth guidance.
Manual Processes Are Okay for Pilots, But Plan for Scale
Our initial manual token management approach worked fine for our pilot phase, but we quickly realized it would become a bottleneck as we scaled. We treated this as a learning opportunity that informed our platform development priorities every manual step in our pilot became a feature requirement for our platform.
Internal Demand Validates External Value
The unexpected internal interest in our Delta Sharing platform was one of our most important validation signals. When teams across Zalando started asking how they could leverage similar capabilities, we knew we had built something with broader applicability than our original scope.
Security and Governance Can't Be Afterthoughts
Working with the AppSec and IAM team from the beginning taught us that security considerations need to be embedded in the architecture from day one. The time we invested in understanding authentication mechanisms and access controls upfront saved us from significant refactoring later.
Documentation Is a Product Feature
Our comprehensive user guides weren't just nice-to-have documentation, they were critical product features that determined adoption success. Partners needed to go from activation link to first data pull in minutes, not hours.
Operational Excellence Matters More Than Perfect Technology
Our decision to use Databricks' managed Delta Sharing service rather than building our own implementation reflected a crucial lesson: operational excellence often trumps technical purity. The managed service freed us to focus on partner value rather than infrastructure maintenance.
Looking Ahead: The Future of Partner Data at Zalando
Our journey from solving a specific partner data problem to building an organization-wide data-sharing platform illustrates something important about innovation: the best solutions often have applications far beyond their original scope. What began as a focused effort to reduce partner frustration with fragmented data access has evolved into a cornerstone of Zalando's data-sharing strategy.
As we continue building this platform, we're guided by the same principle that led us to Delta Sharing in the first place: deep understanding of user needs. Whether those users are external partners trying to optimize their product performance or internal teams seeking to collaborate more effectively, the fundamental challenge remains the same, getting the right data to the right people at the right time with the right level of security.
The shift from fragmented, manual data processes to seamless, real-time data sharing represents more than a technical upgrade, it's a fundamental change in how we enable data-driven decision making across our entire ecosystem. By reducing friction in data access, we're not just improving operational efficiency; we're creating new possibilities for insight and collaboration that didn't exist before.
Our commitment to continuous evolution means this story is far from over. As we gather feedback from the growing community of internal users and external partners, we'll continue iterating on both the technology and the processes around it. The future of data sharing at Zalando isn't just about better technology, it's about better partnerships, more informed decisions, and ultimately, better experiences for the millions of customers who rely on our platform every day.
We're hiring! Do you like working in an ever evolving organization such as Zalando? Consider joining our teams as a Data Engineer!