Architecture April 12, 2026 8 min read

Enterprise Medallion Architecture: Reduce Storage Costs 50-70% and Eliminate 30-40% Data Duplication

Workspace isolation eliminates the 30-40% storage duplication that breaks most medallion implementations. Here's how to architect for 50+ teams without query performance collapse—proven patterns from Slalom's Fortune 500 engagements.

Gastón Cruz Dual Microsoft MVP — Data Platform & AI

Most medallion guides stop at Bronze/Silver/Gold. But enterprise implementations fail at workspace isolation—adding 30-40% storage duplication and doubling governance delays. This guide covers what actually breaks at scale and how to fix it.

What's missing is the nuance — how to structure lakehouses, when to use shortcuts vs. pipelines, how Delta handles schema evolution, and what breaks when you have 50 teams writing to the same lakehouse. That's what this guide covers.

What Medallion Architecture Actually Is

Medallion (also called multi-hop) is a pattern for organizing data through progressive quality layers:

Bronze — Raw, immutable data exactly as it arrived from the source. No transformations. No schema enforcement. Think of it as your audit log.
Silver — Cleansed, validated, deduplicated data. Schema enforced. Business keys resolved. This is your reliable foundation.
Gold — Business-ready aggregations, denormalized tables, and serving layer datasets optimized for specific use cases (reporting, ML, APIs).

In Microsoft Fabric, each layer typically maps to a separate Lakehouse item within a workspace — though there are valid reasons to deviate from this, which we'll cover.

Workspace Design: Central Isolation Prevents 30-40% Storage Duplication

This is the first decision most teams get wrong. The common mistake is creating one workspace per team or department, then trying to share data across workspaces via exports—a pattern that typically increases storage costs 30-40% and lengthens governance audits 2-3x. The result: data duplication, consistency problems, and no single source of truth.

The pattern we use at enterprise scale:

A Central Data Platform workspace — Contains Bronze, Silver, and core Gold lakehouses. This is the authoritative layer. Controlled access. Owns the master data.
Domain workspaces — Finance, HR, Operations, etc. Each has its own Gold lakehouse built on top of the central Silver layer using OneLake shortcuts (more on this below). Teams own their domain models but don't duplicate raw data.
Development/Staging workspaces — Mirror of production for testing. Git-integrated for Fabric items via deployment pipelines.

This structure gives you enterprise governance (central Bronze/Silver with strict access) while enabling team autonomy (domain Gold layers). The central team doesn't become a bottleneck for every data request.

OneLake Shortcuts: Reduce Storage Costs 50-70% Without Copying Data

OneLake Shortcuts eliminate data duplication across lakehouses—reducing storage costs 50-70% while cutting ETL pipeline complexity. Reference data in another lakehouse, storage account, or external source like S3 or ADLS Gen2 without copying it; Fabric reads it as if it's local.

This changes the medallion pattern significantly:

Silver layer data can be exposed to domain workspaces via shortcuts — domain teams query Silver data without it being duplicated
External data sources (Azure Blob, S3, ADLS Gen2, Google Cloud Storage) can be mounted directly into your Bronze lakehouse — no ingestion pipeline needed for the initial load
Dataverse data can be shortcutted into Fabric directly, enabling Fabric analytics on Power Platform operational data

The catch: shortcuts are read-only from the target. If domain teams need to write enriched data back, they need their own writable tables in their domain lakehouse — not the shortcut.

Delta Lake and Schema Evolution: Prevent Breaking Changes at Enterprise Scale

All Fabric lakehouses store data in Delta format. This isn't just a file format choice — Delta is what makes the medallion pattern work reliably at scale.

The features you'll lean on:

ACID transactions — Multiple pipelines can write to the same table without corruption. Critical when you have parallel ingestion streams landing in Bronze.
Schema evolution — Source systems change their schemas. Delta handles adding new columns without breaking existing queries. mergeSchema option in Spark makes this seamless.
Time travel — Every Delta table maintains a transaction log. You can query as-of-yesterday with VERSION AS OF or TIMESTAMP AS OF. This is invaluable for debugging and regulatory requirements.
Z-ordering and liquid clustering — Fabric's managed lakehouse tables support liquid clustering, which automatically reorganizes data for the most common query patterns. For large tables (100M+ rows), this can cut query times by 60–80%.

One important decision: managed tables vs. unmanaged (external) tables in your lakehouse. Managed tables live under the Tables/ path and are fully controlled by Fabric — easy to work with, show up in the SQL analytics endpoint automatically. External tables point to files in Files/ or in external storage. We use managed tables for Gold (final serving layer, well-defined schema) and external tables for Bronze (preserving raw files exactly as received).

Ingestion Patterns for Bronze

Bronze should be append-only whenever possible. You're creating an audit trail. If a source system sends bad data, you still want the bad data in Bronze — so you can debug the Silver transformation logic later.

Common ingestion patterns in Fabric:

Data Factory pipelines — Best for batch ingestion from databases, APIs, SaaS sources (Salesforce, Dynamics, SAP). Use Copy Activity for simple moves, Dataflow Gen2 for transformations during ingestion.
Eventstream — For real-time data: IoT, event hubs, Kafka. Lands directly into Bronze or bypasses Bronze into a KQL database for sub-second analytics.
Notebooks — For complex ingestion logic that outgrows Dataflow capabilities. Use PySpark for anything involving custom parsing, complex joins, or ML-based enrichment during ingestion.
Shortcuts from source — For sources where data already exists in cloud storage (ADLS, S3). Mount it directly — no ingestion job needed.

Add metadata to every Bronze table: ingested_at timestamp, source_system string, batch_id for traceability. This costs almost nothing but saves enormous debugging time.

Silver: Where Real Engineering Happens

Silver is where most of the engineering work lives. You're going from "raw data that arrived" to "data we can trust."

Standard Silver transformations:

Deduplication — Sources often send duplicate records. Delta's MERGE statement handles upserts cleanly: update if the record exists, insert if it doesn't. Define your business key clearly before you start.
Type casting and validation — Enforce column types. Reject or quarantine records that fail validation. Write rejects to a _rejected table with a reason code — never silently drop bad data.
Null handling — Standardize null representation. Decide whether nulls in source mean "unknown" or "not applicable" — this matters for aggregations in Gold.
Business key resolution — If the same entity has different IDs across source systems (customer ID in Salesforce vs. account number in SAP), Silver is where you resolve to a canonical key.
Slowly Changing Dimensions (SCD) — For entities that change over time (customer address, product price), decide your SCD type. SCD Type 2 (keep full history) is safest but requires careful key management. Delta's MERGE + time travel make this tractable.

Organize Silver notebooks into reusable patterns. We use a base transform class that handles logging, error quarantine, and Delta MERGE boilerplate — domain-specific notebooks just define the transformation logic. This pays off enormously when you have 50+ Silver tables to maintain.

Gold: Purpose-Built for Consumers

Gold layers are not generic. Each Gold table exists for a specific consumer: a Power BI semantic model, an ML feature store, an operational API. Optimize accordingly.

Common Gold patterns:

Star schema tables — Fact and dimension tables ready for Power BI import mode. Pre-computed, pre-aggregated, pre-filtered to the reporting grain. Import mode semantic models over star schema Gold tables are the fastest Power BI possible.
Wide flat tables — For operational dashboards that need low-latency DirectLake access. Denormalized, single scan, partitioned by date. DirectLake reads directly from the Delta files without import — near-real-time with near-import-mode performance.
Feature tables — For ML teams. Pre-computed features joined across multiple Silver entities. Versioned, timestamped for training reproducibility.
Aggregated summary tables — Weekly/monthly rollups for executive reporting. Materialized in Gold rather than computed at query time — much faster for large tenants.

Common Mistakes at Enterprise Scale

After implementing this pattern across retail, banking, oil & gas, and media organizations, here's what breaks:

No workspace governance — Anyone can create a workspace, so they do. 6 months in, you have 200 workspaces, no naming convention, no ownership, and Bronze data duplicated in 15 places. Define workspace governance before you start.
Skipping the Bronze immutability rule — Teams start "fixing" Bronze data directly when they find issues. This destroys your audit trail. Bronze is sacred — transformations only happen in Silver.
Single lakehouse for all layers — Putting Bronze, Silver, and Gold in the same lakehouse makes access control impossible. Separate workspaces and lakehouses per layer, linked by shortcuts.
Not partitioning Delta tables — Large tables without partitioning cause full scans. Partition by date (ingestion date for Bronze, business date for Silver/Gold). With liquid clustering, this becomes less critical, but partitioning is still cheaper for time-range queries.
No monitoring on pipelines — Notebooks and pipelines fail silently. By the time someone notices a report is stale, the pipeline has been failing for a week. Use Fabric's built-in monitoring + alerts, or wire failures to a Slack/Teams channel via Power Automate.

Where to Start

If you're building a Fabric medallion architecture from scratch, start with a single domain — not the whole enterprise. Pick one business process with a clear consumer (a report, a dashboard, an application). Build Bronze → Silver → Gold for that domain. Get it working, monitored, and trusted. Then expand.

The most common mistake is trying to design the entire enterprise data model before landing a single row of data in Fabric. You'll discover things in production that no whiteboard session will surface. Build fast, learn fast.

If you have an existing Power BI environment you're migrating into Fabric, your semantic models are a useful guide to what Gold needs to look like — work backwards from there.

And if you want an expert eye on whether your current or planned architecture will hold up at scale, that's exactly what our Fabric Architecture Accelerator is for — we design your medallion architecture in a day, using your actual data sources and reporting requirements as inputs.

Gastón Cruz is Co-Founder & Managing Partner of The Power Mates and a Dual Microsoft MVP (Data Platform & AI). He's designed medallion architectures across retail, banking, oil & gas, and media organizations — and has been implementing Microsoft Fabric since its earliest preview releases.