Pipeline has failed 3 of last 5 runs with connection timeout to on-premises SQL Server. Data freshness SLA breached by 26 hours.
Average duration increased from 20min to 68min over the past 2 weeks. Root cause: new GL table with 4.2M rows added without incremental logic.
Pipeline fails on 2 of 7 days with OutOfMemory errors during feature aggregation step. Memory usage spikes to 95% during window functions.
Only ETL-Sales-Daily and HR-Data-Refresh have retry policies configured. Others fail permanently on first error.
ETL-Sales-Daily, ETL-Inventory-Sync, HR-Data-Refresh, and Finance-Consolidation all run between 2:00-3:00 AM, competing for capacity.
Pipeline failures are only discovered when downstream reports show stale data. No proactive notification system.
Pipeline runs on fixed schedule without validating that upstream Sales data has refreshed. Occasionally produces reports with stale data.
| name | workspace | status | last_run | avg_duration_min | success_rate_7d |
|---|---|---|---|---|---|
| ETL-Sales-Daily | Sales Analytics | healthy | 2026-03-29T02:15:00Z | 22 | 100 |
| ETL-Inventory-Sync | Operations | failing | 2026-03-28T03:00:00Z | 45 | 40 |
| Finance-Consolidation | Finance | degraded | 2026-03-29T04:30:00Z | 68 | 85 |
| HR-Data-Refresh | HR Analytics | healthy | 2026-03-29T01:00:00Z | 8 | 100 |
| Marketing-Attribution | Marketing | healthy | 2026-03-29T05:00:00Z | 35 | 95 |
| ML-Feature-Pipeline | Data Science | degraded | 2026-03-28T06:00:00Z | 120 | 71 |