๐Ÿ”ง Pipeline Health Check
Assessment Report
Client...
March 29, 2026 11:18 AM
gaston@thepowermates.com
Overall Health Score
64
Grade: D
1
CRITICAL
3
HIGH
2
MEDIUM
1
LOW
6
total_pipelines
3
healthy
1
failing
2
degraded
82%
avg_success_rate
42
total_activities
50
avg_duration_min

Findings (7)

ETL-Inventory-Sync failing โ€” 60% failure rate
CRITICAL
Reliability

Pipeline has failed 3 of last 5 runs with connection timeout to on-premises SQL Server. Data freshness SLA breached by 26 hours.

โ†’ Check on-premises data gateway health and network connectivity. Increase connection timeout to 120s. Add retry policy with exponential backoff.
Finance-Consolidation duration trending up 340%
HIGH
Performance

Average duration increased from 20min to 68min over the past 2 weeks. Root cause: new GL table with 4.2M rows added without incremental logic.

โ†’ Implement incremental load pattern for GL table using watermark column (LastModifiedDate). Consider partitioning by fiscal period.
ML-Feature-Pipeline intermittent OOM failures
HIGH
Reliability

Pipeline fails on 2 of 7 days with OutOfMemory errors during feature aggregation step. Memory usage spikes to 95% during window functions.

โ†’ Increase Spark pool memory or split the aggregation into 2 stages. Add checkpointing between heavy transformations.
No retry policy on 4 of 6 pipelines
HIGH
Resilience

Only ETL-Sales-Daily and HR-Data-Refresh have retry policies configured. Others fail permanently on first error.

โ†’ Add retry policies to all pipelines: 3 retries, exponential backoff starting at 5 minutes.
4 pipelines scheduled at same time window
MEDIUM
Performance

ETL-Sales-Daily, ETL-Inventory-Sync, HR-Data-Refresh, and Finance-Consolidation all run between 2:00-3:00 AM, competing for capacity.

โ†’ Stagger schedules: Sales at 1:00 AM, HR at 1:30 AM, Finance at 2:00 AM, Inventory at 2:30 AM.
No alerting configured for pipeline failures
MEDIUM
Monitoring

Pipeline failures are only discovered when downstream reports show stale data. No proactive notification system.

โ†’ Configure Teams/email alerts on pipeline failure events. Set up Data Activator rules for SLA monitoring.
Marketing-Attribution missing dependency chain
LOW
Orchestration

Pipeline runs on fixed schedule without validating that upstream Sales data has refreshed. Occasionally produces reports with stale data.

โ†’ Add pipeline dependency: trigger Marketing-Attribution after ETL-Sales-Daily completes successfully.

Recommendations

1Fix ETL-Inventory-Sync immediately โ€” SLA breach active, check gateway connectivity
2Implement incremental load for Finance GL table to reduce duration from 68min to ~25min
3Add retry policies to all 4 pipelines missing them
4Stagger overnight schedules to eliminate resource contention
5Set up failure alerting via Teams and Data Activator
6Increase Spark pool memory for ML-Feature-Pipeline or split aggregation step
7Add pipeline dependency chain: Sales โ†’ Marketing

Pipelines

nameworkspacestatuslast_runavg_duration_minsuccess_rate_7d
ETL-Sales-DailySales Analyticshealthy2026-03-29T02:15:00Z22100
ETL-Inventory-SyncOperationsfailing2026-03-28T03:00:00Z4540
Finance-ConsolidationFinancedegraded2026-03-29T04:30:00Z6885
HR-Data-RefreshHR Analyticshealthy2026-03-29T01:00:00Z8100
Marketing-AttributionMarketinghealthy2026-03-29T05:00:00Z3595
ML-Feature-PipelineData Sciencedegraded2026-03-28T06:00:00Z12071