Findings (6)
Cost
KQL database 'RawTelemetry' has no soft-delete or retention policy. At 12.5GB/day, storage will grow unbounded and costs will escalate.
โ Set retention policy: .alter table RawTelemetry policy retention softdelete = 90d. Archive older data to OneLake for cold queries.
Performance
Event ingestion is configured for immediate flush (MaximumBatchingTimeSpan=00:00:00). This causes excessive small file writes and high CPU usage.
โ Set batching: MaximumBatchingTimeSpan=00:00:30, MaximumNumberOfItems=10000. This reduces write amplification by ~80%.
Reliability
Alert rules fire on every qualifying event, producing duplicate notifications. During a sensor failure, users received 847 alerts in 10 minutes.
โ Add suppression windows: minimum 15-minute cooldown between alerts for the same rule+entity combination.
Performance
Dashboard queries aggregate raw events by minute/hour every time. Same computation repeated thousands of times daily.
โ Create materialized views for 1-min and 1-hour aggregations: .create materialized-view HourlyAgg on table RawTelemetry { ... }
Reliability
Single-region deployment with no follower database or backup strategy. A region outage would cause complete data loss for in-flight events.
โ Configure follower database in a secondary region. Set up continuous export to OneLake as additional backup.
Performance
3 of 5 dashboard tiles use full table scans without time filters. Query latency averages 8.2 seconds.
โ Add time range filters (where Timestamp > ago(1h)) to all dashboard queries. Use query hints for better partition pruning.
Recommendations
1Set 90-day retention policy on RawTelemetry โ critical for cost control
2Configure ingestion batching (30s window, 10K items) to reduce write amplification
3Create materialized views for minute/hour aggregations
4Add 15-minute suppression windows to Data Activator alert rules
5Set up follower database in secondary region for DR
6Optimize dashboard queries with time range filters