Telltale: Netflix Application Monitoring Simplified

Our Netflix teams need to quickly detect, diagnose, and remediate problems. Telltale is intelligent monitoring and intelligent alerting. The Telltale application health model yields intelligent monitoring and intelligent alerting. Netflix service owners get alerts they can trust with little configuration and no need for constant tuning. When health problems strike, Telltale presents only the most relevant context and suggests possible causes. An alert fires and you get paged in the middle of the night. A metric crossed a threshold. You’re half awake and wondering, “Is there really a problem or is this just an alert that needs tuning? When was the last time somebody adjusted our alert thresholds? Maybe it’s due to an upstream or downstream service?” This is a critical application so you drag yourself out of bed, open your laptop, and start poring through dashboards for more info. You’re not yet convinced there’s a real problem but you’re also aware that the clock is ticking as you dig through a mountain of data looking for clues.

Source: netflixtechblog