This session titled “Next generation monitoring: moving beyond Nagios” by Jenni Snyder and Josh Snyder at Yelp was my first session after lunch. Always a tough time slot for speakers. This session was on how they migrated from Nagios to Sensu. Sensu is great because it is specialized vs. Nagios which does a lot of things well. The session started with an overview of some of the pains associated with Nagios at Yelp. Big pain points were related with scaling and a lack of HA. Also pains associated with the Nagios GUI. Issue if you acknowledge a warning it will not fire if then becomes critical.
Sensu uses the Nagios plugin interface. Existing Nagios checks work in Sensu. Sensu uses RabbitMQ to decouple the clients from the server. You can use standalone checks (agent decides when to run them) or server directed checks. Yelp uses Nagios and then Sensu to monitor Sensu. They deploy Sensu using puppet and deploy each component 3 times. They use HAProxy to handle failover. They use puppet to configure the checks in Sensu.
Yelp uses Sensu for host-based condition monitoring not for graphing. They have A LOT of checks!!! They like the tight integration between Sensu and Puppet. Sensu only run checks, generates events and then calls handlers. So it does a lot less than Nagios but it does it really well. Yelp has open sourced several of their handlers. You can stash Sensu events or a host to disable checks for maintenance or reboots. Handlers include PagerDuty, JIRA, IRC announcements, and email. They also have handlers for Graphite, aws_prune, and OpsGenie. They use Uchiwa as a GUI for Sensu. They shared some information on how they build parameterization into checks. They basically use config files and symlinks to achieve this.
Overall a good session. We are pretty happy with Zenoss right now, but with the migration to Puppet Sensu may become a viable replacement.