There are features in software which you learn to estimate just the very moment you understand why they are there. By now, running nagios internally for monitoring the availability of server hardware and processes, we used to have disabled a feature named flap detection which, to cut things short, does „recognize“ whenever a service seems to change its state too quickly (in other words, is „flapping“). What’s the point? A while ago, we disabled flap detection in course of integrating nagios in our environment after we missed some messages of services being left in „flapping“ state, so we decided to rather be notified in case a service goes down or comes up.
Why is this a bad idea? See, and especially have a look at the Date column in my inbox this morning:
This is a pretty good example of what flapping practically is – a pretty good way to fill up your mailbox, or, worse, the text message store of your cell phone. So, let’s have a look at how to set flap detection right.