My interest here is to help see if there is an OS issue that is causing the apparent longer duration outages.
I don’t ping the device but I am sending messages to it 24×7 at a rate of about 1 every 30s. Since I don’t see these extended outages, only occasional missed messages, it seems unlikely that the outages are the result of OS being busy doing its normal tasks. It seems more likely that the OS network stack is becoming ’busy’ due to not handling some combination of events correctly.
We know that 2 people who are using tools that ‘ping’ the OS at some rate see similar symptoms. It’s seems useful to understand what these tools are doing more detail.
A quick look at the Domotz FAQ:
Q: What is the difference between a missed Heartbeat and a device being Down?
A: The way the monitoring of devices work is that we send six (6) pings every 30 seconds to each device on the network. If one or more of those pings are returned, then we count this as a Heartbeat. A device can “miss” three Heartbeats without being marked as Down. If a device doesn’t return any of the pings within a two (2) minute period, then the device is marked as Down. If the device starts to respond to pings again, it gets a new Heartbeat and is marked as Up again.
Putting aside the question of why you’d want to send 6 pings every 30 seconds to every device in a home network, this info provides some view into what Domotz is doing. A quick look at PRTG didn’t yield as much insight.
The common symptom seems to be that sending regular pings to OS causes it to stop responding to pings for a variable period of time. I’ve started an experiment where I’m sending regular pings to my OS (1 every 30s). The ‘ping’ tool logs the success and failure of each ping. My automation app is also running with its statistics. If there are ping failures, it will be interesting to see 1) Do I also see extended outages 2) Does my app show missed messages that correspond with the ping failures.
If I don’t see any issues after a few hours, I’ll increase the ping rate.
Keep you posted.