I just posted a rather lengthy response based on my network sniffing efforts but it seems to have disappeared… If it reappears, my applogies for any duplication.
In sumamry, the problem appears to be with the ‘type’ parameter in the ‘jl’ command. The command being sent to the controller to view log entries is of the form:
GET /jl?pw=a6d82bced638de3def1e9bbb4983225c&type=wl&start=1517662800&end=1518353940&_=1518315941796 HTTP/1.1..Host: <controller IP address>..Accept-Encoding: gzip, deflate..Connection: keep-alive..Accept: applicat
I’ve never used the ‘type’ parameter in my API requests, so I’ve never encountered the problem there, but if I include “type=wl” in any API Command I send to my problem controller, I end up with the same result: no log entries. Leave out the “type=wl” parameter, and log entries appear. Any other value for ‘type’ is fine too, it’s only “type=wl” that’s a problem.
As noted, this is only a problem for one of my (five) controllers, but this is the most heavily configured unit, currently also with the largest log. The others have only been in operation for about four days, they are not as heavily configured, and they haven’t yet accumulated many log entries.
When I monitor the packet stream on the network, I can see the request (as above) sent to the controller (whether via the web interface or an API request that includes the “type=wl” parameter), I can see the controller acknowledge receipt of the relevant packet, but then nothing for about 10 seconds. Then the controller sends an ARP request to itself, then an IGMP Group Membership report. My Mac then sends a string of ARP requests to the controller until it responds. Provided the web interace isn’t hung up waiting for the never-to-be-received log entires, the controller then continues chattering as it appears to do under normal circumstances, sending perioidic status updates.
But the display log request appears to have been lost, which is not surprising if the controller has somehow lost its network connection. This is what appears to have happened, for some reason. It recovers OK, and fairly quickly, but not without loss of the connection that was being used for the disply log request. Unfortunately, the browser software doesn’t timeout or pick up the fact that the connection has been lost and just sits there waiting.
The problem controller does hang off a WiFi range extender, but so too does another (off the same extender) that is operating without any problem.
It might be tempting to suggest that this is a hardware problem, controller or network, since only one of my controllers is showing the fault. But all I have to do is remove the “type=wl” parameter from any request and it works as expected. And to date it is only ‘jl’ requests that include the “type=wl” parameter that trigger the connection failure and subsequent ARP requests.
But it’s pretty flakey. Even they ‘jl’ requests that include “type=wl” don’t always fail on the problem controller… Up until this morning, they were fine as long as the date range stayed within 7 days. But something [else?] happened today, and here we are.