Forum Replies Created
-
AuthorPosts
-
PeteParticipantIf all things in life could be so simple! That worked just fine.
Thanks.
PeteParticipantThat would be 5mm longer, wouldn’t it? 😉
PeteParticipantSorry, if I’d searcheed a little more widely first, I might have noticed that the following post (https://opensprinkler.com/forums/topic/flow-meter-monitoringlogging/) seems to answer my question.
PeteParticipantMy apologies if this post is a duplicate… The original post seems to have been deleted when I editied it to be notified of follow-up replies…
I’ve just hooked up a 3-wire [Nymet] flow sensor to one of my controllers (2.1.7(2) firmware). All of my controllers are v3.0 hardware and I’m assuming that 3-wire sensors are now ‘fully’ supported with that platform becuase it includes external connectors for a +5V supply as well as GND and two sensor inputs (SN1 & SN2). Having read that SN2 is not currently used, my flow sensor is conencted to SN1 (as well as +5V/GND).
My flow sensor is conencted to the controller that is located next to my pump, and when that controller opens a valve, appropriate messages appear in the log. There is a problem displaying log messages since installing the flow sensor, but I’ll raise that matter in another post.
The main issue I have at the moment is that the flow sensor doesn’t seem to trigger any events if there’s no program running on the local controller. If valves conencted to any of my other controllers are opened, and the pump starts running as a result, the flows do not appear to be recorded.
Is that the way things are meant to work? I can reason that it may not be desirable to be writing a constant stream of log messages, but it would be of value to be writing messages when non-zero flows were detected.
My application is a rural application and apart from monitoring normal pump output regardless of whether or not the local controller was running a program, I was also hoping to use this data to detect, and ultiamtely [automatically] respond to, abnormal events, like a blown hose joint or a burst pipe.
PeteParticipantSorry if this is a dumb newbie question…
I think I managed to get everything to compile OK:
Linking /tmp/mainArduino/mainArduino.bin
Versions: , 2.2.0-dirtyMemory usage
Ram: 41053 bytes
Flash: 328683 bytesBuild complete. Elapsed time: 22 seconds
I can see a whole stack of stuff in the /tmp/mainArduino directory, but how do I get whatever needs to be onto the OpenSprinkler? I ran the makefile from within the Arduino/libraries/OpenSprinkler directory, and I did specify what I believe to be the correct UPLOAD_PORT. Was the image supposed to get uploaded as part of the make process?
I didn’t want to just play around at this point because I didn’t want to trash whatever was currently on the OpenSprinkler without some level of confidence that I’d be able to put something functional back.
PeteParticipantYep, I pretty much worked that out between discovering that only some of my controllers would do the ‘Access Point restart’, the comment in the other thread that the problem had been identified and fixed, and the fact that one of my controllers was now obviously sending out more than 4k of log data.
I have indeed swapped the problem controller for one with the new software and put the older controllers on the lines with the least load. That should at least keep the problem at bay until I can get them upgraded.
I had already downloaded the code and was intending to play with that in the future. I just wanted everything working properly and stable before I started fiddling. It’s probably even more useful that I have contollers with the new code because I can leave them alone for the time being as a reference and just fiddle with the older ones.
Thanks for your help though. As usual, you learn a lot more about things when they go wrong than when they don’t…
PeteParticipantAnd maybe I should have gone back and searched a bit further as this saga unfolded, because it appears that the problem has both been identfied and fixed, and we are just waiting on the firmware ‘release’… (https://opensprinkler.com/forums/topic/restart-when-i-read-the-log/)
PeteParticipantI’m not sure if we shouldn’t be starting a new thread for this query but I’ll ask it here anyway…
Your suggestion of how to start the controller up in Access Point mode “by pressing Button 2 while holding down Button 3” only works on two of my controllers. I note that these are the two with the highest ‘serial numbers’/hardware addresses (B8BB62 and B8D9B0).
If I try this on any of my other three controllers (081153, 08259B and B8B8B8) I just get the option to manually start a program. Am I just coincidentally doing something wrong in these latter cases, or has something changed with the higher serial numbers. If the latter, is there a way to upgrade older controllers to be to be able to do this? Or, alternatively, is there a way to start the older controllers up in Access Point mode?
PeteParticipantThe only time I’ve seen the controller reset is when I send a log request. I had it running on my desktop for several days before I actually installed it, connected first to my core WiFi network, then through the WiFi extenders. Never once did I see it ‘spontaneously’ reset.
Now, I can stand in front the controller, which I am yet to ever see ‘spontaneously’ reset, and trigger a reset simply be entering a View Log request from my iPad or laptop. Every time. 100% consistency. No reset at any other time. Nonetheless, I changed the power supply to a 3A model, and I see exactly the same results.
Having said all that though, one of my other controllers has now accumulated enough log entries to fill a 4k buffer (if we were suspecting this to be the problem), and I can see that it is sending out what looks to be a second buffer load of data without any problem (I can view its log without that controller’s resetting). This controller is also configured with an expansion module, but as yet I have no stations connected to the expansion module.
I’ll wait a couple more days and see if this second controller is still displaying its log OK, then I’ll try swapping the two controllers over and see what happens. It will then take another week or so to fill up the logs sufficiently to see if any of our problems reappear. Stay tuned…
PeteParticipantGiven the original commentary in this thread, I can’t believe I overlooked the fact that the controller might have been resetting (it is out in the yard, some way away from where I am normally controlling things though)! This is indeed why it’s losing its network connection.
At the moment, I can quite consistently cause a controller reset by attempting to view the log from my iPad. It usually resets two or three times before it comes back again, [based on the OLED display messages] resetting sometimes when connecting to the network and sometimes when the “NTP syncing…” message is displayed.
I have also managed to ‘step through’ all of the type=wl messages in the log by requesting suitably small date ranges. Whether it’s the number of log messages or something else, I can typically only retrieve a maximum of about 195 log entries before I cause a reset.
The type doesn’t seem to matter now. The only reason that type=wl messages seemed to be the cause is that there were always more of them. I can now (with another day’s log messages in store) reliably cause a reset simply doing a normal (no type parameter) jl request through the API — I can no longer retrieve all of my ‘normal’ log messages in a single API request. The date range that I Can use for a successful request does seem to be a lot smaller if I use the type=wl parameter though, so I’m not sure where that leaves us with regard to all API requests being treated in much the same way.
Just for the record, the RSSI values for my five devices are: -51 (problem controller), -70, -75, -21, -69.
And yes, I could unplug everything and just bring the controller into the house, but this raises another question I’ve had. My WiFi extenders use discrete SSIDs, that are different to my main WiFi network. Is there any way to get the OpenSprinkler to pick up a different SSID than the one nominated when it first starts up?
But I should point out that I have taken both my iPad and my laptop out to the controller, connected through the same WiFi extender to the controller, and observed the described problem. I’m afraid I’m not familiar enough with WiFi networks to know what actually happens with a WiFi extender, but I would expect ‘local’ traffic to stay local (i.e. I would expect that my iPad or laptop will simply communicate through the extender, like a local hub or switch, to the controller and that packets will not be sent all the way back to the router (not because it’s a router per se, but because it the ‘core’ of the WiFi network).
Just picking up on one final point, you comment that the ESP8266 buffer size is 4k. I can see that the largest log requests that are successfully serviced fill just under three ethernet packets… that’s a little under 3 x 1518 (less headers), which is getting pretty close to 4k, or just one buffer of data…
PeteParticipantYep, OK, when things work there are indeed three API commands as you suggest. The first, type=wl command returns a stack of records of the form [0,”wl”,100,1517618908], many more than the number of log messages returned by the ‘no type’ command. The second, type=fl command result is empty (just “[]”), and the third ‘no type’ command returns the log entries as I posted previously.
When things don’t work, the problem occurs with the first, type=wl command and never proceeds beyond that. If I use the API directly, the type=wl command hangs, as described previously, the type=fl command just returns the empty set “[]”, and the ‘no type’ command returns the log entries.
The problem certainly seems to be with the controller and/or its software, noting that the only thing that results in the ‘hang’ or ‘network disconnect’ is a jl command that includes the “type=wl” parameter.
The signal strength is fine as measured by the OS X Wireless Diagnostics tool—remember there are two controllers running off the same extender, and the one that is failing is only a couple of metres away from the extender (the other one is 10-15 metres away). Ping delay times are typically 5-10ms.
Unfortunately it’s not practical to move the controller anywhere else, because that’s where all the control wires come in. Remember, there’s another controller, using exactly the same network path, that is working just fine. But yes, pretty much the only thing left to do is to swap in another controller. But I’ll wait a couple more days and see if the same problem occurs with the other controllers when their logs fill up a little.
Now, in running those API tests above, I noted that there are many, many more “type=wl” entries than there are ‘regular’ log entries returned. In one of my controllers there are just four log entries returned, but 105 “wl” records. In another there are 62 log entries and 189 “wl” records. So just on a hunch, I reduced the Start time on the jl command I sent to the problem controller right down so that the command returned only a couple of log entries and the API “type=wl” command worked! There does not, however, seem to be any way of setting the Start date on the web interface without first initiating the View Log command, and that results in the ‘hang’.
So it seems that the problem is that, on the controller that is causing the problem, there are too many log entries in the default, one week period. This would be entirely consistent with the previous observation that I could not retrieve more than 7 days worth of log entries. The 7 days may well just be coincidental and it’s really the number of log entries that is the limiting factor, although they will pretty much increase, perhaps not exactly linearly, with the number of days. The problem is not the log entries themselves though, because we can retrieve them through the API. The real problem seems to be the number of entries generated by the “type=wl” parameter because, if we reduce the Start time sufficiently, the jl command will work with the “type=wl” parameter.
PeteParticipantI just posted a rather lengthy response based on my network sniffing efforts but it seems to have disappeared… If it reappears, my applogies for any duplication.
In sumamry, the problem appears to be with the ‘type’ parameter in the ‘jl’ command. The command being sent to the controller to view log entries is of the form:
GET /jl?pw=a6d82bced638de3def1e9bbb4983225c&type=wl&start=1517662800&end=1518353940&_=1518315941796 HTTP/1.1..Host: <controller IP address>..Accept-Encoding: gzip, deflate..Connection: keep-alive..Accept: applicat
I’ve never used the ‘type’ parameter in my API requests, so I’ve never encountered the problem there, but if I include “type=wl” in any API Command I send to my problem controller, I end up with the same result: no log entries. Leave out the “type=wl” parameter, and log entries appear. Any other value for ‘type’ is fine too, it’s only “type=wl” that’s a problem.
As noted, this is only a problem for one of my (five) controllers, but this is the most heavily configured unit, currently also with the largest log. The others have only been in operation for about four days, they are not as heavily configured, and they haven’t yet accumulated many log entries.
When I monitor the packet stream on the network, I can see the request (as above) sent to the controller (whether via the web interface or an API request that includes the “type=wl” parameter), I can see the controller acknowledge receipt of the relevant packet, but then nothing for about 10 seconds. Then the controller sends an ARP request to itself, then an IGMP Group Membership report. My Mac then sends a string of ARP requests to the controller until it responds. Provided the web interace isn’t hung up waiting for the never-to-be-received log entires, the controller then continues chattering as it appears to do under normal circumstances, sending perioidic status updates.
But the display log request appears to have been lost, which is not surprising if the controller has somehow lost its network connection. This is what appears to have happened, for some reason. It recovers OK, and fairly quickly, but not without loss of the connection that was being used for the disply log request. Unfortunately, the browser software doesn’t timeout or pick up the fact that the connection has been lost and just sits there waiting.
The problem controller does hang off a WiFi range extender, but so too does another (off the same extender) that is operating without any problem.
It might be tempting to suggest that this is a hardware problem, controller or network, since only one of my controllers is showing the fault. But all I have to do is remove the “type=wl” parameter from any request and it works as expected. And to date it is only ‘jl’ requests that include the “type=wl” parameter that trigger the connection failure and subsequent ARP requests.
But it’s pretty flakey. Even they ‘jl’ requests that include “type=wl” don’t always fail on the problem controller… Up until this morning, they were fine as long as the date range stayed within 7 days. But something [else?] happened today, and here we are.
PeteParticipantI pulled the records via the API (http:/os_ip/jl?hist=21) and just cut & pasted them from the browser into the file that I forwarded.
I’ve looked at the Developer Tools under both Safari and Chrome and I can’t really see anything that seems relevant. Chrome throws up a warning about an SSL certificate that won’t be trusted in some future release, but there’s nothing in the Safari log.
And after all the fiddling around, the OpenSprinkler unit in question will now not show any log entries, regardless of the date range. I’ve even gone and power cycled the unit, but still no log display (at all), although everything’s still accessible via the API. I’m running five units and the other four are displying their logs on request (same computer, same browser, just different tabs; same on an iPad using the App) just fine, although they only have five days of entries at the moment.
I have some network sniffer software somewhere, so I’ll see if I can dig that up and see if the network traffic provides any hint as to what might be happening.
Note that all of these are just OS V3.0 hardware, no OSPis.
PeteParticipantThanks. Log file attached.
I’ve just fiddled a little more with the display date range. I don’t think the Rain Delay entry has any thing to do with anything. What I’ve discovered I am able do now is display only up to a 7 day period. If I try to reset the Start date, effectively extending the period beyond 7 days, I get the spinning disc. However, if I alter the end date first, bringing it back so that the range is less than 7 days, say just 5, the shorter log is displayed OK. If I then alter the Start date to be no more than 7 days before the End date I get 7 days worth of log. If I try to set the Start date beyond 7 days before the End date, I once again get the spinning disc.
This seems to work consistently. If I steadily adjust the End date, then the Start date so that the two are never more than 7 days apart, I can get back to the start of the log.
Attachments:
PeteParticipantI just let things go (i.e. didn’t delete log entries or anything) after my last post (Jan 22) and then, a couple of days later, the log entires started displaying again. Now I can display the last 7 days’ entries (the default), but if I try to display any more (i.e. extend the start date back to Jan 23—when I last cleared the log—or anything beyond 7 days earlier than the current date—anything less than 7 days is fine) all I get is the grey spinning wheel. They can all be accessed via the API OK, so there are more [than just 7 days’ worth] there.
If I switch from Timeline to Table display mode, the Table is actually displaying entries for the last 9 days, even though the Options request only the last 7 days. It may be coincidental, but the oldest event in the Table list is a Rain Delay that I entered manually, 9 days ago. I do note that, in my case, Timeline mode seems to display its entries counting back from the current time, while Table mode seems to display entries counting back from the last log entry (which, in my case, is most often the previuos day), but ewven that doesn’t explain why, in this case, Table mode is displaying data (albeit just the Rain Delay event) from 9 days ago.
Also, the ‘Table Station Events’ count in either Timeline or Table mode seems to be the number of events displayed in Table mode, which as noted above is often not the same as the number displayed in Timeline mode.
With all my recent testing, it seems that any time I encounter the spinning grey disc, I simply go back to the main screen, reload (which seems to reset the date range to 7 days), and select ‘View Logs’ again. At least that’s the way things seem to be working now. I could reason that that process would cause a reload of any code that’s got itself stuck in a loop somewhere, which is what seems to happen when I try to extend the display date beyond 7 days.
PeteParticipantIt’s happened again. That’s about 10 days again, for about 115 log entries.
I have been able to retrieve the log entires via the API, but the web interface loops indefinitely (displays the OpenSprinkler circling grey disc) and displays “No entries found…” if I try to switch between Timeline and Table.
I have now also just noticed this problem with a second OpenSprinkler that I’ve been using to fiddle around with generally. It’s not connected to anything, just idling away. I’ve been able to view its logs (which are just a couple of events I triggered manually) up until this morning. Its earliest entry is Jan 2 (again, I can retrieve them throught he API), and there are only five of them in all.
On the off chance that someone suggests that this is a ‘host software’ problem, I’m using a Mac (OS X 10.11.6), and I’ve tried Safari (11.0), Firefox (57.0) and Chrome (63.0.3239.132), and they all do the same thing, so…?
PeteParticipantTen days ago I thought this thread had solved my problem, but now I seem to be back at square one.
I have a v3.0 DC controller configured with one expansion board, controlling 20 stations all up. I have six individual programs running on a seven day cycle, with a mix of sequential and non-sequential stations. I don’t think there’s anything else special about my configuration—no other sensors or anything like that.
As I recall, everything worked as expected for about the first 16 days (based on the log entries that I have managed to save). Then, on the 17th day, I was unable to display any log entries through the web interface. I didn’t know whether or not this may simply have been related to something I did in the set-up phase (the problem was with a newly installed controller) so, after retrieving the log entries via the API (they were all there, just not displaying through the web interface), I cleared the logs (per advice in this thread) and all was well… for about another 8 days…
This morning I went to check the log, as I have done on a more or less daily basis since I cleared the logs (on Jan 4) and received the “Nothing to display” message. I then tried to retrieve the logs through the API but this time there was nothing there either.
I ran a program manually, just to see if anything was actually being logged, and sure enough, there was a log entry, but only this one, most recent entry.
The previous log file I managed to save contained 199 entries. By my estimation there would have been less than 100 entries in the log as of yesterday.
If I knew I had to save the log file periodically, I could probably live with that (although it’d be nice not to have to do that more often than about once a month), but this doesn’t look like an ‘overflow’ problem. In the most recent case, the log has actually been deleted. Could I have deleted the log inadvertently? Well, anything’s possible, but I really don’t think I would have.
Has anyone else following this thread seen anything similar, or is it just me?
I do recall reading somewhere that the log entries are kept for something like one year. What happens then? Are just the year-old entries deleted? Could there be a problem with this process that is leading to premature deletion?
-
AuthorPosts