Forum Replies Created
-
AuthorPosts
-
August 6, 2020 at 9:30 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67790
WendellParticipantAfter updating to the July 18th firmware, I can now view large log files, so it looks like the fix you made for importing >2K of program data also fixes the problem with log files larger than 2K! Thanks!
August 6, 2020 at 11:23 am in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67766
WendellParticipantThat must be it… I do have a static address defined. I’ll set it back to DHCP and try again. Thanks.
August 6, 2020 at 9:01 am in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67762
WendellParticipantRay, I’m currently running the July 12th firmware. I’m having trouble getting my OS back into WiFi mode to update the firmware. If I disconnect the W5500 and power up the controller, it gets stuck in a loop trying to boot from the Ethernet module (which isn’t connected). Pressing B3+B2 doesn’t bring up the menu to reset to AP mode. If I reboot with the Ethernet module connected, B3+B2 does bring up the option to reset to AP mode, but it immediately reboots into the hardwired connection. I’ve tried powering down after the B3+B2 reset and rebooting without the Ethernet module, but it goes back to the endless loop of trying to boot from hardwire. How do I get it to go back to WiFi mode so I can load the July 18 firmware file?
August 5, 2020 at 7:17 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67743
WendellParticipantOkay, it looks like I get to 1883 characters of returned log data when it runs into errors, so I guess the remaining bytes in the 2K buffer are part of the packet formatting?
Does 2.1.9(7) solve this 2K buffer problem when using the W5500? I’m currently running 2.1.9(4).
August 5, 2020 at 6:51 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67741
WendellParticipantSorry this took so long to reply to… got lost in the shuffle. If I reduce the number of days of log entries I ask for it is able to display them. I tried using the API debug script you provided, but I think I might need to go back more than 6 days to see a failure, because I’ve had the system disabled for a couple of days. All I can say is that from within the OS app (iOS or macOS), reducing the number of days eliminates the error.
And BTW, so far no lockups / freezes, so the W5500 sure looks like it has completely solved my crashing problem.
If you figure out what’s going on with the ENC28J60 module, do you think the older 2.3AC hardware might start working again? I have another sprinkler system I would like to convert to OS, and if I can re-use my old hardware that would be great.
Thanks Ray.
July 22, 2020 at 12:35 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67467
WendellParticipantYep… that did the trick. So does this mean it’s caused by the 2048 byte buffer issue?
July 21, 2020 at 11:21 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67459
WendellParticipantStill no controller crashes, but I can no longer view the log entries. Whenever I ask OS (app or browser) to display the log entries, the cursor spins for several seconds and then it tells me “Error retrieving log data. Please refresh to try again.”, and refreshing doesn’t work either.
Before I try rebooting, I wanted to find out if there’s anything else I should try.
Is it possible this is related to the 2048 byte buffer issue?
July 20, 2020 at 8:54 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67440
WendellParticipantI have now been running for several days without any lockups / crashes, and there’s no way I would have made it this long with the original Ethernet module, so the W5500 / new firmware version seems to have resolved my frequent crashing problem! I will continue to monitor for problems and update this forum if anything changes, but right now I’m feeling pretty positive about the W5500.
The other thing I’ve noticed is that I no longer get periodic timeouts when I run continuous pings. This doesn’t surprise me, because the hardware TCP/IP handling should be pretty close to bulletproof when it comes to seeing and responding to pings, but it’s yet another sign that the new module has improved Ethernet communication reliability.
July 16, 2020 at 9:45 am in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67361
WendellParticipantI have managed to run in excess of 30 manual zones since installing the W5500 and it hasn’t crashed yet. With the original ethernet module it would have had a pretty high likelihood of crashing at least once by now, but until I’ve run for a few more days without lockups I’m not quite ready to say that the new module has fixed the problem, because once in a while the original module would make it a few days without locking up.
Most of the prior lockups resulted in the on-screen clock freezing and the buttons being unresponsive. It would usually still respond to pings, but any zone I had manually started would still be running if the lockup occurred while the zone was on. I think this may be similar to what John was seeing too.
What Bena describes above actually (mostly) reflects what I was seeing while I was using the WiFi interface for the last week or so. The majority of the times I tried to connect to the controller (using either the iOS app or the web interface) it would time out. It would often take upwards of 15 – 20 minutes of trying before I could get connected. What was different from what Bena reported is that pings would also time out. The controller’s on-screen display reported a strong WiFi signal, and it apparently had no trouble syncing NTP. Several times when I was unable to connect I would run a continuous ping (once every 2 seconds) from my iMac, and after some number (usually 150 – 300) of timed out pings it would start responding quite reliably to the pings and at that point I was able to connect from the app or browser right away. I never had the controller crash while on WiFi, but it was devilishly difficult to connect to it most of the times I tried. Several times I tried cycling power when I was unable to connect, and it would reboot without errors and sync the time, but that didn’t seem to help me get connected.
July 15, 2020 at 7:45 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67353
WendellParticipantSorry for the delay getting back to you with the results of my W5500 upgrade. I just got back in town and am now working with the new ethernet module. I updated the OS controller firmware and got the new module plugged in, and it booted up fine. Apparently you are using a software defined MAC address (because the MAC address is the same as it was for the older ethernet module)?
For the first couple of minutes after rebooting I wasn’t able to connect to OS through either the app or the browser, but then it started connecting and everything seems to be working now. I did not have to restore my configuration from the backup… everything seemed to survive the firmware update. I did notice that it wasn’t logging my 1 minute manually started test runs (and all of my previous log entries were gone), so I cleared the log (via the GUI option to do so) and after that it seems to be logging my test runs.
At this point I need to start running a bunch of manually started zones and see if it crashes like it did when using the original ethernet module. I will provide feedback as my tests progress. Fingers crossed!
July 15, 2020 at 3:26 pm in reply to: Controller lockups / crashes with wired Ethernet module #67342
WendellParticipantRay, this is definitely interesting, but ultimately we need to figure out why the buffer overflow / Receive Error Flag causes the system to become unstable. Based on the seemingly random results I see when things go south, I am strongly suspecting that there’s a memory corruption problem. I’m wondering if the buffer overflow condition is allowing a memory overwrite to occur? Have you looked at the input buffer to see if it’s correctly declared? Does the code that writes bytes into the buffer check to see if the buffer has enough room for the current batch of bytes? If so, could there be a boundary condition error that’s causing a memory overwrite? I haven’t even attempted to look at the code in question, so I have no idea what it looks like, but based on my prior experience, I’m suspecting that a memory overwrite may be corrupting adjacent variables / code.
July 10, 2020 at 8:28 am in reply to: Controller lockups / crashes with wired Ethernet module #67246
WendellParticipantRik, in addition to what Ray posted, if you are able to use WiFi instead of the ethernet module, you may be able to avoid the crashes. My system has been running via WiFi for several days now without crashes. I *do* have a lot of issues with not being able to connect to the controller, but the controller isn’t actually crashing, and whatever is causing it to timeout while trying to connect eventually resolves itself without my intervention. It’s possible that my timeout issues are related to WiFi signal strength in my garage, so you may not even have to deal with this inconvenience in your setup.
Also, if your controller is connected to a switch that supports VLANs, you may be able to solve the problem without the hassle of setting up a secondary router as Ray outlined in #4 above. Another user reported that putting the controller on its own private VLAN solved the problem for him. I haven’t tried this yet myself because my controller is connected to a non-VLAN capable switch that’s downstream from my main switch that does support VLANs.
WendellParticipantRay, that’s great! I just sent a new support ticket requesting one of the adaptor boards.
When you’re done designing the enclosure, will you post the file online for those of us who want to 3D print our own enclosure?
Thanks!
WendellParticipantJohn,
I forgot to mention one thing… I’m not convinced that the crashes I’m seeing are related to DNS timeouts. The forum member who posted about using a VLAN to solve the problem hypothesized that there’s some device on his network that sends a particular type of traffic that the software based TCP/IP stack on the ethernet module chokes on, this causing the crash. By removing all other network traffic from the link to the OS controller, it never sees this sequence that sends it into a tailspin and it stops crashing. If taking your IP camera off the VLAN solves your problem, that would give us a huge clue about the problem, and make it much more likely we could isolate the cause of the crash.
WendellParticipantJohn,
The forum posts I saw were related to the controller crashing, and I’m not sure if that’s what you’re seeing happen when a valve gets stuck open on yours. When it happens to me, my controller is truly crashed and will not recover without a power cycling. The buttons on my controller become unresponsive, and the clock stops updating. I believe what I’m seeing is what the other forum member was seeing when they posted about the VLAN solution a couple of years ago. In their case, isolating the OS controller onto its own VLAN completely solved the crashing problem.
Have you ever seen a situation where the log files in the controller show incorrect runtimes for zones? I’ve actually seen “impossible” run times in my log files (I.e. a zone is reported to have run longer than it possibly could have based on the surrounding zone start and run times). This is one of those symptoms that leads me to believe there’s data corruption responsible for the controller crashing. Since you’ve noted that you have zones that run longer than they should but then finally shut down, I’m curious if your log files reflect that actual run time.
You noted that you leave the app running in the background on your iPhone. What happens if you actually close the app after you start a manual zone? In my case, I’ve seen the controller crash even when there’s no iOS app running (and I don’t necessarily even have to be running a zone for the controller to crash). Based on my symptoms matching what the other forum member posted quite some time ago, I suspect a VLAN would solve my problem, but your issue sounds a little different. I’m currently running an ASUS router with several smart switches. I plan to switch to a UniFi system, but I’m waiting for them to release more of their WiFi 6 hardware before I take the plunge. I really like the UniFi system and the configurability / troubleshooting that you get with it.
WendellParticipantRay, I don’t think my issues are directly related to anything in my router being out of whack, because I’ve rebooted my router before when I’ve seen issues and it hasn’t helped. Over the last few days that I’ve been running via WiFi my system has yet to lock up like it was routinely when on the ethernet module, but I have had several instances where it is *very* slow to connect to the iOS app. It will repeatedly time out trying to connect, but if I leave it alone for awhile it will eventually come back online. I’m not going to spend a bunch of time trying to track down this issue because it sounds like we’ll have a good ethernet solution once your pin converter boards arrive. I ordered one of the W5500 boards yesterday, so I should have it this week.
Out of curiosity, does the internal WiFi functionality in the controller have hardware TCP/IP handling or does it rely on the same code that your ethernet module does?
July 7, 2020 at 11:17 am in reply to: Controller lockups / crashes with wired Ethernet module #67196
WendellParticipantJohn,
I think you are correct that rebooting isn’t a perfect solution (although it’s better than a zone getting stuck “on” forever). In theory you could restore the unit to the last known state by periodically writing the current status to NVRAM and then reading that back out on each reboot, but from my experience this can lead to other problems. And if the lock-ups are occurring due to a buffer overrun corrupting the program or data space, you really wouldn’t want to try to restart from the last known state anyway.
The variety of symptoms I’ve seen really makes me think that the crashes aren’t just one portion of the system (e.g. the ethernet module) locking up, and if I’m correct, it would be nearly impossible to detect the crash condition and reliably reboot from it. Years ago I experienced an issue where an embedded controller was doing really strange things at random times. It occurred on only a small percentage of the units we made, but when the crashes happened, they were completely random. In our case it was a hardware bug in an Atmel CPU chip that was causing it to literally execute code at random locations… pretty much the worst case nightmare a programmer can run into. I suspect that the problem we’re seeing with the OS controller isn’t nearly this insidious, but until someone can figure out the actual cause of the crashes (versus simply detecting when they have occurred and trying to reboot), I don’t think there will be a good solution to the problem.
If Ray can pin down the actual cause of the problem then I agree that a software patch will likely solve all of the problems, but simply trying to detect when a problem has already occurred isn’t likely to be a viable solution. I’ll freely admit that I don’t know the architecture of the OS system (i.e. is everything running on the main CPU, or is there an additional microcontroller in the ethernet module?), so my hypothesis could be off due to not understanding how the various system components relate to one another, but if the entirety of the code is running on one CPU, the varied symptoms I’m seeing suggest that there is some type of widespread corruption of data at play.
Regardless of whether the root cause can be found and corrected, I would rather be running a system that handles the TCP/IP stack in hardware, since it should result in more reliable network operations and free up the CPU to run only the application itself (potentially making it more responsive to user inputs). Implementing the W5500 chip sounds like a really good idea to me!
July 7, 2020 at 12:57 am in reply to: Controller lockups / crashes with wired Ethernet module #67184
WendellParticipantJohn,
That part you found on Amazon appears to be the same one I found on NewEgg, only Amazon seems to have it in stock in the US, so shipping time is a fraction of what NewEgg is quoting. Good find!
As for whether our varying symptoms all have the same root cause, I’m guessing they probably do. I suspect that the TCP/IP traffic that kills the software based stack is somehow causing code or data corruption (I.e. a buffer overrun) which in turn leads to unpredictable execution of the main controller code. I’ve seen a somewhat wide variety of symptoms myself, from simple lockups that don’t seem to have other consequences, to 2 zones running simultaneously even though I have all of my zones set to Sequential mode. Based on these differing symptoms, I will be surprised if the root cause doesn’t turn out to be a buffer overflow (or similar coding error). It’s probably buried in the library that Ray is using.
July 6, 2020 at 11:48 pm in reply to: Controller lockups / crashes with wired Ethernet module #67180
WendellParticipantRay,
Sounds encouraging! I’d be happy to do some testing for you once you get to the point where it’s ready for that. Given how many crashes I’ve been seeing with the current Ethernet module, it shouldn’t take more than a couple of days to verify that the new W5500 module solves the problem. Do you want me to order one of the W5500 modules from NewEgg? It looks like it will take a while to get one since they’re shipping from China.
July 6, 2020 at 11:21 pm in reply to: Controller lockups / crashes with wired Ethernet module #67176
WendellParticipantRay, regarding your latest post… I’m not sure if you were describing the symptoms you’ve seen in your recent testing or what I reported from my testing, so I want to clarify what I’ve observed:
1) when I do long term continuous Ping tests (one every 2 seconds), I see a handful of timeouts every hour. On every other device I’ve done long ping tests to it is *extremely* rare to see a ping timeout. I suspect this behavior from OS is due to the software based TCP/IP stack, so it may have absolutely nothing to do with the crashes I’m seeing.
2) when my controller locks up, it usually freezes the on-screen clock at the lockup time, and the buttons are non-functional until a power cycle reboot. Many (or most) of the lockups have actually left the controller responding to pings still, so something is still alive in the controller even though I can’t log into it or start/stop watering cycles.
July 6, 2020 at 11:20 pm in reply to: Controller lockups / crashes with wired Ethernet module #67174
WendellParticipantJohn,
EDIT – **NOTE** – I didn’t see Ray’s above post until after I sent the info below.
I don’t know if there’s already a commercial module out there that uses the same pinout as the one Ray provides (but based on the more capable chip), so Ray might have to first design a new Ethernet module. From what Ray told me, the W5500 chip (https://www.wiznet.io/product-item/w5500/) is one that looks like it would be relatively easy to adapt the code to use.
I haven’t taken the cover off of the ethernet module that Ray provides, but I’m wondering if it’s this board inside:
If it is, then perhaps something like this could satisfy the hardware part of the equation:
WendellParticipantJohn,
At least in my case, I don’t think the lockups are related to the solenoids switching on. At first I thought that was the case, but then I started noticing that the controller was locking up when nothing was going on. When it was connected via ethernet I’ve had several instances of finding the controller unresponsive to its buttons even when it’s just been sitting idle. Then I thought perhaps the act of connecting to it was causing the lockups, and while I’m not as sure about that now, I think it’s still a possibility. If Ray is correct that a DNS timeout causes problems, and if connecting to it causes it to refresh the Weather data (which is the only thing in my setup that uses DNS), that could explain how making a connection to it could cause a lockup.
Interestingly, shortly after I posted my last reply I tried to access the controller from my iPad and it timed out trying to connect. Then I tried accessing it from my iPhone and it also timed out, so I went to the controller and the display indicated it was trying to connect to the WiFi and all of the buttons were responding as they should (so it wasn’t locked up). With my iPhone right next to the controller I could see that I had a strong signal. The controller continued trying to connect, then acted like it had connected only to revert back to showing it was trying to connect a second or two later. I reset the powerline WiFi extender that I think it’s connecting to (I can’t be sure because the extender uses the same SSID as my other access points), but the controller still wasn’t connecting, so I left it sitting there while I ran an errand. About an hour and a half later when I returned I tried connecting from my iPhone again and it worked, so somehow it got itself sorted out while I was gone.
This isn’t the first time since I’ve had it using WiFi that I’ve thought it was locked up because it wouldn’t connect, only to find out that the controller buttons were still working fine and then it would eventually allow me to connect again without me having to reset the controller. When it was connected via Ethernet, I think every time it failed to connect it was due to the controller being locked up. Obviously I would rather have it fail to connect without being due to a lockup that can leave a station running, but either way it’s frustrating that sometimes you can’t access it to see what’s still running or to fire off another zone. I think the ethernet connection is a more reliable connection in general, so if we can get past the lockups I would switch back to ethernet right away.
I agree that this must not be a really widespread problem, and that tends to give some credibility to the theory that the ethernet module is susceptible to specific unusual network traffic. I have quite a number of different devices in my network, and I tend to add new things periodically, so it’s entirely possible that I added a problematic device within the last 6 to 12 months (when I started seeing lockups on my 2.3AC controller). I’m thinking that a hardware based TCP/IP stack ethernet module would likely cure the problem. Ray seems to think that it may not be terribly difficult to implement this because apparently it uses the same API, so code changes should be very minimal. I don’t know what it would do to the cost of the ethernet module… that might be a big negative.
WendellParticipantJohn,
After doing quite a bit of troubleshooting I decided that the wired Ethernet controller was one of the suspects, so I unplugged the ribbon cable and switched to WiFi. Long term I don’t like this solution because it relies on a powerline extender in my garage, but I have been on WiFi for the last 2 days, and sure enough, it has yet to lock up like it had been quite frequently with the hard wired ethernet connection. I’m not 100% confident that the problem is completely gone (I’ll need more time for that), but it sure is looking like the issue may be caused by the ethernet module. My understanding is that the ethernet module uses a chipset that relies on a software based TCP/IP stack. I’m wondering if the built-in WiFi module has hardware based TCP/IP handling?
Yesterday while I was browsing some of the old forum posts I saw that some other people had been reporting lockups similar to what I was seeing, so I don’t think this is a new problem. At least one of the other reported lockup problems seemed to have started after a couple of years of reliable hardwired operation. That mirrors my situation. That person reported that setting up a VLAN with only the OS controller on it solved his problem, so that suggests that some other newly deployed equipment on the network may be sending some type of broadcast message that the software-based TCP/IP code chokes on. My equipment will support a VLAN, but I will have to rewire some things before I can make that change because I have a separate switch downstream from the switch that has the VLAN support.
Due to the random lockups I’ve been experiencing, I have been reluctant to enable my automatic programs. It’s interesting that you don’t seem to see the lockups unless you’re using manual watering. I think I’m going to continue with manual watering for now until I gain a better confidence that the WiFi interface isn’t susceptible to the lockup problem. I have noticed that sometimes it takes *much* longer than it should to connect to the controller from my iOS app, but I don’t know if this is related to WiFi connectivity issues or something else. At least it hasn’t locked up the controller yet!
I haven’t tried Ray’s above suggestions yet, but the only item in my setup he mentions that relies on DNS is the weather server. I would need to switch back to the hardwired connection to validate his suggestion, so for now I’m going to continue with WiFI to ensure that it isn’t susceptible to the bug in the first place.
WendellParticipantI’ve communicated with David via email, and the problem is that the webpage he was hosting to handle the settings is no longer working (something about dropbox shutting down an account). Anyway, the code is on github at https://github.com/littleprojects/Pebble_Opensprinkler, and you can pull this into CloudPebble (which you will have access to when you set up a customer account on the Pebble website). If you edit the app.js file in CloudPebble you can enter your “link” (IP address of your OS), “pass” (MD5 hash of your password… Google for an online MD5 hash generator and key in your password… copy and paste the resulting hash into the “pass” variable in app.js), and “name” of your OS device (I think it defaults to “home”). In app.js in CloudPebble look for a block of code that starts with
if(demo){
This is where you need to enter these 3 things. Now click the disk icon in the upper right to save your changes, then compile the app (from COMPILATION menu on left of CloudPebble screen), then click RUN BUILD, then select PHONE as your destination and click INSTALL AND RUN. Make sure the iOS Pebble app on your phone is set to “Developer (on)” from the … menu in the upper right corner.
If all went well, CloudPebble should connect to your phone and install the OpenSprinkler app on it. From there, you’re ready to use it. It looks pretty cool.
WendellParticipantI bought my OS hardware (it’s still the latest revision sold) quite a while ago and just got around to installing it yesterday. I’ve got it up and running, but I’d love to be able to use my Pebble Time to control it, and I’m running into a problem. Has anyone used David’s OpenSprinkler Pebble app with the current version of Pebble’s iOS app? I can download and install the OpenSprinkler app to my watch, but when I try to go into settings from the Pebble iOS app it tells me to check my internet connection and try again. I’ve got an internet connection, and the watch is connected to the phone correctly. I’m wondering if there’s something in the app that needs to be updated to support the more recent versions of software from Pebble? I’m current on my watch firmware and iOS app from Pebble. TIA for any help resolving this.
Also, on a side note, I’m considering buying an Apple Watch now that they have a waterproof version (I use a watch to track my swimming, so waterproof is essential for me anyway). Does anyone know of an Apple Watch app that controls the OpenSprinkler hardware? I suspect the Pebble watches might be a better fit for in-the-yard system debugging due to the Apple Watch not having a lot of buttons and the touch screen probably not working well with wet or muddy hands, but I can still see where control from an Apple Watch would be useful when you’re not actually out in the yard messing with the system but still want to check on status or run a zone.
-
AuthorPosts