Forum Replies Created
-
AuthorPosts
-
July 28, 2020 at 9:24 am in reply to: Controller lockups / crashes with wired Ethernet module #67581
RayKeymasterAfter 2 days of non-stop debugging, I think I am finally getting closer to the bottom of the issue. There are two main discoveries:
1) The UIPEthernet library for ENC28J60 does seem to have trouble when there are a constant influx of UDP broadcasts. On some networks, there aren’t that many broadcast traffic, so it works fine; but on other networks, there are lots of broadcast traffic, so eventually it goes into a corrupted state, which is the source of the hanging issue.
Stefan’s firmware (osefw2194_20200722.bin) uses a tweak of the UIPEthernet library that disables incoming broadcasts and that’s why it’s not prone to the issue. This probably explains why it has lasted much longer on your network. However, completely disabling UDP broadcasts has a downside, which is the second discovery below.
2) DHCP relies on UDP broadcasts, so if UDP broadcast is disabled, then DHCP renewal will fail, and that can lead to a stall. The reason this leads to a stall is because Ethernet.maintain() function is being called at every loop iteration:
https://github.com/OpenSprinkler/OpenSprinkler-Firmware/blob/master/main.cpp#L421
its main job is to handle DHCP renewal requests. When DHCP renewal fails, each call will stall for 60 seconds, but then when it comes back to the loop, it will call it again, which is another 60 seconds of non-responsiveness. The UIPEthernet library document never says how often Ethernet.maintain() should be called, it just says call it on a regular interval. So it wasn’t immediately clear to me the consequence of calling it at every loop iteration.With these two discoveries, I’ve now modified the firmware and made firmware 2.1.9 revision(5). It has the following main changes:
A) Disable handling of UDP broadcast most of the time but only enable it temporarily during DHCP events. This way, most of the time the firmware is not affecting by influx of DHCP requests.
B) Change the code to call Ethernet.maintain() only once per hour to process DHCP renewal requests. This way, even in the case of renewal failure, it won’t go into an infinite loop of stalling.
C) There are also a few other improvements, such as improving DNS functionality, clean up send_http_request function, and for OS 3.x supporting LCD dimming or turn off LCD when the controller is idle (to help preserve the lifespan of OLED displays).
p.s. after I did A) above, I accidentally discovered that this is also what EtherCard library does:
https://github.com/njh/EtherCard/blob/master/src/dhcp.cpp#L421
that it only enables broadcast when processing DHCP. I am surprised that UIPEthernet doesn’t do this, but now I’ve added this feature.D) If you are currently using Stefan’s firmware (osefw2194_20200722.bin), since it disables UDP broadcasts completely, you can either set static IP on OpenSprinkler, or set a DHCP reservation on your router, and then reboot your OpenSprinkler. This way it won’t incur DHCP renewal requests so need handling of broadcasts.
In any case, I’ve uploaded the current version of firmware 2.1.9(5) to the experimental firmware folder:
– for OS 3.2, it’s at: http://raysfiles.com/os_compiled_firmware/v3.0/experimental/ (note: the rev5_enc28j60 file, not the w5500 file!)
– for OS 2.3, it’s at: http://raysfiles.com/os_compiled_firmware/v2.3/experimental/Feel free to give it a try and see if it addresses the hanging/locking issue. I’ve tested it myself for about 2 days now, but obviously it needs longer-term testing.
Notes: this firmware is largely meant for controllers with ENC28J60 ethernet module, which includes OS 2.3, and OS 3.2 with ENC28J60 wired Ethernet module. You do NOT need to try this firmware if you use WiFi only, or if you are using the experimental W5500 Ethernet module (so far only a very small number of users are trying out W5500 as far as I am aware).
RayKeymasterThanks for the list. My understanding is that RPi 1 Model A or B (which has 2×13 pins) has long been discontinued, I am not sure where to buy them actually. All other RPis use 2×20 pin header, and should fit in the current enclosure. This includes RPi 1 model A+, which also uses 2×20 pin header.
Also we have started shipping OSPi version 1.5 since a few weeks ago. Version 1.5 uses an improved switching regulator and can provide up to 2amp current on +5V line so should be able to drive all RPis without needing external USB power supply.
Though, technically, to really deliver 2amp it would draw quite a lot of current from the input 24VAC adapter as well. For example, to deliver 2amp @ 5V and assuming 70% efficiency of switching regulator, it will need to draw about 600mA from the input 24VAC, which is a significant amount.
July 27, 2020 at 12:25 am in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67549
RayKeymaster@bena: it just occurred to me that when your controller lost connection, you should check if the IP address may have changed — maybe your router assigned a different IP to it upon DHCP renew. You can always click B1 on the controller to find out. If the IP indeed changed, you can either set a DHCP reservation on your router to reserve a fixed IP, or set a static IP on OpenSprinkler.
RayKeymasterFirst, I suggest that you simply leave the flow pulse rate as 1. This really doesn’t matter, it’s just a scaling factor. The volume in the end is the number of pulse times the flow pulse rate, so by leaving the pulse rate as 1, the volume you are reading in the end is simply the pulse count, you can multiply that by the correct pulse rate yourself. Again, this is really just a scaling factor.
The flow indicator you see on the screen always has a couple of seconds of delay, because the screen refreshes every second or so.
You should estimate roughly how many pulses the meter generates per second. If it generates a large number of pulse per second (by large, I mean more than 50 to 100 per second), then I am afraid it’s going to lose some clicks because the firmware uses a combination of interrupt and polling to handle flow sensor, so pulses that come too fast are not going to work very well. Maybe you should choose a flow meter that has lower pulse per gallon.
July 23, 2020 at 12:27 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67489
RayKeymasterYes, it’s possibly related. I’ve just posted a script that can be used for diagnosing certain firmware issues:
https://opensprinkler.com/forums/topic/useful-script-for-testing-opensprinkler-api/
you can try the ‘Json log (jl)’ API, start with ‘Today’, then go back to 1 day, 2 days and so on. It’s possible that at some point the json log data gets truncated, becoming invalid. If the 2K theory is correct, you can check the number of characters in the log data at that point and it’s probably more than 2K. Updating your firmware to the Jul18 version should fix this issue.
RayKeymasterThis is typically because some zone is drawing too much current when opening, causing the power supply voltage to drop significantly, thus triggering a reboot of the controller. An easy way to tell is to unplug the zone terminal block, let the controller run without physically connected to the zones, and see if this still happens.
July 23, 2020 at 11:02 am in reply to: Controller lockups / crashes with wired Ethernet module #67483
RayKeymasterSince you didn’t specify which version of OS you have, I assume you have OS 3.0 — if you have OS 2.3, the update is done in a different way (through USB port). Next, assuming yo have OS 3.0, please note that firmware update on OS 3.0 can only be performed in WiFi mode, it cannot be performed when wired Ethernet module is plugged in. This is explained in the update instructions:
https://openthings.freshdesk.com/support/solutions/articles/5000832310-opensprinkler-3-x-firmware-update-guideJuly 22, 2020 at 7:37 am in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67464
RayKeymasterTry to change the log period to a day. Br default it’s 7 days. If you have a lot of log records over 7 days it will not be able to retrieve that many. Restricting it to one day and see if it shows up.
July 20, 2020 at 5:05 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67435
RayKeymaster@bena: the network register values (ESTAT.BUFFER and EIR.RXERIF) are only for ENC28J60. These don’t exist for W5500, which uses hardware TCP/IP stack. I don’t know what’s the root cause of the issue with W5500 since so far I haven’t heard of hanging issue with W5500 from the other users.
One thing you can try is when the hanging happens, open a browser and type in:
http://x.x.x.x/ja
or
http://x.x.x.x/su
where x.x.x.x is your OpenSprinklers IP. If you use a port that’s not 80, you also need to explicitly specify the port number, such as x.x.x.x:port_numberSee if you get a response. If so, that means the Ethernet controller is still working, the issue is somewhere else.
July 18, 2020 at 9:47 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67408
RayKeymaster@John K: I’ve updated the firmware — it’s at the same folder, and I added today’s time stamp so we know this is the most recent version:
http://raysfiles.com/os_compiled_firmware/v3.0/experimental/
The Ethernet library’swrite
function is technically incomplete: it’s supposed to be able to send a buffer of any size, but apparently it only sends the first 2048 bytes of it, resulting in incomplete Json data. I’ve raised this issue in their Github library:
https://github.com/arduino-libraries/Ethernet/issues/141
and I’ve fixed it by adding a loop until the entire buffer has been sent out. I’ve tested the new firmware with 20 programs and it works fine so far.
@bena: given your description, the controller is at least not freezing, it just stops responding to web request. There are several possibilities, one is if you have a relatively large number of zones and/or programs, maybe the issue you encountered is the same as what John K reported. An easy way to tell is to export your configurations to a file and open it to check how many characters it has. If it’s larger than 2048, it’s very likely the same issue, and just flash the Jul18 version of the w5500 firmware. That should fix it.July 18, 2020 at 8:36 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67405
RayKeymaster@John K: I figured out why the Json data is incomplete when you have a large number of programs — apparently by default the Arduino Ethernet library (which supports W5500) limits each ‘send’ to a maximum of 2048 characters — even though OpenSprinkler firmware’s send buffer is 8192, the library caps the number of characters to send each time to 2048. As a result, when you have a large number of programs, it’s being truncated so the Json data is incomplete (therefore the homepage can’t render correctly). UIPEthernet (which supports ENC28J60) does not have this limit. Gotta love how different libraries make different assumptions. Anyways, it’s an easy fix and I will update the firmware shortly to increase this limit and that should address the issue.
July 18, 2020 at 8:10 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67404
RayKeymaster@bena: it would help if you can provide some details about the symptoms of the lockup, for example:
– does the controller respond to button clicks?
– is the time displayed on the lcd correct? if not, how far is it from the current time?
– does the controller respond to ping?
RayKeymasterIf you want watering percentage to be adjusted on a day to day basis, I don’t think it’s feasible to implement this in the firmware because it would require storing a full calendar and it would complicate the UI design. However, the easiest solution is to use an external script and OpenSprinkler HTTP API to send the daily percentage value to OS that way. The API document is here:
https://openthings.freshdesk.com/support/solutions/articles/5000716363
to send watering percentage / level, you can use the /co (change options command) and the parameter is wl (water level), like this:
http://your_os_ip/co?pw=md5_hash_of_your_password&wl=value_you_wantMost scripting languages support sending HTTP command. Similarly a lot of custom features can be implemented by using an external script and leveraging the HTTP API, instead of requiring these features to be implemented in the firmware.
July 17, 2020 at 8:47 am in reply to: Controller lockups / crashes with wired Ethernet module #67381
RayKeymasterThanks, Stefan. I will give it a try shortly.
I went through all computers on my network that has Dropbox installed, and turned off the ‘Lan sync’ flag. Since then the ENC28J60 register values on my test OSE has been in clean state (i.e. the two bits are 0) and I no longer observe the corrupted state. So it seems at least for me, the Dropbox ‘Lan sync’ is the culprit. In fact, I have further verified it by turning Lan sync back on, and almost instantly I observe the register bits get set to 1.
I think it’s because the current UIPEthernet library has trouble dealing with a large number of broadcast requests, this results in it not able to clear register bits promptly, eventually leading to a lockup state. While we are still trying to modify the library to address this issue, the other users who experience this issue can check if you have Dropbox installed on any computer. If so, try to turn off the ‘Lan sync’ flag (in Preferences -> Network), then reboot OS so it starts in a clean state. The Lan sync feature is meant to allow computers on the same local network to sync files between each other faster, even if there is no Internet connection, so it’s ok to turn it off.
RayKeymasterThe current opensprinkler (revision 3.2) no longer has the pin header for RF receiver. It’s because that opening is now used to pass through the wired ethernet module cable. So unfortunately you can no longer insert the RF receiver to it.
July 15, 2020 at 8:50 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67354
RayKeymaster1) Mac address: neither the ENC28J60 nor W5500 chip has hardware MAC, so yes it needs to be software defined. The Mac is derived from ESP8266’s Mac, with only the last byte differing from each other:
https://github.com/OpenSprinkler/OpenSprinkler-Firmware/blob/master/OpenSprinkler.cpp#L442
so it’s not totally made up from nowhere, but it’s based on the true Mac of ESP8266.2) Configurations: the firmware only triggers a factory reset if upgrading across major revisions (like from 2.1.9 to 2.2.0). Within the same major revision, the configurations are preserved, because flash structure and all file formats are the same within the same major revision so there is no need to reset flash.
3) John reported issues when the configuration is large (i.e. large number of programs and/or zones). I haven’t had time to look at it yet but will import some large configuration file to check. Meantime, if you can see the homepage that means it’s fine. I would suggest not adding more programs or zones until I figure out the issue.
July 15, 2020 at 3:54 pm in reply to: Controller lockups / crashes with wired Ethernet module #67350
RayKeymasterAlso, as I said I have never been able to reproduce the symptoms you reported — when my test controller locks up, it still responds to button clicks, runs programs fine, displays time correctly, it just locks up w.r.t. web requests. I’ve now gone through three OS 3.0s and one OS 2.3, run my test script repeatedly on them, with IFTTT notifications enabled. I’ve never seen the random zone running problem you reported. So I highly doubt it’s a common problem with the firmware (otherwise I would have heard more reports from other users). If it’s a firmware problem, I unfortunately cannot reproduce it, and without seeing it happening I can’t debug it and find out what’s going on.
July 15, 2020 at 3:44 pm in reply to: Controller lockups / crashes with wired Ethernet module #67346
RayKeymasterNote that ‘buffer overlow flag’ refers to the buffer on the ENC28J60 module, it has nothing to do with the microcontroller’s memory. I doubt there is any memory corruption issue on the microcontroller, otherwise it would have behaved strangely on WiFi mode as well, or would show up even if I isolate OS from the primary network. I suspect it has something to do with UIPEthernet library not handling certain conditions correctly, like not clearing register bits or handling certain conditions that arise when there are too many broadcasts messages and so on.
What you mentioned about “look at the input buffer to see if it’s declared; check if buffer has enough room’ — yes, of course, these are the basic steps anyone has to do when writing a C++ program. As I said, ‘buffer overflow’ is NOT referring to microcontroller’s buffer, it refers to ENC28J60’s receiving buffer (hence the receiving error flag is always raised together with the buffer overflow flag). This is a hardware buffer, not allocated by the program, but exists at fixed size on the Ethernet chip. It’s not something that we can declare the size for.
July 15, 2020 at 3:41 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67345
RayKeymasterNo this can’t be the buffer overflow issue, because the buffer in this firmware is large enough to fit the maximum number of programs and maximum number of zones. I will check later this evening. Since the firmware was prepared in a hurry, I didn’t check what happens when the configuration is very large. But it should be easy to debug now that I know at least the symptom.
July 15, 2020 at 11:56 am in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67339
RayKeymasterIt sounds like the controller is alive but there is a corruption in the returned json data. To verify, you can try:
http://x.x.x.x/ja?pw=yyyyy
where x.x.x.x is your OpenSprinkler’s IP address (with :port if you use a port different than 80), yyyyy is the md5 hash of your device password. You can use https://www.md5hashgenerator.com/ to generate the hash from your password.The return should be a long json string. You can paste it to:
http://json.parser.online.fr/
to verify if there is anything invalid.If something is corrupted you will most likely have to do a factory reset (though, if you know how to use HTTP API, you can issue an HTTP API command to just reset whichever option is corrupted)
July 15, 2020 at 12:18 am in reply to: Controller lockups / crashes with wired Ethernet module #67333
RayKeymasterSo I found something today which I think is really interesting: if you took a look at my post above (https://opensprinkler.com/forums/topic/controller-lockups-crashes/page/2/#post-67254), I suspect that the ENC28J60 register values: buffer overflow error flag ESTAT.BUFFER and receive error flag EIR.RXERIF, are indicators that can tell if the controller is in an erroneous state which after some time of running will eventually lead to lockup. I will call the state when these two bits are 0 as ‘clean state’, and the state when these two are 1 as ‘corrupted state’.
My experiment was to find out what causes the corrupted state to happen in the first place. I know that when I use a secondary router to isolate OS from the rest of my primary WiFi network, it’s always in clean state (or at least during the testing started a few days ago, it has always been in the clean state). On the other hand, if OS is connected to my primary router, it goes into corrupted state shortly after booting.
So I started by unplugging / turning off all WiFi devices leaving only OS on the primary router. Sure enough it stays in clean state. Then I turned each WiFi device back on, one after another, restarting OS each time to observe if and when it goes into the corrupted state. Interestingly, most devices don’t cause any trouble, except my two MacBooks, and a Linux computer — as soon as I turn on WiFi on these computers, the test OS goes into corrupted state.
What’s interesting is that I have two other Linux computers that don’t cause this symptom. Comparing what are installed on these computers led me quickly discover that it’s Dropbox that makes the difference. This can be reliably reproduced: if I quit Dropbox, then reboot OS, it doesn’t go to corrupted state; if I leave Dropbox on, OS goes into corrupted state shortly after booting.
Using Wireshark, I saw Dropbox sends a lot of Dropbox lan sync discovery protocols (DB-LSP-DISC). I have strong feeling that this is the root cause of the problem. Although I haven’t found a way to address the issue yet, this at least gave me a way to reliably reproduce the corrupted state, which I highly suspect will eventually lead to the controller locking up. Apparently there is a way to turn off the sync protocol in Dropbox so that’s what I am going to try next.
Anyways, still digging the issue and trying to figure out a solution, but at least feeling a bit closer.
RayKeymasterYes it’s possible, but someone has to write code for it.
July 14, 2020 at 2:51 pm in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67322
RayKeymasterOpen a browser on your PC, and open the console — for example, in Chrome, go to Menu -> More Tools -> Developer tool, then click to open the Console tab. Then type in your controller’s IP address (and :port if you use a port other than 80). The console should show error messages.
July 14, 2020 at 10:04 am in reply to: Instructions for testing OS 3.2 with W5500 Ethernet module #67314
RayKeymasterPost updated with 3D printed enclosure.
July 12, 2020 at 3:20 pm in reply to: Controller lockups / crashes with wired Ethernet module #67281
RayKeymasterI’ve started a new thread with W5500 instructions:
https://opensprinkler.com/forums/topic/instructions-for-testing-os-3-2-with-w5500-ethernet-module/
including where to download the experimental firmware. -
AuthorPosts