OpenSprinkler Forums Hardware Questions OpenSprinkler Controller lockups / crashes with wired Ethernet module

Viewing 25 posts - 151 through 175 (of 176 total)
  • Author
    Posts
  • #73023

    eddy
    Participant

    I just ordered a new OpenSprinkler (with wired ethernet adapter) yesterday, because my old OpenSprinkler 2.3 (which uses wired Ethernet) has started to behave strangely. Now I’m wondering if the new hardware will also have issues!

    I upgraded to 2.1.9(9) a month or so ago in hopes of the new Ethernet library helping, but no luck. I also added the special “auto reboot” schedule to reboot OS every morning at 1am, but even with that, my OS would 1) become unreachable via the API (but still pingable), 2) respond to API requests with {"result":2} (indicating the password was wrong, even though it’s always been the default), or 3) lock up completely and miss waterings.

    I thought I solved the “unauthorized” issue by setting “Ignore Password” in the Advanced options, but I still run in to the other issues.

    With the heat of the summer water months ahead, I need a reliable watering system. Will your special fixed version based of 2.1.9(7) run on OS 2.3 hardware?

    #73306

    Ray
    Keymaster

    @Water_my_lawn: I apologize for not being able to stay up to date with this issue. Long story short, I’ve been dealing with with severe health issues and haven’t been able to work on the firmware. I’ve got a bit more energy now to come back and look at this issue. Based on what you described, it sounds like UIPEthernet has fixed the issue for you — does the current version of UIPEthernet (2.0.12) work as is, or did you need to fix some code to make it work for you? I’ve been reading the github issues thread but I am confused whether the fix that made it work for you is in the current UIPEthernet branch or has it been reverted.

    In other news, a little while back I’ve made the first version of the firmware that uses lwip which is available in the ESP8266 core 3.0.2. We’ve been testing it and it seems relatively reliable. While this will probably solve the hanging issues for wired Ethernet on OpenSprinkler 3.x, it’s only for ESP8266-based OpenSprinklers, so for OpenSprinkler 2.3 we still need to use either UIPEthernet or EthernetENC.

    In any case, let me know what has worked for you and we can obviously make a version of the firmware based on that for any user who is seeing the same problem.

    #73311

    Water_my_lawn
    Participant

    I have run with the latest code, version 2.1.9(9), and had some hang conditions. I have been communicating with the author of EthernetENC and
    he made some suggestions on where to add some debug code. I have created a version of 2.1.9(9) with the debug code but have not hit a hang yet
    after 2 months of running. I am debugging with EthernetENC now, and no longer the older UIPEthernet. Same author on both cases.

    I wish you well on your health issues.

    #73315

    Ray
    Keymaster

    So you mean running the version with debug code has fixed the issue for you? That sounds puzzling to me: how does adding some debug code fix the issue?

    #73326

    Water_my_lawn
    Participant

    It is not uncommon for small changes in the code to result in masking a problem. This is not a fix and can make finding the actual
    problem a real trick. However, I have only been running for 2 months and in the past I have run for longer before the hang occurred.

    If I could connect a debugger or get a crash dump, I could fix this problem easily. As it is, I can only make small changes and
    hope the little bit of information extracted points to the problem.

    #73372

    Mark Kananen
    Participant

    I got my OS in the mail yesterday and have been reading the forums before installing. Does anyone have any idea how wide spread this problem is. I also noticed on the HA FB pages that some people are abandoning OS because of random crashes. I find it a bit unsettling that this problem has existed for so long.

    #73374

    John K
    Participant

    OS is a very good system, very flexible usage. I use 6 of these currently for commercial production, basically betting my life on these working when the temps are in the 90’s plus….and they do.

    With that being said, yes there are some random crashes, not very wide spread. If you are using it for your lawn at your house and you more or less set it and forget it, worse that will happen (short of the unit actually breaking) will be the need to do a hard reset to get into the app (power switch). Even in those cases, the unit keeps functioning and programs keep running on time. Also there is a program setting you can use to have the unit reset periodically, which will fix the access issue. I use this on only one of my units that seems to every once in a while lose app connection.

    Also, for my use case, I access these controllers from anywhere and on an almost daily basis. I adjust and run programs as needed on the spot and have scheduled programs constantly running. I likely have one of the more intensive use cases that these units deal with and I wouldn’t want to change from this system. Yes, I would like to see these issues resolved, but this stuff comes up as hardware and software is changed over iterations.

    I can not speak to the qualities of comparable systems, for the record.

    #73425

    Mark Kananen
    Participant

    I’m having quite a bit of wired network issues where it will drop into wifi setup mode. A couple times a day. I have been working with some remote stations and I want to say the network issue because more obvious once I started manually starting remote stations. I cannot be sure, but it does seem related.

    #73459

    Mark Kananen
    Participant

    I was having tons of lockups, then I moved the OS to a different router and all the ethernet problems seem to have gone away. In my case I have:

    ATT Router -> TPLink ER605 -> TPLink Switch

    When I moved the OS from the switch up to the ER605, all the ethernet problems seem to resolve. My intention was to put the OS on a vlan as the ER605 support them. I never setup a different vlan for the sprinkler though and just moving to a different router seem to resolve the issues. At least for a few days now.

    #73462

    Mark Kananen
    Participant

    So – the lockups went away, but after further investigation REMOTE station still do not work. But as soon as I unplug the ethernet adapter and go wifi, everything works.

    #73504

    Ray
    Keymaster

    Under the suggestion of @Water_my_lawn, I substituted EthernetENC with UIPEthernet (version 2.0.12) and recompiled the firmware, numbered 2.1.9(10) (i.e. 2.1.9 minor revision 10). So the only difference between minor revision (9) and (10) is the Ethernet library. The firmware is available for OpenSprinkler hardware 2.3 and also 3.2 with wired Ethernet connection. (It’s irrelevant if you use OpenSprinkler 3.2 only in WiFi mode). In any case, if you are on 2.1.9(9) and having frequent lockups or disconnections, give 2.1.9(10) a try and see if it solves the issue. The matter of fact is that we don’t know why some users are encountering problems with (9), we have not been able to reproduce the issue ourselves so without seeing the problem happen we can’t really debug it. I think it’s unlikely the same firmware works for all users (I am sure some users would have problem with (10) as well), so while no single firmware works for everyone, hopefully between (9) and (10) one of them will work for you.

    #77396

    Darian
    Participant

    I have been having similar crashing problems with my hardware version 2.3 Open Sprinkler after upgrading from firmware 2.1.7 to both 2.1.9 and 2.2.0. After reading through this topic I decided to recompile the firmware with every variation of 2.1.9 / 2.2.0 / EthernetENC / UIPEthernet / Debug ON / Debug OFF. There were 8 variations in all and I tested them each until they failed. All of them failed for me after 3 to 4 days of use. I have uploaded them all to my fork of the firmware on GitHub with each variation tagged separately so you can see the differences and try them for yourself.

    https://github.com/darian-au/open-sprinkler-firmware/releases

    My only resort left was to fallback to firmware 2.1.7 with the EtherCard library. This however left me with a different problem where the weather service call failed. My experience was similar to those also reported on the forum.

    https://opensprinkler.com/forums/topic/weather-server-call-failures/

    To determine why it was failing I set up WireShark to inspect the traffic and discovered that the weather server was returning a “400 Bad Request”. On closer inspection I realised that the format of the Request sets the Host Header to “*”. Due to virtual hosting behind Cloudflare I believe it would use the Host Header to determine which backend host to pass traffic onto, so without it being set correctly, it would fail. Firmware 2.1.9 onwards and RPI builds appear to set the Host Header correctly. So I have modified the 2.1.7 code for the ATmega1284P hardware and rebuilt the firmware with a hard coded Host Header string set to “weather.opensprinkler.com” instead of “*”. This fixes the weather problem for me. I have uploaded the firmware to my fork on GitHub along with the tag so you can see the code change.

    https://github.com/darian-au/open-sprinkler-firmware/releases/tag/2.1.7(0)-interim

    In order to build it I had to locate the instructions on the Way Back Machine.

    https://web.archive.org/web/20171021230257/https://openthings.freshdesk.com/support/solutions/articles/5000165132-how-to-compile-opensprinkler-firmware

    Hopefully this will help others with similar problems to myself.

    #78771

    starsoccer
    Participant

    Im also having this issue, so thanks for the above info, it seems like there is also a github thread about this issue, https://github.com/OpenSprinkler/OpenSprinkler-Firmware/issues/249. On that thread if I am understanding everything right there are basically 2 options to either use a patched but unofficial firmware, or possibly to buy a new module with an adapter. I posted in that issue seeking some clarity

    #80054

    hplato
    Participant

    I have a new v3.3 with the hardware ethernet module running the latest version of firmware 2.2.1. This is to replace an older v1 with 2.1.9. I poll the opensprinkler every 10 seconds, and the old one worked like a champ since 2017. No network issues, very reliable.

    After about 2 weeks, I found the v3.3 dropped off the network. It could still be controlled with the local buttons, but didn’t respond to any network requests. A power cycle and the device is back. Has anyone else seen this with the latest hardware and software, or have any suggestions or advice? There was one comment about network switches. I plug the OSP into a managed TP link switch: SG3428X. Don’t know if that is an issue or not, it wasn’t for the older device.

    #80055

    Ray
    Keymaster

    One issue we see once in a while is switches or routers that have PoE(Power over Ethernet) turned on. The OpenSprinkler wired Ethernet module is apparently not compatible with PoE so make sure it’s not turned on at the router/switch that OpenSprinkler is plugged into.

    #80062

    starsoccer
    Participant

    Good to know. Any plans to add support for POE natively?

    Currently I use my opensprinkler with a POE splitter. On the old version and with the old module it would hang but since swapping it to the new one, Ive had no issues.

    #80066

    Ray
    Keymaster

    The PoE support issue is a hardware issue — it’s because the wired Ethernet module, which is an off-the-shelf part, uses a Ethernet jack that’s not compatible with PoE. This is not something updating firmware can resolve.

    #84058

    Bigmaxy
    Participant

    Apologies for dragging up this old thread but it appears to relate directly to the problem I’m having with my 2.3 hardware version (2.2.1 firmware currently) that has been playing up since last year.
    Was there a definitive confirmation that setting an isolated VLAN specific for the sprinkler fixes the issue or whether some other ethernet module can be used?

    Or.

    Is the best option to retire the unit and move to OS3.3/3.4 with Wifi?

    Thanks

    #84059

    Ray
    Keymaster

    If you have many wired devices on your network, the most effective solution is to put OpenSprinkler behind a dedicated router which helps isolate it from the traffic from your other devices. After all OpenSprinkler runs on a small embedded microcontroller that isn’t capable of processing a large number of web packets. The isolated VLAN probably serves the same purpose, though if your router doesn’t support VLAN, getting a secondary router is usually a more cost-effective solution as a basic router is very cheap to buy.

    #84078

    Bigmaxy
    Participant

    Thanks Ray.
    I’ve moved the opensprinkler to it’s own VLAN with nothing else except the gateway (router). Unfortunately I am not seeing any performance improvement using wither the web interface or via the app. I do use a non standard port to access if that makes any difference. But it’s been configured this way for many years with no problem.

    Is there any value is rolling back to any older firmware? Are there any options to try and replace the ethernet module with the same type or another?

    Thanks.

    #84079

    Ray
    Keymaster

    A VLAN can help but only if:
    – It is its own broadcast domain
    – You don’t leak all multicast and mDNS back in
    – You only router-allow the specific traffic OS actually needs

    As I said, if there are many wired devices on your network, the most effective solution is to put OpenSprinkler behind its own dedicated router.

    You said you have OpenSprinkler v2.3 — the Ethernet controller on that version is soldered onto the circuit board and it does not support replaceable module unfortunately. You can certainly downgrade the firmware if that helps.

    #85496

    StephenOz
    Participant

    I have OS2.2 hardware (I prefer having a LAN port) and it has suffered random Ethernet lockups from day one. Due to updates on home network the host header issue raised by Darian become a problem. Anyway running the source code through AI asking about network locks ups produced:

    # OpenSprinkler Firmware 2.1.7 — Patch Summary
    **Target hardware:** ATmega644 / ENC28J60
    **Symptom addressed:** Unit stops responding on ethernet port, requiring power cycle to restore communications. Unit itself continues operating normally.

    ## Files Modified

    | File | Issues Fixed |
    |——|————-|
    | enc28j60.cpp | #1, #2 |
    | main.cpp | #3, #4 |
    | OpenSprinkler.cpp | #3, #6 |
    | weather.cpp | #3, #6 + GetWeather host:port feature |

    ## Issue #1 — ENC28J60 Receive Buffer Pointer Errata (CRITICAL)
    **File:** enc28j60.cpppacketReceive()

    **Root cause:** The ENC28J60 has a silicon errata (Microchip DS80349C, issue #14) requiring that the ERXRDPT register is always written with an **odd** value. The original code used gNextPacketPtr - 1 which can be even, corrupting the chip’s internal receive buffer pointer. Once corrupted, the chip stops receiving all packets until power-cycled — the AVR continues running normally, giving the exact symptom reported.

    **Fix:** ERXRDPT is now forced odd via (gNextPacketPtr - 1) | 0x0001, with a special case when gNextPacketPtr == RXSTART_INIT to wrap correctly to RXSTOP_INIT.

    ## Issue #2 — No ENC28J60 Receive Overflow Recovery (HIGH)
    **File:** enc28j60.cpppacketReceive()

    **Root cause:** The ENC28J60 sets the RXERIF flag and halts its receive engine when the internal RX buffer overflows. No code in the firmware ever checked or cleared this flag, so once triggered the chip would silently drop all incoming packets indefinitely.

    **Fix:** Added an RXERIF check at the top of packetReceive(). If the flag is set, the receive engine is reset (ECON1_RXRST) and re-enabled (ECON1_RXEN) before any packet processing continues.

    ## Issue #3 — Blocking Network Calls Starve the RX Buffer (HIGH)
    **Files:** main.cpp, OpenSprinkler.cpp, weather.cpp

    **Root cause:** Several outbound network operations (IFTTT push notifications, remote station control, weather checks) used a fixed for(int l=0;l<100;l++) ether.packetLoop(...) poll loop. This returned too quickly if the remote server was slow, leaving TCP sessions open and the state machine inconsistent. During these calls the main loop was blocked, allowing the ENC28J60 RX buffer to fill — directly triggering Issue #2. Additionally, the AVR GetWeather() path had no poll loop at all after browseUrl().

    **Fix:** All fixed-iteration poll loops replaced with a 3-second time-bounded loop that also calls wdt_reset() to prevent a spurious watchdog reboot during a slow server response:
    `cpp
    unsigned long _deadline = millis() + 3000UL;
    while (millis() < _deadline) {
    wdt_reset();
    ether.packetLoop(ether.packetReceive());
    }
    `

    ## Issue #4 — Watchdog Does Not Detect Network-Only Lockup (MEDIUM)
    **File:** main.cpp

    **Root cause:** The watchdog is configured in interrupt-only mode and only resets the device after ~120 seconds of total CPU hang. A network lockup (Issues #1/#2) leaves the AVR running normally so the WDT sees regular resets and never fires. The network check interval was also 10 minutes, meaning a lockup could persist for up to 10 minutes before any recovery was attempted.

    **Fix:**
    CHECK_NETWORK_INTERVAL reduced from 601 seconds (10 min) to 121 seconds (~2 min) for faster lockup detection.
    – Added an ether.isLinkUp() fast-path check in check_network() before attempting a gateway ping. A genuine link-down failure is now recorded immediately without burning PING_TIMEOUT milliseconds on a doomed request.

    ## Issue #5 — Shared tmp_buffer (Design Smell — No Active Bug Found)
    **Files:** multiple

    **Assessment:** tmp_buffer (128 bytes on ATmega644) is a single global scratch buffer shared by almost every subsystem. After tracing all usage paths this was found to be safe in the ATmega644 code path:
    push_message() uses its own static postval[] and static key[], not tmp_buffer.
    make_logfile_name() uses the tail of tmp_buffer as a staging area — intentional and correct.
    dns.cpp and dhcp.cpp operate entirely on Ethernet::buffer and never touch tmp_buffer.
    – The master station bit loops only use the first 6 bytes and nothing inside the loop overwrites the buffer.

    No fix required, but noted as a maintenance risk if the codebase grows.

    ## Issue #6 — Incorrect HTTP Host Header (LOW)
    **Files:** weather.cpp, OpenSprinkler.cpp

    **Root cause:** Several browseUrl() calls passed PSTR("*") as the HTTP Host: header rather than the actual server hostname. Some servers reject requests with Host: * with a 400 error, leaving the TCP session in a half-open state that can subtly degrade subsequent network operations.

    **Fix:**
    GetWeather() (AVR path) now passes the parsed weatherHost string.
    switch_remotestation() now builds a dotted-decimal host string from ether.hisip (already populated just before the call) and passes that instead.

    ## Feature: GetWeather() host:port Support
    **File:** weather.cpp (AVR #if defined(ARDUINO) path)

    **Change:** The ADDR_NVM_WEATHERURL value now supports both hostname and ip:port formats. The URL is parsed on each call:
    – If a colon is present, the string is split into host and port; port defaults to 80 if parsing fails.
    – If the host portion is a bare IPv4 address (detected by isIPAddress() — digits, dots, exactly 3 dots), ether.hisip is populated directly, bypassing DNS entirely.
    – If the host is a hostname, ether.dnsLookup() is called as before, and the call is aborted cleanly if DNS fails (previously a failed DNS lookup was silently ignored).
    ether.hisport is set to the parsed port for the duration of the request and restored afterward.

    ## Notes for Testing
    – Flash the firmware and monitor the serial debug output (DEBUG_BEGIN(9600)) — DNS failures and network check results are logged.
    – To exercise the ERXRDPT fix, sustained traffic (e.g. repeated browser refreshes) previously triggered the lockup within minutes to hours depending on packet alignment. This should no longer occur.
    – The network check now runs every ~2 minutes. If the gateway ping fails 3 consecutive times, start_network() is called to reinitialise the ENC28J60. If it fails 6 consecutive times, a safe reboot is scheduled (waits for no active program before rebooting).
    – To use a local weather server by IP: set WEATHERURL to 192.168.1.x:port — no trailing slash or path required as /weather is prepended by the firmware.

    Few of these might relate? My problem is getting the original or the modified code to compile. Side note, I did not ask about the host header issue but it was noticed and patched.

    #85497

    Bigmaxy
    Participant

    I went back to 2.1.6(2) and have not had a single problem since. Once I’ve made it to winter I may look at upgrading version by version to see if I can land on the latest stable for me. But for now, it’s been rock stable which has meant more than any other differences in the later firmware version.
    PS. I also put it back on the IoT VLAN with all of the other devices

    #86003

    ipilcher
    Participant

    I’m about at the end of my rope with this thing. The Ethernet module seems to completely lock up every time it rains, which almost always involves some thunder and lightning here in Texas. I’m using a 3.4 AC with firmware 2.2.1(3). (I haven’t bothered to update to 2.2.1(4) yet; there’s nothing in the changelog about the Ethernet module.) I never had problems like this with my old 2.3 AC unit.

    Any suggestions before I give up? (I hate to go down the proprietary, cloud-based route, but I need something that works.)

    #86005

    Ray
    Keymaster

    Honestly this is the first time I’ve heard that the Ethernet lockup is associated with rains (which involve thunder and lighting). If the correlation is true, this almost certainly means there is some sort of interference issue. I don’t know what effectively solves the issue since this is the first time I’ve heard of it. But our general wired Ethernet troubleshooting instructions are here:
    https://opensprinkler.github.io/OpenSprinkler-Firmware/troubleshooting/#connectivity
    scroll down to “Wired Ethernet connection issues”.

    Given that this could be an individual wired Ethernet module’s issue, you are welcome to submit a support ticket to get a replacement Ethernet module and see if it makes any difference.

Viewing 25 posts - 151 through 175 (of 176 total)
  • You must be logged in to reply to this topic.

OpenSprinkler Forums Hardware Questions OpenSprinkler Controller lockups / crashes with wired Ethernet module