Network problems | OpenSprinkler

Tagged: IPV4

This topic has 31 replies, 2 voices, and was last updated 4 years, 5 months ago by Ray.

Viewing 25 posts - 1 through 25 (of 32 total)

1 2 →

Author

Posts
April 14, 2016 at 12:36 pm #42034

Domenic
Participant

Hello,
I recently bought an OpenSprinkler DC (FW = 2.1.6 (1), HW = 2.3-DC) and hooked it up to my network (hardwired Ethernet network with a static IP). I haven’t done any programming on it, I simply configured the network settings and it’s just been sitting on my network for a couple weeks (until I get some time to actually hook it up to my irrigation system later this month). So basically it’s just been idle on my network. I have PRTG network monitor running on my network and have been getting alerts on a daily basis that the OS stops responding to pings. This lasts for roughly 5 to 30 minutes, and then the OS comes back up. The OS is connected to a UPS battery backup so power is not a problem. (I have also tried the usual basic stuff: swapped patch cable, tried a different switch port, tried a different switch.)
What could be the issue? I’m willing to troubleshoot if someone could provide some guidance.

April 17, 2016 at 11:28 am #42067

Samer
Keymaster

Honestly, this is sometimes seen and I believe it’s due to the Arduino network stack being used because I also experience small windows of downtime but they are usually short lived.

Personally I’m using a very old phone cable with 5 pairs of wires to handle Ethernet which probably explains the intermittent connection. Not sure if that helps but maybe a cable or something in the path isn’t reliable or maybe the software on the OS isn’t keeping up.

April 18, 2016 at 1:37 am #42098

Ray
Keymaster

The microcontroller-based OpenSprinkler runs a software TCP/IP stack on a 16MHz microcontroller, so it has limited capability to handle network requests. On top of that, it needs to perform certain tasks, such as pinging the router periodically to make sure it’s still connected, and perform NTP sync, and check weather. While it’s doing these tasks, it won’t be able to respond to pings. I am not sure how frequently you are pinging the controller, but frequent pinging will obviously flood the controller causing it to stop responding. In the end this is not a computer running a Linux system, it’s a small embedded system.

April 29, 2016 at 1:29 pm #42253

Domenic
Participant

I’m pinging every 5 minutes, which doesn’t seem excessive to me. To be frank, I’m surprised that the controller wouldn’t be able to keep up with these basic housekeeping tasks. It sounds vastly under-powered for the task. I mean, that’s the whole point of a network connected irrigation controller, no? I still consistently get an approx 10-20 minute “outage” on a daily basis, and that is with zero programs running. I don’t have it connected to my system yet, it’s just idle on the network. I’m concerned that once I put this into operation these outages will get longer and/or more frequent.

May 2, 2016 at 10:16 pm #42297

Ray
Keymaster

The firmware is set to perform a number of networl-related tasks per minute. If at the time of pinging it happens to be doing one of these tasks it won’t be able to respond to ping requests. It’s a single 8-bit microcontroller, not a Linux system.

May 5, 2016 at 5:08 am #42326

Kevin
Participant

I have been noticing this since I started using a network monitor tool. As you can see in the screen shot, Opensprinkler losses network connectivity multiple times throughout the day. I really hope this can be fixed.

May 12, 2016 at 2:35 pm #42408

Domenic
Participant

Based on Ray’s response it looks like a fundamental limitation in the hardware, so I don’t expect this to be resolved. The fact that it’s so under-powered that it can’t respond to a ping when it’s doing other tasks boggles my mind. I would expect this from a $25 device, but not a $160 one (+ shipping + USD/CAD exchange). Unfortunately, I will be returning mine for a refund.

May 12, 2016 at 8:50 pm #42414

Ray
Keymaster

@Domenic:

Honestly, I thought you were joking when you said “I would expect this from a $25 device” — even a 4-station sprinkler controller with no network connectivity of any sort would cost more than $25…

Also, it’s unclear from your graph how frequently you are polling the unit. All that I see are messages such as heartbeat lost, heartbeat recovered, but how frequently are you polling? If it’s 1 loss out of 60 polls, that’s pretty normal in my mind.

May 12, 2016 at 8:56 pm #42415

DaveC
Participant

A different perspective, FWIW.

I’m using the Arduino OS with 24 zones. I run an automation app that talks to the OS 24/7. It polls OS at the rate of 17 messages every 5 minutes. Since I’m still tinkering with capability I keep stats on the message activity. Looking at total messages sent and # of messages that failed to get a response in the most recent uptime period:
Up 8.5days. 41616 poll messages sent. 26 message failures.

During this period of time I also used the phone app and the browser GUI to make programming changes so there was even more message traffic at times.

Yup, I think the OS makes server calls more frequently than needed (once an hour)
Yup, I wish that OS had an async notification method to let me know when to request updates. This would cut down drastically on polling.

However, the message failure rate is still very low and hasn’t caused any issues in sprinkler operation or my ability to monitor and control the device.

May 13, 2016 at 4:41 am #42434

Kevin
Participant

I’m not sure the frequency of the polling. However, look at how long some of the down time is. Some are over 40 minutes. That’s not acceptable in my book.

Also, it’s unclear from your graph how frequently you are polling the unit. All that I see are messages such as heartbeat lost, heartbeat recovered, but how frequently are you polling? If it’s 1 loss out of 60 polls, that’s pretty normal in my mind

May 13, 2016 at 8:21 pm #42448

Ray
Keymaster

I honestly don’t know what’s the cause, and not knowing the polling frequency gives me little information about this ‘heartbeat’ app you are using. It could be due to your specific network setup. We just had a customer reporting network issue, and after diagnosis, it turns out it’s due to IP conflict (someone on the same network manually set a static IP that conflicts with OpenSprinkler’s IP). If you want, send it back to us and we can perform a thorough check to see what has caused the issue.

May 14, 2016 at 5:26 am #42453

Kevin
Participant

The name of the app I am using is called Domotz (domotz.com). That app monitors my whole network and there aren’t any other devices on my network having this issue. Also, let’s remember I am not the only one having this issue, so I really don’t think it’s an issue with my particular opensprinkler device. Is it possible for you to maybe setup an opensprinkler device in a networked environment and monitor it with a network monitor and maybe you can see the issue and the cause ? Thanks for the help.

May 14, 2016 at 7:35 am #42454

DaveC
Participant

My OS doesn’t show these outages, just the occasional expected message failures described in a previous post. Maybe it’s useful to look at things that are different in either the OS setup or what might going on in the network that the OS may be reacting to. Here’s some info about my setup:

OS config: Static address, default http port, ntp sync, manual weather adj.
Its connected via wire to a switch along with about 12 other devices. AP (phones, tablets, notebooks), automation controllers, NAS, A/V devices, printer, PC.
It’s a single sub-net with mixture of static, DHCP and DHCP reserved addresses. External access is via VPN (no port forwarding).

I don’t explicitly ping the OS. Communication with OS is periodic polling via the API as described previously.
In the past I’ve monitored the OS’s hourly server calls. Very infrequent failures.

I tried sending some regular pings to the OS to see if caused any message failures. I didn’t do this for very long though long enough for it to overlap with my normal polling cycle. Nothing.

I’m willing to experiment some though I don’t have any suggestions as to what network packets the OS might have an allergic reaction to.
Ray, got any ideas?

May 14, 2016 at 2:39 pm #42459

Domenic
Participant

I have a very standard setup with a number of devices connected to a Cisco sg200 switch. I also have a POE switch with a few IP cameras. All on static IPs. I use prtg to monitor the devices, typically pinging each device every 5 min. Prtg can also send http requests but I did not enable that for my OS. I can honestly say that the open sprinkler is my only device that has ever gone offline during normal operation. This includes my cheapo USB/network “print server”. I’ve changed ports and the patch cable, just in case, with no effect.
My goal was to integrate the OS into my homeseer HA system (there is a free plugin available) but it won’t fly with these frequent outages.
Reading through other forum posts tells me this is not an uncommon issue. My guess is that most people don’t run network monitoring tools so they just don’t notice it.

May 15, 2016 at 11:30 am #42467

DaveC
Participant

My interest here is to help see if there is an OS issue that is causing the apparent longer duration outages.

I don’t ping the device but I am sending messages to it 24×7 at a rate of about 1 every 30s. Since I don’t see these extended outages, only occasional missed messages, it seems unlikely that the outages are the result of OS being busy doing its normal tasks. It seems more likely that the OS network stack is becoming ’busy’ due to not handling some combination of events correctly.

We know that 2 people who are using tools that ‘ping’ the OS at some rate see similar symptoms. It’s seems useful to understand what these tools are doing more detail.

A quick look at the Domotz FAQ:
Q: What is the difference between a missed Heartbeat and a device being Down?
A: The way the monitoring of devices work is that we send six (6) pings every 30 seconds to each device on the network. If one or more of those pings are returned, then we count this as a Heartbeat. A device can “miss” three Heartbeats without being marked as Down. If a device doesn’t return any of the pings within a two (2) minute period, then the device is marked as Down. If the device starts to respond to pings again, it gets a new Heartbeat and is marked as Up again.

Putting aside the question of why you’d want to send 6 pings every 30 seconds to every device in a home network, this info provides some view into what Domotz is doing. A quick look at PRTG didn’t yield as much insight.

The common symptom seems to be that sending regular pings to OS causes it to stop responding to pings for a variable period of time. I’ve started an experiment where I’m sending regular pings to my OS (1 every 30s). The ‘ping’ tool logs the success and failure of each ping. My automation app is also running with its statistics. If there are ping failures, it will be interesting to see 1) Do I also see extended outages 2) Does my app show missed messages that correspond with the ping failures.

If I don’t see any issues after a few hours, I’ll increase the ping rate.

Keep you posted.

May 15, 2016 at 11:56 am #42469

Kevin
Participant

Thanks for your testing Dave. I appreciate the effort and I’m eager to hear your results. I will say this. My Opensprinkler device was going down before I started using the network monitor tool. I found this out because there were times I tried to connect to it via a browser or the app and it wouldn’t respond. I couldn’t figure out why until I started using the network monitor tool.

Just as an FYI, I have the Opensprinkler device connected via cat 5e cable to a POE switch (DHCP assigned). Also connected to that switch are various networking devices (ex. NAS, 4 IP cameras, Tivos, computers, sonos).

May 15, 2016 at 2:18 pm #42470

DaveC
Participant

I’ve run 2 hours pinging every 30s mins, then 1.5hrs pinging every 15 secs, and my app polling 17 message every 5mins throughout. No failures and no missed messages. In addition to this, Kevin says that he had problems trying to connect and that it what led him to monitoring.

I think we can make the assumption that OS’s regular network task processing is unlikely to be the direct cause of the outages. I’m back to an hypothesis that some kind of network message/packet is the stimulus for OS to go busy. Since I never see this behavior, sniffing network activity on my network isn’t going to be of much help. Seems like we might be looking for some kind of broadcast or multi-cast message not specifically for OS but that it tries to process.

PTRG seems to have some form of packet sniffing. Domotz does not appear to have any. Perhaps PTRG might provide a clue of what was going on in the network when OS first stops responding.

It might also be useful to know the period of outages. How often they occur and what their duration is. Kevin, the data you provided above is a small window of this. It might be helpful to see it over a couple of days. Does Domotz have the option of providing a log in text form, like a .csv, that would be easy to digest?

May 16, 2016 at 9:50 am #42477

Domenic
Participant

I actually noticed these outages first when I looked at my Homeseer logs (I had installed the Opensprinkler plugin, which polls the OS for status on a regular basis). While checking my Homeseer logs, I noticed a few “could not communicate with opensprinkler” errors. At this point, I actually disabled the Homeseer plugin, in case it was somehow overloading the OS. This is when I pointed PRTG at the opensprinkler, and started with the pings every 5 minutes. Per my original post, the outages occur about once per day from approx 5 to 30 minutes. I have observed this consistently at some point each day that I had it on my network (from mid March to last Friday). The timing of these events does not appear to have any regular periodicity.
I understand that the OS may respond differently to ping than other requests, and that not replying to a ping may not necessarily mean that the OS is “offline”. However, for me, the acid test is connectivity with my Homeseer installation. My intention was to integrate it into my HA system. For that, I need to be able to communicate with OS 100% of the time. I wish I had more time to play with this, but it’s the start of irrigation season here so I just want something that works.

May 18, 2016 at 10:39 pm #42527

Ray
Keymaster

Let me re-iterate that OpenSprinkler uses a 8-bit microcontroller with a classic Microchip Ethernet controller that doesn’t have hardware TCP/IP stack. Instead, it uses software defined TCP/IP stack. The other devices on your network you are referring to are probably all based on 32-bit microprocessors running some sort of Linux variants. It’s really unfair to compare OpenSprinkler with these other devices.

And if you ask why OpenSprinkler is based on such underpowered hardware: it’s because it started as an open-source DIY kit, which users can buy and solder every single component to build their own OpenSprinkler. Today it’s still the only open-source sprinkler controller on the market. Look around and if there is another open-source sprinkler controller (which means both the hardware schematic and software are publicly available), let me know. The DIY kit restricted the number of options for choosing chips: only through-hole chips can be selected, and Microchip’s ENC28J60 is the only Ethernet controller that has a through-hole package. If it didn’t start as a DIY kit, I would have chosen a completely different set of components.

The Ethernet is based on the open-source EtherCard library. It’s not perfect, but it works reasonably well. I’ve personally done a lot of ping tests myself, and I’ve never seen the kind of ‘down’ times as you described. So I suspect the issue may be specific to your particular unit, and maybe to your particular network setup. Also, not understanding how these heartbeat and Homeseer apps work does not really help me figure out the cause: perhaps when a ping fails, they continuously send multiple pings trying to get an answer, and these multiple pings will only make the situation worse by saturating the controller.

In the end, if you feel OpenSprinkler is not for you, simply return it and we are happy to issue a full refund. As much as I want it to be, I have to be honest and admit that the product is not a perfect solution for everyone.

May 19, 2016 at 4:55 am #42542

Kevin
Participant

Hi Ray,
Let me just say that I love my Opensprinkler. I was not complaining in any way. In fact, I would have never even posted about it, if it wasn’t for someone else mentioning it in the forums. I posted so that they knew they weren’t the only one having this issue and in hopes that maybe a fix could be had. Ray is there anyway we could enable a debug log that might capture what’s going on ? Thanks.

In the end, if you feel OpenSprinkler is not for you, simply return it and we are happy to issue a full refund. As much as I want it to be, I have to be honest and admit that the product is not a perfect solution for everyone.

May 19, 2016 at 10:23 am #42547

Domenic
Participant

Ray,
Thanks for the background, but here is my bottom line: I bought a NETWORKED irrigation controller. I put it on my network and found that communications to the device was intermittent. I did some of my own troubleshooting with no effect.
After posting my issue on the forum, I found a few other users were trying to be helpful, offering suggestions on how to debug and troubleshoot. The response I got from you, the developer of this product, was “well what did you expect, it’s basically a DIY toy with under-powered hardware”. For me this is a hard stop.
In terms of other units on the market, I did some more research and recently bought a RainMachine HD. It’s been on my network for the last 3 days and has been rock-solid, despite my hesitation with using wifi. It also has a fully documented API that I can use, and the best part is that it can still be programmed and operated locally without a network connection. Of course, it’s $140 more than the cost of an Opensprinkler, but it comes with a 24V adapter, has 12 zones and a very nice touchscreen interface.

May 19, 2016 at 10:01 pm #42553

DaveC
Participant

Over the past year, I’ve tried to participate a few times in this forum to find issues that users have described. In almost all cases the developer has ignored or shutdown that activity. (I realize that it could be just me and that I’m not really being helpful.) In any case, I gave up for a while, but I got sucked back in to this thread because I thought Dominic’s post (May 12, 2016 at 2:35 pm) was off the mark. There might be an issue that would useful to try to address and it wasn’t because the device was under-powered. It might not be a defect in OS, and my data suggests that OS performs quite well in some network environments, but the issue manifests itself in OS behavior. Putting myself in the developer’s shoes (a completely unfair thing for me to do), I would want to address anything that helped to make the embedded network stack more tolerant. I might even want to use it on another project like a thermostat or a garage door opener

Ray’s last post helped me understand why my attempts to participate in the forum have been ineffectual. And Dominic’s last post was a reasonable summary. The developer appears to view this as a hobbyist’s platform. It’s not a product in the normal sense. Both the HW and SW are Open Source. If YOU want to make a product based on it, have at it. OK, I get it now.

I have an OS that works fine in my environment and I’ve successfully integrated it into my home automation environment. I have no need to find a different solution right now, but it was good to be reminded of the environment that I’ve chosen to use and that I should plan for the future.

When Google cut off Rvolv, I thought, this could happen with OS. What happens when Ray moves on and shuts off the cloud connection. The server connection (aka the weather call) is not just for weather info, its integral to the system and is needed to get time zone and DST info. OS will still run, but no weather adjustments, no twice a year DST adjustments, and my OS will silently reboot once day. I understand now that I must either find a new product for the future or modify the Open Source code to free me from this dependency and create an environment that I can depend on. I can do that because its, well, Open Source and because I’m a software developer. But I really don’t want to, I just wanted to help make the better,

Ray’s note also reminded me not to waste my time trying to contribute to solutions. I either have a solution and can provide it or I don’t. The developer isn’t interested in collaboration. Bummer.

May 23, 2016 at 9:57 pm #42599

Ray
Keymaster

@Domenic: no, you missed one of my important messages. Here is what I said: ‘I suspect the issue may be specific to your particular unit, and maybe to your particular network setup.’ If you submit a support ticket, we are happy to send you a replacement and see if it’s a problem with your particular unit.

@DaveC: it’s not because I don’t want to address this issues, it’s because we don’t have the resources needed to delve into these issues. Coming from academia, I am keen to treat the issues as research problems and get them solved as completely as we can. But this particular issue about network is something that I can’t figure out, other than blaming the underpowered hardware or maybe hardware defects with some specific units. If I had found the issue I would certainly try to address it right away. The fact is that both Samer and I have a daytime job, and we have limited amount of time to work on this project. Making it open-source basically allows a small team like us to develop a useful product, where the users with sufficient skills can give us feedback, make suggestions, and even help us improve the code. I am sorry if this sounds disappointing and that we don’t have a full-blown testing team to perform thorough testing before the product is released. This is just the way how a small team works.

Regarding collaborations, it’s the same thing: we have only limited amount of time to work on collaborative projects. We receive requests every week about extending certain features of the product, customizing the product etc. The thing is we need to protect our own times and avoid making promises that we can’t keep. If users can help us diagnose problems and propose solutions, that’s great; if not, you can beat us all you want, but we only have this much throughput. I am sorry about the situation.

May 24, 2016 at 8:32 am #42611

grahamk
Participant

Curious if the PI option would help alleviate the low performing hardware issue.

May 24, 2016 at 12:14 pm #42614

Ray
Keymaster

In terms of network connectivity, OSPi is generally more flexible because RPi is a Linux-based system — more powerful and gives you more options to choose from: you can use WiFi USB dingle, wired Ethernet, and even 3G/GPRS dongle if you don’t want high-speed Internet.
Author

Posts