OpenSprinkler › Forums › Hardware Questions › OpenSprinkler › Controller lockups / crashes with wired Ethernet module
Tagged: Controller lock up hang crash
- This topic has 168 replies, 18 voices, and was last updated 4 months, 2 weeks ago by Ray.
-
AuthorPosts
-
August 19, 2020 at 3:34 pm #67955
RayKeymasterThe forum does not allow .bin type of file. You can try to zip it, or change it to some other extension name. If it still doesn’t work, you can probably put it on Google drive and share a link, or send it to me and I can copy it to a online folder for people to download.
August 19, 2020 at 8:09 pm #67960
Water_my_lawnParticipantOK, here is the same file zip’ed.
Attachments:
August 20, 2020 at 4:24 pm #67969
Water_my_lawnParticipantAfter about 8 hours of running I got a hang with my debug code.
I now have some obtuse numbers to ponder over. Initially I
can say that packets continue to be received during the hang.
This is evidenced by the packet counter incrementing.
However the counters for the UDP, ICMP, and TCP packets
do not increment. Ray’s reset logic does detect an error
and increments. I have disabled Ray’s reset code so
the reset is not actually done so as to not disturb
my state recording code.I will meditate on this result and get back.
August 25, 2020 at 5:12 am #68030
Water_my_lawnParticipantRay;
I have, up to now, been debugging using printing to the TTY port. This has given me a lot
of clues to look at. However adding printing changes the timing and can make the problem
go away.I am trying to setup my system to use GDB. I have installed the Sloeber package and compiled
in the stub driver. I have been unable to connect using:
target remote /dev/ttyUSB0I do issue all the configuration commands including setting the BAUD rate but I never
connect successfully.Do you use GDB? What development environment do you use?
Thanks.
August 25, 2020 at 9:08 am #68035
RayKeymasterI am not sure how you use GDB to debug — the firmware runs on the microcontroller and it’s not a process running on your computer, so I am not sure how to use GDB to, say, set breakpoints and step over the firmware code. There are way to debug a microcontroller such as JTAG. That requires hardware support, and I’ve never used JTAG myself. For the moment serial printf is the best way probably.
August 25, 2020 at 10:58 am #68038
Water_my_lawnParticipantThere is quite a nice graphical debugger package for the Arduino that has been
adapted for the esp8266 called Sloeber. The CPU inside the esp does not support
JTAG. It does support debugging using the ASYNC port. The chip has one
hardware breakpoint so you can do debuging in read only memory.I have used GDB a lot and used JTAG a lot but I have not used a CPU that debugs
through the ASYNC port. To facilitate debugging in your target code you link
in a small piece of code called the stub driver. Early in the program you
set the BAUD rate and call gdbstub_init(). From there GDB running on a PC
should be able to grab hold of the target and control it. It is the
connection that is failing for me.August 29, 2020 at 6:24 am #68077
Water_my_lawnParticipantWhile debugging the hang problem I was keenly aware that I could brick my OS.
Well I have done it! I loaded a image of the OS firmware that had a bug
that prevented it from coming up. This prevented using the 192.168.4.1/update
process.It took me a long time to figure out the flash programming process on the OS.
There is a program capable of communicating with the ESP8266 module using
the ASYNC port available on the 6 pin edge connector on the OS. The
ESP8266 flash write program is called esptool.py. It is available on
github. However, out of a few hundred tries I could only get it to work
3 or 4 times.The problem turned out to be GPIO 2 (pin 17 on the ESP8266 module) must
be pulled up by a resistor. It is left floating. This pin is connected
to the “B1” button. I put a 10K resistor from the B1 button to the +3.3
supply. Now I can get into boot mode reliably.I can now download firmware images. I still cannot get the OS firmware
to run even with known good firmware; the latest release os_219_rev7.bin.
There must be something that I am missing. The display does not light
up and the WiFi does not come up.August 29, 2020 at 1:25 pm #68080
RayKeymasterFirst of all, you can send a support ticket and request a USB-serial programmer and we can send you one. The programmer can plug into the 2×3 card edge slot close to the top of the PCB, that way you can program it via USB.
Second, if you have your own USB serial (either CH340, or CP2102 or any other serial), you can also use that but you will have to solder wires onto the serial pins, and the standard procedure for getting ESP8266 into flash mode is to keep GPIO0 pulled down to ground when the controller is powered up. GPIO0 is connected to the middle button (B2), so that needs to be pressed down when you power up the controller and then it will enter bootloading mode.
The USB programmer that we use has built-in auto reset circuit, therefore there is no need to press B2 down when using the programmer.
I am not sure about what you said about GPIO2 — GPIO2 is already pulled up internally on ESP8266, there is no need to add external pullup. Though, adding an external pullup won’t hurt either.
August 29, 2020 at 4:14 pm #68083
Water_my_lawnParticipantHow do you program the OS the first time? Are they already flashed with something that
gets you into AP mode that enables updating through the WiFi?I can reliably flash the released firmware now but it does not work. I suspect that there
is already some firmware in the OS that makes it work that I am missing. Do you know
what that might be? Can you read a full copy of the flash from a working unit using
something like esptool.py?I had assumed that the pins connected to buttons B1 and B3 were pulled up internally.
But I found a web page that said that GPIO 2 must be pulled up with a resistor to enable
programming mode. With the resistor it always works, without it it rarely works.
Perhaps it is only my unit.I am using a CH340 which works fine now. I do have a soldered on reset switch however.
With a CP2102 you get extra pins that allow it to handle the reset. I have already ordered
one.August 30, 2020 at 12:53 am #68086
RayKeymasterESP8266 has built-in bootloader that supports serial programming. Unlike AVR or other microcontrollers, this bootloader is there already to begin with. We use a USB serial (CH340-based) programmer to program it for the first time. With the firmware uploaded, it can then support OTA firmware update. But USB flashing always works, even if the firmware fails to run.
The information you found about GPIO2 may be referring to the bare ESP8266 chip. We don’t ever use the bare ESP8266 chips, instead we use the ESP-12 module, which is very common and it has built-in pull up for GPIO2. Typical circuits for ESP8266 module only require pull-ups for RST, EN, GPIO0, and a pull-down for GPIO15. I’ve never seen that pull-up is required for GPIO2.
You said without that pullup uploading rarely works. I think one possibility is that maybe you are using a serial baud rate that’s too high: 230400 baud rate should be pretty reliable in general. Higher than that generally requires the auto-reset circuit: for example, with the USB programmer that I have, I can use 921600 baud rate, because it has built-in auto-reset circuit.
August 30, 2020 at 8:12 am #68087
Water_my_lawnParticipantI have not tried such fast BAUD rates, I use just 115200. That seems real fast, I learned about ASYNC with an ASR33.
Mechanical decoding at 110 BAUD! I fed miles of paper tape through those machines.My USB to ASYNC converter board that uses a CH340 chip does not have the extra RS232 lines brought out so I don’t have
the ability to automatically handle reset. My converter board only has TxD, Rxd, and ground. I have better boards coming.When I download the latest release, os_219_rev7.bin, I get these messages:
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0In the past with a working system the messages would continue and report
the IP address and the time and something about checking for the weather.
By the lack of these messages I assume that the OS firmware is not running.
The WiFi does not come up.If I download to a base address of 0x0 I get these messages. If I download
to a base address of 0x1000 I get just garbage text. Do I need to position
the download image at some specific address?September 3, 2020 at 10:21 pm #68147
Water_my_lawnParticipantAfter much pulling my hair and pounding my head I finally
recovered my OS. I tried loading a bunch of different binaries
that supported OTA updates but none would work to update
the OS firmmware. I tried the Arduino IDE which loaded
binary images OK but none that were useful.I finally loaded “OpenGarage” which worked properly and
presented an update screen at 192.168.4.1/update.To load the OpenGarage firmware I executed:
esptool.py –port /dev/ttyUSB0 write_flash 0x0 ~/Downloads/og_1.1.0.binClearly there is something in the OpenSprinkler firmware
that is not clearing some area of the flash and causing
a read fault and then a reset. Likely an invalid pointer.
Loading OpenGarage seems to clear that condition.It is likely that OpenSprinkler is sensitive to something
in it’s configuration data. This is the data that is not
over-written when a new version is flashed during the update
process.September 19, 2020 at 4:09 pm #68286
Water_my_lawnParticipantIn my working with the source I wanted to be able to recreate a new source tree.
I wrote a script that automates the instructions that Ray gives for downloading
and compiling the code. Also, the makefiles are written so that they reference
your home directory rather than your current directory. This requires you to
work in your home directory rather then some sub-directory. This precludes
having multiple build environments. This script had a fix for that.I have attached the script below.
September 19, 2020 at 4:14 pm #68287
Water_my_lawnParticipantSecond try on the attachment. This is a tar GNU zipped file.
September 19, 2020 at 4:16 pm #68288
Water_my_lawnParticipantThird try.
September 19, 2020 at 4:19 pm #68289
Water_my_lawnParticipantForth try.
Attachments:
September 23, 2020 at 8:07 am #68327
Water_my_lawnParticipantI have captured a hang event with Wireshark. You can see the traffic progressing
normally until the OS stops responding. I have attached the captured file if anyone
wants to have a look and make any comments.The OS has an IP add of 192.168.209.20, my computer has an IP address of 192.168.209.221.
Load this file in Wireshark and use this display filter to hide unrelated traffic.
ip.addr == 192.168.209.20Attachments:
September 24, 2020 at 9:49 am #68333
DaveCParticipant@Water_my_lawn
I don’t know what the capture environment was but it doesn’t seem like you are looking at the network from the OS perspective, I.e. everything that …20 is receiving. Correct me if I got this wrong.Like you, I see the last successful exchange and the next one doesn’t get off the ground, i.e. OS doesn’t respond to the connect request. That doesn’t provide much info about what’s going on with OS. If you suspect that the OS hang is due to something on the network that OS is choking on, you would need to see everything OS sees and then correlate suspect activities to a time window where the OS stops responding.
Do you have a managed switch that can mirror ports? If so, you could mirror the port that OS is on to a port that a PC running Wireshark can look at.
September 26, 2020 at 11:31 am #68347
Water_my_lawnParticipantYou are correct in that I am watching the IP traffic from my PC and not at the OS.
I too did not see anything odd in the IP traffic until it stopped.Mostly I am attempting to insert suitable debug messages in the firmware that will
tell me what is wrong. Doing this type of printf debugging is tricky because
you effect the timing. The printf’s can be a heavy load as they are synchronous.So far it seems that the OS stops receiving messages when the hang occurs.
I don’t yet know why.I am using the latest enc28j60 drivers which have some significant changes
by the developers. I had hoped that this would solve the problem, but it
is no better than the old version with regard to hanging.October 15, 2020 at 5:27 pm #68490
OpenSprinklerShop GermanyParticipantHi,
regarding the enc28hj60 module I made some new builds.
1. the UIPEthernet project updated the Ethernet library to version 2.0.9 (former 2.0.8) and fixed some bugs
–>Firmware https://opensprinklershop.de/firmware/OS220(90).bin2. the EthernetENC project is a completly new project, implementing the new Ethernet 2.0.0 Arduino library functions.
Faster pings, native integration, better handling.
–>Firmware https://opensprinklershop.de/firmware/OS220(91).binboth firmware are running on OpenSprinkler 3.0 and 3.2
They also have my WifiSleepmode and ping check extensions.Please test and report!
October 16, 2020 at 9:58 pm #68501
Water_my_lawnParticipantThanks;
I have compiled OS 3.2 using the 2.0.9-4 version of the Ethernet
library. I have added debugging code that should allow me to
know the state of a bunch of variables if it hangs. I intend
to let this run for a week. This may be long enough to produce
a hang.I tried getting gdb to run on the OS but it was not successful.
I could single step and look at data but I could not break
execution or stop on a hardware breakpoint. So now I am
back to using printf.November 10, 2020 at 8:33 am #68636
Water_my_lawnParticipantI have completed testing of 3 firmware versions. The first firmware is one that I compiled
and has debugging code that would provide the state of the IP stack if a hang happened.The second 2 firmware versions, OS220(90).bin and OS220(91).bin, are from the previous post.
I ran each for 1 week in my normal setup. This is just the OS 3.2 sitting on my desk with
only power, the RS232, and the ENC28J60 connected. I have no zone valves connected. I
have a web browser pointed at the OS. This setup would usually produce a hang within
one week.The result is that all 3 firmware versions ran the full week with no problems.
It seems that the problem is solved by using the latest version of UIPEthernet 2.0.9.
November 18, 2020 at 1:43 pm #68687
RayKeymasterThanks for the update. That’s good to know.
November 19, 2020 at 4:34 pm #68694
robin haymanParticipantI have two V3 sprinklers ordered under order #67055. After raising a ticket( I think) we decided to the other version of Ethernet connection with the w5500 chip should be tried. That was done, but I never managed to get the W5500 version to work reliably either. It uses firmware os_219_rev4_w5500_jul18.bin.
It seemes to fail to get a DHCP connection (while 10 other devices on the same network succeed with no trace of problems. After reboot, the IP ends up 0.0.0.0 and nothing can connect. Some times it works correctly.I use dynamic DHCP but with IPs preassigned to MACs.
I have been messing around with this since my order, last Jul(?) (Grumble!)I need a speedy resolution.
Meanwhile, my OS Pi(same order) has been chugging away since July all on its own and I can browse to it today.
Does thes previous posts mean that the original hardware using latest firmwareEthernet is now proven
reliable? Should I just dump the W5500 cards?Thanks
November 20, 2020 at 8:13 am #68706
Water_my_lawnParticipantA quick note, my firmware has hung after 10 days. After running the above tests
as described I loaded my debug version and installed my OS back in it’s normal place.
Since it is after watering season it has nothing to do but sit there powered up.
I checked it once a day, and today it was hung.Since it is inconvenient to get to I will have to rig up a debug cable to read out
the log information. I will have to do this without risking removing power.
I will report back when I have done this. -
AuthorPosts
- You must be logged in to reply to this topic.
OpenSprinkler › Forums › Hardware Questions › OpenSprinkler › Controller lockups / crashes with wired Ethernet module