OpenSprinkler Forums Hardware Questions OpenSprinkler Controller lockups / crashes with wired Ethernet module

Viewing 25 posts - 101 through 125 (of 137 total)
  • Author
    Posts
  • #67955

    Ray
    Keymaster

    The forum does not allow .bin type of file. You can try to zip it, or change it to some other extension name. If it still doesn’t work, you can probably put it on Google drive and share a link, or send it to me and I can copy it to a online folder for people to download.

    #67960

    Water_my_lawn
    Participant

    OK, here is the same file zip’ed.

    #67969

    Water_my_lawn
    Participant

    After about 8 hours of running I got a hang with my debug code.
    I now have some obtuse numbers to ponder over. Initially I
    can say that packets continue to be received during the hang.
    This is evidenced by the packet counter incrementing.
    However the counters for the UDP, ICMP, and TCP packets
    do not increment. Ray’s reset logic does detect an error
    and increments. I have disabled Ray’s reset code so
    the reset is not actually done so as to not disturb
    my state recording code.

    I will meditate on this result and get back.

    #68030

    Water_my_lawn
    Participant

    Ray;

    I have, up to now, been debugging using printing to the TTY port. This has given me a lot
    of clues to look at. However adding printing changes the timing and can make the problem
    go away.

    I am trying to setup my system to use GDB. I have installed the Sloeber package and compiled
    in the stub driver. I have been unable to connect using:
    target remote /dev/ttyUSB0

    I do issue all the configuration commands including setting the BAUD rate but I never
    connect successfully.

    Do you use GDB? What development environment do you use?

    Thanks.

    #68035

    Ray
    Keymaster

    I am not sure how you use GDB to debug — the firmware runs on the microcontroller and it’s not a process running on your computer, so I am not sure how to use GDB to, say, set breakpoints and step over the firmware code. There are way to debug a microcontroller such as JTAG. That requires hardware support, and I’ve never used JTAG myself. For the moment serial printf is the best way probably.

    #68038

    Water_my_lawn
    Participant

    There is quite a nice graphical debugger package for the Arduino that has been
    adapted for the esp8266 called Sloeber. The CPU inside the esp does not support
    JTAG. It does support debugging using the ASYNC port. The chip has one
    hardware breakpoint so you can do debuging in read only memory.

    I have used GDB a lot and used JTAG a lot but I have not used a CPU that debugs
    through the ASYNC port. To facilitate debugging in your target code you link
    in a small piece of code called the stub driver. Early in the program you
    set the BAUD rate and call gdbstub_init(). From there GDB running on a PC
    should be able to grab hold of the target and control it. It is the
    connection that is failing for me.

    #68077

    Water_my_lawn
    Participant

    While debugging the hang problem I was keenly aware that I could brick my OS.
    Well I have done it! I loaded a image of the OS firmware that had a bug
    that prevented it from coming up. This prevented using the 192.168.4.1/update
    process.

    It took me a long time to figure out the flash programming process on the OS.
    There is a program capable of communicating with the ESP8266 module using
    the ASYNC port available on the 6 pin edge connector on the OS. The
    ESP8266 flash write program is called esptool.py. It is available on
    github. However, out of a few hundred tries I could only get it to work
    3 or 4 times.

    The problem turned out to be GPIO 2 (pin 17 on the ESP8266 module) must
    be pulled up by a resistor. It is left floating. This pin is connected
    to the “B1” button. I put a 10K resistor from the B1 button to the +3.3
    supply. Now I can get into boot mode reliably.

    I can now download firmware images. I still cannot get the OS firmware
    to run even with known good firmware; the latest release os_219_rev7.bin.
    There must be something that I am missing. The display does not light
    up and the WiFi does not come up.

    #68080

    Ray
    Keymaster

    First of all, you can send a support ticket and request a USB-serial programmer and we can send you one. The programmer can plug into the 2×3 card edge slot close to the top of the PCB, that way you can program it via USB.

    Second, if you have your own USB serial (either CH340, or CP2102 or any other serial), you can also use that but you will have to solder wires onto the serial pins, and the standard procedure for getting ESP8266 into flash mode is to keep GPIO0 pulled down to ground when the controller is powered up. GPIO0 is connected to the middle button (B2), so that needs to be pressed down when you power up the controller and then it will enter bootloading mode.

    The USB programmer that we use has built-in auto reset circuit, therefore there is no need to press B2 down when using the programmer.

    I am not sure about what you said about GPIO2 — GPIO2 is already pulled up internally on ESP8266, there is no need to add external pullup. Though, adding an external pullup won’t hurt either.

    #68083

    Water_my_lawn
    Participant

    How do you program the OS the first time? Are they already flashed with something that
    gets you into AP mode that enables updating through the WiFi?

    I can reliably flash the released firmware now but it does not work. I suspect that there
    is already some firmware in the OS that makes it work that I am missing. Do you know
    what that might be? Can you read a full copy of the flash from a working unit using
    something like esptool.py?

    I had assumed that the pins connected to buttons B1 and B3 were pulled up internally.
    But I found a web page that said that GPIO 2 must be pulled up with a resistor to enable
    programming mode. With the resistor it always works, without it it rarely works.
    Perhaps it is only my unit.

    I am using a CH340 which works fine now. I do have a soldered on reset switch however.
    With a CP2102 you get extra pins that allow it to handle the reset. I have already ordered
    one.

    #68086

    Ray
    Keymaster

    ESP8266 has built-in bootloader that supports serial programming. Unlike AVR or other microcontrollers, this bootloader is there already to begin with. We use a USB serial (CH340-based) programmer to program it for the first time. With the firmware uploaded, it can then support OTA firmware update. But USB flashing always works, even if the firmware fails to run.

    The information you found about GPIO2 may be referring to the bare ESP8266 chip. We don’t ever use the bare ESP8266 chips, instead we use the ESP-12 module, which is very common and it has built-in pull up for GPIO2. Typical circuits for ESP8266 module only require pull-ups for RST, EN, GPIO0, and a pull-down for GPIO15. I’ve never seen that pull-up is required for GPIO2.

    You said without that pullup uploading rarely works. I think one possibility is that maybe you are using a serial baud rate that’s too high: 230400 baud rate should be pretty reliable in general. Higher than that generally requires the auto-reset circuit: for example, with the USB programmer that I have, I can use 921600 baud rate, because it has built-in auto-reset circuit.

    #68087

    Water_my_lawn
    Participant

    I have not tried such fast BAUD rates, I use just 115200. That seems real fast, I learned about ASYNC with an ASR33.
    Mechanical decoding at 110 BAUD! I fed miles of paper tape through those machines.

    My USB to ASYNC converter board that uses a CH340 chip does not have the extra RS232 lines brought out so I don’t have
    the ability to automatically handle reset. My converter board only has TxD, Rxd, and ground. I have better boards coming.

    When I download the latest release, os_219_rev7.bin, I get these messages:

    load 0x4010f000, len 1384, room 16
    tail 8
    chksum 0x2d
    csum 0

    In the past with a working system the messages would continue and report
    the IP address and the time and something about checking for the weather.
    By the lack of these messages I assume that the OS firmware is not running.
    The WiFi does not come up.

    If I download to a base address of 0x0 I get these messages. If I download
    to a base address of 0x1000 I get just garbage text. Do I need to position
    the download image at some specific address?

    #68147

    Water_my_lawn
    Participant

    After much pulling my hair and pounding my head I finally
    recovered my OS. I tried loading a bunch of different binaries
    that supported OTA updates but none would work to update
    the OS firmmware. I tried the Arduino IDE which loaded
    binary images OK but none that were useful.

    I finally loaded “OpenGarage” which worked properly and
    presented an update screen at 192.168.4.1/update.

    To load the OpenGarage firmware I executed:
    esptool.py –port /dev/ttyUSB0 write_flash 0x0 ~/Downloads/og_1.1.0.bin

    Clearly there is something in the OpenSprinkler firmware
    that is not clearing some area of the flash and causing
    a read fault and then a reset. Likely an invalid pointer.
    Loading OpenGarage seems to clear that condition.

    It is likely that OpenSprinkler is sensitive to something
    in it’s configuration data. This is the data that is not
    over-written when a new version is flashed during the update
    process.

    #68286

    Water_my_lawn
    Participant

    In my working with the source I wanted to be able to recreate a new source tree.
    I wrote a script that automates the instructions that Ray gives for downloading
    and compiling the code. Also, the makefiles are written so that they reference
    your home directory rather than your current directory. This requires you to
    work in your home directory rather then some sub-directory. This precludes
    having multiple build environments. This script had a fix for that.

    I have attached the script below.

    #68287

    Water_my_lawn
    Participant

    Second try on the attachment. This is a tar GNU zipped file.

    #68288

    Water_my_lawn
    Participant

    Third try.

    #68289

    Water_my_lawn
    Participant

    Forth try.

    #68327

    Water_my_lawn
    Participant

    I have captured a hang event with Wireshark. You can see the traffic progressing
    normally until the OS stops responding. I have attached the captured file if anyone
    wants to have a look and make any comments.

    The OS has an IP add of 192.168.209.20, my computer has an IP address of 192.168.209.221.
    Load this file in Wireshark and use this display filter to hide unrelated traffic.
    ip.addr == 192.168.209.20

    #68333

    DaveC
    Participant

    @Water_my_lawn
    I don’t know what the capture environment was but it doesn’t seem like you are looking at the network from the OS perspective, I.e. everything that …20 is receiving. Correct me if I got this wrong.

    Like you, I see the last successful exchange and the next one doesn’t get off the ground, i.e. OS doesn’t respond to the connect request. That doesn’t provide much info about what’s going on with OS. If you suspect that the OS hang is due to something on the network that OS is choking on, you would need to see everything OS sees and then correlate suspect activities to a time window where the OS stops responding.

    Do you have a managed switch that can mirror ports? If so, you could mirror the port that OS is on to a port that a PC running Wireshark can look at.

    #68347

    Water_my_lawn
    Participant

    You are correct in that I am watching the IP traffic from my PC and not at the OS.
    I too did not see anything odd in the IP traffic until it stopped.

    Mostly I am attempting to insert suitable debug messages in the firmware that will
    tell me what is wrong. Doing this type of printf debugging is tricky because
    you effect the timing. The printf’s can be a heavy load as they are synchronous.

    So far it seems that the OS stops receiving messages when the hang occurs.
    I don’t yet know why.

    I am using the latest enc28j60 drivers which have some significant changes
    by the developers. I had hoped that this would solve the problem, but it
    is no better than the old version with regard to hanging.

    #68490

    Hi,
    regarding the enc28hj60 module I made some new builds.
    1. the UIPEthernet project updated the Ethernet library to version 2.0.9 (former 2.0.8) and fixed some bugs
    –>Firmware https://opensprinklershop.de/firmware/OS220(90).bin

    2. the EthernetENC project is a completly new project, implementing the new Ethernet 2.0.0 Arduino library functions.
    Faster pings, native integration, better handling.
    –>Firmware https://opensprinklershop.de/firmware/OS220(91).bin

    both firmware are running on OpenSprinkler 3.0 and 3.2
    They also have my WifiSleepmode and ping check extensions.

    Please test and report!

    #68501

    Water_my_lawn
    Participant

    Thanks;

    I have compiled OS 3.2 using the 2.0.9-4 version of the Ethernet
    library. I have added debugging code that should allow me to
    know the state of a bunch of variables if it hangs. I intend
    to let this run for a week. This may be long enough to produce
    a hang.

    I tried getting gdb to run on the OS but it was not successful.
    I could single step and look at data but I could not break
    execution or stop on a hardware breakpoint. So now I am
    back to using printf.

    #68636

    Water_my_lawn
    Participant

    I have completed testing of 3 firmware versions. The first firmware is one that I compiled
    and has debugging code that would provide the state of the IP stack if a hang happened.

    The second 2 firmware versions, OS220(90).bin and OS220(91).bin, are from the previous post.

    I ran each for 1 week in my normal setup. This is just the OS 3.2 sitting on my desk with
    only power, the RS232, and the ENC28J60 connected. I have no zone valves connected. I
    have a web browser pointed at the OS. This setup would usually produce a hang within
    one week.

    The result is that all 3 firmware versions ran the full week with no problems.

    It seems that the problem is solved by using the latest version of UIPEthernet 2.0.9.

    #68687

    Ray
    Keymaster

    Thanks for the update. That’s good to know.

    #68694

    robin hayman
    Participant

    I have two V3 sprinklers ordered under order #67055. After raising a ticket( I think) we decided to the other version of Ethernet connection with the w5500 chip should be tried. That was done, but I never managed to get the W5500 version to work reliably either. It uses firmware os_219_rev4_w5500_jul18.bin.

    It seemes to fail to get a DHCP connection (while 10 other devices on the same network succeed with no trace of problems. After reboot, the IP ends up 0.0.0.0 and nothing can connect. Some times it works correctly.I use dynamic DHCP but with IPs preassigned to MACs.
    I have been messing around with this since my order, last Jul(?) (Grumble!)

    I need a speedy resolution.

    Meanwhile, my OS Pi(same order) has been chugging away since July all on its own and I can browse to it today.

    Does thes previous posts mean that the original hardware using latest firmwareEthernet is now proven
    reliable? Should I just dump the W5500 cards?

    Thanks

    #68706

    Water_my_lawn
    Participant

    A quick note, my firmware has hung after 10 days. After running the above tests
    as described I loaded my debug version and installed my OS back in it’s normal place.
    Since it is after watering season it has nothing to do but sit there powered up.
    I checked it once a day, and today it was hung.

    Since it is inconvenient to get to I will have to rig up a debug cable to read out
    the log information. I will have to do this without risking removing power.
    I will report back when I have done this.

Viewing 25 posts - 101 through 125 (of 137 total)
  • You must be logged in to reply to this topic.

OpenSprinkler Forums Hardware Questions OpenSprinkler Controller lockups / crashes with wired Ethernet module