OpenSprinkler Forums Hardware Questions OpenSprinkler Controller lockups / crashes with wired Ethernet module

Viewing 18 posts - 101 through 118 (of 118 total)
  • Author
    Posts
  • #67955

    Ray
    Keymaster

    The forum does not allow .bin type of file. You can try to zip it, or change it to some other extension name. If it still doesn’t work, you can probably put it on Google drive and share a link, or send it to me and I can copy it to a online folder for people to download.

    #67960

    Water_my_lawn
    Participant

    OK, here is the same file zip’ed.

    #67969

    Water_my_lawn
    Participant

    After about 8 hours of running I got a hang with my debug code.
    I now have some obtuse numbers to ponder over. Initially I
    can say that packets continue to be received during the hang.
    This is evidenced by the packet counter incrementing.
    However the counters for the UDP, ICMP, and TCP packets
    do not increment. Ray’s reset logic does detect an error
    and increments. I have disabled Ray’s reset code so
    the reset is not actually done so as to not disturb
    my state recording code.

    I will meditate on this result and get back.

    #68030

    Water_my_lawn
    Participant

    Ray;

    I have, up to now, been debugging using printing to the TTY port. This has given me a lot
    of clues to look at. However adding printing changes the timing and can make the problem
    go away.

    I am trying to setup my system to use GDB. I have installed the Sloeber package and compiled
    in the stub driver. I have been unable to connect using:
    target remote /dev/ttyUSB0

    I do issue all the configuration commands including setting the BAUD rate but I never
    connect successfully.

    Do you use GDB? What development environment do you use?

    Thanks.

    #68035

    Ray
    Keymaster

    I am not sure how you use GDB to debug — the firmware runs on the microcontroller and it’s not a process running on your computer, so I am not sure how to use GDB to, say, set breakpoints and step over the firmware code. There are way to debug a microcontroller such as JTAG. That requires hardware support, and I’ve never used JTAG myself. For the moment serial printf is the best way probably.

    #68038

    Water_my_lawn
    Participant

    There is quite a nice graphical debugger package for the Arduino that has been
    adapted for the esp8266 called Sloeber. The CPU inside the esp does not support
    JTAG. It does support debugging using the ASYNC port. The chip has one
    hardware breakpoint so you can do debuging in read only memory.

    I have used GDB a lot and used JTAG a lot but I have not used a CPU that debugs
    through the ASYNC port. To facilitate debugging in your target code you link
    in a small piece of code called the stub driver. Early in the program you
    set the BAUD rate and call gdbstub_init(). From there GDB running on a PC
    should be able to grab hold of the target and control it. It is the
    connection that is failing for me.

    #68077

    Water_my_lawn
    Participant

    While debugging the hang problem I was keenly aware that I could brick my OS.
    Well I have done it! I loaded a image of the OS firmware that had a bug
    that prevented it from coming up. This prevented using the 192.168.4.1/update
    process.

    It took me a long time to figure out the flash programming process on the OS.
    There is a program capable of communicating with the ESP8266 module using
    the ASYNC port available on the 6 pin edge connector on the OS. The
    ESP8266 flash write program is called esptool.py. It is available on
    github. However, out of a few hundred tries I could only get it to work
    3 or 4 times.

    The problem turned out to be GPIO 2 (pin 17 on the ESP8266 module) must
    be pulled up by a resistor. It is left floating. This pin is connected
    to the “B1” button. I put a 10K resistor from the B1 button to the +3.3
    supply. Now I can get into boot mode reliably.

    I can now download firmware images. I still cannot get the OS firmware
    to run even with known good firmware; the latest release os_219_rev7.bin.
    There must be something that I am missing. The display does not light
    up and the WiFi does not come up.

    #68080

    Ray
    Keymaster

    First of all, you can send a support ticket and request a USB-serial programmer and we can send you one. The programmer can plug into the 2×3 card edge slot close to the top of the PCB, that way you can program it via USB.

    Second, if you have your own USB serial (either CH340, or CP2102 or any other serial), you can also use that but you will have to solder wires onto the serial pins, and the standard procedure for getting ESP8266 into flash mode is to keep GPIO0 pulled down to ground when the controller is powered up. GPIO0 is connected to the middle button (B2), so that needs to be pressed down when you power up the controller and then it will enter bootloading mode.

    The USB programmer that we use has built-in auto reset circuit, therefore there is no need to press B2 down when using the programmer.

    I am not sure about what you said about GPIO2 — GPIO2 is already pulled up internally on ESP8266, there is no need to add external pullup. Though, adding an external pullup won’t hurt either.

    #68083

    Water_my_lawn
    Participant

    How do you program the OS the first time? Are they already flashed with something that
    gets you into AP mode that enables updating through the WiFi?

    I can reliably flash the released firmware now but it does not work. I suspect that there
    is already some firmware in the OS that makes it work that I am missing. Do you know
    what that might be? Can you read a full copy of the flash from a working unit using
    something like esptool.py?

    I had assumed that the pins connected to buttons B1 and B3 were pulled up internally.
    But I found a web page that said that GPIO 2 must be pulled up with a resistor to enable
    programming mode. With the resistor it always works, without it it rarely works.
    Perhaps it is only my unit.

    I am using a CH340 which works fine now. I do have a soldered on reset switch however.
    With a CP2102 you get extra pins that allow it to handle the reset. I have already ordered
    one.

    #68086

    Ray
    Keymaster

    ESP8266 has built-in bootloader that supports serial programming. Unlike AVR or other microcontrollers, this bootloader is there already to begin with. We use a USB serial (CH340-based) programmer to program it for the first time. With the firmware uploaded, it can then support OTA firmware update. But USB flashing always works, even if the firmware fails to run.

    The information you found about GPIO2 may be referring to the bare ESP8266 chip. We don’t ever use the bare ESP8266 chips, instead we use the ESP-12 module, which is very common and it has built-in pull up for GPIO2. Typical circuits for ESP8266 module only require pull-ups for RST, EN, GPIO0, and a pull-down for GPIO15. I’ve never seen that pull-up is required for GPIO2.

    You said without that pullup uploading rarely works. I think one possibility is that maybe you are using a serial baud rate that’s too high: 230400 baud rate should be pretty reliable in general. Higher than that generally requires the auto-reset circuit: for example, with the USB programmer that I have, I can use 921600 baud rate, because it has built-in auto-reset circuit.

    #68087

    Water_my_lawn
    Participant

    I have not tried such fast BAUD rates, I use just 115200. That seems real fast, I learned about ASYNC with an ASR33.
    Mechanical decoding at 110 BAUD! I fed miles of paper tape through those machines.

    My USB to ASYNC converter board that uses a CH340 chip does not have the extra RS232 lines brought out so I don’t have
    the ability to automatically handle reset. My converter board only has TxD, Rxd, and ground. I have better boards coming.

    When I download the latest release, os_219_rev7.bin, I get these messages:

    load 0x4010f000, len 1384, room 16
    tail 8
    chksum 0x2d
    csum 0

    In the past with a working system the messages would continue and report
    the IP address and the time and something about checking for the weather.
    By the lack of these messages I assume that the OS firmware is not running.
    The WiFi does not come up.

    If I download to a base address of 0x0 I get these messages. If I download
    to a base address of 0x1000 I get just garbage text. Do I need to position
    the download image at some specific address?

    #68147

    Water_my_lawn
    Participant

    After much pulling my hair and pounding my head I finally
    recovered my OS. I tried loading a bunch of different binaries
    that supported OTA updates but none would work to update
    the OS firmmware. I tried the Arduino IDE which loaded
    binary images OK but none that were useful.

    I finally loaded “OpenGarage” which worked properly and
    presented an update screen at 192.168.4.1/update.

    To load the OpenGarage firmware I executed:
    esptool.py –port /dev/ttyUSB0 write_flash 0x0 ~/Downloads/og_1.1.0.bin

    Clearly there is something in the OpenSprinkler firmware
    that is not clearing some area of the flash and causing
    a read fault and then a reset. Likely an invalid pointer.
    Loading OpenGarage seems to clear that condition.

    It is likely that OpenSprinkler is sensitive to something
    in it’s configuration data. This is the data that is not
    over-written when a new version is flashed during the update
    process.

    #68286

    Water_my_lawn
    Participant

    In my working with the source I wanted to be able to recreate a new source tree.
    I wrote a script that automates the instructions that Ray gives for downloading
    and compiling the code. Also, the makefiles are written so that they reference
    your home directory rather than your current directory. This requires you to
    work in your home directory rather then some sub-directory. This precludes
    having multiple build environments. This script had a fix for that.

    I have attached the script below.

    #68287

    Water_my_lawn
    Participant

    Second try on the attachment. This is a tar GNU zipped file.

    #68288

    Water_my_lawn
    Participant

    Third try.

    #68289

    Water_my_lawn
    Participant

    Forth try.

    #68327

    Water_my_lawn
    Participant

    I have captured a hang event with Wireshark. You can see the traffic progressing
    normally until the OS stops responding. I have attached the captured file if anyone
    wants to have a look and make any comments.

    The OS has an IP add of 192.168.209.20, my computer has an IP address of 192.168.209.221.
    Load this file in Wireshark and use this display filter to hide unrelated traffic.
    ip.addr == 192.168.209.20

    #68333

    DaveC
    Participant

    @Water_my_lawn
    I don’t know what the capture environment was but it doesn’t seem like you are looking at the network from the OS perspective, I.e. everything that …20 is receiving. Correct me if I got this wrong.

    Like you, I see the last successful exchange and the next one doesn’t get off the ground, i.e. OS doesn’t respond to the connect request. That doesn’t provide much info about what’s going on with OS. If you suspect that the OS hang is due to something on the network that OS is choking on, you would need to see everything OS sees and then correlate suspect activities to a time window where the OS stops responding.

    Do you have a managed switch that can mirror ports? If so, you could mirror the port that OS is on to a port that a PC running Wireshark can look at.

Viewing 18 posts - 101 through 118 (of 118 total)
  • You must be logged in to reply to this topic.

OpenSprinkler Forums Hardware Questions OpenSprinkler Controller lockups / crashes with wired Ethernet module