Forum Replies Created

Viewing 25 posts - 26 through 50 (of 72 total)
  • Author
    Posts
  • in reply to: Controller lockups / crashes with wired Ethernet module #69165

    Water_my_lawn
    Participant

    An update;

    I have run since my last post and just now detected a network subsystem hang.
    With the debug code that I added I can see that the uip_process in the uip.c
    file is receiving packets but always drops them. The main OpenSprinkler
    code never gets the packets. There is clearly something wrong with packet
    handling since even ping does not work and ping does not involve the OS code.

    I have been communicating with jandrassy, one of the maintainers of the
    UIPEthernet code. That thread is here:
    https://github.com/UIPEthernet/UIPEthernet/issues/129

    He has been chasing a memory leak in this code.
    There is a memory heap manager for packet buffers called mempool.c.
    He suspects that the problem may lay there. Since I am seeing receive
    buffer overflow errors in the ENC28J60 chip the problems could be related.
    I have added code to check for this and will start another run.
    It took 2 months to catch this error it may take a while to catch another
    error.

    in reply to: Controller lockups / crashes with wired Ethernet module #68706

    Water_my_lawn
    Participant

    A quick note, my firmware has hung after 10 days. After running the above tests
    as described I loaded my debug version and installed my OS back in it’s normal place.
    Since it is after watering season it has nothing to do but sit there powered up.
    I checked it once a day, and today it was hung.

    Since it is inconvenient to get to I will have to rig up a debug cable to read out
    the log information. I will have to do this without risking removing power.
    I will report back when I have done this.

    in reply to: Controller lockups / crashes with wired Ethernet module #68636

    Water_my_lawn
    Participant

    I have completed testing of 3 firmware versions. The first firmware is one that I compiled
    and has debugging code that would provide the state of the IP stack if a hang happened.

    The second 2 firmware versions, OS220(90).bin and OS220(91).bin, are from the previous post.

    I ran each for 1 week in my normal setup. This is just the OS 3.2 sitting on my desk with
    only power, the RS232, and the ENC28J60 connected. I have no zone valves connected. I
    have a web browser pointed at the OS. This setup would usually produce a hang within
    one week.

    The result is that all 3 firmware versions ran the full week with no problems.

    It seems that the problem is solved by using the latest version of UIPEthernet 2.0.9.

    in reply to: Controller lockups / crashes with wired Ethernet module #68501

    Water_my_lawn
    Participant

    Thanks;

    I have compiled OS 3.2 using the 2.0.9-4 version of the Ethernet
    library. I have added debugging code that should allow me to
    know the state of a bunch of variables if it hangs. I intend
    to let this run for a week. This may be long enough to produce
    a hang.

    I tried getting gdb to run on the OS but it was not successful.
    I could single step and look at data but I could not break
    execution or stop on a hardware breakpoint. So now I am
    back to using printf.

    in reply to: Controller lockups / crashes with wired Ethernet module #68347

    Water_my_lawn
    Participant

    You are correct in that I am watching the IP traffic from my PC and not at the OS.
    I too did not see anything odd in the IP traffic until it stopped.

    Mostly I am attempting to insert suitable debug messages in the firmware that will
    tell me what is wrong. Doing this type of printf debugging is tricky because
    you effect the timing. The printf’s can be a heavy load as they are synchronous.

    So far it seems that the OS stops receiving messages when the hang occurs.
    I don’t yet know why.

    I am using the latest enc28j60 drivers which have some significant changes
    by the developers. I had hoped that this would solve the problem, but it
    is no better than the old version with regard to hanging.

    in reply to: Controller lockups / crashes with wired Ethernet module #68327

    Water_my_lawn
    Participant

    I have captured a hang event with Wireshark. You can see the traffic progressing
    normally until the OS stops responding. I have attached the captured file if anyone
    wants to have a look and make any comments.

    The OS has an IP add of 192.168.209.20, my computer has an IP address of 192.168.209.221.
    Load this file in Wireshark and use this display filter to hide unrelated traffic.
    ip.addr == 192.168.209.20

    in reply to: Controller lockups / crashes with wired Ethernet module #68289

    Water_my_lawn
    Participant

    Forth try.

    in reply to: Controller lockups / crashes with wired Ethernet module #68288

    Water_my_lawn
    Participant

    Third try.

    in reply to: Controller lockups / crashes with wired Ethernet module #68287

    Water_my_lawn
    Participant

    Second try on the attachment. This is a tar GNU zipped file.

    in reply to: Controller lockups / crashes with wired Ethernet module #68286

    Water_my_lawn
    Participant

    In my working with the source I wanted to be able to recreate a new source tree.
    I wrote a script that automates the instructions that Ray gives for downloading
    and compiling the code. Also, the makefiles are written so that they reference
    your home directory rather than your current directory. This requires you to
    work in your home directory rather then some sub-directory. This precludes
    having multiple build environments. This script had a fix for that.

    I have attached the script below.

    in reply to: Controller lockups / crashes with wired Ethernet module #68147

    Water_my_lawn
    Participant

    After much pulling my hair and pounding my head I finally
    recovered my OS. I tried loading a bunch of different binaries
    that supported OTA updates but none would work to update
    the OS firmmware. I tried the Arduino IDE which loaded
    binary images OK but none that were useful.

    I finally loaded “OpenGarage” which worked properly and
    presented an update screen at 192.168.4.1/update.

    To load the OpenGarage firmware I executed:
    esptool.py –port /dev/ttyUSB0 write_flash 0x0 ~/Downloads/og_1.1.0.bin

    Clearly there is something in the OpenSprinkler firmware
    that is not clearing some area of the flash and causing
    a read fault and then a reset. Likely an invalid pointer.
    Loading OpenGarage seems to clear that condition.

    It is likely that OpenSprinkler is sensitive to something
    in it’s configuration data. This is the data that is not
    over-written when a new version is flashed during the update
    process.

    in reply to: Controller lockups / crashes with wired Ethernet module #68087

    Water_my_lawn
    Participant

    I have not tried such fast BAUD rates, I use just 115200. That seems real fast, I learned about ASYNC with an ASR33.
    Mechanical decoding at 110 BAUD! I fed miles of paper tape through those machines.

    My USB to ASYNC converter board that uses a CH340 chip does not have the extra RS232 lines brought out so I don’t have
    the ability to automatically handle reset. My converter board only has TxD, Rxd, and ground. I have better boards coming.

    When I download the latest release, os_219_rev7.bin, I get these messages:

    load 0x4010f000, len 1384, room 16
    tail 8
    chksum 0x2d
    csum 0

    In the past with a working system the messages would continue and report
    the IP address and the time and something about checking for the weather.
    By the lack of these messages I assume that the OS firmware is not running.
    The WiFi does not come up.

    If I download to a base address of 0x0 I get these messages. If I download
    to a base address of 0x1000 I get just garbage text. Do I need to position
    the download image at some specific address?

    in reply to: Controller lockups / crashes with wired Ethernet module #68083

    Water_my_lawn
    Participant

    How do you program the OS the first time? Are they already flashed with something that
    gets you into AP mode that enables updating through the WiFi?

    I can reliably flash the released firmware now but it does not work. I suspect that there
    is already some firmware in the OS that makes it work that I am missing. Do you know
    what that might be? Can you read a full copy of the flash from a working unit using
    something like esptool.py?

    I had assumed that the pins connected to buttons B1 and B3 were pulled up internally.
    But I found a web page that said that GPIO 2 must be pulled up with a resistor to enable
    programming mode. With the resistor it always works, without it it rarely works.
    Perhaps it is only my unit.

    I am using a CH340 which works fine now. I do have a soldered on reset switch however.
    With a CP2102 you get extra pins that allow it to handle the reset. I have already ordered
    one.

    in reply to: Controller lockups / crashes with wired Ethernet module #68077

    Water_my_lawn
    Participant

    While debugging the hang problem I was keenly aware that I could brick my OS.
    Well I have done it! I loaded a image of the OS firmware that had a bug
    that prevented it from coming up. This prevented using the 192.168.4.1/update
    process.

    It took me a long time to figure out the flash programming process on the OS.
    There is a program capable of communicating with the ESP8266 module using
    the ASYNC port available on the 6 pin edge connector on the OS. The
    ESP8266 flash write program is called esptool.py. It is available on
    github. However, out of a few hundred tries I could only get it to work
    3 or 4 times.

    The problem turned out to be GPIO 2 (pin 17 on the ESP8266 module) must
    be pulled up by a resistor. It is left floating. This pin is connected
    to the “B1” button. I put a 10K resistor from the B1 button to the +3.3
    supply. Now I can get into boot mode reliably.

    I can now download firmware images. I still cannot get the OS firmware
    to run even with known good firmware; the latest release os_219_rev7.bin.
    There must be something that I am missing. The display does not light
    up and the WiFi does not come up.

    in reply to: Controller lockups / crashes with wired Ethernet module #68038

    Water_my_lawn
    Participant

    There is quite a nice graphical debugger package for the Arduino that has been
    adapted for the esp8266 called Sloeber. The CPU inside the esp does not support
    JTAG. It does support debugging using the ASYNC port. The chip has one
    hardware breakpoint so you can do debuging in read only memory.

    I have used GDB a lot and used JTAG a lot but I have not used a CPU that debugs
    through the ASYNC port. To facilitate debugging in your target code you link
    in a small piece of code called the stub driver. Early in the program you
    set the BAUD rate and call gdbstub_init(). From there GDB running on a PC
    should be able to grab hold of the target and control it. It is the
    connection that is failing for me.

    in reply to: Controller lockups / crashes with wired Ethernet module #68030

    Water_my_lawn
    Participant

    Ray;

    I have, up to now, been debugging using printing to the TTY port. This has given me a lot
    of clues to look at. However adding printing changes the timing and can make the problem
    go away.

    I am trying to setup my system to use GDB. I have installed the Sloeber package and compiled
    in the stub driver. I have been unable to connect using:
    target remote /dev/ttyUSB0

    I do issue all the configuration commands including setting the BAUD rate but I never
    connect successfully.

    Do you use GDB? What development environment do you use?

    Thanks.

    in reply to: Controller lockups / crashes with wired Ethernet module #67969

    Water_my_lawn
    Participant

    After about 8 hours of running I got a hang with my debug code.
    I now have some obtuse numbers to ponder over. Initially I
    can say that packets continue to be received during the hang.
    This is evidenced by the packet counter incrementing.
    However the counters for the UDP, ICMP, and TCP packets
    do not increment. Ray’s reset logic does detect an error
    and increments. I have disabled Ray’s reset code so
    the reset is not actually done so as to not disturb
    my state recording code.

    I will meditate on this result and get back.

    in reply to: Controller lockups / crashes with wired Ethernet module #67960

    Water_my_lawn
    Participant

    OK, here is the same file zip’ed.

    in reply to: Controller lockups / crashes with wired Ethernet module #67952

    Water_my_lawn
    Participant

    I have produced a debug version of the code. It should operate no
    differently than the official release. I have added some debug
    information that will appear on a line above the standard display
    and a line that will appear below the standard display.

    The line above will contain 4 hex numbers. The first is the
    flag field of the current packet being handled. This will
    normally be zero.

    The second, third and forth are counts of the packet count,
    the ICMP packet count, and the TCP packet count. These are
    only one byte counters so they roll over often. The ICMP
    count will be 0 until you ping the OS.

    Before you communicate with the OS the tol line will
    display “client”. That means that no client has established
    communications. Just point a web browser at the OS and
    the debug counters will appear.

    The bottom line will contain 4 numbers. These are state
    indicators for the 4 levels of code involved in the network
    communication using the ENC28J60 interface.

    I will run this firmware on my system and watch for a hang.
    If other people with the hang problem would like to help
    that would be great.

    If you get a hang I would like to get all the numbers.
    Take a photo to save writing them down.
    Generally a hang is indicated with a “Network error”
    message at the bottom of the OS web page.

    When you se that happens send me the numbers. Sometimes
    I can ping the OS when it is hung but mostly ping will fail.
    If you ping it the numbers may change. Please also send
    the changed numbers.

    Try to refresh the web page. The numbers may change, if
    so then please send the changed numbers.

    I have attached the debug version of the firmware.

    in reply to: Controller lockups / crashes with wired Ethernet module #67847

    Water_my_lawn
    Participant

    I just had another hang. This time it was unusual, the web page was hung as with other
    hangs but this time the OS responded to a pings. The display showed 28|1|4|10.
    The only other time I saw a 28 was when the OS was running OK.

    in reply to: Controller lockups / crashes with wired Ethernet module #67831

    Water_my_lawn
    Participant

    I got the code and can compile it with debugging and load it successfully.
    Now I an ready to try some debugging.

    Here is my take on the situation:

    The ENC28J60 is not interrupt driven. There is an interrupt pin #2 on the
    connector but it is not connected to anything in the OS. It runs in polled
    mode.

    The OS continues to run normally, only the network interface is down. The
    polling loop in main.cpp runs OK because the sprinkler programs continue
    to run normally.

    The interface does not respond to a ping. ICMP packets are handled in the
    UIPEthernet driver, they never get into the OS code. There is no hardware
    support for ICMP packets.

    I suspect that the receive buffer fills and is not being cleared for some
    reason. One possible reason is that the incoming packets over-run the
    OS in the rate that can be digested. Another possible reason could be
    could be some non-thread safe code.

    I am going to put some debug messages into a new version and try to catch
    the problem.

    I went 7 days without a hang then had 2 in succession.

    I have looked at the OLED debug messages for a number of these hang events and cannot
    identify a root cause.

    I would like to produce a new debug version and I will run it. I would like
    to have some volunteers that have had these problems. The code will otherwise
    be identical to Ray’s latest release.

    in reply to: Controller lockups / crashes with wired Ethernet module #67774

    Water_my_lawn
    Participant

    Perhaps I was reading your instructions too literally.
    Here is my update instructions that seem to work and
    produce the mainArduino.bin file. I have not tried it
    yet.

    Ps: I have not had a hang since Aug 1. No change to the
    firmware and no change on my network!
    —————————————————–

    #Get the code.
    git clone https://github.com/OpenSprinkler/OpenSprinkler-Firmware.git
    #Puts it in ~/OpenSprinkler-Firmware/

    #Get the Arduino code.
    git clone https://github.com/esp8266/Arduino.git esp8266_2.5.2
    #Puts it in ~/esp8266_2.5.2

    #Go into esp8266_2.5.2 and get the correct tag.
    cd esp8266_2.5.2
    git checkout tags/2.5.2

    cd tools
    python get.py

    #Install necessary libraries, including SSD1306, RCSwitch, and UIPEthernet.
    #Download and unzip or git clone these into ~/Arduino/libraries folder.

    mkdir -p ~/Arduino/libraries
    cd ~/Arduino/libraries
    git clone https://github.com/ThingPulse/esp8266-oled-ssd1306.git

    # The latest version of the OLED code is not compatible, backup to 4.1.0
    cd esp8266-oled-ssd1306
    git checkout tags/4.1.0

    git clone https://github.com/sui77/rc-switch.git
    git clone https://github.com/UIPEthernet/UIPEthernet.git

    #And this one which is new.
    git clone https://github.com/knolleary/pubsubclient.git

    cd ~/OpenSprinkler-Firmware

    #There is an error in make.lin32:
    #Replace this line:
    ~/Arduino/libraries/SSD1306 \

    #with this line:
    ~/Arduino/libraries/esp8266-oled-ssd1306 \

    # Remove tests directory, will not compile.
    rm -rf ~/Arduino/libraries/pubsubclient/tests

    make -f make.lin32

    in reply to: Controller lockups / crashes with wired Ethernet module #67760

    Water_my_lawn
    Participant

    I updated the UIPEthernet library from your source but I get the same errors.
    I issue these commands from ~/OpenSprinkler-Firmware.

    make -f make.lin32 clean
    make -f make.lin32

    I have attached the full compiler output showing all the errors that I get.
    I don’t have any errors that refer to “test”.

    Thanks.

    Attachments:
    in reply to: Controller lockups / crashes with wired Ethernet module #67745

    Water_my_lawn
    Participant

    I am trying to compile the source. Here is the procedure that I followed which
    is as close as possible to the procedure that you described. However I get
    compile errors.

    ————————————————————-

    #Get the code.
    git clone https://github.com/OpenSprinkler/OpenSprinkler-Firmware.git
    #Puts it in ~/OpenSprinkler-Firmware/

    #Get the ESP8266 for Arguino stuff.
    git clone https://github.com/esp8266/Arduino.git
    #Puts it in ~/Arduino

    git clone https://github.com/esp8266/Arduino.git esp8266_2.5.2
    #Puts it in ~/esp8266_2.5.2

    #Go into esp8266_2.5.2
    cd esp8266_2.5.2
    git checkout tags/2.5.2

    cd tools
    python get.py

    #Install necessary libraries, including SSD1306, RCSwitch, and UIPEthernet.
    #Download and unzip or git clone these into ~/Arduino/libraries folder.

    cd ~/Arduino/libraries
    git clone https://github.com/ThingPulse/esp8266-oled-ssd1306.git
    git clone https://github.com/sui77/rc-switch.git
    git clone https://github.com/UIPEthernet/UIPEthernet.git

    #And this one which is new.
    git clone https://github.com/knolleary/pubsubclient.git

    cd ~/OpenSprinkler-Firmware

    #There is an error in make.lin32:
    #Replace this line:
    ~/Arduino/libraries/SSD1306 \

    #with this line:
    ~/Arduino/libraries/esp8266-oled-ssd1306 \

    make -f make.lin32

    ———————————————–

    I get a series of errors like:

    home/peter/Arduino/libraries/ESP8266WiFi/src/BearSSLHelpers.h:149:34: error: ‘virtual const unsigned char* BearSSL::HashSHA256::oid()’ marked override, but does not override
    virtual const unsigned char *oid() override;

    /home/peter/Arduino/libraries/ESP8266WebServer/src/Parsing-impl.h:139:15: error: ‘class String’ has no member named ‘isEmpty’
    if (req.isEmpty()) break; //no more headers

    I suspect that there is some version miss-match somewhere.

    in reply to: Controller lockups / crashes with wired Ethernet module #67722

    Water_my_lawn
    Participant

    Hello Ray;

    Could you send me the OS files that you modified with the debug code? I would like to take
    a look at them and see if anything catches my eye. I know that I can get the standard source
    from github.

    Oddly, I have not had a hand since Aug 1.

Viewing 25 posts - 26 through 50 (of 72 total)