OpenSprinkler Forums Hardware Questions OpenSprinkler Controller lockups / crashes with wired Ethernet module Reply To: Controller lockups / crashes with wired Ethernet module



After waiting for 2 months for a network interface hang I added debug code to
narrow the focus of my investigation. I started the next run and caught a
hang after 2 days. This time the network interface, while hung, would respond
to pings. The pings had a valid response ratio of about 10%. The bad
response packets seemed to be corrupted. I watched the traffic with Wireshark.

I further narrowed my debug code to focus more closely in the received and
transmitted packets code. When I loaded my new code I totally bricked
the device. The recovery method that I previously used with
did not work. The OS was transmitting data continuously from the ASYNC
port but it would not autobaud so the data was just garbage.

The default BAUD rate of the ESP8266 with a 26MHz oscillator is 74880 BAUD.
This is non-standard and Putty does not support it even though my USB to ASYNC
adapter does support that odd BAUD rate. I found a terminal emulator called which does support any BAUD rate.

Using this I successfully received the data from the OS. This is what
I got:
ets Jan 8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
Fatal exception 9(LoadStoreAlignmentCause):
epc1=0x401014d7, epc2=0x00000000, epc3=0x00000000, excvaddr=0x0000000a, depc=0x00000000

Exception (9):
epc1=0x401014d7 epc2=0x00000000 epc3=0x00000000 excvaddr=0x0000000a depc=0x00000000


ctx: sys
sp: 3ffff8f0 end: 3fffffb0 offset: 01a0
3ffffa90: 4024c3fa 3ffee5aa 3ffee5aa 3ffeed9c
3ffffaa0: 4024c409 4024c3b6 40105450 c1781c9b
3ffffab0: 00000000 400042db 40105712 000003fd
3ffffac0: 000000ed 00000020 3fffff10 00000001
3ffffad0: 4010570c 40105583 00000003 8667a4e3
3ffffae0: ffffffff ffffffff ffff0002 00000000
3ffffaf0: 00000000 00000000 00000000 00000000
3ffffb00: 00000000 00000000 00000000 00000000
3ffffb10: ffffffff 00ffffff 00000000 00000000
3ffffb20: 00000000 00000000 00000000 00000000
3ffffb30: 00000000 00000000 00000000 00000000

This was sent repeatedly and the maximum rate.
The important message is:

Fatal exception 9

This means that a pointer expecting to read a 32 bit value
is not word aligned. The compiler should not do this so
perhaps the process of flashing my code had an error.
The OS was initializing and taking an exception in
a very tight loop.

Since the OS was in this loop, the regular tools would
not write new firmware. Even the loader would
not work. On the Espressif web site I found their tool,
flash_download_tool_3.8.5.exe, for programming the device.
That tool is really klugey but I did manage to over-write
the flash with the OpenGarage binary. Then the OS did
respond to IP address Now I could
fully recover.

Now the the OS is back I will further zoom into the
suspected area and hopefully fix this problem. This was a
struggle, I thought that I had permanently bricked my OS!