Significant milestone - Mower controls externalized




In my last blog on the robotic lawn mower, I had indicated that I rewrote the old python code using AI. Since then I have refactored the entire codebase and made significant changes to the hardware and the corresponding software. I had AI write the summary below. But I will let the videos speak for themselves. 


The field test went well, now the Nvidia Jetson Nano Orin is able to control the lawn mower  successfully based on the controls signal sent by the phone via an RF transmitter.


Field test







Garage Test




Explanation of the hardware configuration. Jump to 4:30 for hardware details. 




Summary below is AI generated and is reviewed for accuracy.


ZTM Auto-Mower Refactored — Project History

A chronological log of all challenges, approaches, and fixes encountered during the development and integration of the refactored auto-mower control system.


Phase 1: Code Refactoring & Architecture

1.1 Modular Redesign

Goal: Refactor the original monolithic auto-mower app into a modular design with separation of threads.

Approach: Discussed and designed a new architecture with dedicated modules:

  • app/hardware/ — DAC, relays, GPIO
  • app/io/ — NRF radio, Arduino calibration, USB detection
  • app/control/ — stick, motor, heartbeat, calibration training
  • app/web/ — Flask dashboard and API
  • app/ai/ — AI mowing (future)

Removed BLE and unused Arduino references. Added AI mowing feature stub to read predetermined GPS mapping with camera feedback.

1.2 Training Mode

Added a training mode to collect training data via GPS and cameras for future AI-assisted mowing.


Phase 2: Deployment & NRF Communication

2.1 Initial Deployment to Jetson Nano

Challenge: First deploy to the Nano via SSH/rsync. Had to establish remote workflow using sshpass with credentials for vivek@192.168.55.1.

Issues encountered:

  • Connecting Mac to Nano over USB (192.168.55.1)
  • Old app auto-starting and conflicting with new app
  • Setting up systemd service to auto-start the refactored app

2.2 NRF Radio — No ACK

Challenge: After deploying the refactored app, the NRF radio would not send ACK responses back to the Android app. The heartbeat was not being received.

Approaches tried:

  1. Hard-coded USB port assignments via .env file — fragile, ports changed on reboot
  2. Auto-probing all USB ports — cameras and GPS were identifiable, remaining port assigned to NRF
  3. Required Android NRF to be active before Nano boot
  4. Manual rescan from the dashboard

Root cause: The refactored app didn't start NRF communication after rescan. The NRF reader thread needed to be restarted when a new port was assigned.

2.3 USB Port Detection & Assignment

Challenge: Jetson Nano reassigns USB port numbers on reboot, so hard-coded paths broke.

Solution: Built a USB detection panel on the dashboard showing all detected ports, bytes received, device identifiers, and a dropdown to manually reassign if needed. Cameras and GPS were auto-detected by USB VID/PID; the remaining port defaulted to NRF.

2.4 Heartbeat Lost — Emergency Stop

Challenge: Even after NRF was detected, the watchdog would trigger emergency stop after 3.5 seconds of no heartbeat.

[NRF] Parse: H,0,0,0,0,1,54933,0,248
[NRF] HB: steer=0 speed=0 E=0 R=0 cmd=0 relays=11111000
[WATCHDOG] HEARTBEAT LOST (3.5s) - EMERGENCY STOP!

Fixes:

  • Reduced heartbeat timeout from 3s to 1s for faster response
  • Kept heartbeat alive even when receiving junk characters on the NRF serial line
  • Ensured NRF thread restart on port rescan

Phase 3: Thread Deadlocks & State Management

3.1 App Crash on E-STOP Command

Challenge: The app crashed when an emergency stop command was sent from the Android app. Traced to state management issues.

Root cause: Multiple threads (NRF, web, heartbeat, stick) were reading and writing a shared state object with locks, causing deadlocks.

3.2 Deadlock Analysis & Lock-Free Redesign

Challenge: The master state object with locks caused the NRF thread to deadlock, freezing the entire app.

Approach — brainstormed each shared variable:

  • Heartbeat: Written by NRF, read by watchdog and web — no lock needed (single writer)
  • Motor controls: Written by active control mode (NRF, web, or AI), read by DAC — only one writer active at a time
  • Relay bitmask: Written by NRF, web, or e-stop — single flag overwrite, unlikely deadlock
  • Stick detection: Split into two fields (vertical/horizontal angle), each set by one camera
  • DAC calibration: Written only from Android app, read by DAC module

Solution: Removed the master state lock. Each thread writes only its own data; other threads only read. Eliminated all deadlock scenarios.

3.3 Control Mode Conflicts (Web vs Android)

Challenge: The web dashboard and Android app kept overriding each other's motor controls and relay states. Pressing stop on Android was immediately overwritten by web.

Solution: Added a control mode toggle (WEB / ANDROID) on the dashboard. Only the active controller can write motor commands and relay states. Mode persists through e-stop (previously e-stop reset mode to web).

3.4 Relay Bitmask Sync Between Android and Nano

Challenge: The relay bitmask sent continuously by Android would overwrite Nano-initiated relay changes (like start/stop mower). The relays would flip back within milliseconds.

Solution: Implemented a two-way sync protocol:

  1. Android sends bitmask + change flag
  2. Once Nano's bitmask matches, change flag resets and Android stops overwriting
  3. When Nano changes relays (e.g., e-stop), it sets flag=2
  4. Android accepts the change, updates its switches, and sends the bitmask back to confirm

Phase 4: Motor Control & DAC Calibration

4.1 DAC Voltage Mapping

Challenge: Stopping the mower sent motors to 2.8V instead of the calibrated neutral voltages (left=2.44V, right=3.21V).

Root cause: The 0% motor command was hard-coded to 2.8V instead of reading from calibration.

Solution: Designed a proper voltage mapping:

  1. Android sends speed/steer → Nano converts to left/right motor percentage (-100 to +100)
  2. 0% maps to calibrated neutral voltage
  3. -100% and +100% map to calibrated low/high voltages
  4. Exponential acceleration curve between neutral and extremes

4.2 Left/Right Motor Inversion

Challenge: One motor uses inverted voltage (higher voltage = reverse). The inversion was applied inconsistently across the stack, sometimes double-inverting.

Issues found:

  • Android displayed inverted direction for left motor
  • DAC calibration panel showed pre-inversion values
  • Left/right DAC pin assignments were swapped
  • Steering logic was reversed (turning left spun the wrong motor faster)

Fixes applied iteratively:

  • Swapped DAC pin assignments (left↔right)
  • Applied inversion to right motor instead of left
  • Fixed Android steering logic for gyro mode
  • Added "DAC Sent" display showing actual voltage after inversion

4.3 Arduino Calibration System

Challenge: Needed to verify DAC output voltage matched intended values. No way to measure voltages remotely.

Solution: Connected an Arduino Uno to the Jetson via USB with 4 analog inputs:

  • A0 = Mower left handle voltage
  • A1 = Mower right handle voltage
  • A2 = DAC left output
  • A3 = DAC right output

Arduino streams V,mower_l,mower_r,dac_l,dac_r at 10Hz. Dashboard calibration panel shows live comparison.

4.4 Arduino Upload Difficulties

Challenge: Uploading the Arduino sketch was problematic.

Issues:

  • arduino-cli not installed on Nano — installed manually
  • Wrong bootloader selection (atmega328old vs atmega328/new bootloader)
  • Serial port conflicts when app held the Arduino port

Fix: Used arduino:avr:nano (new bootloader) FQBN. Kill the app before upload to release the serial port.

4.5 Arduino Left/Right Values Swapped

Challenge: The Arduino analog readings for left and right (both mower and DAC) were physically swapped compared to what the software expected.

Fix: Swapped the values in the Nano's serial parser (calibration_arduino.py) — dac_left = Arduino A3dac_right = Arduino A2.

4.6 Training Mode & Calibration Verify

Built a calibration training mode that:

  1. Steps through a sequence of motor percentages (-100% to +100%)
  2. Sets DAC to corresponding voltage
  3. User follows instructions on the physical mower handles
  4. Arduino reads actual mower voltage and DAC output
  5. Logs differences for calibration refinement

Added a "Verify" button that sweeps 5 voltage pairs and reports accuracy.

4.7 Calibration Panel — Edit/Save Bug

Challenge: Editing calibration values on the dashboard would be overwritten by the live polling loop before the save button could be pressed.

Fix: Added an explicit Edit/Save/Refresh workflow — live polling pauses while editing, values only commit on Save.


Phase 5: Relay Logic Refinement

5.1 Relay Auto-Toggle on Joystick Movement

Challenge: Moving the joystick while mower was IDLE would auto-enable master kill and motor relays, creating unexpected behavior.

Fix: Removed the auto-relay-on logic. Relays only change via explicit start/stop buttons, emergency stop, or manual relay toggles.

5.2 E-STOP Reset Didn't Restore Relays

Challenge: After clearing an emergency stop, relays didn't return to their pre-e-stop state. Spare relays (used for other mower switches) stayed off.

Fix: On e-stop clear, restore spare relays to their default ON state for mower readiness.

5.3 Start/Stop Mower Relay Mapping

Added relay 7 and 8 (spare 4 and 5) to the start/stop mower sequence — on when running, off when stopped.

5.4 Relay Actor Logging

Added logs for every relay toggle with the responsible actor (NRF, WEB, E-STOP, WATCHDOG, etc.) at log level 5 for debugging relay fights.


Phase 6: Dashboard & Android App Polish

6.1 Camera Preview Toggle

Challenge: Camera thread drained CPU even when not needed.

Solution: Camera only runs in stick mode, AI training, and AI running mode. Added a button to toggle camera preview on the dashboard.

6.2 Android App UI Updates

  • Changed from white-on-black to black-on-white theme
  • Moved NRF TX/RX section above virtual joystick
  • Added split motor bars: "Sent" (from gyro/joystick) vs "Received" (from NRF ACK), side by side
  • Added motor test mode with individual left/right sliders

6.3 Android Motor Value Display

Challenge: Significant delay in motor values updating on Android. Slower than the NRF round trip.

Fix: Separated sent vs received motor values into independent display bars that update at different rates.

6.4 Android Sends Left/Right Instead of Speed/Steer

Challenge: Different control modes (joystick, gyro, motor test) used different formats. Gyro sent speed/steer while motor test sent left/right, causing confusion on the Nano.

Fix: All Android control modes now send left/right motor values. Nano only expects left/right — no more speed/steer conversion needed on the backend.

6.5 Dashboard Server Restart Button

Added a button on the dashboard to restart the Flask server itself, useful for applying code changes without SSH access.

6.6 Log Level Dropdown

Added a runtime log level selector on the dashboard (levels 1-10). Higher levels include all lower-level logs. Avoids restarting the app to change verbosity.

6.7 Log Retention

  • New log file created on each app start
  • Logs older than 30 days auto-deleted
  • Rolling file handler prevents single-file bloat

Phase 7: E-STOP on Reverse & NRF RX Loss

7.1 E-STOP Triggered on Reverse

Challenge: Moving the control from forward to reverse triggered an emergency stop.

Investigation:

  • Checked for divide-by-zero when motor speed crosses from positive to negative
  • Checked for variables that could cause a crash during sign change
  • Examined the DAC voltage mapping at the crossover point

Root cause: The NRF RX stopped entirely when reversing. Android TX continued sending, but Nano stopped responding. The heartbeat watchdog interpreted the silence as a lost connection.

7.2 NRF RX Loss During Reverse

Challenge: The Nano stopped receiving NRF data when motor commands went negative, even though Android kept transmitting.

Investigation: Pulled and analyzed logs showing:

  • NRF parse continued working for forward values
  • RX went silent at the exact moment of sign change
  • No crash or exception in the logs

Root cause & fix: The NRF serial read was failing on certain byte patterns that appeared in negative motor values. Added error handling to keep the NRF reader alive regardless of parse failures, and ensured junk characters don't kill the heartbeat.


Phase 8: MCP4728 → MCP42010 Migration

8.1 SPI Driver Implementation

Goal: Replace the I2C DAC (MCP4728, channel A broken) with an SPI digital potentiometer (MCP42010, dual 10kΩ).

Approach: Created mcp42010_digipot.py wrapping Python spidev. Voltage divider config: PA→5V, PB→GND, wiper as output. Maps voltage to wiper position: wiper = round(voltage / VDD * 255).

8.2 SPI Bus Identification on Jetson Orin Nano

Challenge: Board was actually a Jetson Orin Nano (Tegra234), not a regular Nano. SPI bus naming differs — Tegra SPI1 on header pins 19/21/23/24 maps to spidev0 in Linux.

Approaches:

  • Started with DIGIPOT_SPI_BUS = 0
  • Changed to 1 based on misinterpretation
  • Corrected back to 0 after device tree verification

8.3 Stale __pycache__ on Nano

Challenge: After deploying updated code, Nano still ran old MCP4728 code from cached .pyc files.

Fix: Clear all __pycache__ directories before each deploy.

8.4 MCP42010 Pinout Documentation Error

Challenge: Initial pinout had critical errors — pins 13 (SO) and 14 (VDD) were swapped, and PA/PB terminal assignments were reversed.

Impact: Chip was unpowered or parasitically powered. Wipers read constant ~2.75V (floating).

Fix: Cross-referenced official Microchip datasheet. Moved 5V from pin 13 to pin 14. Readings shifted to ~1.9V/2.1V — chip now powered but still not responding to commands.

8.5 Flat Wiper Readings — Exhaustive Software Testing

Challenge: After fixing VDD, wipers showed ~1.9V and ~2.1V but were completely unresponsive to SPI commands. Sweeping 0→255 produced no change.

All approaches tried (all failed):

  • spidev0.0 at speeds from 10kHz to 1MHz
  • spidev1.0
  • All 4 SPI modes (0, 1, 2, 3)
  • 20x repeated writes
  • Separate pot writes vs simultaneous
  • GPIO bit-bang (manual pin toggling, bypassing spidev)
  • MISO response analysis (returned [0,0] — ambiguous: could be valid chip response or floating pin)
  • spi.no_cs = True — threw OSError: Invalid argument (unsupported on Orin Nano kernel)

8.6 Arduino as Remote Pin Probe

Challenge: No multimeter access remotely. Needed to verify electrical signals on the Jetson header.

Solution: Connected Jetson SPI pins directly to Arduino analog inputs:

  • Jetson pin 19 (MOSI) → Arduino A0
  • Jetson pin 23 (SCLK) → Arduino A1
  • Jetson pin 24 (CE0/CS) → Arduino A2
  • Jetson pin 21 (MISO) → Arduino A3

Toggled each Jetson pin via GPIO and read Arduino to verify signals.

8.7 Root Cause — CS Wire on Wrong Pin

Discovery from probe test:

PinToggle ResultVerdict
Pin 19 (MOSI)0V ↔ 3.49VOK
Pin 23 (SCLK)0V ↔ 3.49VOK
Pin 24 (CE0/CS)0V ↔ 0VSTUCK AT 0V

Root cause: The CS wire was on pin 25 (GND) instead of pin 24 (CE0) — adjacent pins on the 40-pin header.

Impact: With CS permanently LOW, the chip was always selected but never latched data. The MCP42010 requires CS to go HIGH→LOW before clocking data, then LOW→HIGH to execute the command. Permanent LOW means the chip just shifts data through without executing.

Fix: Moved the CS wire from pin 25 to pin 24.

8.8 Successful Integration

After the CS fix, the digipot immediately worked:

WiperExpectedLeftRight
00.0V0.015V0.000V
641.25V1.266V1.251V
1282.5V2.527V2.507V
1923.75V3.778V3.763V
2555.0V4.990V4.995V

Calibration verify passed with max error of 20mV across the full 0.5V–4.5V range, well within the 150mV tolerance.


Recurring Cross-Cutting Issues

SSH/SSHPASS Flakiness

SSH commands via sshpass intermittently failed with Permission denied throughout the project. Certain quoting styles and command chaining triggered failures. Workaround: vary quoting, split compound commands, retry.

Remote Development Workflow

All development done from a MacBook connecting to the Jetson Nano over USB at 192.168.55.1. Code edited locally, deployed via rsync, logs pulled via ssh. Arduino sketches compiled and uploaded from the Nano using arduino-cli.


Key Lessons Learned

  1. Always verify physical wiring electrically — days of software debugging couldn't find a wire on the wrong pin
  2. Use the Arduino as a remote probe — when you can't use a multimeter, analog inputs make an excellent remote voltmeter
  3. Clear __pycache__ on every deploy — stale bytecode is invisible and causes maddening "my fix didn't work" moments
  4. Lock-free state management — single-writer-per-variable eliminates deadlocks in multi-threaded embedded systems
  5. Two-way bitmask sync — when two controllers can both change relay state, an explicit "change pending" flag prevents overwrite races
  6. Verify datasheets directly — don't trust secondary pinout documentation; always cross-reference the manufacturer's PDF
  7. Adjacent pin mistakes are common — pin 24 vs pin 25 on a 40-pin header is an easy error; always double-count