Files
Anonymous 12b476f1fc Ingececkt
2026-03-17 08:32:29 +01:00

13 KiB
Raw Permalink Blame History

ESP32 Android Auto Head Unit

A DIY Android Auto wireless head unit built on the ESP32-S3 (WT32-SC01 Plus), written entirely in Rust.

Implements the Android Auto WiFi protocol from scratch: TCP connection, TLS handshake, protobuf service discovery, video channel (H.264 decode), touch input, navigation events, and sensor reporting — all running on a $20 microcontroller.

Demo

The ESP32 hosts a WiFi AP. The phone joins it and connects via TCP on port 5277. Android Auto renders to the 480×320 LCD with touch input support.

Phone (Android Auto) ──WiFi──► ESP32-S3 AP ──I80 bus──► 480×320 LCD
                      ◄──touch──  FT6336U ◄──I2C──┘

Hardware

Component Details
Board WT32-SC01 Plus ($20)
SoC ESP32-S3R8 — dual-core Xtensa LX7 @ 240MHz
RAM 512KB SRAM + 2MB PSRAM (quad, 80MHz)
Flash 16MB (QIO, 80MHz)
Display ST7796 480×320 LCD, I80 8-bit parallel bus @ 40MHz
Touch FT6336U capacitive, I2C @ 400KHz
WiFi 802.11 b/g/n 2.4GHz (built-in)

Pin Assignments

Function GPIO
LCD D0D7 9, 46, 3, 8, 18, 17, 16, 15
LCD WR 47
LCD DC 0
LCD RST 4
Backlight 45
Touch SDA 6
Touch SCL 5

Features

Build Modes

Mode Flag Description
Full Video (default) H.264 decode + downscale 800×480 → 480×320 + display (~3-5 fps)
Crop Video --crop Center-crop 480×320 from 800×480, no scaling (faster conversion)
Nav-Only --nav-only Text-only turn-by-turn navigation, no video decode. PNG turn arrows.

Protocol Implementation

  • Version handshake — negotiates protocol version with phone
  • TLS 1.2 — mbedtls with hardware AES/SHA acceleration
  • Service discovery — advertises 9 channels (control, input, sensor, video, 3× audio, AV input, navigation, media status)
  • Video channel — accepts H.264 stream, sends VideoFocusIndication, acks frames
  • Touch input — FT6336U → coordinate mapping → protobuf TouchEvent (PRESS/DRAG/RELEASE)
  • Navigation — receives TurnInstruction + DistanceUpdate events (works with OsmAnd, not Google Maps*)
  • Sensors — reports DRIVING_STATUS (unrestricted) and NIGHT_DATA
  • Audio — stubs: accepts setup, discards audio data (no DAC/I2S output)
  • mDNS — advertises _androidauto._tcp for network discovery

* Google Maps renders navigation entirely in the video stream and doesn't send turn-by-turn data over the navigation channel. OsmAnd uses the standard Android Auto Navigation API.

Video Pipeline (Full Video Mode)

Phone sends H.264 800×480 @ 30fps
    │
    ▼
TCP receive → TLS decrypt → protobuf parse → mpsc channel (depth 4)
    │
    ▼
Decode thread: esp_h264 SW decoder (tinyh264-based, dual-task)
    │  decode: ~100ms per 800×480 frame
    ▼
I420 → RGB565 strip conversion (dual-core: worker + main thread)
    │  40-line strips, bilinear downscale 800×480 → 480×320
    ▼
DMA double-buffered to LCD (38.4KB × 2 staging buffers in internal SRAM)

Performance: ~3-5 fps depending on scene complexity. The ESP32-S3's software H.264 decoder is the bottleneck — Espressif benchmarks show ~9 fps for 640×480 with dual-task mode. At 800×480 (Android Auto's minimum), expect ~8-9 fps raw decode throughput.

Video Pipeline (Crop Mode)

Same as above but the I420 → RGB565 conversion copies the center 480×320 pixels 1:1 instead of downscaling. Eliminates bilinear interpolation overhead.

Nav-Only Mode

No H.264 decoder. Receives navigation events via the AA navigation channel and renders:

  • Turn maneuver + direction (text)
  • Street name
  • Distance to next turn
  • ETA
  • PNG turn arrow image (decoded via miniz_oxide, scaled to 64×64)

Uses strip-based LCD rendering with bitmap font (5×7 base, scalable).

Building

Prerequisites

  • Podman (or Docker — change sudo podman to docker in build.sh)
  • USB serial access to the WT32-SC01 Plus (/dev/ttyACM0 or /dev/ttyUSB0)
  • espflash for flashing + monitoring

The build uses the official espressif/idf-rust:all_latest container image which includes:

  • ESP-IDF v5.5.1
  • Rust toolchain for Xtensa (esp channel)
  • All ESP32-S3 build tools

Build Commands

# Full video mode (default)
./build.sh

# Crop video mode (faster conversion, cropped view)
./build.sh --crop

# Nav-only mode (no video, turn-by-turn text only)
./build.sh --nav-only

# Build without flashing
./build.sh --build-only

# Combine flags
./build.sh --crop --build-only

Manual Build (without container)

# Requires esp-idf-sys toolchain configured
cargo build --release                     # full video
cargo build --release --features crop-video  # crop mode
cargo build --release --features nav-only    # nav-only

Flashing

# Via build script (prompts after build)
./build.sh

# Manual flash + monitor
espflash flash target/xtensa-esp32s3-espidf/release/esp32-android-auto-nav --monitor

# Monitor only (after flashing)
espflash monitor --port /dev/ttyACM0

Connecting a Phone

  1. Build and flash the firmware to the WT32-SC01 Plus
  2. On the phone, join the WiFi network:
    • SSID: ESP32-AA-HU
    • Password: androidauto123
  3. Open Android Auto on the phone:
    • Go to Android Auto settings → enable Developer mode (tap version 10×)
    • Developer Settings → Start head unit server
  4. The ESP32 scans DHCP client IPs on port 5277 and connects automatically
  5. Alternatively, the ESP32 also listens on port 5277 for incoming connections

Connection Flow

ESP32 boots → WiFi AP starts → mDNS advertised → listening on :5277
Phone joins WiFi → ESP32 connects to phone:5277 (or phone connects to ESP32:5277)
→ Version handshake → TLS negotiation → Service discovery
→ Video setup → VideoFocusIndication(FOCUSED) → Phone starts streaming
→ Touch events sent back to phone → Video frames displayed

4G Internet While Connected

The ESP32's DHCP server is configured to not advertise a gateway or DNS, so Android keeps using mobile data for internet while connected to the ESP32's WiFi for Android Auto.

For best results, on the phone enable: Developer Options → Mobile data always active

Project Structure

src/
├── main.rs          # Entry point, thread spawning, video decode/display loop,
│                    # touch polling, WiFi AP, connection cycle
├── session.rs       # Android Auto protocol session (message loop, dispatch)
├── frame.rs         # Wire protocol: frame read/write, TLS state (mbedtls)
├── channels.rs      # Channel descriptors, AV message parsing, video/audio/sensor frames
├── control.rs       # Control channel messages (version, TLS, ping, auth, shutdown)
├── common.rs        # Common channel messages (channel open request/response)
├── decoder.rs       # H.264 SW decoder (esp_h264 FFI), I420→RGB565 conversion
├── display.rs       # ST7796 LCD driver (I80 bus, DMA strip rendering)
├── touch.rs         # FT6336U capacitive touch driver (I2C)
├── navigation.rs    # Navigation event parsing (TurnInstruction, DistanceUpdate)
├── config.rs        # Head unit + WiFi configuration
├── cert.rs          # TLS certificate for Android Auto authentication
├── mdns.rs          # mDNS service advertisement (_androidauto._tcp)
├── bluetooth.rs     # BT protocol definitions (unused — ESP32-S3 has no BT Classic)
└── esp_h264_bindings.h  # C header for esp_h264 FFI bindgen

protobuf/
├── Wifi.proto       # Android Auto WiFi protocol messages
└── Bluetooth.proto  # Android Auto BT protocol messages (reference only)

build.sh             # Container-based build script (Podman)
build.rs             # Build script: protobuf codegen + esp_h264 bindgen
Cargo.toml           # Rust dependencies + feature flags
sdkconfig.defaults   # ESP-IDF configuration (CPU, PSRAM, WiFi, TLS, H.264, etc.)
partitions.csv       # Flash partition table (4MB app partition)
espflash.toml        # Flash tool configuration
rust-toolchain.toml  # Xtensa Rust toolchain (esp channel)
idf_component.yml    # ESP-IDF component: espressif/esp_h264 v1.3.0

Architecture Details

Threading Model

Thread Core Stack Purpose
Main 0 16KB WiFi AP, TCP listener, connection cycle, session protocol
decode-display 0/1 16KB H.264 decode + strip conversion + DMA to LCD
converter 1 4KB Dual-core strip helper (scale mode only)
touch-poll any 4KB FT6336U I2C polling @ 60Hz
nav-ui any 8-16KB Navigation event logging (video mode) or LCD rendering (nav-only)

Memory Layout

Region Size Usage
Internal SRAM ~416KB usable DMA buffers (38.4KB×2), thread stacks, WiFi, mbedtls, FreeRTOS
PSRAM 2MB H.264 decoder buffers (~576KB), LWIP buffers, large allocations
Flash 16MB Firmware (~4MB partition), NVS, PHY calibration

Android Auto Protocol

The implementation follows the Android Auto WiFi protocol:

  1. Transport: TCP on port 5277, then upgraded to TLS 1.2
  2. Framing: 4-byte header (channel ID, flags, length) + payload
  3. Channels: Multiplexed over single TCP connection, each with a numeric ID
  4. Messages: Protobuf-encoded, prefixed with 2-byte message type
  5. Video: H.264 baseline profile, 800×480 @ 30fps, requires periodic ack
  6. Touch: Timestamped (µs precision), mapped from display coords to AA video coords
  7. Navigation: Protobuf TurnInstruction + DistanceUpdate events

Key Design Decisions

  • Strip-based rendering: 40-line strips (38.4KB each) instead of full-frame buffers. Allows DMA double-buffering with only 76.8KB of internal SRAM instead of 300KB.
  • No intermediate framebuffer: I420→RGB565 conversion writes directly into DMA staging buffers. Zero-copy from decode to display.
  • Drain-and-skip: When frames queue up, older frames are discarded without decoding. Only the latest frame is decoded and displayed. This prevents the decoder from falling behind.
  • Always FOCUSED: The head unit always reports VideoFocusIndication(FOCUSED) to the phone. Reporting UNFOCUSED causes the phone to stop sending navigation data too.
  • Unsolicited focus kick: After video setup, an unsolicited VideoFocusIndication with unrequested=true is sent to prompt the phone to start streaming. Without this, the phone sends VideoFocusRequest but never StartIndication.
  • Non-fatal video acks: If ack writes fail (TCP buffer full, etc.), the error is logged but doesn't kill the session. The phone tolerates missed acks.
  • DHCP without gateway/DNS: Prevents Android from switching internet to WiFi.

Configuration

WiFi Settings

Edit src/config.rs:

Self {
    ssid: "ESP32-AA-HU".into(),
    password: "androidauto123".into(),
    listen_port: 5277,
}

sdkconfig Tuning

Key settings in sdkconfig.defaults:

Setting Value Purpose
ESP_DEFAULT_CPU_FREQ_MHZ 240 Max CPU for decode performance
SPIRAM_SPEED_80M y Max PSRAM bandwidth
ESP32S3_DATA_CACHE_64KB y Maximize cache for PSRAM access
ESP_H264_DUAL_TASK y Dual-core H.264 decode
ESP_H264_DECODER_IRAM y Hot decoder code in IRAM (+22KB)
COMPILER_OPTIMIZATION_PERF y -O2 for ESP-IDF C code
MBEDTLS_HARDWARE_AES y Hardware AES acceleration
MBEDTLS_HARDWARE_SHA y Hardware SHA acceleration

Limitations

  • ~3-5 fps in video mode — the ESP32-S3 software H.264 decoder is the bottleneck. Real head units use dedicated video decoder hardware.
  • No audio output — audio channels are accepted but data is discarded. Would need I2S + DAC/codec.
  • No Bluetooth Classic — ESP32-S3 only has BLE. Phone must manually join WiFi and start head unit server in developer mode.
  • Google Maps renders navigation in the video stream, not via the navigation channel. Use OsmAnd for turn-by-turn text in nav-only mode.
  • 800×480 minimum — Android Auto protocol doesn't allow requesting lower than 480p resolution.
  • Single WiFi client — AP is configured for max 1 connection.

Possible Improvements

  • Raspberry Pi Zero 2W proxy — decode H.264 on RPi (hardware VideoCore decoder, ~1ms/frame), send pre-decoded RGB565 frames to ESP32 via SPI. Would achieve 30fps.
  • Audio output — add I2S DAC for media/nav audio. The audio channel stubs are already in place.
  • WiFi Direct (P2P) — proper AA wireless uses WiFi Direct, which doesn't disable phone's cellular. ESP-IDF supports WiFi P2P but adds complexity.
  • ESP32-P4 — has a hardware H.264 decoder (25fps @ 640×480, 31fps dual-task). Would be a significant upgrade from SW decode.

Dependencies

Crate Version Purpose
esp-idf-svc 0.52 ESP-IDF high-level services (WiFi, NVS, mDNS)
esp-idf-hal 0.46 Hardware abstraction (I2C, GPIO)
esp-idf-sys 0.37 Raw ESP-IDF FFI bindings
protobuf 3.7 Protocol Buffers (AA protocol messages)
anyhow 1.0 Error handling
bitfield 0.19 Frame header bitfield parsing
miniz_oxide 0.7 PNG inflate (nav-only mode, optional)
espressif/esp_h264 1.3.0 H.264 SW decoder (C component, tinyh264-based)

License

LGPL-3.0-or-later