Ingececkt
This commit is contained in:
@@ -0,0 +1,318 @@
|
||||
# ESP32 Android Auto Head Unit
|
||||
|
||||
A DIY Android Auto wireless head unit built on the **ESP32-S3** (WT32-SC01 Plus), written entirely in **Rust**.
|
||||
|
||||
Implements the Android Auto WiFi protocol from scratch: TCP connection, TLS handshake, protobuf service discovery, video channel (H.264 decode), touch input, navigation events, and sensor reporting — all running on a $20 microcontroller.
|
||||
|
||||
## Demo
|
||||
|
||||
The ESP32 hosts a WiFi AP. The phone joins it and connects via TCP on port 5277. Android Auto renders to the 480×320 LCD with touch input support.
|
||||
|
||||
```
|
||||
Phone (Android Auto) ──WiFi──► ESP32-S3 AP ──I80 bus──► 480×320 LCD
|
||||
◄──touch── FT6336U ◄──I2C──┘
|
||||
```
|
||||
|
||||
## Hardware
|
||||
|
||||
| Component | Details |
|
||||
|---|---|
|
||||
| Board | [WT32-SC01 Plus](https://www.waveshare.com/wiki/WT32-SC01-Plus) ($20) |
|
||||
| SoC | ESP32-S3R8 — dual-core Xtensa LX7 @ 240MHz |
|
||||
| RAM | 512KB SRAM + 2MB PSRAM (quad, 80MHz) |
|
||||
| Flash | 16MB (QIO, 80MHz) |
|
||||
| Display | ST7796 480×320 LCD, I80 8-bit parallel bus @ 40MHz |
|
||||
| Touch | FT6336U capacitive, I2C @ 400KHz |
|
||||
| WiFi | 802.11 b/g/n 2.4GHz (built-in) |
|
||||
|
||||
### Pin Assignments
|
||||
|
||||
| Function | GPIO |
|
||||
|---|---|
|
||||
| LCD D0–D7 | 9, 46, 3, 8, 18, 17, 16, 15 |
|
||||
| LCD WR | 47 |
|
||||
| LCD DC | 0 |
|
||||
| LCD RST | 4 |
|
||||
| Backlight | 45 |
|
||||
| Touch SDA | 6 |
|
||||
| Touch SCL | 5 |
|
||||
|
||||
## Features
|
||||
|
||||
### Build Modes
|
||||
|
||||
| Mode | Flag | Description |
|
||||
|---|---|---|
|
||||
| **Full Video** | *(default)* | H.264 decode + downscale 800×480 → 480×320 + display (~3-5 fps) |
|
||||
| **Crop Video** | `--crop` | Center-crop 480×320 from 800×480, no scaling (faster conversion) |
|
||||
| **Nav-Only** | `--nav-only` | Text-only turn-by-turn navigation, no video decode. PNG turn arrows. |
|
||||
|
||||
### Protocol Implementation
|
||||
|
||||
- **Version handshake** — negotiates protocol version with phone
|
||||
- **TLS 1.2** — mbedtls with hardware AES/SHA acceleration
|
||||
- **Service discovery** — advertises 9 channels (control, input, sensor, video, 3× audio, AV input, navigation, media status)
|
||||
- **Video channel** — accepts H.264 stream, sends VideoFocusIndication, acks frames
|
||||
- **Touch input** — FT6336U → coordinate mapping → protobuf TouchEvent (PRESS/DRAG/RELEASE)
|
||||
- **Navigation** — receives TurnInstruction + DistanceUpdate events (works with OsmAnd, not Google Maps*)
|
||||
- **Sensors** — reports DRIVING_STATUS (unrestricted) and NIGHT_DATA
|
||||
- **Audio** — stubs: accepts setup, discards audio data (no DAC/I2S output)
|
||||
- **mDNS** — advertises `_androidauto._tcp` for network discovery
|
||||
|
||||
\* *Google Maps renders navigation entirely in the video stream and doesn't send turn-by-turn data over the navigation channel. OsmAnd uses the standard Android Auto Navigation API.*
|
||||
|
||||
### Video Pipeline (Full Video Mode)
|
||||
|
||||
```
|
||||
Phone sends H.264 800×480 @ 30fps
|
||||
│
|
||||
▼
|
||||
TCP receive → TLS decrypt → protobuf parse → mpsc channel (depth 4)
|
||||
│
|
||||
▼
|
||||
Decode thread: esp_h264 SW decoder (tinyh264-based, dual-task)
|
||||
│ decode: ~100ms per 800×480 frame
|
||||
▼
|
||||
I420 → RGB565 strip conversion (dual-core: worker + main thread)
|
||||
│ 40-line strips, bilinear downscale 800×480 → 480×320
|
||||
▼
|
||||
DMA double-buffered to LCD (38.4KB × 2 staging buffers in internal SRAM)
|
||||
```
|
||||
|
||||
**Performance**: ~3-5 fps depending on scene complexity. The ESP32-S3's software H.264 decoder is the bottleneck — Espressif benchmarks show ~9 fps for 640×480 with dual-task mode. At 800×480 (Android Auto's minimum), expect ~8-9 fps raw decode throughput.
|
||||
|
||||
### Video Pipeline (Crop Mode)
|
||||
|
||||
Same as above but the I420 → RGB565 conversion copies the center 480×320 pixels 1:1 instead of downscaling. Eliminates bilinear interpolation overhead.
|
||||
|
||||
### Nav-Only Mode
|
||||
|
||||
No H.264 decoder. Receives navigation events via the AA navigation channel and renders:
|
||||
- Turn maneuver + direction (text)
|
||||
- Street name
|
||||
- Distance to next turn
|
||||
- ETA
|
||||
- PNG turn arrow image (decoded via miniz_oxide, scaled to 64×64)
|
||||
|
||||
Uses strip-based LCD rendering with bitmap font (5×7 base, scalable).
|
||||
|
||||
## Building
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- [Podman](https://podman.io/) (or Docker — change `sudo podman` to `docker` in `build.sh`)
|
||||
- USB serial access to the WT32-SC01 Plus (`/dev/ttyACM0` or `/dev/ttyUSB0`)
|
||||
- [espflash](https://github.com/esp-rs/espflash) for flashing + monitoring
|
||||
|
||||
The build uses the official `espressif/idf-rust:all_latest` container image which includes:
|
||||
- ESP-IDF v5.5.1
|
||||
- Rust toolchain for Xtensa (`esp` channel)
|
||||
- All ESP32-S3 build tools
|
||||
|
||||
### Build Commands
|
||||
|
||||
```bash
|
||||
# Full video mode (default)
|
||||
./build.sh
|
||||
|
||||
# Crop video mode (faster conversion, cropped view)
|
||||
./build.sh --crop
|
||||
|
||||
# Nav-only mode (no video, turn-by-turn text only)
|
||||
./build.sh --nav-only
|
||||
|
||||
# Build without flashing
|
||||
./build.sh --build-only
|
||||
|
||||
# Combine flags
|
||||
./build.sh --crop --build-only
|
||||
```
|
||||
|
||||
### Manual Build (without container)
|
||||
|
||||
```bash
|
||||
# Requires esp-idf-sys toolchain configured
|
||||
cargo build --release # full video
|
||||
cargo build --release --features crop-video # crop mode
|
||||
cargo build --release --features nav-only # nav-only
|
||||
```
|
||||
|
||||
### Flashing
|
||||
|
||||
```bash
|
||||
# Via build script (prompts after build)
|
||||
./build.sh
|
||||
|
||||
# Manual flash + monitor
|
||||
espflash flash target/xtensa-esp32s3-espidf/release/esp32-android-auto-nav --monitor
|
||||
|
||||
# Monitor only (after flashing)
|
||||
espflash monitor --port /dev/ttyACM0
|
||||
```
|
||||
|
||||
## Connecting a Phone
|
||||
|
||||
1. **Build and flash** the firmware to the WT32-SC01 Plus
|
||||
2. On the phone, **join the WiFi network**:
|
||||
- SSID: `ESP32-AA-HU`
|
||||
- Password: `androidauto123`
|
||||
3. Open **Android Auto** on the phone:
|
||||
- Go to Android Auto settings → enable Developer mode (tap version 10×)
|
||||
- Developer Settings → **Start head unit server**
|
||||
4. The ESP32 scans DHCP client IPs on port 5277 and connects automatically
|
||||
5. Alternatively, the ESP32 also listens on port 5277 for incoming connections
|
||||
|
||||
### Connection Flow
|
||||
|
||||
```
|
||||
ESP32 boots → WiFi AP starts → mDNS advertised → listening on :5277
|
||||
Phone joins WiFi → ESP32 connects to phone:5277 (or phone connects to ESP32:5277)
|
||||
→ Version handshake → TLS negotiation → Service discovery
|
||||
→ Video setup → VideoFocusIndication(FOCUSED) → Phone starts streaming
|
||||
→ Touch events sent back to phone → Video frames displayed
|
||||
```
|
||||
|
||||
### 4G Internet While Connected
|
||||
|
||||
The ESP32's DHCP server is configured to **not advertise a gateway or DNS**, so Android keeps using mobile data for internet while connected to the ESP32's WiFi for Android Auto.
|
||||
|
||||
For best results, on the phone enable: *Developer Options → Mobile data always active*
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
src/
|
||||
├── main.rs # Entry point, thread spawning, video decode/display loop,
|
||||
│ # touch polling, WiFi AP, connection cycle
|
||||
├── session.rs # Android Auto protocol session (message loop, dispatch)
|
||||
├── frame.rs # Wire protocol: frame read/write, TLS state (mbedtls)
|
||||
├── channels.rs # Channel descriptors, AV message parsing, video/audio/sensor frames
|
||||
├── control.rs # Control channel messages (version, TLS, ping, auth, shutdown)
|
||||
├── common.rs # Common channel messages (channel open request/response)
|
||||
├── decoder.rs # H.264 SW decoder (esp_h264 FFI), I420→RGB565 conversion
|
||||
├── display.rs # ST7796 LCD driver (I80 bus, DMA strip rendering)
|
||||
├── touch.rs # FT6336U capacitive touch driver (I2C)
|
||||
├── navigation.rs # Navigation event parsing (TurnInstruction, DistanceUpdate)
|
||||
├── config.rs # Head unit + WiFi configuration
|
||||
├── cert.rs # TLS certificate for Android Auto authentication
|
||||
├── mdns.rs # mDNS service advertisement (_androidauto._tcp)
|
||||
├── bluetooth.rs # BT protocol definitions (unused — ESP32-S3 has no BT Classic)
|
||||
└── esp_h264_bindings.h # C header for esp_h264 FFI bindgen
|
||||
|
||||
protobuf/
|
||||
├── Wifi.proto # Android Auto WiFi protocol messages
|
||||
└── Bluetooth.proto # Android Auto BT protocol messages (reference only)
|
||||
|
||||
build.sh # Container-based build script (Podman)
|
||||
build.rs # Build script: protobuf codegen + esp_h264 bindgen
|
||||
Cargo.toml # Rust dependencies + feature flags
|
||||
sdkconfig.defaults # ESP-IDF configuration (CPU, PSRAM, WiFi, TLS, H.264, etc.)
|
||||
partitions.csv # Flash partition table (4MB app partition)
|
||||
espflash.toml # Flash tool configuration
|
||||
rust-toolchain.toml # Xtensa Rust toolchain (esp channel)
|
||||
idf_component.yml # ESP-IDF component: espressif/esp_h264 v1.3.0
|
||||
```
|
||||
|
||||
## Architecture Details
|
||||
|
||||
### Threading Model
|
||||
|
||||
| Thread | Core | Stack | Purpose |
|
||||
|---|---|---|---|
|
||||
| Main | 0 | 16KB | WiFi AP, TCP listener, connection cycle, session protocol |
|
||||
| decode-display | 0/1 | 16KB | H.264 decode + strip conversion + DMA to LCD |
|
||||
| converter | 1 | 4KB | Dual-core strip helper (scale mode only) |
|
||||
| touch-poll | any | 4KB | FT6336U I2C polling @ 60Hz |
|
||||
| nav-ui | any | 8-16KB | Navigation event logging (video mode) or LCD rendering (nav-only) |
|
||||
|
||||
### Memory Layout
|
||||
|
||||
| Region | Size | Usage |
|
||||
|---|---|---|
|
||||
| Internal SRAM | ~416KB usable | DMA buffers (38.4KB×2), thread stacks, WiFi, mbedtls, FreeRTOS |
|
||||
| PSRAM | 2MB | H.264 decoder buffers (~576KB), LWIP buffers, large allocations |
|
||||
| Flash | 16MB | Firmware (~4MB partition), NVS, PHY calibration |
|
||||
|
||||
### Android Auto Protocol
|
||||
|
||||
The implementation follows the Android Auto WiFi protocol:
|
||||
|
||||
1. **Transport**: TCP on port 5277, then upgraded to TLS 1.2
|
||||
2. **Framing**: 4-byte header (channel ID, flags, length) + payload
|
||||
3. **Channels**: Multiplexed over single TCP connection, each with a numeric ID
|
||||
4. **Messages**: Protobuf-encoded, prefixed with 2-byte message type
|
||||
5. **Video**: H.264 baseline profile, 800×480 @ 30fps, requires periodic ack
|
||||
6. **Touch**: Timestamped (µs precision), mapped from display coords to AA video coords
|
||||
7. **Navigation**: Protobuf TurnInstruction + DistanceUpdate events
|
||||
|
||||
### Key Design Decisions
|
||||
|
||||
- **Strip-based rendering**: 40-line strips (38.4KB each) instead of full-frame buffers. Allows DMA double-buffering with only 76.8KB of internal SRAM instead of 300KB.
|
||||
- **No intermediate framebuffer**: I420→RGB565 conversion writes directly into DMA staging buffers. Zero-copy from decode to display.
|
||||
- **Drain-and-skip**: When frames queue up, older frames are discarded without decoding. Only the latest frame is decoded and displayed. This prevents the decoder from falling behind.
|
||||
- **Always FOCUSED**: The head unit always reports VideoFocusIndication(FOCUSED) to the phone. Reporting UNFOCUSED causes the phone to stop sending navigation data too.
|
||||
- **Unsolicited focus kick**: After video setup, an unsolicited VideoFocusIndication with `unrequested=true` is sent to prompt the phone to start streaming. Without this, the phone sends VideoFocusRequest but never StartIndication.
|
||||
- **Non-fatal video acks**: If ack writes fail (TCP buffer full, etc.), the error is logged but doesn't kill the session. The phone tolerates missed acks.
|
||||
- **DHCP without gateway/DNS**: Prevents Android from switching internet to WiFi.
|
||||
|
||||
## Configuration
|
||||
|
||||
### WiFi Settings
|
||||
|
||||
Edit [src/config.rs](src/config.rs):
|
||||
|
||||
```rust
|
||||
Self {
|
||||
ssid: "ESP32-AA-HU".into(),
|
||||
password: "androidauto123".into(),
|
||||
listen_port: 5277,
|
||||
}
|
||||
```
|
||||
|
||||
### sdkconfig Tuning
|
||||
|
||||
Key settings in [sdkconfig.defaults](sdkconfig.defaults):
|
||||
|
||||
| Setting | Value | Purpose |
|
||||
|---|---|---|
|
||||
| `ESP_DEFAULT_CPU_FREQ_MHZ` | 240 | Max CPU for decode performance |
|
||||
| `SPIRAM_SPEED_80M` | y | Max PSRAM bandwidth |
|
||||
| `ESP32S3_DATA_CACHE_64KB` | y | Maximize cache for PSRAM access |
|
||||
| `ESP_H264_DUAL_TASK` | y | Dual-core H.264 decode |
|
||||
| `ESP_H264_DECODER_IRAM` | y | Hot decoder code in IRAM (+22KB) |
|
||||
| `COMPILER_OPTIMIZATION_PERF` | y | -O2 for ESP-IDF C code |
|
||||
| `MBEDTLS_HARDWARE_AES` | y | Hardware AES acceleration |
|
||||
| `MBEDTLS_HARDWARE_SHA` | y | Hardware SHA acceleration |
|
||||
|
||||
## Limitations
|
||||
|
||||
- **~3-5 fps** in video mode — the ESP32-S3 software H.264 decoder is the bottleneck. Real head units use dedicated video decoder hardware.
|
||||
- **No audio output** — audio channels are accepted but data is discarded. Would need I2S + DAC/codec.
|
||||
- **No Bluetooth Classic** — ESP32-S3 only has BLE. Phone must manually join WiFi and start head unit server in developer mode.
|
||||
- **Google Maps** renders navigation in the video stream, not via the navigation channel. Use **OsmAnd** for turn-by-turn text in nav-only mode.
|
||||
- **800×480 minimum** — Android Auto protocol doesn't allow requesting lower than 480p resolution.
|
||||
- **Single WiFi client** — AP is configured for max 1 connection.
|
||||
|
||||
## Possible Improvements
|
||||
|
||||
- **Raspberry Pi Zero 2W proxy** — decode H.264 on RPi (hardware VideoCore decoder, ~1ms/frame), send pre-decoded RGB565 frames to ESP32 via SPI. Would achieve 30fps.
|
||||
- **Audio output** — add I2S DAC for media/nav audio. The audio channel stubs are already in place.
|
||||
- **WiFi Direct (P2P)** — proper AA wireless uses WiFi Direct, which doesn't disable phone's cellular. ESP-IDF supports WiFi P2P but adds complexity.
|
||||
- **ESP32-P4** — has a hardware H.264 decoder (25fps @ 640×480, 31fps dual-task). Would be a significant upgrade from SW decode.
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Crate | Version | Purpose |
|
||||
|---|---|---|
|
||||
| esp-idf-svc | 0.52 | ESP-IDF high-level services (WiFi, NVS, mDNS) |
|
||||
| esp-idf-hal | 0.46 | Hardware abstraction (I2C, GPIO) |
|
||||
| esp-idf-sys | 0.37 | Raw ESP-IDF FFI bindings |
|
||||
| protobuf | 3.7 | Protocol Buffers (AA protocol messages) |
|
||||
| anyhow | 1.0 | Error handling |
|
||||
| bitfield | 0.19 | Frame header bitfield parsing |
|
||||
| miniz_oxide | 0.7 | PNG inflate (nav-only mode, optional) |
|
||||
| espressif/esp_h264 | 1.3.0 | H.264 SW decoder (C component, tinyh264-based) |
|
||||
|
||||
## License
|
||||
|
||||
LGPL-3.0-or-later
|
||||
@@ -881,10 +881,12 @@ struct ConvertWork {
|
||||
|
||||
// SAFETY: Pointers are valid for the duration of the work item.
|
||||
// The main thread waits for `done_rx` before touching the buffers again.
|
||||
#[cfg(not(feature = "nav-only"))]
|
||||
unsafe impl Send for ConvertWork {}
|
||||
|
||||
/// Converter worker thread — sits on core opposite to the decode thread.
|
||||
/// Receives half-strip conversion jobs and signals completion.
|
||||
#[cfg(not(feature = "nav-only"))]
|
||||
fn converter_worker(
|
||||
rx: mpsc::Receiver<ConvertWork>,
|
||||
done_tx: mpsc::SyncSender<()>,
|
||||
@@ -918,6 +920,7 @@ fn converter_worker(
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(not(feature = "nav-only"))]
|
||||
fn decode_display_loop(decode_rx: mpsc::Receiver<Vec<u8>>, lcd: display::Display) {
|
||||
log::info!("Decode+display thread started (display every frame)");
|
||||
|
||||
|
||||
Reference in New Issue
Block a user