Ingececkt

This commit is contained in:
Anonymous
2026-03-17 08:32:29 +01:00
parent 91e3d36f52
commit 12b476f1fc
2 changed files with 321 additions and 0 deletions
+318
View File
@@ -0,0 +1,318 @@
# ESP32 Android Auto Head Unit
A DIY Android Auto wireless head unit built on the **ESP32-S3** (WT32-SC01 Plus), written entirely in **Rust**.
Implements the Android Auto WiFi protocol from scratch: TCP connection, TLS handshake, protobuf service discovery, video channel (H.264 decode), touch input, navigation events, and sensor reporting — all running on a $20 microcontroller.
## Demo
The ESP32 hosts a WiFi AP. The phone joins it and connects via TCP on port 5277. Android Auto renders to the 480×320 LCD with touch input support.
```
Phone (Android Auto) ──WiFi──► ESP32-S3 AP ──I80 bus──► 480×320 LCD
◄──touch── FT6336U ◄──I2C──┘
```
## Hardware
| Component | Details |
|---|---|
| Board | [WT32-SC01 Plus](https://www.waveshare.com/wiki/WT32-SC01-Plus) ($20) |
| SoC | ESP32-S3R8 — dual-core Xtensa LX7 @ 240MHz |
| RAM | 512KB SRAM + 2MB PSRAM (quad, 80MHz) |
| Flash | 16MB (QIO, 80MHz) |
| Display | ST7796 480×320 LCD, I80 8-bit parallel bus @ 40MHz |
| Touch | FT6336U capacitive, I2C @ 400KHz |
| WiFi | 802.11 b/g/n 2.4GHz (built-in) |
### Pin Assignments
| Function | GPIO |
|---|---|
| LCD D0D7 | 9, 46, 3, 8, 18, 17, 16, 15 |
| LCD WR | 47 |
| LCD DC | 0 |
| LCD RST | 4 |
| Backlight | 45 |
| Touch SDA | 6 |
| Touch SCL | 5 |
## Features
### Build Modes
| Mode | Flag | Description |
|---|---|---|
| **Full Video** | *(default)* | H.264 decode + downscale 800×480 → 480×320 + display (~3-5 fps) |
| **Crop Video** | `--crop` | Center-crop 480×320 from 800×480, no scaling (faster conversion) |
| **Nav-Only** | `--nav-only` | Text-only turn-by-turn navigation, no video decode. PNG turn arrows. |
### Protocol Implementation
- **Version handshake** — negotiates protocol version with phone
- **TLS 1.2** — mbedtls with hardware AES/SHA acceleration
- **Service discovery** — advertises 9 channels (control, input, sensor, video, 3× audio, AV input, navigation, media status)
- **Video channel** — accepts H.264 stream, sends VideoFocusIndication, acks frames
- **Touch input** — FT6336U → coordinate mapping → protobuf TouchEvent (PRESS/DRAG/RELEASE)
- **Navigation** — receives TurnInstruction + DistanceUpdate events (works with OsmAnd, not Google Maps*)
- **Sensors** — reports DRIVING_STATUS (unrestricted) and NIGHT_DATA
- **Audio** — stubs: accepts setup, discards audio data (no DAC/I2S output)
- **mDNS** — advertises `_androidauto._tcp` for network discovery
\* *Google Maps renders navigation entirely in the video stream and doesn't send turn-by-turn data over the navigation channel. OsmAnd uses the standard Android Auto Navigation API.*
### Video Pipeline (Full Video Mode)
```
Phone sends H.264 800×480 @ 30fps
TCP receive → TLS decrypt → protobuf parse → mpsc channel (depth 4)
Decode thread: esp_h264 SW decoder (tinyh264-based, dual-task)
│ decode: ~100ms per 800×480 frame
I420 → RGB565 strip conversion (dual-core: worker + main thread)
│ 40-line strips, bilinear downscale 800×480 → 480×320
DMA double-buffered to LCD (38.4KB × 2 staging buffers in internal SRAM)
```
**Performance**: ~3-5 fps depending on scene complexity. The ESP32-S3's software H.264 decoder is the bottleneck — Espressif benchmarks show ~9 fps for 640×480 with dual-task mode. At 800×480 (Android Auto's minimum), expect ~8-9 fps raw decode throughput.
### Video Pipeline (Crop Mode)
Same as above but the I420 → RGB565 conversion copies the center 480×320 pixels 1:1 instead of downscaling. Eliminates bilinear interpolation overhead.
### Nav-Only Mode
No H.264 decoder. Receives navigation events via the AA navigation channel and renders:
- Turn maneuver + direction (text)
- Street name
- Distance to next turn
- ETA
- PNG turn arrow image (decoded via miniz_oxide, scaled to 64×64)
Uses strip-based LCD rendering with bitmap font (5×7 base, scalable).
## Building
### Prerequisites
- [Podman](https://podman.io/) (or Docker — change `sudo podman` to `docker` in `build.sh`)
- USB serial access to the WT32-SC01 Plus (`/dev/ttyACM0` or `/dev/ttyUSB0`)
- [espflash](https://github.com/esp-rs/espflash) for flashing + monitoring
The build uses the official `espressif/idf-rust:all_latest` container image which includes:
- ESP-IDF v5.5.1
- Rust toolchain for Xtensa (`esp` channel)
- All ESP32-S3 build tools
### Build Commands
```bash
# Full video mode (default)
./build.sh
# Crop video mode (faster conversion, cropped view)
./build.sh --crop
# Nav-only mode (no video, turn-by-turn text only)
./build.sh --nav-only
# Build without flashing
./build.sh --build-only
# Combine flags
./build.sh --crop --build-only
```
### Manual Build (without container)
```bash
# Requires esp-idf-sys toolchain configured
cargo build --release # full video
cargo build --release --features crop-video # crop mode
cargo build --release --features nav-only # nav-only
```
### Flashing
```bash
# Via build script (prompts after build)
./build.sh
# Manual flash + monitor
espflash flash target/xtensa-esp32s3-espidf/release/esp32-android-auto-nav --monitor
# Monitor only (after flashing)
espflash monitor --port /dev/ttyACM0
```
## Connecting a Phone
1. **Build and flash** the firmware to the WT32-SC01 Plus
2. On the phone, **join the WiFi network**:
- SSID: `ESP32-AA-HU`
- Password: `androidauto123`
3. Open **Android Auto** on the phone:
- Go to Android Auto settings → enable Developer mode (tap version 10×)
- Developer Settings → **Start head unit server**
4. The ESP32 scans DHCP client IPs on port 5277 and connects automatically
5. Alternatively, the ESP32 also listens on port 5277 for incoming connections
### Connection Flow
```
ESP32 boots → WiFi AP starts → mDNS advertised → listening on :5277
Phone joins WiFi → ESP32 connects to phone:5277 (or phone connects to ESP32:5277)
→ Version handshake → TLS negotiation → Service discovery
→ Video setup → VideoFocusIndication(FOCUSED) → Phone starts streaming
→ Touch events sent back to phone → Video frames displayed
```
### 4G Internet While Connected
The ESP32's DHCP server is configured to **not advertise a gateway or DNS**, so Android keeps using mobile data for internet while connected to the ESP32's WiFi for Android Auto.
For best results, on the phone enable: *Developer Options → Mobile data always active*
## Project Structure
```
src/
├── main.rs # Entry point, thread spawning, video decode/display loop,
│ # touch polling, WiFi AP, connection cycle
├── session.rs # Android Auto protocol session (message loop, dispatch)
├── frame.rs # Wire protocol: frame read/write, TLS state (mbedtls)
├── channels.rs # Channel descriptors, AV message parsing, video/audio/sensor frames
├── control.rs # Control channel messages (version, TLS, ping, auth, shutdown)
├── common.rs # Common channel messages (channel open request/response)
├── decoder.rs # H.264 SW decoder (esp_h264 FFI), I420→RGB565 conversion
├── display.rs # ST7796 LCD driver (I80 bus, DMA strip rendering)
├── touch.rs # FT6336U capacitive touch driver (I2C)
├── navigation.rs # Navigation event parsing (TurnInstruction, DistanceUpdate)
├── config.rs # Head unit + WiFi configuration
├── cert.rs # TLS certificate for Android Auto authentication
├── mdns.rs # mDNS service advertisement (_androidauto._tcp)
├── bluetooth.rs # BT protocol definitions (unused — ESP32-S3 has no BT Classic)
└── esp_h264_bindings.h # C header for esp_h264 FFI bindgen
protobuf/
├── Wifi.proto # Android Auto WiFi protocol messages
└── Bluetooth.proto # Android Auto BT protocol messages (reference only)
build.sh # Container-based build script (Podman)
build.rs # Build script: protobuf codegen + esp_h264 bindgen
Cargo.toml # Rust dependencies + feature flags
sdkconfig.defaults # ESP-IDF configuration (CPU, PSRAM, WiFi, TLS, H.264, etc.)
partitions.csv # Flash partition table (4MB app partition)
espflash.toml # Flash tool configuration
rust-toolchain.toml # Xtensa Rust toolchain (esp channel)
idf_component.yml # ESP-IDF component: espressif/esp_h264 v1.3.0
```
## Architecture Details
### Threading Model
| Thread | Core | Stack | Purpose |
|---|---|---|---|
| Main | 0 | 16KB | WiFi AP, TCP listener, connection cycle, session protocol |
| decode-display | 0/1 | 16KB | H.264 decode + strip conversion + DMA to LCD |
| converter | 1 | 4KB | Dual-core strip helper (scale mode only) |
| touch-poll | any | 4KB | FT6336U I2C polling @ 60Hz |
| nav-ui | any | 8-16KB | Navigation event logging (video mode) or LCD rendering (nav-only) |
### Memory Layout
| Region | Size | Usage |
|---|---|---|
| Internal SRAM | ~416KB usable | DMA buffers (38.4KB×2), thread stacks, WiFi, mbedtls, FreeRTOS |
| PSRAM | 2MB | H.264 decoder buffers (~576KB), LWIP buffers, large allocations |
| Flash | 16MB | Firmware (~4MB partition), NVS, PHY calibration |
### Android Auto Protocol
The implementation follows the Android Auto WiFi protocol:
1. **Transport**: TCP on port 5277, then upgraded to TLS 1.2
2. **Framing**: 4-byte header (channel ID, flags, length) + payload
3. **Channels**: Multiplexed over single TCP connection, each with a numeric ID
4. **Messages**: Protobuf-encoded, prefixed with 2-byte message type
5. **Video**: H.264 baseline profile, 800×480 @ 30fps, requires periodic ack
6. **Touch**: Timestamped (µs precision), mapped from display coords to AA video coords
7. **Navigation**: Protobuf TurnInstruction + DistanceUpdate events
### Key Design Decisions
- **Strip-based rendering**: 40-line strips (38.4KB each) instead of full-frame buffers. Allows DMA double-buffering with only 76.8KB of internal SRAM instead of 300KB.
- **No intermediate framebuffer**: I420→RGB565 conversion writes directly into DMA staging buffers. Zero-copy from decode to display.
- **Drain-and-skip**: When frames queue up, older frames are discarded without decoding. Only the latest frame is decoded and displayed. This prevents the decoder from falling behind.
- **Always FOCUSED**: The head unit always reports VideoFocusIndication(FOCUSED) to the phone. Reporting UNFOCUSED causes the phone to stop sending navigation data too.
- **Unsolicited focus kick**: After video setup, an unsolicited VideoFocusIndication with `unrequested=true` is sent to prompt the phone to start streaming. Without this, the phone sends VideoFocusRequest but never StartIndication.
- **Non-fatal video acks**: If ack writes fail (TCP buffer full, etc.), the error is logged but doesn't kill the session. The phone tolerates missed acks.
- **DHCP without gateway/DNS**: Prevents Android from switching internet to WiFi.
## Configuration
### WiFi Settings
Edit [src/config.rs](src/config.rs):
```rust
Self {
ssid: "ESP32-AA-HU".into(),
password: "androidauto123".into(),
listen_port: 5277,
}
```
### sdkconfig Tuning
Key settings in [sdkconfig.defaults](sdkconfig.defaults):
| Setting | Value | Purpose |
|---|---|---|
| `ESP_DEFAULT_CPU_FREQ_MHZ` | 240 | Max CPU for decode performance |
| `SPIRAM_SPEED_80M` | y | Max PSRAM bandwidth |
| `ESP32S3_DATA_CACHE_64KB` | y | Maximize cache for PSRAM access |
| `ESP_H264_DUAL_TASK` | y | Dual-core H.264 decode |
| `ESP_H264_DECODER_IRAM` | y | Hot decoder code in IRAM (+22KB) |
| `COMPILER_OPTIMIZATION_PERF` | y | -O2 for ESP-IDF C code |
| `MBEDTLS_HARDWARE_AES` | y | Hardware AES acceleration |
| `MBEDTLS_HARDWARE_SHA` | y | Hardware SHA acceleration |
## Limitations
- **~3-5 fps** in video mode — the ESP32-S3 software H.264 decoder is the bottleneck. Real head units use dedicated video decoder hardware.
- **No audio output** — audio channels are accepted but data is discarded. Would need I2S + DAC/codec.
- **No Bluetooth Classic** — ESP32-S3 only has BLE. Phone must manually join WiFi and start head unit server in developer mode.
- **Google Maps** renders navigation in the video stream, not via the navigation channel. Use **OsmAnd** for turn-by-turn text in nav-only mode.
- **800×480 minimum** — Android Auto protocol doesn't allow requesting lower than 480p resolution.
- **Single WiFi client** — AP is configured for max 1 connection.
## Possible Improvements
- **Raspberry Pi Zero 2W proxy** — decode H.264 on RPi (hardware VideoCore decoder, ~1ms/frame), send pre-decoded RGB565 frames to ESP32 via SPI. Would achieve 30fps.
- **Audio output** — add I2S DAC for media/nav audio. The audio channel stubs are already in place.
- **WiFi Direct (P2P)** — proper AA wireless uses WiFi Direct, which doesn't disable phone's cellular. ESP-IDF supports WiFi P2P but adds complexity.
- **ESP32-P4** — has a hardware H.264 decoder (25fps @ 640×480, 31fps dual-task). Would be a significant upgrade from SW decode.
## Dependencies
| Crate | Version | Purpose |
|---|---|---|
| esp-idf-svc | 0.52 | ESP-IDF high-level services (WiFi, NVS, mDNS) |
| esp-idf-hal | 0.46 | Hardware abstraction (I2C, GPIO) |
| esp-idf-sys | 0.37 | Raw ESP-IDF FFI bindings |
| protobuf | 3.7 | Protocol Buffers (AA protocol messages) |
| anyhow | 1.0 | Error handling |
| bitfield | 0.19 | Frame header bitfield parsing |
| miniz_oxide | 0.7 | PNG inflate (nav-only mode, optional) |
| espressif/esp_h264 | 1.3.0 | H.264 SW decoder (C component, tinyh264-based) |
## License
LGPL-3.0-or-later
+3
View File
@@ -881,10 +881,12 @@ struct ConvertWork {
// SAFETY: Pointers are valid for the duration of the work item.
// The main thread waits for `done_rx` before touching the buffers again.
#[cfg(not(feature = "nav-only"))]
unsafe impl Send for ConvertWork {}
/// Converter worker thread — sits on core opposite to the decode thread.
/// Receives half-strip conversion jobs and signals completion.
#[cfg(not(feature = "nav-only"))]
fn converter_worker(
rx: mpsc::Receiver<ConvertWork>,
done_tx: mpsc::SyncSender<()>,
@@ -918,6 +920,7 @@ fn converter_worker(
}
}
#[cfg(not(feature = "nav-only"))]
fn decode_display_loop(decode_rx: mpsc::Receiver<Vec<u8>>, lcd: display::Display) {
log::info!("Decode+display thread started (display every frame)");