The Scale Gap
Most IoT proofs-of-concept work fine at 100 devices. The architecture decisions that seemed fine at 100 devices become production crises at 100,000.
After managing platforms with 10M+ connected devices, here are the decisions that matter most.
Decision 1: MQTT vs HTTP for Device Communication
MQTT wins for high-volume, low-power devices. Its persistent connection model eliminates the overhead of repeated TLS handshakes, and QoS levels give you semantic guarantees about message delivery.
HTTP is still appropriate for:
- Low-frequency reporting devices (< 1 message/minute)
- Devices that need request/response semantics
- Environments where MQTT broker management overhead isn't justified
Decision 2: Edge vs Cloud Processing
Every byte you send to the cloud costs latency and money. The questions to answer:
- What decisions need to happen in under 100ms? (Edge only)
- What data is high-volume but low-value for cloud? (Aggregate at edge)
- What requires historical context or ML inference? (Cloud or hybrid)
Modern IoT architectures run a rule engine at the edge that filters, aggregates, and routes — only sending what cloud needs.
Decision 3: Device Identity and Security
The most under-engineered aspect of IoT platforms. Every device needs a unique identity that:
- Cannot be cloned (hardware attestation or secure element)
- Can be revoked without a firmware update
- Has clearly scoped permissions (a temperature sensor shouldn't publish to motor control topics)
Decision 4: OTA Update Architecture
Your OTA system is part of your security posture. It must support:
- Delta updates (not full firmware)
- Rollback capability
- Staged rollouts with automatic halt on error rate threshold
- Cryptographic signature verification
The teams that get this wrong discover it when a botched OTA update bricks 50,000 field devices simultaneously.