You On AI Field Guide · Multimodal AI Interfaces The You On AI Field Guide Home
Txt Low Med High
CONCEPT

Multimodal AI Interfaces

The emerging class of AI systems that accept sketches, gestures, and spatial manipulation alongside natural language — the logical continuation of the interface revolution Tversky's framework predicts.
Current AI collaboration runs primarily through text, which accepts a subset of human spatial thinking — the subset that natural language can encode through prepositions, narrative, and metaphor. Multimodal interfaces extend the acceptable input to include the channels that text discards: the sketch that externalizes a spatial model, the gesture that shows what words cannot tell, the spatial manipulation that demonstrates a relationship without describing it. Tversky's framework predicts that multimodal systems, when mature, will produce cognitive benefits beyond what text-only systems can achieve — not because they are more convenient but because they access channels of thinking that text systematically suppresses.
Multimodal AI Interfaces
Multimodal AI Interfaces

In The You On AI Field Guide

The first generation of multimodal AI — vision-language models that accept images as input, sketch-to-code systems, gesture-aware interfaces — demonstrates the principle but not yet its full potential. Current systems mostly translate multimodal input into internal text representations before processing, which reintroduces the representational mismatch at a different layer. A true multimodal architecture would preserve spatial

← Home 0%
CONCEPT Book →

Keep reading with YOU ON AI

Unlock the full book, field guide, and 555-thinker library. If you have a book code, register now — it takes a minute.

Register with book code Sign in