ON THIS PAGE
• Cheat sheet
• I/O Loop diagram
• A dive into each modality:
• Visual
• Auditory
• Physical
These are the three major categories of input and output in computing: visual, audio, and physical. Combined, the I/O pairing is known as a modality, though the practical details are of course a bit different for how computers take in input compared to humans. We'll go over this, as well as the pros and cons for each category.
VISUAL
Visual includes gestures, poses, graphics, text, UI, screens, and animations.
AUDITORY
Auditory includes music, tones, sound effects, wake works and natural language processing, and diagetic and real-world sounds.
PHYSICAL
Physical includes hardware, physical affordances like buttons and edges, haptics, and any time a user interacts with real object.
Why do we care about these modalities? Because our users have different needs at different times. Needs can be situational, tied to their level of mastery, or simply personal prefererence. It is our responsibility to think this through for them.
DL;DR Cheat Sheet: Best Practices
VISUAL
AUDITORY
PHYSICAL
Interactions that 'feel' best are those that make use of all the modalities that are available. You can map out any action to this basic action loop. For every stage of the user journey in a feature, you should decide what modalities you want to use.
In the second wheel here, I've mapped a simple game controller action. The affordance to push the button is mostly visual (pink), with some sound cues (green). The input is simply the button press: all physical (blue). The feedback includes haptic feedback, sounds, and visual confirmation.
Keep in mind this loop will look different at different stages in the user’s life cycle and familiarity level with the experience, or ability level, or even on different devices. Start thinking about all of your design decisions this way, and your app's user experience will vastly improve. Promise.
These two top sections are enough to give you the tools to make good decisions for your apps. For more resources, check out the Object States Grid, or the Input Mapping Guide. If you'd like a deeper dive into the pros and cons of each modality with examples, read on.
Any phone app is a fine example of a visual output. Here's some things to note:
Visuals have a lot of advantages, which is why they are the most popular output type by far for most abled users. Here's a rundown of advantages:
Of course, there are disadvantages, too:
Auditory-first inputs and outputs are only recently becoming a common type of conventional computing. Surgery rooms are a great real-life example: because the surgeon's hands and head must remain focused, the room communicates by talking. Restaurant kitchens are similar. Here's some things to note:
Audio is something of the unsung hero of computer outputs, although funnily enough, it seems humans have always assumed voice input would become the primary way we communicate with our devices—all the way back to the Jetsons, Star Trek, and even Metropolis. Here's a rundown of advantages:
Of course, there are disadvantages, too:
In a sense, everything we've talked about here is physical. Light hitting your eyes, sound hitting your ears; these are physical qualities, too. But our nervous systems are wired to process them seperately—hence the five senses.
Instruments are a great example of conventional physical inputs and outputs, though most mechanical objects fit the bill: everything from vehicles to kitchen drawers.
Some things to note about the instrument here:
Lately, with the rise of capacitive touchscreen devices, physical inputs have gotten a serious downgrade over the less sexy, but more practical computer peripherals like mouse & keyboard. Here are the advantages to physical modalities:
Can be very fast & precise as an input, especially peripherals that allow for small, supported finger movements, like keyboards or game controllers.
Of course, there are disadvantages: