Having worked on a few multimodal apps (able to use speech, touch, and other modalities of input), I’ve learned a thing or two and want to speak on one idea that’s popped up.
Try modular design for voice behaviors
This one is tough, but I just started thinking maybe I can take a play from Brad Frost’s Atomic Design playbook – modularity. The most recent app I’ve worked on is relatively simple. Yet and still, we found problems we weren’t expecting and clear communication was one of them. In our case, we had a few recurring themes pop up about "How will voice handle this?" and "How will voice handle that?". Sometimes they were valuable points, sometimes they were redundant questions. But the problem I noticed wasn’t in the people asking, but in the definition of the system and the lacking documentation.
There was documentation available for case-by-case basis of "How does voice react on this screen?", but it could create instances of inconsistency that may not be found until it’s too late. So, my next project, I’m going to see if the principles of Atomic Design can be transposed on to voice/speech design and track lessons learned on that. I recognize voice should operate as Random Access Navigation, so this will definitely add to the complexity. But hey, this is why I love what I do!