Accessible voice control survey software

Government institution - Custom software development

Weeks to delivery
HIPAA compliance
Device support

We build custom tech solutions that researchers can apply to large scale studies. In the past few years, we’ve designed and built bespoke platforms with functions such as:

  • “Smart surveys” that support hundreds of question pathways
  • Data export formats that meet some niche spec
  • Offline use that protects PII

But this study presented a new constraint: subjects who can’t see a screen due to visual impairment (VI).

The challenge

Think of the last time you used an app. Your sight likely drove the experience: where to click, when a field is selected, what page you are on all hinge on a visual concept of the screen.

And even if you can depend on users’ sense of sight, providing a pleasant e-survey is no small task. This type of app must accept copious data points, convey the subject’s progress, and ask follow-up questions based on answers.

We sought to preserve the rich interaction of a survey without such a key interface. Speech and sound afford a means of two-way contact with a user. We can give context by talking to a user and absorb context by listening to a user. But as the old adage goes, a picture is worth a thousand words: we lose detail when we convert sight to sound.

Our Assessment

The ARIA-WAI spec empowers web apps that support a Screen Reader. Screen Readers are built-in to phones and desktops; they speak the text on a page and emulate a mouse cursor. We sought to leave intact these functions that VI cohorts deftly use each day.

But ARIA-WAI apps still employ a full keyboard, on which a tablet survey may not be able to depend. Thus we designed three modes:

  1. Audio
  2. Single button
  3. Pure Screen Reader

In Audio Mode, the software uses Tensorflow.js to recognize user speech without sending any data off device. For free response questions, we transcribed speech on device via WebAssembly port of Whisper, a state of the art speech recognition model.

In Single Button mode, a user can press, hold, or “mash” one key to answer, skip, or bring up a menu.


This study is ongoing, and we don’t expect results for some time. So far we’ve tracked these benchmarks internally:

  1. >95% accuracy for spoken answers on first ask.
  2. By asking a user to confirm each answer, we achieve near-perfection.
  3. A modest 70% speed reduction when compared to a visual survey.
  4. Support for the vast majority of phones, PCs and Macs.
  5. Complete anonymity: all speech encoding occurs on device, while end-to-end encrypted results transmit to a single workstation.