The Open Voice Network 1 April 2023
For the latest, detailed information on the work of the Open Voice Network, please visit our website at openvoicenetwork.org and the Open Voice Network GitHub Repository at github.com/open-voice-network/docs.
This report strives to identify the opportunities to address and the pitfalls to avoid when engaging patients and adding additional data points to remote patient monitoring and management (RPM) applications through voice technology and conversational AI.
Topics discussed include:
- Current practices,
- Future opportunities and challenges,
- Critical ethical questions that are unique to voice and voice analysis, and
- A path forward for technologists and practitioners.
Today, four major market transitions are intersecting to accelerate use of voice technology and conversational AI as the next new important complementary element to remote patient monitoring and management (RPM):
- Shortage of medical professionals,
- Aging population,
- Development of conversational AI, and
- Rapidly advancing research in the use of voice technologies for clinical purposes, including analysis and early diagnosis.
New research by Juniper Research (Remote Patient Monitoring 2023) forecasts the adoption of RPM within both developing markets and established markets to be $72 billion in 2023, $110 billion in 2027 with 2023-2027 growth at 54%.
The changing landscape for RPM is not only seen in the voice-enabled applications being developed and deployed, but in the U.S. investment in some biometric databases that go beyond flagging physiological and psychological conditions to diagnosing them.
The U.S. Centers for Medicare and Medicaid Services’ (CMS) includes Remote Therapeutic Monitoring reimbursement in the proposed 2023 Medicare Physician Fee Schedule released in 2022. Nixon Gwilt Law (NGL) notes they “have made recommendations to revisit the overall code structure to better align RTM with Remote Physiologic Monitoring and other care management services” (O’Connor, K. et al, 2022).
The sci-fi film, Ad Astra, opens with Brad Pitt as astronaut Roy McBride speaking to his spaceship’s computer:
“I’m calm, steady. I slept well. 8.2 hours. No bad dreams. I am ready to go.”
“Your psychological evaluation has been approved,” the ship’s voice assistant responds.
While such a comprehensive degree of remote psychological evaluation has yet to be achieved in 2023, the evolution of voice technology, audio signal analysis, and natural language processing/understanding (NLP/NLU) methods has opened the way for the next generation of health care applications that facilitate digital-to-human interaction that mimics human speech and conversation. The aim is to strengthen accessible options for improving health, well-being, and risk assessment.
The opportunities to improve personalized remote patient monitoring and management (RPM) and the continuum of care in the 21st Century are significant. The new generation of healthcare applications share critical new data points through biomarkers and other biometric tools with clinicians to support patients and caregivers where, when and how it is needed. Promising applications enable:
1. Interactive, Automated Patient Engagement
- brief verbal diary entries
- conversational apps with patient/caregiver instructions and virtual coaches
- data-gathering/reporting tools that use voice technology to gather and report new data and interact with wearable devices that track fitness and health conditions through blood pressure, glucose, heart rate, and digital ECG/EKG monitors
2. Inclusive patient engagement across multiple languages, dialects, and speech differences
3. Accessible patient engagement with underserved populations, especially rural, impoverished, and language-minority, and
4. Precision-medicine support through AI-enabled voice biomarker analysis to flag changes in a patient’s voice -- such as tone, speed, duration, volume, and pitch – associated with specific physical or mental health conditions (Fagherazzi et al., 2021).
The Center for Connected Health Policy (CCHP), a program under the Public Health Institute established in 2008 with funding from the California Health Care Foundation (cchpca.org) defines RPM as “Personal health and medical data collection from an individual in one location, which is transmitted via electronic communication technologies to a provider in a different location for use in care and related support.”
The CCHP notes that RPM programs are a form of telehealth that “...can also help keep people healthy, allow older and disabled individuals to live at home longer and avoid having to move into skilled nursing facilities. RPM can also serve to reduce the number of hospitalizations, readmissions, and lengths of stay in hospital – all of which help improve quality of life and contain costs (CCHP.org, 2010-2023).”
Since the arrival of Apple’s Siri and Amazon Alexa, people have gotten used to communicating with digital devices through natural language. The 2022 Voice Consumer Index, published by Vixen Labs in cooperation with the Open Voice Network, found that in US, UK, and German markets, most people use voice assistants every day. The authors note: “In all three markets, the percentage of people who use voice assistants daily has approximately doubled since last year.”
As awareness of conversational AI use cases spread, companies such as Orbita, Macadamian, and others experimented and demonstrated the potential of using voice to create value in healthcare settings — starting with patient service, scheduling, and provision of basic advice.
This early wave of innovation inspired wider exploration. Vocalytics AI software leaned into the digital assistants’ listening capabilities to listen to ambient noise, analyze events and trends, and send real-time alerts to patients and caregivers with their permission. This software is currently being tested in selected East Coast hospitals.
The early pandemic years underscored the value of health care experiences such as telemedicine to continue to care for patients and limit their exposure to COVID-19. It was a logical leap from viewing telemedicine beyond the in-person patient-to-provider relationship to include communication with digital devices for data to monitor chronic conditions, or for other factors.
According to new research (Next Move Strategy Consulting, 2022), the RPM market size was valued at USD 36.45 billion in 2021 and was predicted to reach USD 77.56 billion by 2030.The global RPM market by product and end user is projected to reach USD 175.2 billion by 2027 from USD 53.6 Billion in 2022 during the forecast period. This takes into account providers such as hospitals, clinics, home health care, patients, and payers.
The rising engagement of the medical community to use voice-based RPM to expand patient access to effective, affordable care, has reinforced funding for research of voice biomarker analysis tools,
The following convergence of compelling factors has made the need to improve and expand RPM to support affordable health care and well-being more equitably:
- Shortage of Physicians in the United States
- A 2021 report published by the Association of American Medical Colleges projects the U.S. could see an estimated shortage of between 37,800 and 124,000 physicians by 2034, including shortfalls in both primary and specialty care (Association of American Medical Colleges, 2021)
- Shortage of health care workers worldwide
- The World Health Organization (WHO) estimates that there will be a shortage of 15 million health professionals in hospitals, nursing homes, and care centers worldwide by 2030 (Stroobants, J. et al., 2022).
- Disparity in health care delivery – underserved populations
- “5 Vulnerable Populations in Healthcare” published in the American Journal of Managed Care (Joszt 2018) cites the following as underserved populations:
1. Chronically ill and disabled;
2. Low-income and/or homeless individuals;
3. Certain geographical communities, such as rural areas of the US and Native Americans living on
reservations;
4. LGBTQ+ population
5. The very young and very old.
- “5 Vulnerable Populations in Healthcare” published in the American Journal of Managed Care (Joszt 2018) cites the following as underserved populations:
- Medically underserved Areas/populations (MUA/P) in the US
- MUA/P are designated by the Health Resources & Service Administration as having an insufficient number of primary care providers, high infant mortality, high poverty or a high elderly population. A report published by the University of Medicine and Health Sciences cites four main communities comprising areas in need of doctors (Harrah 2020):
1. Migrant and Hispanic populations
2. Rural communities
3. African American inner-city areas
4. Asian American and Pacific Islander inner-city areas.
- MUA/P are designated by the Health Resources & Service Administration as having an insufficient number of primary care providers, high infant mortality, high poverty or a high elderly population. A report published by the University of Medicine and Health Sciences cites four main communities comprising areas in need of doctors (Harrah 2020):
- Aging Populations in the Developed World
- In October 2022, the World Health Organization reported that between 2015 and 2050 the proportion of the world’s population over 60 years will nearly double from 12% to 22% (World Health Organization, 2022).
- Increase in Chronic Conditions
- In the U.S. alone, 86% of healthcare costs alone are going toward an epidemic of chronic health conditions (Holman 2020).
The 2001 publication by the Institute of Medicine (US) Committee, Quality of Health Care in America Crossing the Quality Chasm: A New Health System for the 21st Century, emphasized that “health care should be supported by systems that are carefully and consciously designed to produce care that is safe, effective, patient-centered, timely, efficient, and equitable” and asserted that information technology “ has enormous potential to improve the quality of health
care with regard to all six of the aims set forth (Institute of Medicine (US) Committee on Quality of Health Care in America, 2001).”
In response to these factors, and boosted by the need to address the COVID-19 pandemic in 2020, RPM/Telehealth has grown and is now poised to include audible and textual implementations of voice technology often referred to as conversational AI.

New research by Juniper Research (Remote Patient Monitoring 2023) forecasts the adoption of RPM within both developing markets and established markets to be $72 billion in 2023, $110 billion in 2027 with 2023-2027 growth at 54%.
The use of natural language technologies can offer timely assistance to underserved populations, shorten diagnostic wait times, enable remote identification of leading indicators of mental and physical illness, and enable ongoing patient engagement.
A review of current practices suggests four primary use cases for voice technology in remote patient care – all which can lead to better patient outcomes and greater clinical efficiencies:
- Interactive, automated patient engagement – from patient service to automated reminders and virtual conversations
- Inclusive engagement – improving outcomes across languages, cultures
- Accessible engagement –improving outcomes in lower-income and rural populations
- Biomarker precision medicine support – providing vital physiological data points previously unavailable from wearable technologies.
The use of natural language technologies is driving these benefits:
- For patient populations, conversational AI is inclusive and accessible, using technology to engage and serve patients that most already own and use. (Pew Research Center, 2021). Conversational AI can be a tool for continuous care and evaluation of chronic conditions,
providing:
- Information on leading health indicators
- Automated reminders
- Instruction/education
- Caregiver coordination and decision support
- Companionship
- A trustworthy source of 24/7 information
- For clinicians/providers, conversational AI provides the ability to capture patient-clinician interaction in both direct and
automated connection with remote patients. This enables:- Acoustic analysis and sentiment analysis of voice biomarkers to monitor key health indicators for awareness/diagnosis and continuously evaluate chronic conditions
- Hands-free clinical care
- Fast EHR treatment data entry (130 WPM vs. 40 WPM)
- Accurate transcription
- Next-step action coordination
- For payers, conversational AI saves time and money:
- The use of consumer technology that is already owned and operated
- Preventive care enabling early detection of physical and mental health changes and guidance on the best course of intervention, which can avoid unnecessary emergency room visits
- The ability to automate clinician activity.
“When a voice experience is built well, it can be a democratizing tool,” said Dr. Yaa Kumah-Crystal, MD, MPH, MS, Assistant Professor of Biomedical Informatics and Pediatric Endocrinology at Vanderbilt University Medical Center whose research focuses on communication and documentation in health care and developing strategies to improve workflow and patient care delivery.
Vanderbilt’s “Hospital at home” program provides RPM as an affordable option to those undergoing short-term treatment that would otherwise be in a dedicated medical facility, such as those requiring administration of IV antibiotics.
Voice technology can be used to implement more accessible applications that support patients and caregivers with instructions, data, and the ability to better manage their own health.
Through natural language exchange or brief verbal diary entries, patient data can be collected and analyzed with artificial intelligence for content, e.g., sentiment and biomarkers, for health and well-being implications. These data points inform a more effective patient mental/physical health management program.
Wolters Kluwer Health provides over 20 different Interactive Voice Response (IVR)-based interface programs, called EmmiJourneys, into which care teams can enroll patients. EmmiJourneys “feature mostly post-discharge topics, with the main goal of preventing hospital readmissions, but also educating patients and changing behaviors,” explains Freddie Feldman, Director, Voice & Conversational Interfaces at Wolters Kluwer Health, adding “We’re adding episodes of care and chronic care journeys in the future.”

A digital Diabetes coach designed to draw on more physical data, focuses on behaviors that promote healthier outcomes over time started out by being more engaging than using a logbook to track blood glucose levels. Particularly when the patient is a child, it can also reinforce attitudes and actions that are life-changing.
In 2017, as Amazon Alexa supported more voice technology skills/apps, Canadian software engineering firm, Macadamian (acquired by Emids in early 2021), created My Diabetes Coach to help develop and reward necessary lifelong compliance habits that could mitigate the health complications that might otherwise happen to a child with Diabetes. Back in 2018, it was powered by HealthWise. Timon LeDain, VP, Customer Solutions, Emids, said: “We are currently working with clinicians to streamline the waitlist for patients suffering from BPPV [benign paroxysmal positional vertigo, the most common form of dizziness.] The persona is on a 10-month waiting list to see a specialist. If a chatbot can direct a typical BPPV sufferer to be diagnosed and treated quickly by a physiotherapist knowledgeable in the practice, that would remove them from that waiting list and allow those specialty cases (like dizziness caused by a tumor) to be seen sooner.
Stay tuned on ChatGPT and other generative AI enhancements to RPM. The possibilities for specialized health coaches, with a bias toward long-term health-promoting outcomes, are promising. Training can make or break these apps. Design begins with intent for the patient and only verified-as-accurate and beneficial content.
Conversation-driven technology is at the heart of remote communication, engagement, and companionship with many seniors (McTear, M. et al 2022).
In their recent conference paper, “Empowering Well-Being Through Conversational Coaching for Active and Healthy Ageing,” Michael McTear and his colleagues present a scalable approach involving a virtual coach for seniors being developed in a 3-year joint European and Japanese funded research project called e-VITA.
The virtual coach, built on the open-source RASA framework, aims to enable older adults to better manage their own health proactively by providing individualized profiling and personalized recommendations on daily activities that support healthy aging in critical areas:
- cognition
- physical activity
- mobility
- mood
- social interaction
- leisure, and
- spirituality.
The virtual coach captures and monitors data using unobtrusive sensors and emotion analysis used within dialogues to assess the user’s situation, such as location, activity, and vital state, from heart rate and mood to surrounding temperature. The virtual coach remotely provides support through natural interactions with 3D-holograms, emotional objects, or robotic technologies using multimodal and spoken dialogue technology, advanced knowledge graph representations, and data fusion (McTear, p 258).
Laurie M. Orlov, Principal Analyst, Aging and Health Technology Watch, has long tracked the evolution of voice technology and its promise for assisting older adults as they age. In her recent report, “The state of voice-AI and older adults 2022,” she noted that in addition to substantial improvements in voice technology software required to enable the older adult population to fully accept and benefit from the more complex capabilities of voice assistance “...users will expect clearer privacy protections to support greater personalization and smarter, more conversational interactions with voice assistance. Conversational AI and voice assistants will help mitigate social isolation among older adults and enable them to securely access health and other personal information easily in the privacy of their home (Orlov, L., 2022).”
Enabling RPM for multiple languages, dialects, and accents expands the clinician’s ability to offer instructions and generally engage a wider range of patients in caring for their health.
Extending conversational AI to help detect signs of depression through a “Befriending Model,” German start-up, Bolo (https://boloapp.de to launch “soon”) is training its emotion analysis app in multiple languages, starting with English, German, Spanish, Hindi, and Bengali. It’s the important step of including more people in the benefits of RPM.
The app facilitates two-way communication, sending and receiving voice messages. The voice notes are labeled according to the emotion levels of their voice.
“We are using Speech Emotion Recognition (SER), an act of recognizing human emotion and affective states from speech,” explains Bolo’s founder.
- This is based on a voice’s ability to reflect underlying emotion through tone and pitch.
- Helps the listeners understand what kind of measures they need to take to soothe the speaker and avoid any kind of misunderstanding.
- The app provides guidance on the various mental health exercises based on the speaker’s mental state (after consulting with reputable psychologists and life coaches)”.
Development of language, accent, dialect, and difference support has come a long way, but given the scope it is still a work in progress with a long way to go.
As of April 2021, Amazon Alexa supported 8 languages: English, French, German, Hindi, Italian, Japanese, Portuguese (Brazilian) and Spanish. According to Google Support, in that same timeframe a Google Assistant device supported Danish, Dutch, English, French, German Hindi, Italian, Japanese, Korean, Norwegian, Spanish, and Swedish. Apple’s Siri supported 21 languages: Arabic, Cantonese, Danish, Dutch, Finnish, English, French, German, Hebrew, Italian, Japanese, Korean, Malay, Mandarin, Norwegian, Portuguese (Brazil), Russian, Spanish, Swedish, Thai, and Turkish.
Siri also tops the progress made in supporting a wide variety of dialects, including for Chinese, Dutch, English, French, German, Italian, and Spanish.
The potential for inclusion through language options is being supported by organizations such as the Bill and Melinda Gates Foundation.
In November 2021, publisher Wolters Kluwer Health announced it would use the Microsoft Azure cloud platform to deliver tools for patient education. Virtual care companies and patient care management also have access.
About 40% of Wolters Kluwer Health’s EmmiJourneys series are offered in both English & Spanish, with the remainder of those EmmiJourneys being released in Spanish during this coming year.
The virtual assistant experience is a priority, with choices and their ethical ramifications considered.
According to Wolters Kluwer Health’s Freddie Feldman:
“Since 2020 we’ve been also offering programs voiced by a Black voice actor, in an effort to better connect with patients of the Black community. My team is preparing to begin a study quantifying the efficacy of voice interfaces where the racial background of the VUI matches that of the patient.”
Good universal design offers choices for everyone. Great universal design tends to make RPM and other products more successful.
Voice applications easily gather and share data from wearable and other devices that track fitness and health conditions through blood pressure, glucose, heart rate, and digital ECG/EKG monitors. This makes them particularly useful for the health and well-being of those benefiting from proactive care and support, particularly for larger communities, such as actively aging and elderly, and chronic diseases.
Because it is flexible and portable, voice technology can be used in the healthcare facility or the patient’s home, nursing home, or assisted living facility.
As Anil Lewis, Executive Director, Jernigan Institute at National Federation of the Blind, said on a late February 2023 recording of the Open Voice Network’s “Future of Voice Podcast,”
“Inclusivity begins at the design and development stage. Good intentions are simply not good enough.”
This is just one of many organizations aiming to make people’s lives better. Their feedback on products designed for their specific constituency is invaluable.
Diversity within teams is the first step toward preventing societal harms, such as favoring one group arbitrarily over another (Li, 2020). Well-rounded teams can also minimize potential bias because of the range of experiences and perspectives that can be drawn from throughout the entire design process (Orduña, 2019). This is important, especially for speech-recognition tools.
Innovators who want to ensure the accessibility of RPM tools they are developing would do well to consult with organizations that serve specific populations, such as the National Federation of the Blind.
Ongoing work to expand disease-specific vocal biomarker databases through public and private research partnerships promises more accurate diagnosis and classification of psychiatric, cognitive, and physical changes through digital devices.
“If we can get this right, RPM might allow for patients to provide more data on how patients are doing,” said Dr. Yaa.
Conversation-based applications and digital voice-enabled biometric measurement tools can be used to detect changes in respiratory conditions, depression, Schizophrenia, anxiety, bipolar disorder, cancer, diabetes, Rheumatoid Arthritis; dementia/Alzheimer’s, multiple sclerosis and other neurodegenerative disorders; as well as unhealthy activity in the cardiovascular system, such as heart attack or stroke (Tracy et al., 2020; Fagherazzi et al., 2021; Mei et al, 2021; Kwon et al., 2022).
When paired with digital devices such as EKGs, ultrasounds, stethoscopes, data, audio and video, applications using conversational AI to track key health parameters remotely can help fill the gap between patient visits and help turn talk into recommendations for targeted therapeutic action outside of the clinical setting.
Especially for rural and underserved populations, accessing advanced RPM capabilities can facilitate more equitable patient and caregiver support and decision making leading to better outcomes and fewer unnecessary in-person visits to clinicians and costly emergency rooms.
Figure 1 Data that can be processed and analyzed from the sound of a voice.
Note. A graphic describing how voice technology can identify, analyze, and deduce sensitive personal information related to an individual’s identity, physical characteristics, physical health, mental health, socioeconomic status, location, emotions, intent, and more. By Open Voice Network, 2022
Voice technology implemented through applications and voice assistants that mimic conversation with humans on smartphones and other digital devices to track, explain, monitor and address fitness and certain health conditions had already gone mainstream by the time the COVID-19 virus began to spread worldwide.
Interaction with scripted bots was also common. During this time, the Centers for Disease Control and Prevention began delivering timely information and self-assessment mechanisms to the public using Microsoft’s text-based Healthcare Bot service (Yang et al., 2021). Deloitte’s “Connectivity & Mobile Trends 2021” survey,” noted that the use of devices, such as Apple Watch or FitBit (acquired by Google), were expanding beyond step and calorie counting to serve as a hub for gathering, explaining, and sharing medical information on conditions, such as heart health, sleep quality and duration, stress levels, possible COVID-19 symptoms, and some chronic health condition indicators (Deloitte, 2021).
The Apple Watch Series 7 was released in October 2021 with a larger face that made it easier to read the results of its blood-oxygen sensor, built-in ECG app, and heart rate notifications. It came with fall detection and could use conversational AI to phone for assistance. It is part of a class of smart digital devices, such as Fitus -- designed to appeal to seniors -- that can detect and display some health and fitness irregularities, announce results, transmit reports and use conversational AI to engage patients in better healthcare practices.
Also in 2021, disease-specific wearable monitoring products with conversational AI features were created in partnership with stakeholder users. For example, with the support of the National Blind Association and certified diabetes educators, Prodigy® specialty device maker incorporated conversational AI as a main feature of its Prodigy VOICE Blood Glucose Monitoring System, to offer verbal instructions for use and reporting in four languages, which provided greater accessibility for home healthcare and RPM. For those requiring continuous glucose monitoring, wearable sensors such as the Dexcom G6 CGM System of a patient’s glucose numbers on compatible smart devices.
Coincidentally, major developers of blood pressure monitors began to expand their capabilities and facilitate accessible use through the addition of conversational AI, to provide verbal alerts when pressure is not within range. Offering full verbal instruction in several languages facilitated use outside the clinical setting and served a wider range of patients.
During this same period, many new conversational AI startups were formed to address the growing patient and healthcare provider demand for RPM and point-of-care solutions. One such startup, RedFox AI, built a web-enabled conversational AI platform — accessible from any smartphone — that helps facilitate and troubleshoot patients through home-use diagnostic tests and medical devices. Simultaneously, the RedFox platform helps original equipment manufacturers (OEMs) and providers be there at the point-of-care increasing compliance and adherence for home diagnostics products while reducing customer support costs.
According to Nick Myers, CEO & Co-Founder of RedFox AI, “The COVID-19 pandemic really showed us how patients are demanding more flexibility with how they receive their healthcare, and quite frankly there will never be enough healthcare labor available to support every patient in their home, at the point-of-care. With the RPM market rapidly expanding, now is the time for innovative technologies like voice and conversational AI to assist with closing the ever-expanding communication gap between patients, healthcare providers, pharma device manufacturers, and payers.”
New research in Deloitte’s 2022 Connectivity and Mobile Trends Survey (Deloitte Center for Technology, Media & Telecommunications, 2022) reported:
Virtual medical appointments remain popular, with 49% of surveyed consumers saying they have attended at least one virtual appointment in the past year.
- At least a third of smartphone users are monitoring their health and fitness with their phones, and 1 in 5 use meditation or mental wellness apps.
- Nine in 10 consumers who own these devices use them to track fitness and monitor health metrics.
- The most common uses are to count daily steps, check pulse rate, count calories/nutrition, monitor heart health and track sleep. More than a third of users get reminders or badges to motivate them to exercise.
By early 2021, COVID-19 screening tools facilitated by conversational applications emerged. The Vocalis hyper-rapid COVID-19 screening tool, VocalisCheck, had achieved an accuracy of 81.2 percent in identifying COVID-19 infection based on patients’ voice samples. Collaborating on clinical trials with the Municipal Corporation of Greater Mumbai, Israeli startup, Vocalis, received the CE mark, which indicates that it has been manufactured to the European Economic Area (EEA) health, safety, and environmental protection standards, for its hyper-rapid COVID-19 screening tool (Ganguly S., 2021). Chris Landon, M.D., CEO of Technology Development Center Labs and Studio in Ventura, California observes:
“In the companies that seek our expertise, voice is being explored as a biomarker for exertion and exercise capacity. We’re also in the early stages of trying to understand if changes in voice can be used to predict early exacerbations or even help identify motivation/readiness to begin programs. To avoid the dangers of using artificial intelligence to expand the physician and health care practitioner’s Clinical Decision Support beyond their experience in the last ten cases of history and symptoms still requires vigilance and continued informing of the algorithm and quality improvement.”
In a joint study with Beth Israel Medical Center to facilitate identification of vocal biomarkers for Huntington’s disease, Canary Speech, Inc., used its patented technology with biomarker data to enable analysis that identified more than 1,000 features of speech differentiating healthy patients from Huntington’s Disease patients (O’Connell, 2022).
Co-founders Henry O’Connell and Jeff Adams got a jumpstart in the market a decade ago by combining their expertise in neurology research at the National Institutes of Health and on the founding Amazon Alexa speech AI team, respectively. Canary Speech holds three US Patents and two European patents.
Expanding the available databases of novel vocal biomarkers is an important step toward monitoring patients, diagnosing specific conditions, or grading the severity or the stages of a disease or for drug development. Like traditional biomarkers, they must be validated analytically and qualified using an evidentiary assessment (Robin et al., 2020 & Fagherazzi, et al., 2021).
The value of transmitting information obtained from voice biomarkers to healthcare providers using devices most already own, such as a smartphone, is inspiring investment by private and public entities.
Until about 2020, most vocal biomarker research focused on common neurodegenerative diseases, such as Parkinson’s, where voice disorders are frequent — 86% of the time (Tracy et al 2020).
Subsequent work on vocal biomarker data supported by private and public/private research is uncovering opportunities to identify and monitor additional medical conditions with greater accuracy (Fagherazzi et al, 2021).
Public-private partnerships are investing in the tremendous potential for using vocal biomarkers in healthcare.
In September, 2022, the National Institutes of Health (NIH) announced The NIH Common Fund’s Bridge to Artificial Intelligence (Bridge2AI) program, which plans to invest $130 million over four years to accelerate the widespread use of artificial intelligence (AI) and best practices by the biomedical and behavioral research communities.
To support ethical use and avoid pitfalls such as unintentional bias, the program assembled team members from diverse disciplines and backgrounds to generate tools, resources, and detailed data that are responsive to AI approaches.
“Generating high-quality ethically sourced data sets is crucial for enabling the use of next-generation AI technologies that transform how we do research,” said Lawrence A. Tabak, D.D.S., Ph.D., Performing the Duties of the Director of NIH. “The solutions to long-standing challenges in human health are at our fingertips, and now is the time to connect researchers and AI technologies to tackle our most difficult research questions and ultimately help improve human health (National Institutes of Health. 2022).”
As one of four inaugural data generation projects funded by NIH’s Bridge2AI, Voice as a Biomarker of Health, is being led by the University of South Florida and Weill Cornell Medicine to create a large, national databank of de-identified voices linked to selected biomarkers of health. Twelve institutions will collaborate. The project received $3.8 million with subsequent funding over three years of up to $14 million, contingent upon NIH appropriations by Congress.
The goal is the equitable use of the human voice as a tool when diagnosing and treating diseases, from cancer to depression, based on the sound of a patient’s voice with an eye toward establishing voice as a biomarker used in clinical care (Gillis, 2022; Long, 2022; National Institute of Allergy and Infectious Diseases, 2020).
The ethically sourced voice samples will go through AI analysis to identify signs of disease, such as slow speech. The research seeks biometric data that can be used to help diagnose and inform earlier treatment in five categories — voice, neurological, respiratory, psychiatric and children’s speech disorders.
TQIntelligence was awarded a National Science Foundation (NSF) Small Business Innovation Research Phase II grant for $1 million in November 2021. As part of the NIH’s Bridge2AI, Voice as a Biomarker of Health project TQIntelligence uses voice + AI technology to collect information from voice samples to help therapists treating children and adolescents with mental health issues that may be unable, or afraid, to articulate negative emotions, such as anger, fear, and sadness.
The goal is to identify kids in crisis for timely decision making about appropriate treatment, monitor treatment effectiveness, and not only support decision making by therapists, but also guide family decisions about getting the right help at the right time.
Founder and CEO of voice technology implementation company, TQIntelligence, Dr. Yared Alemu, PhD, a behavioral health psychologist is particularly enthusiastic about advancing ethical voice technology applications aimed at supporting mental health and brain development of youth growing up in low-income, BIPOC communities -- a population most often affected by trauma, which is associated with anxiety and depression and can negatively affect brain development and life choices.
Dr. Alemu said:
“The promise of leveraging conversational AI for remote monitoring is providing decision support for families as well as giving access to important information to clinicians. Remote monitoring changes episodic care to continuous care, for example by enabling a kid suffering from anxiety to record and analyze daily diary entries with as little as a 15 second voice sample.”
“Apple knows more about my blood pressure than my doctor does,” said Dr. Alemu. “I need to
know how they and other commercial apps are using the data they’re collecting. Personalized
medicine vs. personalized commerce respects rights and helps build trust for legitimate
conversational AI healthcare apps. To do otherwise hurts our ability to be able to use life-saving
apps important for patient healthcare. We have an opportunity to improve quality of life and
lower healthcare costs.”
Current and future RPM applications that incorporate voice technology implementations and
newer technology, such as augmented reality and virtual reality (AR/VR) features, must be
vigilant in following ethical use, privacy, and security practices in line with legal and societal
expectations (Open Voice Network, 2023).
It’s not the first time U.S. medical institutions and public-private partnerships have provided
support for using the human voice as a biomarker to supplement remote patient care.
Originally funded by a 1999-2004 grant from the National Science Foundation (NSF), work by
Carnegie Mellon University and the University of Pennsylvania created AphasiaBank, a shared
multimedia database of interactions by 180 aphasic individuals and 140 non-aphasic controls
performing a uniform set of discourse tasks for the study of communication in aphasia -- the
partial or total loss of the ability to articulate ideas or understand language. And, the work
continues as AsphasiaBank supported by NIH-NIDCD grant R01-DC008524 for 2022-27 (Forbes,
M., et al 2012).
A major international research project, Remote Assessment of Disease and Relapse in Central
Nervous System Disorders (RADAR-CNS), from 2020 to April 2022 was a collaboration of
clinicians, researchers, engineers, computer scientists and bioinformaticians from 22
organizations from across Europe and the U.S. The project aimed to improve people’s quality of
life and change how depression, epilepsy, and multiple sclerosis are managed and treated. It used
voice data collected on mobile devices to detect changes in behavior, sleep, or mood based on
conversational AI before it is perceived by the patient to help predict, or avoid, a relapse
(Fagherazzi et al., 2021 and Innovative Health Initiative, 2021).
Among its successes, the RADNAR-CNS team was awarded the Harald Frey prize by the Michael
Foundation in October 2021, for their work on the use of wearables to gauge the likelihood of
sudden unexpected death in epilepsy (Innovative Medicines Initiative, 2021).
RADAR-CNS was co-led by Janssen Pharmaceutica and King’s College London funded by the
Innovative Medicines Initiative 2 (IMI2) Joint Undertaking with support from the European Union’s
Horizon 2020 research and innovation programme and EFPIA (Fagherazzi et al., 2021).
In November 2021, the Council of the European Union approved a proposal to build upon the
work of the Innovative Medicines Initiative as the Innovative Health Initiative. The new
public-private partnership expands the focus to a broader range of cross-sectoral discoveries,
such as medical device/drug combinations or diagnostics based on AI (Innovative Health Initiative,
2021).
Among the practical considerations for the future of evolving RPM technology with conversational
AI are answers to questions such as: If an app collects data remotely using only conversational
AI, should it be billed as a separate class of telehealth (Sezgin, E. et al., 2020)?
Emre Sezgin, who has served as Principal Investigator, Information Technology Research and
Innovation at Nationwide Children’s Hospital, Columbus, OH observed: “To push forward RPM and
related voice-enabled solutions, the healthcare systems must be ready to adopt different types of
voice technology implementations as an alternative health care delivery modality.”
The push for new rules has already begun. Early in 2022 the U.S. Centers for Medicare and
Medicaid Services (CMS) suggested 2023 Medicare Physician Fee Schedule changes to Remote
Therapeutic Monitoring reimbursement. Nixon Gwilt Law (NGL), which specializes in healthcare
and innovation, notes they “have made recommendations to revisit the overall code structure to
better align RTM with Remote Physiologic Monitoring and other care management services
(O’Connor, K. et al, 2022).”
The best smart apps use conversational applications, and within that proprietary voice assistants,
for ease of use and accessibility. Trusted proprietary platforms are transparent about the data
they collect, how long they will retain it, and with whom the information will be shared.
Trustworthy apps also respect user privacy, collect only what is needed to perform their function,
and keep information and data-rich voice files secure. To avoid incidents of bias, algorithms must
be trained with data from diverse sources and maintained. Emerging technology such as
augmented reality and virtual reality (AR/VR), opens new opportunities for voice-enabled tools
for remote patient assistance. Organizations such as XR Metaverse Safety Institute and the Open
Voice Network are calling for closer attention to these challenges.
In March 2023, the Open Voice Network announced TrustMark. The three-part initiative:
-
Translates an ethical code that respects the individual’s privacy rights, regulations, and legislation for conversational applications into implementable actions that mitigate risk for developers, clients, and the public and promotes values of transparency, accountability, inclusivity, and sustainability
-
Provides public/industry education on critical ethical issues and best practices through training courses, published policies/guidelines and a self-assessment and maturity model for organizations, and who strive to implement best practices outlined by the TrustMark initiative.
-
Enables public identification of individuals/organizations (through badges/certifications)
Ethically designed voice technology implementations hold great promise to enhance the next
generation of personalized remote patient monitoring and management (RPM) and the continuum
of care.
If you are developing a software function that meets the definition of a device (such as a mobile
medical app) with an entirely new intended use, the U.S. Food and Drug Administration (FDA)
encourages you to contact the agency to discuss what regulatory requirements may apply.
The FDA notes:
“Consistent with FDA’s existing oversight approach that considers functionality of the software
rather than platform, FDA intends to apply its regulatory oversight to only those software
functions that are medical devices and whose functionality could pose a risk to a patient’s safety
if the device were to not function as intended.”
Authored by the Open Voice Network with special thanks to its Health, Wellness, and Life Sciences Community contributors Yared Alemu, Audrey Arbeeny, Yaa Kumah-Crystal, Freddie, Feldman, Chris Landon, Shaun Mitra, Nick Myers, Henry O’Connell, Laurie Orlov, Harry Pappas, Janice Mandel (document draft), Emre Sezgin, Jon Stine.
The Open Voice Network (OVON) is a non-profit industry association dedicated to the
development of standards for voice assistance transparency, consent, limited collection, and
control of voice data that will make using voice technology worthy of user trust. In any reality,
virtual or otherwise, we believe personal privacy should be respected as the default. The Open
Voice Network operates as an open-source community within The Linux Foundation. It is
independently funded and governed with participation from more than 120 voice practitioners
and enterprise leaders from 12 countries.
The Open Voice Network community’s work is open source. We seek inclusive input and like to
share our insights. At present, our work is focused in four areas:
- Interoperability, defined as the ability for conversational agents to share dialogs (and accompanying context, control, and privacy),
- Destination registration and management, the ability of users to confidently find a destination of choice through specific requests, and for the providers of goods and services to register a verbal “brand”—similar to the Domain Name System (DNS) of the internet;
- Privacy, with voice-specific guidance for both the protection of individual user data and that of commercial users; and
- Security, with a focus on voice-specific threats and harms.
Please see our papers and support the Open Voice Network by visiting http://www.openvoicenetwork.org/
Founded in 2000, The Linux Foundation is supported by more than 1,000 members and is the world’s leading home for collaboration on open-source software, open standards, open data, and open hardware. Linux Foundation’s projects are critical to the world’s infrastructure including Linux, Kubernetes, Node.js, and more. The Linux Foundation’s methodology focuses on leveraging best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, please visit us at http://www.linuxfoundation.org/
Conversational AI The implementation of voice technology in audible and/or textual automated interchanges between a human and digital device that facilitates interactive, automated patient engagement; inclusive engagement; accessible engagement; and biomarker precision medicine.
Remote Patient Monitoring and Management Personal Health and medical data collection from an individual in one location, which is transmitted via electronics communication technologies to a provider in a different location for use in care and related support (CCHP.org, 2010-2023).
Voice Technology Auditory or textual communication between digital-to-human or digital-to-digital interaction that mimics human speech.
Association of American Medical Colleges (AAMC), (2021, June 11) AAMC Report Reinforces Mounting Physician Shortage, [Press release]. https://www.aamc.org/news-insights/press-releases/aamc-report-reinforces-mounting-physician-shortage
Center for Connected Health Policy (CCHP.org), (2010-2023) What is telehealth? https://www.cchpca.org/what-is-telehealth/?category=remote-patient-monitoring
Chen, A. (2019, March 14). Why companies want to mine the secrets in your voice. The Verge.https://www.theverge.com/2019/3/14/18264458/voice-technology-speech-analysis-mental-health-risk-privacy
Deloitte. (2021, June). How the pandemic has stress-tested the digital home. CMO Today. The Wall Street Journal. https://deloitte.wsj.com/articles/how-the-pandemic-has-stress-tested-the-digital-home01628880858
Deloitte Center for Technology, Media & Telecommunications. (2022). Connectivity and mobile trends, 3rd edition. Deloitte Insights. https://www2.deloitte.com/us/en/insights/industry/telecommunications/connectivity-mobile-trends-survey.html
Fagherazzi G., Fischer A., Ismael M., Despotovic V. (2021, April 16). Voice for health: The use of vocal biomarkers from research to clinical practice. Digital Biomarkers, 5(1), 78-88. https://doi.org/10.1159/000515346
Forbes, M., Fromm, D., MacWhinney, B. (2012, August 01). AphasiaBank:A resource for clinicians. Seminars in speech and language. 33(03): 217-222 https://doi.org/10.1055/s-0032-1320041
Ganguly, S. (2021, May 19). Healthtech startup Vocalis’ COVID screening tool 81 pc accurate, gets CE mark. Yourstory. https://yourstory.com/2021/03/healthtech-startup-vocalis-health-covid-screening-tool
Gillis, E. (2022, November 16). Your voice could help diagnose disease, other issues through AI [Video]. Yahoo!Finance. https://finance-yahoo-com.cdn.ampproject.org/c/s/finance.yahoo.com/amphtml/news/voice-could-help-diagnose-disease-011853082.html
Harrah, S. (2020, November 05). Medically underserved areas in the US. University of Medicine and Health Sciences. https://www.umhs-sk.org/blog/medically-underserved-areas-regions-where-u-s-needs-doctors
Holman, H.R. (2020). The Relation of the Chronic Disease Epidemic to the Health Care Crisis. ACR open rheumatology, 2(3), 167-173. https://doi.org/10.1002/acr2.11114
Innovative Health Initiative (IMI), (2022). From IMI to IHI. https://www.ihi.europa.eu/about-ihi/imi-ihi
Innovative Medicines Initiative (2021, October 9). Wearable can predict risk of fatal epilepsy complication. [Press release]. https://www.imi.europa.eu/news-events/newsroom/wearable-can-predict-risk-fatal-epilepsy-complication
Institute of Medicine (US) Committee on Quality of Health Care in America. (2001). Crossing the quality chasm: A new health system for the 21st century. Washington, DC. National Academies Press. https://doi.org/10.17226/10027
Joszt, L. (2018, July 20), 5 vulnerable populations in healthcare. American Journal of Managed Care. https://www.ajmc.com/view/5-vulnerable-populations-in-healthcare
Juniper Research. (2023) Remote Patient Monitoring. https://www.juniperresearch.com/researchstore/healthcare-government/remote-patientmonitoring-research-report
Kwon, N, Kim, S., Predeck, K, Adams, Adams, J, O’Connell, H. (2022, August). Depression severity detection using read speech with a divide-and-conquer approach. Canary Speech. https://canaryspeech.com/wp-content/uploads/2022/08/Canary-Speech-Depression-Severity-Detection-Using-Read-Speech-With-A-Divide-And-Conquer-Approach.pdf.
Li, M. (2020, October 26). To build less-biased AI, hire a more diverse team. Harvard Business Review. https://hbr.org/2020/10/to-build-less-biased-ai-hire-a-more-diverse-team
Long, A. (2022, September 13). USF Health, Weill Cornell Medicine earn inaugural funding in NIH’s newly launched Bridge2AI initiative, will create artificial intelligence platform for using voice to diagnose disease. Hot News, Morsani College of Medicine, Research, USF Health Lead Story. https://hscweb3.hsc.usf.edu/blog/2022/09/13/usf-health-cornell-earns-inaugural-nih-funding-to-create-artificial-intelligence-platform-for-using-voice-to-diagnose-disease/
McTear, M., Jokinen, K, Dubey, M., Chollet, G, Boudy, J, Lohr, C., Roelen, S. D., Mossing, W, Wieching, R. (2022, June 17). Empowering Well-Being Through Conversational Coaching for Active and Healthy Ageing. ICOST 2022. Lecture Notes in Computer Science, vol 13287. Springer, Cham. https://doi.org/10.1007/978-3-031-09593-1_21
Mei, J., Desrosiers, C., Frasnelli, J. (2021, May 06). Machine learning for the diagnosis of Parkinson’s disease: A Review of Literature. Frontiers in Aging Neuroscience. 13. https://doi.org/10.3389/fnagi.2021.633752
Next Move Strategy Consulting. (2022). Patient monitoring market by product, by type and by end user ‐ Global opportunity analysis and industry forecast 2022-30). InResearchandMarkets (ID: 5601842). https://www.researchandmarkets.com/reports/5601842/patient-monitoring-market-by-product-by-type?utm_code=fz89hg&utm_exec=jamu273prd
National Institutes of Health. (2022, September 13). NIH launches Bridge2AI program to expand the use of artificial intelligence in biomedical and behavioral research [Press Release]. https://www.nih.gov/news-events/news-releases/nih-launches-bridge2ai-program-expand-use-artificial-intelligence-biomedical-behavioral-research
National Institute of Allergy and Infectious Diseases. (2020, December 16). NIH Joins NSF for Initiative Supporting Smart Health Research. NIH: National Institute of Allergy and Infectious Diseases. [Press Release]. https://www.niaid.nih.gov/grants-contracts/nsf-initiative-smart-health-research
O’Connell, H. (2022, December 7). Finding Biomarkers in Huntington’s Disease. Canary Speech. https://canaryspeech.com/blog/finding-biomarkers-in-huntingtons-disease
O’Connor, K., Nixon, C., Papp, C. (2022, July 11). Proposed changes to remote therapeutic monitoring reimbursement in the proposed 2023 Medicare physician fee schedule. Nixon Gwilt Law. https://nixongwiltlaw.com/nlg-blog/2022/2022/7/11/proposed-changes-to-remote-therapeutic-monitoring-reimbursement-in-the-proposed-2023-medicare-physician-fee-schedule
Open Voice Network. (2023). Ethical guidelines for voice experiences: A case for inclusivity and trustworthiness. https://openvoicenetwork.org/resources
Orlov, L. (2022, June). The state of voice-AI and older adults 2022. Aging and Health Technology Watch. https://www.ageinplacetech.com/files/aip/The%20State%20of%20Voice-AI%202.0%20and%20Older%20Adults%20-%20Final_0.pdf
Orduña, N. (2019, July 16). AI-driven companies need to be more diverse. Here’s why. World Economic Forum. https://www.weforum.org/agenda/2019/07/ai-driven-companiesneed-to-be-more-diverse-here-s-why/
Pew Research Center. (2021, April 7). [Mobile Fact Sheet]. Mobile phone ownership over time. https://www.pewresearch.org/internet/fact-sheet/mobile/#mobile-phone-ownership-over-time
Robin, J., Harrison, J., Kaufman, L., Rudzicz, F, Simpson, W. Yancheva, M. (2020). Evaluation of speech-based digital biomarkers: Review and Recommendations. Digital Biomarkers, 4(3): 99-108. https://doi.org/10.1159/000510820
Sezgin, E., Huang, Y., Ramtekkar, U. & Lin. S. (2020). Readiness for voice assistants to support healthcare delivery during a health crisis and pandemic. Npj Digital Medicine, 3(1). https://doi.org/10.1038/s41746-020-00332-0
Stroobants, J. et al., (2022, July 28). Severe shortage of caregivers at heart of European healthcare crisis. Le Monde. https://www.lemonde.fr/en/international/article/2022/07/28/severe-shortage-of-caregivers-at-heart-of-european-healthcare-crisis_5991700_4.html
Tracy, J.M., Ozkanca, Yl, Atkins, D.C., & Hosseini Ghomi, R. (2020). Investigating voice as a biomarker: Deep phenotyping methods for early detection of Parkinson’s disease. Journal of biomedical informatics, 104, 103362. https://doi.org/10.1016/j.jbi.2019.103362.
Vixen Labs. (2022). Voice Consumer Index 2022. https://vixenlabs.co/wp-content/uploads/2022/06/VixenLabs_VoiceConsumerIndex2022.pdf
World Health Organization. (2022, October 1). Ageing and Health. [Fact Sheet]. https://www.who.int/news-room/fact-sheets/detail/ageing-and-health
Yang S., Lee J., Sezgin E. Bridge J. Lin S (2021). Clinical advice by voice assistants on postpartum depression: Cross-sectional investigation using Apple Siri, Amazon Alexa, Google Assistant, and Microsoft Cortana. JMIR mhealth and uhealth https://doi.org/10.2196/24045