AAC Board Speech Recognition API

A speech-to-text API designed for AAC (Augmentative and Alternative Communication) devices and applications. Optimized for low-latency voice command recognition to help developers integrate voice controls into games, apps, and assistive technologies.

Features

Multiple Recognition Engines - Google Speech Recognition + Vosk offline fallback
Command Mode - Optimized for short AAC commands with faster response times from AAC devices
Confidence Scoring - Filter low-confidence recognitions
Word-Level Timing - Get start/end times for each recognized word
Standardized JSON Responses - Consistent camelCase API format
Request Logging - Optional consent-based logging for analytics
Game Integration Ready - Drop-in JavaScript module included
Privacy Focused - Offline recognition available, logging requires consent

New Features Implemented in This Release

Offline speech recognition fallback using Vosk models
AAC Command Mode for optimized short-utterance recognition
Word-level confidence scoring and timing metadata
Standardized camelCase JSON response formatting
Consent-based logging layer for diagnostic analytics
JavaScript game integration controller module
Docusaurus-based documentation site and embedded demo
Health reporting endpoint for service status visibility
Configurable environment behavior for deployment flexibility
Tic-Tac-Toe voice-controlled demonstration showcasing SDK usage

Quick Start - How to quickly run our API

# Clone the repository
git clone https://github.com/yourusername/aac-board-api.git
cd aac-board-api

# Install dependencies
npm install
pip install SpeechRecognition vosk numpy scipy --break-system-packages

# Start the server
cd Initial_API
node index.js

The API is now running at http://localhost:8080

Test it:

# Health check
curl http://localhost:8080/health

# Upload audio for transcription
curl -X POST http://localhost:8080/upload \
  -F "audioFile=@your-audio.wav"

Installation

Prerequisites

Requirement	Version	Download
Node.js	16+	nodejs.org
Python	3.8+	python.org
npm	8+	Included with Node.js

Step 1: Clone Repository

git clone https://github.com/yourusername/aac-board-api.git
cd aac-board-api

Step 2: Install Node.js Dependencies

npm install

Step 3: Install Python Dependencies

# Standard installation
pip install SpeechRecognition vosk numpy scipy

# If you get externally-managed-environment error (Python 3.11+)
pip install SpeechRecognition vosk numpy scipy --break-system-packages

# Or use a virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install SpeechRecognition vosk numpy scipy

Step 4: Download Vosk Model (Optional)

If Vosk failes to compile from python installation, an alternative method is to download the model directly into your system and the unzip within the project folder. This also enables it to work offline if internet access is a major concern.

For offline speech recognition:

mkdir -p model && cd model
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
cd ..

Other models available at alphacephei.com/vosk/models

Step 5: Start the Server

cd Initial_API
node index.js

📡 API Reference

Endpoints

Method	Endpoint	Description
`GET`	`/health`	Server health and status
`GET`	`/formats`	Supported audio formats
`POST`	`/upload`	Upload audio for transcription

GET `/health`

Returns server status, uptime, and service availability.

Response:

{
  "status": "ok",
  "timestamp": "2025-01-15T10:30:00.000Z",
  "uptime": 3600,
  "uptimeFormatted": "1h 0m 0s",
  "version": "2.0.0",
  "services": {
    "speechRecognition": true,
    "logging": true
  },
  "supportedFormats": ["WAV", "MP3", "FLAC", "OGG", "M4A"],
  "endpoints": {
    "health": "/health",
    "upload": "/upload",
    "formats": "/formats"
  }
}

GET `/formats`

Returns supported audio formats and optimal settings.

Response:

{
  "supportedFormats": ["WAV", "MP3", "FLAC", "AIFF", "OGG", "M4A", "RAW", "PCM"],
  "optimal": {
    "format": "WAV",
    "sampleRate": 16000,
    "bitDepth": 16,
    "channels": 1
  },
  "notes": [
    "WAV format recommended for lowest latency",
    "16kHz sample rate optimal for speech recognition",
    "Mono audio preferred (stereo will be converted)"
  ]
}

POST `/upload`

Upload an audio file for speech-to-text transcription.

Request:

Content-Type: multipart/form-data
Body: audioFile - The audio file to transcribe

Headers (Optional):

Header	Description
`x-command-mode`	Set to `"true"` for AAC command optimization
`x-user-id`	User identifier for logging
`x-session-id`	Session identifier (fallback for user-id)
`x-logging-consent`	Set to `"true"` to enable server-side logging

Example Request:

curl -X POST http://localhost:8080/upload \
  -H "x-command-mode: true" \
  -H "x-user-id: user123" \
  -F "audioFile=@recording.wav"

Success Response (200):

{
  "success": true,
  "transcription": "hello world",
  "confidence": 0.92,
  "service": "vosk",
  "processingTimeMs": 245,
  "audio": {
    "filename": "recording.wav",
    "size": 32000,
    "sizeBytes": 32000,
    "format": "WAV",
    "duration": 1.5,
    "sampleRate": 16000,
    "channels": 1,
    "mimeType": "audio/wav"
  },
  "request": {
    "timestamp": "2025-01-15T10:30:00.000Z",
    "device": "Desktop",
    "browser": "Chrome",
    "userAgent": "Mozilla/5.0..."
  },
  "aac": {
    "commandMode": true,
    "commandType": "communication",
    "isCommand": true,
    "suggestedActions": ["send_message", "repeat", "edit"]
  },
  "wordTiming": [
    { "word": "hello", "startTime": 0.12, "endTime": 0.45, "confidence": 0.94 },
    { "word": "world", "startTime": 0.48, "endTime": 0.82, "confidence": 0.90 }
  ]
}

Error Response (4xx/5xx):

{
  "success": false,
  "transcription": null,
  "processingTimeMs": 150,
  "error": {
    "code": "AUDIO_QUALITY_ISSUES",
    "message": "Audio appears silent or nearly silent",
    "details": [
      { "service": "google", "error": "Could not understand audio" },
      { "service": "vosk", "error": "No speech detected" }
    ]
  },
  "request": {
    "timestamp": "2025-01-15T10:30:00.000Z",
    "device": "Desktop",
    "browser": "Chrome"
  },
  "warnings": ["Audio volume is low"]
}

Response Format

All responses use camelCase keys for consistency.

Success Response Structure

Field	Type	Description
`success`	boolean	Whether transcription succeeded
`transcription`	string	Recognized text
`confidence`	number	Recognition confidence (0-1)
`service`	string	Recognition service used (`google`, `vosk`)
`processingTimeMs`	number	Processing time in milliseconds
`audio`	object	Audio file metadata
`request`	object	Request metadata
`aac`	object	AAC-specific information
`wordTiming`	array	Word-level timing (when available)
`user`	object	User identifier (if provided)
`warnings`	array	Non-fatal warnings

AAC Object

Field	Type	Description
`commandMode`	boolean	Whether command mode was enabled
`commandType`	string	Classified command type
`isCommand`	boolean	Whether recognized text is a known command
`suggestedActions`	array	Suggested follow-up actions

Command Types:

navigation - back, next, up, down, etc.
selection - select, choose, yes, no, etc.
communication - hello, thank you, help, etc.
media - play, pause, stop, etc.
freeform - Unclassified speech

Game Integration

We provide a drop-in JavaScript module for easy game integration.

Quick Integration

<script type="module">
  import { AACGameController } from './aac-voice-control.js';

  const voice = new AACGameController({
    apiUrl: 'http://localhost:8080',
    commandMode: true
  });

  // Map voice commands to game actions
  voice.mapCommand(['jump', 'hop'], () => player.jump());
  voice.mapCommand(['left', 'go left'], () => player.moveLeft());
  voice.mapCommand(['fire', 'shoot'], () => player.attack());

  // Or use common command mappings
  voice.mapCommonCommands({
    up: () => player.moveUp(),
    down: () => player.moveDown(),
    select: () => game.select(),
    pause: () => game.pause()
  });

  // Start listening
  voice.start();
</script>

Module Features

Continuous and single-shot listening modes
Multi-phrase command mapping
Confidence thresholds
Built-in UI panel (optional)
Event-based architecture

See aac-voice-control.js for full documentation.

Configuration

Environment Variables

Variable	Default	Description
`PORT`	`8080`	Server port
`VOSK_MODEL_PATH`	`model/vosk-model-small-en-us-0.15`	Path to Vosk model
`AAC_COMMAND_MODE`	`false`	Enable command mode by default
`PRELOAD_VOSK`	`true`	Preload Vosk model on startup
`NODE_ENV`	`development`	Environment (`production` disables auto-consent)

Example:

PORT=3000 VOSK_MODEL_PATH=./my-model node index.js

Command Mode

Enable for AAC devices to optimize for short commands:

# Via header
curl -H "x-command-mode: true" ...

# Via environment
AAC_COMMAND_MODE=true node index.js

Benefits:

Faster recognition for short phrases
Limited vocabulary reduces errors
Optimized for common AAC commands

Testing

Run API Tests

npm test

Run Python Tests

python test.py                    # Run all tests
python test.py --audio file.wav   # Test specific file
python test.py --record           # Record from microphone
python test.py --command-mode     # Test with command mode

Manual Testing

# Health check
curl http://localhost:8080/health

# Upload test audio
curl -X POST http://localhost:8080/upload \
  -F "audioFile=@tests/TestRecording.wav"

# With command mode
curl -X POST http://localhost:8080/upload \
  -H "x-command-mode: true" \
  -F "audioFile=@tests/TestRecording.wav"

Examples

Basic Transcription (Node.js)

const FormData = require('form-data');
const fs = require('fs');
const fetch = require('node-fetch');

async function transcribe(audioPath) {
  const form = new FormData();
  form.append('audioFile', fs.createReadStream(audioPath));

  const response = await fetch('http://localhost:8080/upload', {
    method: 'POST',
    body: form
  });

  const result = await response.json();
  
  if (result.success) {
    console.log('Transcription:', result.transcription);
    console.log('Confidence:', result.confidence);
  } else {
    console.error('Error:', result.error.message);
  }
}

transcribe('recording.wav');

Browser Integration

async function recordAndTranscribe() {
  // Get microphone access
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const mediaRecorder = new MediaRecorder(stream);
  const chunks = [];

  mediaRecorder.ondataavailable = (e) => chunks.push(e.data);
  
  mediaRecorder.onstop = async () => {
    const blob = new Blob(chunks, { type: 'audio/webm' });
    const formData = new FormData();
    formData.append('audioFile', blob, 'recording.webm');

    const response = await fetch('http://localhost:8080/upload', {
      method: 'POST',
      headers: { 'x-command-mode': 'true' },
      body: formData
    });

    const result = await response.json();
    console.log(result.transcription);
  };

  // Record for 3 seconds
  mediaRecorder.start();
  setTimeout(() => mediaRecorder.stop(), 3000);
}

Python Client

import requests

def transcribe(audio_path, command_mode=False):
    url = 'http://localhost:8080/upload'
    
    headers = {}
    if command_mode:
        headers['x-command-mode'] = 'true'
    
    with open(audio_path, 'rb') as f:
        files = {'audioFile': f}
        response = requests.post(url, files=files, headers=headers)
    
    result = response.json()
    
    if result['success']:
        print(f"Transcription: {result['transcription']}")
        print(f"Confidence: {result['confidence']:.1%}")
        print(f"Service: {result['service']}")
    else:
        print(f"Error: {result['error']['message']}")
    
    return result

# Usage
transcribe('recording.wav', command_mode=True)

Known Bugs / Limitations in This Release

Speech accuracy may degrade under high background noise
Offline model load may cause short delay during startup
Voice command misclassification can occur when speaking rapidly
Embedded Tic-Tac-Toe demo UI elements overlap on small screen sizes
Logging may generate empty entries when consent header omitted
Some AAC vocabulary categories are limited or minimally trained
Offline model path configuration may require manual adjustment on Windows

Troubleshooting

Common Issues

EADDRINUSE: Port already in use

# Find process using port
lsof -i :8080

# Kill it
kill -9 <PID>

# Or use different port
PORT=8081 node index.js

Python module not found

# Install missing module
pip install SpeechRecognition --break-system-packages

# Or use virtual environment
python -m venv venv
source venv/bin/activate
pip install SpeechRecognition vosk numpy scipy

Vosk model not found

# Download and extract model
mkdir -p model && cd model
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip

Low recognition accuracy

Enable command mode: -H "x-command-mode: true"
Use WAV format at 16kHz mono
Reduce background noise
Speak clearly at moderate pace
Try the Vosk offline model

CORS errors in browser

The API includes CORS support. If issues persist:

// Ensure you're using the correct URL
const API_URL = 'http://localhost:8080';  // Not 127.0.0.1

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Development Setup

# Clone your fork
git clone https://github.com/yourusername/aac-board-api.git
cd aac-board-api

# Install dependencies
npm install
pip install -r requirements.txt

# Run tests
npm test
python test.py

# Start in development mode
npm run dev

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

SpeechRecognition - Python speech recognition library
Vosk - Offline speech recognition
Express.js - Web framework for Node.js
Lily Ulrey - For creation of our project logo for the website.

Contact

Issues: GitHub Issues
Discussions: GitHub Discussions

[Giovanni Muniz] • [Andrew Blass] • [Eric Smith] • [Kieran Plenn] • [Mohammed Eisa] • [Shrikanth Srenivasan]

Name		Name	Last commit message	Last commit date
Latest commit History 203 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
Initial_API		Initial_API
documentation		documentation
lib		lib
.gitignore		.gitignore
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
JIRA_STORIES.md		JIRA_STORIES.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

AAC Board Speech Recognition API

Features

New Features Implemented in This Release

Table of Contents

Quick Start - How to quickly run our API

Test it:

Installation

Prerequisites

Step 1: Clone Repository

Step 2: Install Node.js Dependencies

Step 3: Install Python Dependencies

Step 4: Download Vosk Model (Optional)

Step 5: Start the Server

📡 API Reference

Endpoints

GET /health

GET /formats

POST /upload

Response Format

Success Response Structure

AAC Object

Game Integration

Quick Integration

Module Features

Configuration

Environment Variables

Command Mode

Testing

Run API Tests

Run Python Tests

Manual Testing

Examples

Basic Transcription (Node.js)

Browser Integration

Python Client

Known Bugs / Limitations in This Release

Troubleshooting

Common Issues

Contributing

Development Setup

License

Acknowledgments

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

GET `/health`

GET `/formats`

POST `/upload`

Packages