All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Gracefully handle unknown message types
- Moved channel field to root of AddTranscript message for multichannel transcription.
- Deprecating
speechmatics-pythonin favor ofspeechmatics-rtfor real-time transcription and 'speechmatics-batch' for batch transcription.
- Patch WebsocketClient to work with any websockets version >= 10.0
- Support requesting a temporary token (JWT) with region and client ref
- Moved channel_diarization_labels field from realtime transcription config to common class.
- Added missing flag to call_middleware for multichannel mode.
- Fixed non-multichannel sessions bugging out after adding multichannel support
- Fixed microphone transcription tests not working after adding multichan dz support
- Fixed version number not matching release, incremented new version number for hotfix.
- Support RT Multichannel and channel DZ
BREAKING CHANGE: Metrics functionality now requires explicit installation
Previously, all metrics dependencies (pyannote, pandas, jiwer, etc.) were installed by default. This change moves them to an optional '[metrics]' extra to reduce the default installation footprint.
- Move metrics dependencies to requirements-metrics.txt
- Configure extras_require in setup.py for optional installation
- Add graceful error handling in CLI when dependencies are missing
- Support end-of-utterance messages (DEL-24982)
- cli: fix some config options not being set when defined in a config file:
topic_detection_configandspeaker_diarization_config
- Support for new parameters
prefer_current_speakerandspeaker_sensitivityin Speaker Diarization
- Support search/replace API (DEL-24399 DEL-24766)
- Introduce Mixed Error Rate to SM Metrics
- Language code getting added to URL query parameter
- Renamed
extra_headerstoadditional_headersinwebsockets.connect()to support WebSockets version 14.0, as per documentation - Updated
speechmatics-pythonto requirePython >= 3.9, aligning with WebSockets 14.0
speechmatics-python 2.0.3 is the last version supporting Python 3.8
- Unexpected
keyword argument 'extra_headers'error inwebsockets.connect()by updating requirements to allow versions of websockets from 10.0 up to and including 13.1
- Added internal, Speechmatics only client message: GetSpeakers, and server message: SpeakersResult
- Added internal, Speechmatics only client method: send_message
- Refactor mutable default parameters in run function
- Remove deprecated speaker_change, channel_and_speaker_change, and speaker_change_sensitivity diarization options
- Remove speaker change deprecation warning
- Speaker change deprecation warning
- Disfluency option now exposed for batch.
- Support for adding extra headers for RT websocket
- AudioEventsConfig class now defaults to empty dict instead of empty list when types not provided
- Disfluency option is now backwards compatible.
- Support for removing words tagged as disfluency.
- Support for audio_events in Batch CLI.
- Support
typeswhitelist for audio events.
- Support for volume_threshold audio filtering in transcription config
- Add audio_events_config to BatchTranscriptionConfig
- Add audio_events_config to BatchConfig.to_config method
- Proper flag handling for Audio Events
- Support for the Audio Events feature
- Rename
metricstoasr_metrics
- Fix import errors for asr_metrics module
- Misc fixes for asr_metrics module
- Add metrics toolkit for transcription and diarization
- Add support for batch auto chapters
1.11.1 - 2023-10-19
- Improve upload speeds for files submitted with the batch client
- Retry requests in batch client on httpx.ProtocolError
- Remove generate-temp-token option from examples and examples in docs
1.11.0 - 2023-08-25
- Add support for batch topic detection
- Add support for batch sentiment analysis
- Add support for transcribing multiple files at once (submit_jobs)
1.9.0 - 2023-06-07
- Fix error when language provided is whitespace
- Add support for transcript summarization
- Example of using notifications
- Pass sdk information to batch and rt requests
- Add support for providing just auth_token ConnectionSettings
- Use default URLs + .toml config in python sdk
- Fixed an issue in the batch client where jobs with fetch_url were not able to be submitted
- Fixed reading translation config from config file
- TranscriptionConfig.enable_partials defaults to False
- setting TranscriptionConfig.enable_partials bool value to a string raises exception
- Support for batch and realtime urls in config .toml files
- Added support for real-time translation
- Added
--enable-translation-partialsto enable partials for translation only - Added
--enable-transcription-partialsto enable partials for transcription only
- Updated
--enable-partialsto enable partials for both transcription and translation
- Add support for multiple profiles to the CLI tool
1.7.0 - 2023-03-01
- Add support for language identification
- Fixed an issue where
transcription_configwas not correctly loaded from the JSON config file - CLI transcript output now properly handles UTF-8
1.6.4 - 2023-02-14
- printing finals in cli now correctly deletes partials for that segment
1.6.3 - 2023-02-14
- Type annotation for BatchSpeakerDiarizationConfig.speaker_sensitivity
1.6.2 - 2023-02-07
- Always raise an exception on transcriber error
1.6.1 - 2023-02-02
- Fix inconsistency in docs
1.6.0 - 2023-02-02
- Add support for translation
- Raises ConnectionClosedException rather than returning when the websocket connection closes unexpectedly
- Add sphinx-argparse to docs build pipeline to auto-document the CLI tool
- Update the docs / help texts for the CLI tool
1.5.0 - 2023-01-13
- .toml config file support to set the auth token with CLI config set command
- CLI config unset command for removing properties from the toml file
- --generate-temp-token option to the set/unset config command and toml file
- Default URLs for self-service Batch and RT in the CLI
1.4.5 - 2023-01-03
- Documentation for base transcription config class
_TranscriptionConfig - Human-readable error outputs in the CLI
- Improved error types in HTTP requests to capture errors more clearly
- Remove excess logging on errors and allow developer to catch errors
- Use environment variable SM_MANAGEMENT_PLATFORM_URL before defaulting to production MP API URL
1.4.4 - 2022-12-06
- Check for error in submit job response
- Url ending in '/v2/' does not return 404 error anymore
- Perform non-blocking reads when reading chunks from a synchronous stream
- Add --config-file CLI argument to allow passing a whole TranscriptionConfig JSON file to the transcriber
- Changed github workflow trigger to released
- Add --generate-temp-token CLI argument to rt websocket setup to get temp token for rt authentication
- Add generate_temp_token optional boolean kwarg to connection settings, defaults to False
- Add new RT self-service runtime URL for eu2
- Add --print-json CLI argument to enable printing transcripts as JSON rather than text
- Add
speechmatics.adaptersmodule with support for performing JSON to text conversion - Add support for
language_pack_infoin theRecognitionStartedmessage
- Restored postional
languageparameter toTranscriptionConfig.__init__
- Support for enable entities, speaker diarization sensitivity, channel diarization labels in batch
- Transformed command to follow the pattern of RT only for legacy compatibility
- Fix client crashing if 'url' parameter is omitted and now outputting informative message
- Changed diarization option <speaker_and_channel> to <channel_and_speaker_change> as that's what SaaS expects.
- Fix get-results to fetch the transcript
- Update batch delete job to return meaningful response
- Update documentation for RT speaker diarization.
- Add support for speaker diarization in RT, and support the max_speakers parameter
- Remove support for --n_best_limit parameter
- Remove unnecessary Version file use and updated documentation for batch_client
- Added support for Batch ASR client
- Add domain parameter
- Fix an issue with an unhandled task exception when using run_synchronously with a timeout.
- Remove default values from args parser for max-delay-mode and operating-point for backwards compatibility with older versions of RTC.
- Use later version of sphinx to generate docs (supports Python 3.10)
- Update Speechmatics logo
- Allow user to raise ForceEndSession from an event handler or middleware in order to forcefully end the transcription session early.
- Publish to pypi.org not test.pypi.org.
- Update helper text for enable-entities, max-delay, and max-delay-mode
- Support for choosing mode of operation for max_delay via
max_delay_modein transcription config.
- bump
websocketsdependency to 10.1 to get the fix for an issue it has with Python 3.10
- bump
websocketsdependency to 9.1
- Support for enabling inverse text normalization (ITN) entities via
enable_entitiesin transcription config.
- operating_point CLI option validation and documentation
- operating_point CLI option and property in TranscriptionConfig
- Fix seq_no persisting across sessions
- Migrate from Travis CI to GitHub Actions
- Added authentication token support for RT-SaaS @rakeshv247.