Skip to content

✨ (go-speech-transcriber): add Go Speech Transcriber application for real-time speech-to-text transcription#58

Open
jqueguiner wants to merge 2 commits into
mainfrom
feat/go-sample
Open

✨ (go-speech-transcriber): add Go Speech Transcriber application for real-time speech-to-text transcription#58
jqueguiner wants to merge 2 commits into
mainfrom
feat/go-sample

Conversation

@jqueguiner
Copy link
Copy Markdown
Contributor

@jqueguiner jqueguiner commented Mar 6, 2025

📝 (README.md): create documentation for installation, usage, and features of the Go Speech Transcriber
🔧 (go.mod): add module dependencies for the Go Speech Transcriber application
💡 (go-speech-transcriber.go): implement core functionality for audio recording and transcription using Gladia API
✅ (tests): add tests for key components of the Go Speech Transcriber application

Summary by CodeRabbit

  • New Features

    • Launched a real-time speech-to-text application with multi-language support.
    • Added a user-friendly system tray interface and global keyboard shortcuts to control recording and text input.
  • Documentation

    • Introduced a comprehensive guide with detailed installation instructions, configuration options for API key integration, and troubleshooting tips across Windows, macOS, and Linux.

…real-time speech-to-text transcription

📝 (README.md): create documentation for installation, usage, and features of the Go Speech Transcriber
🔧 (go.mod): add module dependencies for the Go Speech Transcriber application
💡 (go-speech-transcriber.go): implement core functionality for audio recording and transcription using Gladia API
✅ (tests): add tests for key components of the Go Speech Transcriber application
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 6, 2025

Walkthrough

This pull request introduces a new speech transcription application built with Go. It delivers a comprehensive README outlining installation, configuration, and usage details, and adds a main source file implementing real-time speech-to-text conversion using the Gladia API. The implementation features components for audio recording, WebSocket communication, system tray interaction, and keyboard shortcuts. A new Go module file is also provided to manage dependencies and set the required Go version.

Changes

File Change Summary
go/go-speech-transcriber/README.md Added comprehensive documentation covering installation prerequisites, usage instructions, API key configuration (via .env or command-line), multi-language support, keyboard shortcuts, system tray controls, and troubleshooting guidance.
go/go-speech-transcriber/go-speech-transcriber.go Added core application implementation for real-time speech transcription. Introduces key types (Config, AudioTranscriptionService, SpeechTranscriber, GladiaRecorder, StatusBarApp, KeyListener) and associated methods, including audio recording via PortAudio, WebSocket handling, session initialization with Gladia API, UI updates, and keyboard event processing.
go/go-speech-transcriber/go.mod Introduced a new Go module file that declares the project module (github.com/gladiaio/go-speech-transcriber), specifies Go version 1.20, and lists both direct and indirect dependencies required for audio processing, system tray integration, key event handling, and WebSocket communication.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant KeyListener
    participant StatusBarApp
    participant GladiaRecorder
    participant AudioTranscriptionService
    participant GladiaAPI

    User->>StatusBarApp: Launch application
    StatusBarApp->>KeyListener: Initialize key listener
    User->>KeyListener: Press start key
    KeyListener->>GladiaRecorder: Trigger audio recording
    GladiaRecorder->>AudioTranscriptionService: Send audio stream
    AudioTranscriptionService->>GladiaAPI: Initialize session & transmit audio
    GladiaAPI-->>AudioTranscriptionService: Return transcription results
    AudioTranscriptionService-->>GladiaRecorder: Forward transcription text
    GladiaRecorder->>StatusBarApp: Update UI with transcribed text
    StatusBarApp->>User: Display transcription
Loading

Poem

I hopped into code with a skip and a beat,
Translating sound to words oh-so-sweet.
With keys that tap and lights that gleam,
My code dances like a dream.
Cheers from a rabbit, happy and fleet! 🐇

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
go/go-speech-transcriber/README.md (2)

8-9: Avoid repetitive wording around "system tray".
The phrase "system tray" is used twice in close succession, which may sound redundant. Consider renaming one instance or merging them for clarity.

🧰 Tools
🪛 LanguageTool

[grammar] ~8-~8: This phrase is duplicated. You should probably use “system tray” only once.
Context: ...ge support** with language selection in system tray - System tray controls for easy access - **Keyboard...

(PHRASE_REPETITION)


70-70: Specify a language for the fenced code block.
Markdown guidelines recommend specifying a language (e.g. bash or shell) to improve syntax highlighting and readability.

-```
+```bash
 GLADIA_API_KEY=your_gladia_api_key

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

70-70: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

</details>

</details>

</blockquote></details>
<details>
<summary>go/go-speech-transcriber/go-speech-transcriber.go (3)</summary><blockquote>

`45-45`: **Graceful handling of session initialization errors.**  
Your session initialization includes a short 3-second timeout. This might be tight for slower connections or higher latencies. Consider increasing the timeout or making it configurable to avoid sporadic failures.

---

`292-298`: **Clarify or remove the 'FIX THE BYTE CONVERSION' comment.**  
The existing logic to convert each 16-bit sample into two bytes in little-endian order looks correct. If no issue exists, remove or rephrase the comment to avoid confusion.  

```diff
-// Convert buffer to bytes - FIX THE BYTE CONVERSION
+// Convert buffer to bytes in little-endian format

613-761: Potential over-reliance on platform-specific key codes.
Hardcoding values like cmd_l: 56 or alt: 3675 can cause unexpected behavior on some systems. Consider making these configurable or verifying them at runtime if the gohook library provides enumerations for system keys.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75368d7 and dfc2717.

⛔ Files ignored due to path filters (1)
  • go/go-speech-transcriber/go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • go/go-speech-transcriber/README.md (1 hunks)
  • go/go-speech-transcriber/go-speech-transcriber.go (1 hunks)
  • go/go-speech-transcriber/go.mod (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • go/go-speech-transcriber/go.mod
🧰 Additional context used
🪛 golangci-lint (1.62.2)
go/go-speech-transcriber/go-speech-transcriber.go

19-19: could not import github.com/getlantern/systray (-: # github.com/getlantern/systray
/tmp/go/.go-mod-cache/github.com/getlantern/systray@v1.2.2/systray.go:78:2: undefined: nativeLoop
/tmp/go/.go-mod-cache/github.com/getlantern/systray@v1.2.2/systray.go:106:2: undefined: registerSystray
/tmp/go/.go-mod-cache/github.com/getlantern/systray@v1.2.2/systray.go:111:14: undefined: quit
/tmp/go/.go-mod-cache/github.com/getlantern/systray@v1.2.2/systray.go:136:2: undefined: addSeparator
/tmp/go/.go-mod-cache/github.com/getlantern/systray@v1.2.2/systray.go:190:2: undefined: hideMenuItem
/tmp/go/.go-mod-cache/github.com/getlantern/systray@v1.2.2/systray.go:195:2: undefined: showMenuItem
/tmp/go/.go-mod-cache/github.com/getlantern/systray@v1.2.2/systray.go:220:2: undefined: addOrUpdateMenuItem
/tmp/go/.go-mod-cache/github.com/getlantern/systray@v1.2.2/systray_linux.go:8:2: undefined: SetIcon)

(typecheck)


20-20: could not import github.com/gordonklaus/portaudio (-: build constraints exclude all Go files in /tmp/go/.go-mod-cache/github.com/gordonklaus/portaudio@v0.0.0-20221027163845-7c3b689db3cc)

(typecheck)


22-22: could not import github.com/micmonay/keybd_event (-: # github.com/micmonay/keybd_event
/tmp/go/.go-mod-cache/github.com/micmonay/keybd_event@v1.1.2/keybd_event.go:20:9: undefined: initKeyBD)

(typecheck)


23-23: could not import github.com/robotn/gohook (-: # github.com/robotn/gohook
/tmp/go/.go-mod-cache/github.com/robotn/gohook@v0.40.0/event.go:51:10: undefined: addEvent
/tmp/go/.go-mod-cache/github.com/robotn/gohook@v0.40.0/event.go:62:7: undefined: Start
/tmp/go/.go-mod-cache/github.com/robotn/gohook@v0.40.0/event.go:75:18: undefined: KeyHold
/tmp/go/.go-mod-cache/github.com/robotn/gohook@v0.40.0/event.go:83:18: undefined: KeyUp
/tmp/go/.go-mod-cache/github.com/robotn/gohook@v0.40.0/event.go:95:22: undefined: KeyUp
/tmp/go/.go-mod-cache/github.com/robotn/gohook@v0.40.0/event.go:96:4: undefined: End
/tmp/go/.go-mod-cache/github.com/robotn/gohook@v0.40.0/event.go:113:7: undefined: Start
/tmp/go/.go-mod-cache/github.com/robotn/gohook@v0.40.0/event.go:121:17: undefined: MouseMove
/tmp/go/.go-mod-cache/github.com/robotn/gohook@v0.40.0/event.go:128:22: undefined: MouseDown
/tmp/go/.go-mod-cache/github.com/robotn/gohook@v0.40.0/event.go:129:4: undefined: End
/tmp/go/.go-mod-cache/github.com/robotn/gohook@v0.40.0/event.go:129:4: too many errors)

(typecheck)

🪛 LanguageTool
go/go-speech-transcriber/README.md

[grammar] ~8-~8: This phrase is duplicated. You should probably use “system tray” only once.
Context: ...ge support** with language selection in system tray - System tray controls for easy access - **Keyboard...

(PHRASE_REPETITION)

🪛 markdownlint-cli2 (0.17.2)
go/go-speech-transcriber/README.md

70-70: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

🔇 Additional comments (2)
go/go-speech-transcriber/go-speech-transcriber.go (2)

145-160: Consider concurrency safe teardown in Stop().
Signaling r.Done <- true then sleeping depends on timing. If the goroutine performing transcription runs slowly, there’s a slight chance of asynchronous overlap. Investigate whether a wait-group or a more robust synchronization approach might be preferable.


762-822: Validate library compatibility and environment constraints.
The static analysis errors suggest unrecognized symbols in systray, portaudio, keybd_event, and gohook.
These are often caused by incomplete environment setup or missing build flags on certain platforms. Confirm that the correct build constraints, OS packages, or cgo dependencies are in place.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
go/go-speech-transcriber/README.md (3)

7-12: Avoid Repetitive "System Tray" References

The Features section includes two bullet points (lines 8 and 9) that both reference the system tray ("language selection in system tray" and "System tray controls for easy access"). Consider merging or rephrasing these to reduce redundancy and improve clarity.

🧰 Tools
🪛 LanguageTool

[grammar] ~8-~8: This phrase is duplicated. You should probably use “system tray” only once.
Context: ...ge support** with language selection in system tray - System tray controls for easy access - **Keyboard...

(PHRASE_REPETITION)


70-72: Specify Language for Fenced Code Block

The fenced code block showing the .env configuration does not have a language identifier. Specifying a language (for example, using bash or dotenv) would improve syntax highlighting and readability.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

70-70: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


111-113: Enhance Clarity with a Definite Article

In the Keyboard Controls section, the bullet point starting with "If using double_cmd option, press Right Command key twice quickly to toggle recording" would read more clearly with the insertion of the definite article. Consider revising it to "press the Right Command key twice quickly…".

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfc2717 and 607a6c3.

📒 Files selected for processing (1)
  • go/go-speech-transcriber/README.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
go/go-speech-transcriber/README.md

[grammar] ~8-~8: This phrase is duplicated. You should probably use “system tray” only once.
Context: ...ge support** with language selection in system tray - System tray controls for easy access - **Keyboard...

(PHRASE_REPETITION)


[uncategorized] ~117-~117: You might be missing the article “the” here.
Context: ...cOS) to start/stop recording - If using double_cmd option, press Right Command key twice q...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~117-~117: You might be missing the article “the” here.
Context: ...ing - If using double_cmd option, press Right Command key twice quickly to toggle rec...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🪛 markdownlint-cli2 (0.17.2)
go/go-speech-transcriber/README.md

70-70: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
go/go-speech-transcriber/README.md (1)

1-164: Comprehensive and Well-Structured Documentation

The README provides thorough and clear instructions covering the application's purpose, features, prerequisites, installation steps, platform-specific build commands, usage examples, and troubleshooting tips. The structure and level of detail are well-suited for users looking to get started with the Go Speech Transcriber.

🧰 Tools
🪛 LanguageTool

[grammar] ~8-~8: This phrase is duplicated. You should probably use “system tray” only once.
Context: ...ge support** with language selection in system tray - System tray controls for easy access - **Keyboard...

(PHRASE_REPETITION)


[uncategorized] ~117-~117: You might be missing the article “the” here.
Context: ...cOS) to start/stop recording - If using double_cmd option, press Right Command key twice q...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~117-~117: You might be missing the article “the” here.
Context: ...ing - If using double_cmd option, press Right Command key twice quickly to toggle rec...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🪛 markdownlint-cli2 (0.17.2)

70-70: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

Copy link
Copy Markdown
Contributor

@nmorel nmorel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the root folder ?
Move it to integrations-examples/speech-transcriber for example.
The "language" folders show simple Gladia usage. Here, you have a complete tool and not a simple Gladia usage in Go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants