Skip to content

feat: Add proxy support to all cloud-based recognizers#867

Open
HenryXiaoYang wants to merge 1 commit intoUberi:masterfrom
HenryXiaoYang:master
Open

feat: Add proxy support to all cloud-based recognizers#867
HenryXiaoYang wants to merge 1 commit intoUberi:masterfrom
HenryXiaoYang:master

Conversation

@HenryXiaoYang
Copy link
Copy Markdown

Add a proxy_url attribute on Recognizer (matching the existing operation_timeout pattern) and thread it through all network-making code paths. A centralized speech_recognition/proxy.py utility module provides helpers for urllib, httpx, requests, boto3, and gRPC.

  • proxy_url=None uses system/env proxy settings (backward compatible)
  • proxy_url="" explicitly disables proxies
  • proxy_url="http://host:port" uses that proxy
  • proxy_url="socks5://host:port" SOCKS proxy (requires PySocks)

Solve issue:

Add a `proxy_url` attribute on `Recognizer` (matching the existing
`operation_timeout` pattern) and thread it through all network-making
code paths. A centralized `speech_recognition/proxy.py` utility module
provides helpers for urllib, httpx, requests, boto3, and gRPC.

- proxy_url=None uses system/env proxy settings (backward compatible)
- proxy_url="" explicitly disables proxies
- proxy_url="http://host:port" uses that proxy
- proxy_url="socks5://host:port" SOCKS proxy (requires PySocks)
Copilot AI review requested due to automatic review settings February 11, 2026 06:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class proxy configuration to SpeechRecognition’s cloud recognizers by introducing a Recognizer.proxy_url attribute and routing all network calls through centralized proxy helpers.

Changes:

  • Add Recognizer.proxy_url and thread it through urllib / requests / boto3 / httpx / gRPC call sites.
  • Introduce speech_recognition/proxy.py with helpers for building proxy-aware clients/opener/config and a gRPC env context manager.
  • Add documentation (README + library reference) and unit tests for the proxy utilities and new attribute.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
speech_recognition/proxy.py New centralized proxy helpers for urllib/httpx/requests/boto3/gRPC.
speech_recognition/__init__.py Adds proxy_url attribute to Recognizer and applies proxy-aware wrappers to legacy cloud recognizers.
speech_recognition/recognizers/google.py Routes Google (legacy endpoint) urllib calls through urlopen_with_proxy.
speech_recognition/recognizers/google_cloud.py Wraps Google Cloud gRPC calls with a proxy env context manager.
speech_recognition/recognizers/whisper_api/openai.py Threads proxy config into OpenAI client creation via httpx client injection.
speech_recognition/recognizers/whisper_api/groq.py Threads proxy config into Groq client creation via httpx client injection.
tests/test_proxy.py Adds unit tests for proxy helper behavior and Recognizer.proxy_url presence.
README.rst Documents proxy_url usage and semantics.
reference/library-reference.rst Adds reference docs for recognizer_instance.proxy_url.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



def obtain_transcription(request: Request, timeout: int) -> str:
def obtain_transcription(request: Request, timeout: int, proxy_url: str | None = None) -> str:
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obtain_transcription() now accepts proxy_url, but timeout is still annotated as int even though callers pass Recognizer.operation_timeout (documented as Union[float, None]). Update the signature to accept float | None to match the public attribute type and avoid type-checker inconsistencies.

Suggested change
def obtain_transcription(request: Request, timeout: int, proxy_url: str | None = None) -> str:
def obtain_transcription(
request: Request,
timeout: float | None,
proxy_url: str | None = None,
) -> str:

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +87
def urlopen_with_proxy(request: Request, timeout: int | None, proxy_url: str | None):
"""Drop-in replacement for ``urlopen()`` that respects *proxy_url*."""
opener = build_urllib_opener(proxy_url)
kwargs = {}
if timeout is not None:
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urlopen_with_proxy types timeout as int | None, but Recognizer.operation_timeout is documented as Union[float, None] and is passed through in several call sites. Update the annotation here (and any related ones) to accept float | None to match the public API and actual usage.

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +109
return httpx.Client(proxy=None)

return httpx.Client(proxy=proxy_url)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For proxy_url == "" the intent is to disable proxies, but httpx.Client(proxy=None) is also the default and can still honor environment proxies when trust_env=True (httpx default). Consider constructing the client with trust_env=False (and optionally set trust_env=False when an explicit proxy_url is provided) so the documented semantics are actually enforced.

Suggested change
return httpx.Client(proxy=None)
return httpx.Client(proxy=proxy_url)
# Explicitly disable proxies: ignore environment proxies as well.
return httpx.Client(proxy=None, trust_env=False)
# Use the explicit proxy URL and ignore environment proxy settings.
return httpx.Client(proxy=proxy_url, trust_env=False)

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +65
proxy_url = getattr(recognizer, "proxy_url", None)
client_kwargs = {}
if proxy_url is not None:
from speech_recognition.proxy import build_httpx_client

http_client = build_httpx_client(proxy_url)
if http_client is not None:
client_kwargs["http_client"] = http_client

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a new httpx.Client via build_httpx_client() on each call, but the client is never closed. Over time this can leak connections/file descriptors. Prefer reusing a single client per Recognizer (or per module) and closing it when done, or ensure the temporary client is closed after the request.

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +61
proxy_url = getattr(recognizer, "proxy_url", None)
client_kwargs = {}
if proxy_url is not None:
from speech_recognition.proxy import build_httpx_client

http_client = build_httpx_client(proxy_url)
if http_client is not None:
client_kwargs["http_client"] = http_client

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as OpenAI: a new httpx.Client may be created per call and never closed. Consider reusing a shared/per-recognizer client or explicitly closing the temporary client after the transcription call to avoid resource leaks.

Copilot uses AI. Check for mistakes.
Comment thread tests/test_proxy.py
Comment on lines +50 to +55
opener = build_urllib_opener("socks5://proxy.example.com:1080")
self.assertIsNotNone(opener)
mock_handler.assert_called_once_with(
2, "proxy.example.com", 1080, username=None, password=None
)

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test replaces SocksiPyHandler with a plain MagicMock, but urllib.request.build_opener() expects handlers with a numeric handler_order and BaseHandler-like behavior; using a mock here can cause sorting/type errors inside build_opener. Consider patching speech_recognition.proxy.build_opener in the test and asserting it was called with the expected handler args, or set handler_order (int) on the mock handler instance.

Suggested change
opener = build_urllib_opener("socks5://proxy.example.com:1080")
self.assertIsNotNone(opener)
mock_handler.assert_called_once_with(
2, "proxy.example.com", 1080, username=None, password=None
)
with patch("speech_recognition.proxy.build_opener") as mock_build_opener:
mock_opener = MagicMock()
mock_build_opener.return_value = mock_opener
opener = build_urllib_opener("socks5://proxy.example.com:1080")
self.assertIs(opener, mock_opener)
mock_handler.assert_called_once_with(
2, "proxy.example.com", 1080, username=None, password=None
)
mock_build_opener.assert_called_once()

Copilot uses AI. Check for mistakes.
Comment thread tests/test_proxy.py
},
):
result = build_boto3_proxy_config("")
mock_config_cls.assert_called_once_with(proxies={})
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable result is not used.

Suggested change
mock_config_cls.assert_called_once_with(proxies={})
mock_config_cls.assert_called_once_with(proxies={})
self.assertEqual(result, mock_config_cls.return_value)

Copilot uses AI. Check for mistakes.
Comment thread tests/test_proxy.py
"botocore.config": mock_botocore_config,
},
):
result = build_boto3_proxy_config("http://proxy:8080")
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable result is not used.

Copilot uses AI. Check for mistakes.
Comment thread tests/test_proxy.py
os.environ.pop("http_proxy", None)

def test_restores_on_exception(self):
original = os.environ.get("http_proxy")
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable original is not used.

Copilot uses AI. Check for mistakes.
Comment thread tests/test_proxy.py
with grpc_proxy_env("http://proxy:8080"):
raise TestError("boom")

self.assertEqual(os.environ.get("http_proxy"), original)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unreachable.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants