feat: Add proxy support to all cloud-based recognizers#867
feat: Add proxy support to all cloud-based recognizers#867HenryXiaoYang wants to merge 1 commit intoUberi:masterfrom
Conversation
Add a `proxy_url` attribute on `Recognizer` (matching the existing `operation_timeout` pattern) and thread it through all network-making code paths. A centralized `speech_recognition/proxy.py` utility module provides helpers for urllib, httpx, requests, boto3, and gRPC. - proxy_url=None uses system/env proxy settings (backward compatible) - proxy_url="" explicitly disables proxies - proxy_url="http://host:port" uses that proxy - proxy_url="socks5://host:port" SOCKS proxy (requires PySocks)
There was a problem hiding this comment.
Pull request overview
Adds first-class proxy configuration to SpeechRecognition’s cloud recognizers by introducing a Recognizer.proxy_url attribute and routing all network calls through centralized proxy helpers.
Changes:
- Add
Recognizer.proxy_urland thread it through urllib / requests / boto3 / httpx / gRPC call sites. - Introduce
speech_recognition/proxy.pywith helpers for building proxy-aware clients/opener/config and a gRPC env context manager. - Add documentation (README + library reference) and unit tests for the proxy utilities and new attribute.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
speech_recognition/proxy.py |
New centralized proxy helpers for urllib/httpx/requests/boto3/gRPC. |
speech_recognition/__init__.py |
Adds proxy_url attribute to Recognizer and applies proxy-aware wrappers to legacy cloud recognizers. |
speech_recognition/recognizers/google.py |
Routes Google (legacy endpoint) urllib calls through urlopen_with_proxy. |
speech_recognition/recognizers/google_cloud.py |
Wraps Google Cloud gRPC calls with a proxy env context manager. |
speech_recognition/recognizers/whisper_api/openai.py |
Threads proxy config into OpenAI client creation via httpx client injection. |
speech_recognition/recognizers/whisper_api/groq.py |
Threads proxy config into Groq client creation via httpx client injection. |
tests/test_proxy.py |
Adds unit tests for proxy helper behavior and Recognizer.proxy_url presence. |
README.rst |
Documents proxy_url usage and semantics. |
reference/library-reference.rst |
Adds reference docs for recognizer_instance.proxy_url. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
|
|
||
| def obtain_transcription(request: Request, timeout: int) -> str: | ||
| def obtain_transcription(request: Request, timeout: int, proxy_url: str | None = None) -> str: |
There was a problem hiding this comment.
obtain_transcription() now accepts proxy_url, but timeout is still annotated as int even though callers pass Recognizer.operation_timeout (documented as Union[float, None]). Update the signature to accept float | None to match the public attribute type and avoid type-checker inconsistencies.
| def obtain_transcription(request: Request, timeout: int, proxy_url: str | None = None) -> str: | |
| def obtain_transcription( | |
| request: Request, | |
| timeout: float | None, | |
| proxy_url: str | None = None, | |
| ) -> str: |
| def urlopen_with_proxy(request: Request, timeout: int | None, proxy_url: str | None): | ||
| """Drop-in replacement for ``urlopen()`` that respects *proxy_url*.""" | ||
| opener = build_urllib_opener(proxy_url) | ||
| kwargs = {} | ||
| if timeout is not None: |
There was a problem hiding this comment.
urlopen_with_proxy types timeout as int | None, but Recognizer.operation_timeout is documented as Union[float, None] and is passed through in several call sites. Update the annotation here (and any related ones) to accept float | None to match the public API and actual usage.
| return httpx.Client(proxy=None) | ||
|
|
||
| return httpx.Client(proxy=proxy_url) |
There was a problem hiding this comment.
For proxy_url == "" the intent is to disable proxies, but httpx.Client(proxy=None) is also the default and can still honor environment proxies when trust_env=True (httpx default). Consider constructing the client with trust_env=False (and optionally set trust_env=False when an explicit proxy_url is provided) so the documented semantics are actually enforced.
| return httpx.Client(proxy=None) | |
| return httpx.Client(proxy=proxy_url) | |
| # Explicitly disable proxies: ignore environment proxies as well. | |
| return httpx.Client(proxy=None, trust_env=False) | |
| # Use the explicit proxy URL and ignore environment proxy settings. | |
| return httpx.Client(proxy=proxy_url, trust_env=False) |
| proxy_url = getattr(recognizer, "proxy_url", None) | ||
| client_kwargs = {} | ||
| if proxy_url is not None: | ||
| from speech_recognition.proxy import build_httpx_client | ||
|
|
||
| http_client = build_httpx_client(proxy_url) | ||
| if http_client is not None: | ||
| client_kwargs["http_client"] = http_client | ||
|
|
There was a problem hiding this comment.
This creates a new httpx.Client via build_httpx_client() on each call, but the client is never closed. Over time this can leak connections/file descriptors. Prefer reusing a single client per Recognizer (or per module) and closing it when done, or ensure the temporary client is closed after the request.
| proxy_url = getattr(recognizer, "proxy_url", None) | ||
| client_kwargs = {} | ||
| if proxy_url is not None: | ||
| from speech_recognition.proxy import build_httpx_client | ||
|
|
||
| http_client = build_httpx_client(proxy_url) | ||
| if http_client is not None: | ||
| client_kwargs["http_client"] = http_client | ||
|
|
There was a problem hiding this comment.
Same as OpenAI: a new httpx.Client may be created per call and never closed. Consider reusing a shared/per-recognizer client or explicitly closing the temporary client after the transcription call to avoid resource leaks.
| opener = build_urllib_opener("socks5://proxy.example.com:1080") | ||
| self.assertIsNotNone(opener) | ||
| mock_handler.assert_called_once_with( | ||
| 2, "proxy.example.com", 1080, username=None, password=None | ||
| ) | ||
|
|
There was a problem hiding this comment.
This test replaces SocksiPyHandler with a plain MagicMock, but urllib.request.build_opener() expects handlers with a numeric handler_order and BaseHandler-like behavior; using a mock here can cause sorting/type errors inside build_opener. Consider patching speech_recognition.proxy.build_opener in the test and asserting it was called with the expected handler args, or set handler_order (int) on the mock handler instance.
| opener = build_urllib_opener("socks5://proxy.example.com:1080") | |
| self.assertIsNotNone(opener) | |
| mock_handler.assert_called_once_with( | |
| 2, "proxy.example.com", 1080, username=None, password=None | |
| ) | |
| with patch("speech_recognition.proxy.build_opener") as mock_build_opener: | |
| mock_opener = MagicMock() | |
| mock_build_opener.return_value = mock_opener | |
| opener = build_urllib_opener("socks5://proxy.example.com:1080") | |
| self.assertIs(opener, mock_opener) | |
| mock_handler.assert_called_once_with( | |
| 2, "proxy.example.com", 1080, username=None, password=None | |
| ) | |
| mock_build_opener.assert_called_once() |
| }, | ||
| ): | ||
| result = build_boto3_proxy_config("") | ||
| mock_config_cls.assert_called_once_with(proxies={}) |
There was a problem hiding this comment.
Variable result is not used.
| mock_config_cls.assert_called_once_with(proxies={}) | |
| mock_config_cls.assert_called_once_with(proxies={}) | |
| self.assertEqual(result, mock_config_cls.return_value) |
| "botocore.config": mock_botocore_config, | ||
| }, | ||
| ): | ||
| result = build_boto3_proxy_config("http://proxy:8080") |
There was a problem hiding this comment.
Variable result is not used.
| os.environ.pop("http_proxy", None) | ||
|
|
||
| def test_restores_on_exception(self): | ||
| original = os.environ.get("http_proxy") |
There was a problem hiding this comment.
Variable original is not used.
| with grpc_proxy_env("http://proxy:8080"): | ||
| raise TestError("boom") | ||
|
|
||
| self.assertEqual(os.environ.get("http_proxy"), original) |
There was a problem hiding this comment.
This statement is unreachable.
Add a
proxy_urlattribute onRecognizer(matching the existingoperation_timeoutpattern) and thread it through all network-making code paths. A centralizedspeech_recognition/proxy.pyutility module provides helpers for urllib, httpx, requests, boto3, and gRPC.Solve issue: