Skip to content

Conversation

@Dwij1704
Copy link
Member

@Dwij1704 Dwij1704 commented Jul 18, 2025


EntelligenceAI PR Summary

  • Migrated all HTTP and authentication flows to async using aiohttp (agentops/client/api/base.py, agentops/client/api/versions/v3.py, agentops/client/api/versions/v4.py, agentops/client/http/http_client.py)
  • Refactored client and SDK for dynamic, non-blocking JWT authentication and telemetry (agentops/client/client.py, agentops/sdk/core.py, agentops/sdk/exporters.py)
  • Improved error handling and logging for API key validation and JWT retrieval (agentops/config.py, agentops/validation.py)
  • Updated tests and dependencies to support async architecture (tests/unit/test_validation.py, pyproject.toml)
  • Enhanced maintainability, flexibility, and async compatibility across the codebase

@codecov
Copy link

codecov bot commented Jul 18, 2025

@areibman
Copy link
Contributor

Review Summary

🏷️ Draft Comments (18)

Skipped posting 18 drafted comments based on your review threshold. Feel free to update them here.

agentops/client/api/base.py (1)

39-39: The BaseApiClient retains a requests.Response object in self.last_response, which is unused and can cause unnecessary memory retention for large responses.

Scores:

  • Production Impact: 2
  • Fix Specificity: 3
  • Urgency Impact: 2
  • Total Score: 7

Reason for filtering: The comment correctly identifies that self.last_response is defined at line 39 and points out a legitimate performance concern. The attribute stores a requests.Response object that could retain large response data in memory unnecessarily. While we can't see the full class implementation to confirm it's completely unused, the comment raises a valid architectural concern about memory management in an async-designed client.

Analysis: This is a valid performance optimization suggestion. The line numbers are accurate (39), and the technical concern about memory retention from unused Response objects is legitimate. Production impact is low (2) since it's a gradual memory issue rather than a crash. Fix specificity is moderate (3) as it provides clear direction to remove the attribute but doesn't show the complete removal process. Urgency is low (2) since this is an optimization rather than a critical bug.


agentops/client/api/versions/v3.py (2)

66-67: fetch_auth_token returns None on any exception, causing silent authentication failures and breaking contract that expects AuthTokenResponse or error.

Scores:

  • Production Impact: 4
  • Fix Specificity: 5
  • Urgency Impact: 3
  • Total Score: 12

Reason for filtering: The comment correctly identifies a real issue at the specified line numbers. The method signature indicates it should return AuthTokenResponse but the exception handler returns None, which breaks the contract and causes silent failures. The suggested fix is technically sound and improves error handling.

Analysis: This is a valid correctness issue. The method's return type annotation suggests it should return AuthTokenResponse or raise an exception, but the current implementation silently returns None on any error. This can lead to difficult-to-debug authentication failures in production. The suggested fix properly logs the error and re-raises the exception, maintaining the expected contract. High production impact due to authentication being critical, perfect fix specificity with clear implementation, and medium urgency as it affects error handling but doesn't immediately break functionality.


27-67: fetch_auth_token does not validate or sanitize the api_key input, allowing attackers to supply malicious or malformed API keys that could be logged or used in backend requests, potentially leading to sensitive data exposure or backend abuse.

Scores:

  • Production Impact: 3
  • Fix Specificity: 4
  • Urgency Impact: 3
  • Total Score: 10

Reason for filtering: This is a valid security concern. The method accepts an api_key parameter without any validation and passes it directly to the backend request. The line numbers correctly identify the method (27-67), the bug description accurately describes a real security issue, and the commitable suggestion provides proper input validation without breaking existing functionality.

Analysis: The comment identifies a legitimate security vulnerability where unvalidated input could be exploited. While the production impact is moderate (3) since API keys are typically controlled inputs, the security implications are real. The fix is highly specific (4) with clear validation logic, and the urgency is medium (3) as it's a security issue that should be addressed but won't immediately break the system. The suggested validation (type check, length limit, alphanumeric constraint) is appropriate for API key inputs.


agentops/client/api/versions/v4.py (3)

110-115: upload_logfile accepts trace_id: str but the docstring and header construction use str(trace_id), which will silently convert non-string types and may mask bugs if an int is passed.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The line numbers 110-115 are incorrect - they point to docstring content, not the function signature or the problematic str(trace_id) usage which is actually at line 127. The commitable suggestion would break the code by overwriting docstring lines with duplicate function definition code. While the underlying issue about str(trace_id) is valid, the comment fails basic accuracy requirements for line number correspondence.

Analysis: This comment identifies a real issue (unnecessary str() conversion of already-typed string parameter) but fails on implementation details. The incorrect line numbers and unsafe suggestion make it unsuitable for direct application, despite the valid underlying concern about type safety.


66-69: The post method always sends the request body as JSON with a single body key, which may cause unnecessary serialization overhead and limits flexibility for large or already-serialized payloads.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The commitable suggestion would break the code. The method signature specifies body: Union[str, bytes], but the suggestion checks for isinstance(body, (dict, list)) which are not part of the allowed types. This creates a logical inconsistency where the code checks for types that should never be passed according to the method's contract.

Analysis: While the comment correctly identifies that wrapping the body in json={'body': body} is unusual, the suggested fix is incompatible with the method's type signature. The method explicitly accepts only str or bytes, but the suggestion assumes dict or list inputs. This mismatch would either cause runtime errors or violate the method's contract. The comment fails the commitable suggestion safety check.


71-106: upload_object and upload_logfile do not validate or sanitize the body parameter, allowing attackers to upload arbitrary content, which could lead to storage of malicious files or code execution if downstream consumers process these files unsafely.

Scores:

  • Production Impact: 3
  • Fix Specificity: 4
  • Urgency Impact: 3
  • Total Score: 10

Reason for filtering: This is a valid security concern. The upload_object method at lines 71-106 does indeed accept arbitrary content without validation or sanitization. The comment correctly identifies that the body parameter (Union[str, bytes]) is processed and uploaded without any content filtering, which could allow malicious content to be stored. The commitable suggestion provides a reasonable approach to mitigate this by adding basic pattern matching for common dangerous content patterns.

Analysis: The security vulnerability is real - the method accepts any content and uploads it without validation. However, the production impact is moderate (3) because the actual risk depends on how downstream systems process the uploaded content. The fix is quite specific (4) with clear validation patterns and proper error handling. Urgency is medium (3) as this is a security issue but not immediately critical unless the system is actively being exploited. The suggested validation approach is practical and preserves existing functionality while adding necessary security checks.


agentops/client/client.py (2)

1015-1016: _fetch_auth_async swallows all exceptions and returns None, causing silent authentication failures and making debugging impossible if the async task fails unexpectedly.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The structured context is empty, making it impossible to verify that the reported issue exists at the specified line numbers (1015-1016) in agentops/client/client.py. Without the actual code context, I cannot confirm that the _fetch_auth_async method exists, that it has the problematic exception handling described, or that the line numbers are correct. This violates the fundamental requirement to verify issues against the structured_context.

Analysis: Cannot evaluate the technical merit of this comment due to missing structured context. The comment describes a legitimate-sounding issue (silent exception swallowing in authentication), but without the actual code to verify against, it must be removed according to the evaluation criteria that prioritize verification against structured_context.


118-146: _start_auth_task creates a new thread and event loop for every call when no event loop is running, which can lead to thread/resource leaks if called repeatedly (e.g., in a server or multi-init context).

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: While the bug description accurately identifies a real thread leak issue in the _start_auth_task method, the commitable suggestion would break the code. The suggestion references self._auth_thread without initializing this attribute in new or init, which would cause AttributeError at runtime when the hasattr checks and thread operations are performed.

Analysis: The comment correctly identifies a thread leak vulnerability where repeated calls to _start_auth_task create new background threads without cleanup. However, the proposed fix introduces a critical bug by referencing an uninitialized self._auth_thread attribute, making the suggestion unsafe to apply.


agentops/client/http/http_client.py (2)

113-169: async_request does not handle HTTP redirects, so requests to endpoints that return 3xx will not follow redirects and may silently fail, breaking compatibility with APIs that use redirects.

Scores:

  • Production Impact: 3
  • Fix Specificity: 5
  • Urgency Impact: 2
  • Total Score: 10

Reason for filtering: Valid comment identifying a real issue. The async_request method at lines 113-169 does not explicitly handle HTTP redirects, which could cause silent failures when APIs return 3xx responses. The commitable suggestion provides a comprehensive fix with proper redirect handling, limits, and logging.

Analysis: Production impact is moderate (3) because while redirect failures could break API integrations, many modern HTTP libraries handle redirects automatically. Fix specificity is excellent (5) with a complete, immediately applicable solution including redirect limits, proper status code handling, and method conversion for 303 redirects. Urgency is low-medium (2) as this is a compatibility issue that should be addressed but won't cause immediate system failures.


112-169: async_request does not validate or sanitize the url parameter, allowing attackers to supply arbitrary URLs and potentially exfiltrate sensitive data or perform SSRF if user input is passed directly.

Scores:

  • Production Impact: 4
  • Fix Specificity: 5
  • Urgency Impact: 3
  • Total Score: 12

Reason for filtering: This is a valid security concern. The async_request method at lines 112-169 does indeed accept a url parameter without any validation and passes it directly to session.request(). This creates a potential SSRF (Server-Side Request Forgery) vulnerability if user-controlled input reaches this method. The commitable suggestion properly adds URL validation using urllib.parse.urlparse to check for valid scheme and netloc, which is a standard security practice.

Analysis: High production impact (4) because SSRF vulnerabilities can lead to serious security breaches including internal network scanning, data exfiltration, and access to internal services. Perfect fix specificity (5) as the suggestion provides complete, ready-to-use code with proper URL validation. Medium urgency (3) because while this is a security issue, it requires an attack vector where untrusted user input reaches this method, making it not immediately critical but important to address in the near term.


agentops/config.py (2)

193-193: self.default_tags = set(default_tags) will raise a TypeError if default_tags is already a set, as set(set) is valid but may not be intended and could cause subtle bugs if a set of sets is ever passed.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The comment contains a fundamental technical error. The claim that 'set(set)' will raise a TypeError is incorrect. In Python, calling set() on an existing set creates a new set with the same elements - it does not create a 'set of sets' as claimed. The code set(default_tags) works correctly whether default_tags is a list, tuple, set, or any other iterable. The comment is based on a misunderstanding of Python's set() constructor behavior.

Analysis: The review comment demonstrates a misunderstanding of Python's set() constructor. When you call set() on an existing set, it creates a new set with the same elements, not a 'set of sets'. The current code is technically correct and handles all iterable inputs properly. The suggested change adds unnecessary complexity and a type check that provides no benefit. This is a false positive that should be removed to avoid confusing developers.


245-266: The dict() method includes self.default_tags (a set), which is not JSON serializable and can cause repeated serialization overhead or runtime errors for large tag sets.

Scores:

  • Production Impact: 3
  • Fix Specificity: 5
  • Urgency Impact: 2
  • Total Score: 10

Reason for filtering: The comment correctly identifies a real issue: sets are not JSON serializable by default, which could cause runtime errors when this dict is serialized. The line numbers match the actual code location, and the suggestion to convert the set to a list is technically sound and safe.

Analysis: This is a valid technical issue. While Python sets are not JSON serializable by default, the dict() method returns self.default_tags as-is, which could cause TypeError when serialized. The fix is specific and correct (convert to list). Production impact is moderate since it depends on usage context - if this dict is never JSON serialized, no issue occurs, but if it is, it will fail. Urgency is low since it's a potential rather than immediate issue.


agentops/sdk/core.py (2)

132-134: AuthenticatedOTLPExporter is used for traces but not for metrics, so metrics exporter may not refresh JWT if token changes, causing authentication failures for metrics after token rotation.

Scores:

  • Production Impact: 4
  • Fix Specificity: 4
  • Urgency Impact: 3
  • Total Score: 11

Reason for filtering: This is a valid technical issue. The code uses AuthenticatedOTLPExporter for traces (which handles JWT refresh) but uses standard OTLPMetricExporter with static headers for metrics. After JWT token rotation, metrics authentication would fail while traces continue working. The line numbers correctly identify the metrics exporter setup, and the suggested solution is technically sound.

Analysis: High production impact (4) because authentication failures for metrics after token rotation would cause data loss and monitoring gaps. High fix specificity (4) as the suggestion provides a complete, implementable solution with a custom exporter class. Medium urgency (3) because while it's a real issue, it only manifests after JWT token rotation, which may not happen immediately but should be addressed soon to prevent future authentication failures.


272-304: The shutdown method does not clear self._active_traces after ending all traces, causing potential memory retention of trace contexts after shutdown in long-lived processes.

Scores:

  • Production Impact: 2
  • Fix Specificity: 5
  • Urgency Impact: 2
  • Total Score: 9

Reason for filtering: This is a valid memory management issue. The shutdown method ends all active traces but never clears the self._active_traces dictionary, which could retain references to trace contexts after shutdown. The suggested fix properly adds a thread-safe clear operation after ending traces, which is appropriate for cleanup during shutdown.

Analysis: The comment correctly identifies that self._active_traces is not cleared during shutdown, which could cause memory retention in long-lived processes. While this won't cause crashes (production_impact: 2), it's a legitimate memory leak concern. The fix is very specific and immediately applicable (fix_specificity: 5) with proper thread safety. The urgency is low-medium (urgency_impact: 2) as it only affects long-lived processes and won't break functionality immediately.


agentops/sdk/exporters.py (1)

112-129: The export method repeatedly copies and clears the entire self._session.headers dict for every export, which can be expensive under high-throughput tracing workloads.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The commitable suggestion is functionally identical to the existing code and provides no performance improvement. Both dict(self._session.headers) and self._session.headers.copy() create a shallow copy of the dictionary with the same computational complexity. The suggestion changes the implementation without addressing the underlying performance concern described in the bug_description.

Analysis: While the comment correctly identifies a potential performance issue with repeatedly copying and clearing headers in high-throughput scenarios, the proposed fix is essentially a no-op change. The suggestion replaces dict(self._session.headers) with self._session.headers.copy() - both methods create shallow copies with identical performance characteristics. A meaningful fix would require a more substantial refactoring to avoid header manipulation entirely or use a more efficient approach, making this suggestion misleading and unhelpful.


agentops/validation.py (3)

91-99: get_jwt_token_sync uses asyncio.run inside a thread pool, which can deadlock or crash if called from an existing event loop (e.g., in async code or Jupyter).

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The commitable suggestion contains a critical flaw that would break the code. It suggests using loop.run_until_complete(get_jwt_token(api_key)) when an event loop is already running, but this will cause a RuntimeError because you cannot call run_until_complete on a running event loop. The current implementation correctly uses a ThreadPoolExecutor to avoid this exact issue by running the async code in a separate thread with its own event loop.

Analysis: While the bug description correctly identifies a potential issue with asyncio.run in certain contexts, the proposed solution is fundamentally flawed and would introduce a more severe bug. The current ThreadPoolExecutor approach is actually a valid pattern for running async code from sync contexts. The suggestion would replace working code with broken code that throws RuntimeError when called from async contexts.


72-73: get_jwt_token and get_jwt_token_sync log full exception messages (logger.warning(f"Failed to get JWT token: {e} ...")) which may expose sensitive API keys or internal error details to logs, risking sensitive data exposure if logs are accessible.

Scores:

  • Production Impact: 3
  • Fix Specificity: 5
  • Urgency Impact: 3
  • Total Score: 11

Reason for filtering: This is a valid security concern. The code at lines 72-73 does log the full exception object using f-string interpolation (logger.warning(f"Failed to get JWT token: {e} - continuing without authentication")), which could potentially expose sensitive information like API keys or internal system details in the logs. The suggested fix appropriately removes the exception details while maintaining the informational value of the log message.

Analysis: This is a legitimate security issue where exception details could leak sensitive information to logs. The production impact is moderate (3) as it depends on log accessibility and exception content. The fix is very specific (5) with a clear, direct replacement. Urgency is medium (3) as it's a security concern but not immediately system-breaking. The suggested change maintains functionality while improving security posture.


101-101: get_jwt_token_sync logs exception details (logger.warning(f"Failed to get JWT token synchronously: {e}")) which may leak sensitive information to logs if exceptions contain API keys or internal errors.

Scores:

  • Production Impact: 3
  • Fix Specificity: 5
  • Urgency Impact: 3
  • Total Score: 11

Reason for filtering: This is a valid security concern. The code at line 101 does log the full exception object using f-string interpolation, which could potentially expose sensitive information like API keys, tokens, or internal system details in the logs. The suggested fix appropriately removes the exception details while maintaining useful logging for debugging purposes.

Analysis: The comment correctly identifies a security issue where exception details are logged, which is a common vector for information leakage. Production impact is moderate (3) as it depends on what sensitive data might be in exceptions and log access controls. Fix specificity is high (5) with a clear, direct replacement. Urgency is medium (3) as it's a security concern but not immediately critical unless logs are compromised or contain highly sensitive data.


@areibman
Copy link
Contributor

Walkthrough

This PR refactors the AgentOps client and SDK to support asynchronous workflows throughout the authentication, HTTP, and telemetry layers. Core API clients, HTTP handling, and validation logic are modernized to use async methods and aiohttp, with improved error handling and dynamic JWT token management. The telemetry exporter and tracing setup now support runtime token refresh. Tests and configuration are updated to align with the new async architecture, and dependency management is updated to include aiohttp.

Changes

File(s) Summary
agentops/client/api/base.py Refactored BaseApiClient to use async HTTP methods and async_request with aiohttp; updated return types and docstrings for async behavior.
agentops/client/api/versions/v3.py Refactored fetch_auth_token to async, using async_request; updated error handling and imports for async support.
agentops/client/api/versions/v4.py Refactored V4 API client: added async post method, updated upload methods to use it, improved error handling, changed trace_id to str, and simplified response handling.
agentops/client/http/http_client.py Refactored HttpClient for async-first HTTP requests with aiohttp; added async session management and async_request; retained sync fallback for legacy support.
agentops/client/client.py Introduced async authentication handling, thread-safe JWT/project ID management, background auth initialization, and removed legacy sync logic.
agentops/config.py Modified configure to log warnings instead of raising on invalid API keys; removed InvalidApiKeyException import.
agentops/sdk/core.py Refactored telemetry to use dynamic jwt_provider, updated tracing setup, added update_config, improved shutdown, and switched to AuthenticatedOTLPExporter.
agentops/sdk/exporters.py Enhanced AuthenticatedOTLPExporter for dynamic JWT via jwt_provider, robust error handling, thread safety, and improved logging.
agentops/validation.py Refactored JWT retrieval to async with aiohttp, added sync wrapper, improved error handling, updated validation and LLM span logic, and enhanced summary reporting.
pyproject.toml Added aiohttp (>=3.8.0,<4.0.0) to dependencies for async HTTP support.
tests/unit/test_validation.py Updated tests to use get_jwt_token_sync, aligned with new error handling, and cleaned up imports.

Sequence Diagram

This diagram shows the interactions between components:

sequenceDiagram
    title BaseApiClient Asynchronous HTTP Operations

    participant Client as "Client Code"
    participant BaseApiClient
    participant HttpClient

    Note over BaseApiClient: Changed from synchronous to<br/>asynchronous HTTP methods

    %% GET request flow
    Client->>BaseApiClient: async get(path, headers)
    activate BaseApiClient
    BaseApiClient->>BaseApiClient: _get_full_url(path)
    BaseApiClient->>HttpClient: async_request(method="get", url, headers, timeout)
    activate HttpClient
    Note right of HttpClient: Replaced direct HTTP client<br/>with static HttpClient class
    HttpClient-->>BaseApiClient: response_data (Dict or None)
    deactivate HttpClient
    BaseApiClient-->>Client: JSON response as dictionary
    deactivate BaseApiClient

    %% POST request flow
    Client->>BaseApiClient: async post(path, data, headers)
    activate BaseApiClient
    BaseApiClient->>BaseApiClient: _get_full_url(path)
    BaseApiClient->>HttpClient: async_request(method="post", url, data, headers, timeout)
    activate HttpClient
    HttpClient-->>BaseApiClient: response_data (Dict or None)
    deactivate HttpClient
    BaseApiClient-->>Client: JSON response as dictionary
    deactivate BaseApiClient

    %% Error handling
    Client->>BaseApiClient: async any_http_method()
    activate BaseApiClient
    BaseApiClient->>HttpClient: async_request()
    activate HttpClient
    Note over HttpClient: Exception occurs
    HttpClient-->>BaseApiClient: Exception
    deactivate HttpClient
    BaseApiClient-->>Client: Raise Exception with details
    deactivate BaseApiClient

    Note over Client, HttpClient: All HTTP methods (get, post, put, delete)<br/>now return Optional[Dict[str, Any]]<br/>instead of requests.Response
Loading

▶️AI Code Reviews for VS Code, Cursor, Windsurf
Install the extension

Note for Windsurf Please change the default marketplace provider to the following in the windsurf settings:

Marketplace Extension Gallery Service URL: https://marketplace.visualstudio.com/_apis/public/gallery

Marketplace Gallery Item URL: https://marketplace.visualstudio.com/items

Entelligence.ai can learn from your feedback. Simply add 👍 / 👎 emojis to teach it your preferences. More shortcuts below

Emoji Descriptions:

  • ⚠️ Potential Issue - May require further investigation.
  • 🔒 Security Vulnerability - Fix to ensure system safety.
  • 💻 Code Improvement - Suggestions to enhance code quality.
  • 🔨 Refactor Suggestion - Recommendations for restructuring code.
  • ℹ️ Others - General comments and information.

Interact with the Bot:

  • Send a message or request using the format:
    @entelligenceai + *your message*
Example: @entelligenceai Can you suggest improvements for this code?
  • Help the Bot learn by providing feedback on its responses.
    @entelligenceai + *feedback*
Example: @entelligenceai Do not comment on `save_auth` function !

Also you can trigger various commands with the bot by doing
@entelligenceai command

The current supported commands are

  1. config - shows the current config
  2. retrigger_review - retriggers the review

More commands to be added soon.

@areibman
Copy link
Contributor

Review Summary

🏷️ Draft Comments (19)

Skipped posting 19 drafted comments based on your review threshold. Feel free to update them here.

agentops/client/api/base.py (1)

39-39: self.last_response is set only in the old sync code and is now unused, but the attribute remains and may hold large response objects, causing unnecessary memory retention in long-lived clients.

Scores:

  • Production Impact: 2
  • Fix Specificity: 4
  • Urgency Impact: 2
  • Total Score: 8

Reason for filtering: The comment correctly identifies line 39 where self.last_response is initialized. The issue is technically accurate - this attribute appears to be unused legacy code that could cause memory retention issues in long-lived clients. The comment provides a clear fix direction (remove the attribute) and addresses a legitimate performance concern.

Analysis: This is a valid performance optimization comment. The line numbers are correct (39), and the technical assessment is accurate - unused attributes that hold response objects can cause memory leaks in long-lived applications. Production impact is low (2) since it's a gradual memory issue rather than a crash. Fix specificity is high (4) with clear removal instructions. Urgency is low (2) as it's an optimization rather than a critical bug.


agentops/client/api/versions/v3.py (3)

66-67: fetch_auth_token returns None on any exception, which can silently mask critical authentication or network errors, making it impossible for callers to distinguish between invalid credentials and system failures.

Scores:

  • Production Impact: 3
  • Fix Specificity: 5
  • Urgency Impact: 3
  • Total Score: 11

Reason for filtering: The comment correctly identifies a real issue at the specified line numbers. Lines 66-67 do contain a bare except Exception: that returns None, which masks all exceptions and makes error diagnosis difficult. The commitable suggestion is syntactically correct and would improve error handling by logging and re-raising exceptions.

Analysis: This is a valid code quality issue. The current exception handling swallows all errors and returns None, making it impossible to distinguish between authentication failures and network/system errors. The suggested fix is specific and immediately applicable - it adds proper logging and re-raises exceptions for upstream handling. While this won't cause immediate production crashes, it impacts debugging and error handling capabilities, warranting moderate production and urgency scores.


27-67: fetch_auth_token does not cache or reuse tokens, causing a full network request for every call, which can significantly degrade performance under high-frequency authentication scenarios.

Scores:

  • Production Impact: 3
  • Fix Specificity: 5
  • Urgency Impact: 2
  • Total Score: 10

Reason for filtering: This is a valid performance optimization comment. The line numbers correctly identify the fetch_auth_token method (lines 27-67), the bug description accurately identifies that the method performs a network request on every call without caching, and the commitable suggestion provides a working implementation that adds token caching while preserving all existing functionality.

Analysis: The comment identifies a legitimate performance issue where authentication tokens are fetched via network request on every call. The suggested fix adds simple in-memory caching that checks for an existing cached token before making the network request, and stores the response for future use. This is a moderate production impact (3) as it affects performance but won't cause crashes, has very high fix specificity (5) with a complete, ready-to-apply solution, and low urgency (2) since it's an optimization rather than a critical bug.


27-67: fetch_auth_token does not validate or sanitize the api_key input, allowing attackers to supply malicious payloads that could be logged or used in downstream requests, potentially leading to sensitive data exposure or injection attacks if the key is not strictly validated elsewhere.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The commitable suggestion would break the code by introducing overly restrictive validation. API keys are not necessarily alphanumeric - they often contain special characters like hyphens, underscores, or other symbols. The suggestion api_key.isalnum() would reject many valid API keys, causing legitimate authentication to fail.

Analysis: While the security concern about input validation is theoretically valid, the proposed fix is fundamentally flawed. API keys commonly contain non-alphanumeric characters, and the isalnum() check would break legitimate usage. The comment fails the commitable suggestion safety check as it would introduce functional regressions by rejecting valid API keys that contain hyphens, underscores, or other common characters used in API key formats.


agentops/client/api/versions/v4.py (2)

66-69: post method always sends the request body as JSON with a body key, which may not match the API's expected payload format and can cause incorrect uploads or server errors.

Scores:

  • Production Impact: 4
  • Fix Specificity: 5
  • Urgency Impact: 3
  • Total Score: 12

Reason for filtering: This is a valid comment that correctly identifies a potential API integration issue. The line numbers are accurate, the suggestion is safe to apply, and the technical concern about wrapping the body in an unnecessary dictionary structure is legitimate and could cause server-side parsing issues.

Analysis: High production impact (4) because incorrect API payload format can cause server errors and failed requests. Perfect fix specificity (5) with a clear, direct code change. Medium urgency (3) as this affects API functionality but may not cause immediate system failure if the server happens to accept the current format.


33-52,89,127: prepare_headers is called on every upload, rebuilding the same headers dict for each request, which can be inefficient for high-frequency API usage.

Scores:

  • Production Impact: 2
  • Fix Specificity: 5
  • Urgency Impact: 2
  • Total Score: 9

Reason for filtering: This is a valid performance optimization comment. The line numbers correctly identify the prepare_headers method definition (33-52) and its usage in upload methods (89, 127). The issue accurately describes that headers are rebuilt on every request, and the commitable suggestion provides a safe optimization by caching base headers in init and using .copy() to avoid rebuilding the static parts repeatedly.

Analysis: This is a legitimate performance optimization for high-frequency API usage. The current code rebuilds the User-Agent header string and creates a new dict on every request. The suggested fix caches the base headers and only copies/modifies as needed. Production impact is low (2) since it's an optimization rather than a bug fix. Fix specificity is high (5) with clear, immediately applicable code changes. Urgency is low (2) as it's a performance enhancement that can be addressed when convenient.


agentops/client/client.py (2)

1015-1016: _fetch_auth_async swallows all exceptions and returns None, masking real authentication errors and making debugging impossible.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The structured context is empty, making it impossible to verify that the reported issue exists at the specified line numbers (1015-1016) in agentops/client/client.py. Without being able to confirm the presence of the _fetch_auth_async method or the exception handling code described in the bug_description, this comment cannot be validated and must be removed to avoid false positives.

Analysis: The comment describes a legitimate concern about exception swallowing in authentication code, and the suggested fix is technically sound (adding logging before returning None). However, without the structured context containing the actual code, it's impossible to verify that this issue exists at the specified location. The evaluation process requires confirmation against the structured_context, and since it's empty, the comment must be removed regardless of its potential validity.


118-146: The _start_auth_task method creates a new thread and event loop for async auth if no loop is running, which can lead to thread proliferation and resource waste if called repeatedly in rapid succession.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The commitable suggestion contains a logical error that would cause an AttributeError. The condition 'not (self._auth_thread and self._auth_thread.is_alive())' will be evaluated even when hasattr returns False, attempting to access a non-existent attribute. This would break the code when applied.

Analysis: While the bug description correctly identifies a potential thread proliferation issue, the suggested fix has a critical flaw in its conditional logic that would introduce an AttributeError. The suggestion attempts to reference self._auth_thread before ensuring it exists, making it unsafe to apply.


agentops/client/http/http_client.py (3)

113-169: async_request does not handle HTTP redirects, so requests to endpoints that return 3xx will not follow redirects and may silently fail, breaking compatibility with APIs that use redirects.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The comment is technically incorrect. By default, aiohttp's ClientSession.request() automatically follows HTTP redirects (up to 10 redirects). The current implementation DOES handle redirects through aiohttp's default behavior. The bug description falsely claims that redirects are not handled when they actually are.

Analysis: This comment should be removed due to technical inaccuracy. The reviewer misunderstood aiohttp's default redirect handling behavior. The current code already handles redirects automatically through aiohttp's built-in functionality, making the suggested changes unnecessary and the bug report invalid.


46-67: get_session uses a double-checked locking pattern with threading.Lock, but the lock is only acquired after the first check, which can cause race conditions and unnecessary session creation under high concurrency.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The comment misunderstands the double-checked locking pattern. The current implementation is correct: the first check without the lock is an optimization to avoid acquiring the lock when the session already exists, while the second check inside the lock prevents race conditions. The suggested change would remove this performance optimization without providing any correctness benefit.

Analysis: This is a classic double-checked locking pattern that is implemented correctly. The first unlocked check is a performance optimization, and the second locked check prevents race conditions. The comment incorrectly identifies this as a bug when it's actually a well-established concurrency pattern.


160-164: async_request returns the full JSON response to the caller without filtering or sanitizing, which could expose sensitive data if the endpoint returns secrets or credentials in the response.

Scores:

  • Production Impact: 3
  • Fix Specificity: 4
  • Urgency Impact: 3
  • Total Score: 10

Reason for filtering: This is a valid security concern. The async_request method at lines 160-164 does return the full JSON response without any filtering, which could potentially expose sensitive data if API responses contain credentials or secrets. The line numbers correctly identify where the response data is parsed and returned. The commitable suggestion provides a reasonable approach to redact common sensitive field names.

Analysis: The comment identifies a legitimate security concern where sensitive data could be exposed through unfiltered API responses. Production impact is moderate (3) because it depends on what APIs are called and whether they return sensitive data. Fix specificity is high (4) as it provides a clear, implementable solution with specific field names to redact. Urgency is medium (3) since it's a security issue but requires specific conditions (sensitive data in API responses) to be exploitable. The suggestion is technically sound and wouldn't break existing functionality.


agentops/config.py (2)

193-193: self.default_tags is set to a set, but ConfigDict and downstream code may expect a list, causing serialization or type errors.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The comment incorrectly identifies a TypeError risk that doesn't exist. Line 193 is already inside an 'if default_tags is not None:' conditional block (line 192), so default_tags cannot be None at this point. The suggested fix is redundant and the bug description is based on a misunderstanding of the code structure.

Analysis: This is a false positive. The reviewer missed that the assignment is already protected by a None check in the conditional structure. The code correctly handles the None case by not executing this line when default_tags is None, making the suggested change unnecessary and the bug report inaccurate.


172-174: api_key is logged in warning messages with the first 8 characters exposed, risking partial sensitive data leakage in logs if logs are accessible to unauthorized users.

Scores:

  • Production Impact: 4
  • Fix Specificity: 5
  • Urgency Impact: 4
  • Total Score: 13

Reason for filtering: This is a valid security concern. The code at lines 172-174 does expose the first 8 characters of the API key in log messages, which constitutes a security vulnerability as partial sensitive data could be leaked in logs. The commitable suggestion properly removes the API key exposure while maintaining the informative nature of the warning message.

Analysis: High production impact (4) because API key leakage in logs is a serious security vulnerability that could lead to unauthorized access if logs are compromised. Perfect fix specificity (5) as the suggestion provides an exact, immediately applicable replacement that removes the security issue. High urgency (4) because security vulnerabilities should be addressed promptly, though the system won't immediately break if not fixed. The suggested fix maintains logging functionality while eliminating the sensitive data exposure.


agentops/sdk/core.py (2)

1116-1117: AuthenticatedOTLPExporter is used in setup_telemetry, but if jwt_provider is None, it may result in missing authentication headers, potentially causing authentication failures for endpoints that require JWT.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The structured context is empty, making it impossible to verify that the reported issue exists at the specified line numbers (1116-1117) in agentops/sdk/core.py. Without the actual code context, I cannot confirm that AuthenticatedOTLPExporter is being instantiated with a potentially None jwt_provider, or that this would cause the described authentication issues. The comment must be removed due to lack of verifiable context.

Analysis: Cannot evaluate the technical merit of this comment because the structured context is empty. The comment describes a potentially valid concern about JWT authentication, but without the actual code to verify against, it's impossible to confirm the issue exists at the specified location or that the suggested fix is appropriate. All scores are 0 due to removal.


278-280: The shutdown method does not clear self._active_traces after ending all traces, causing potential memory retention of trace contexts in long-lived processes.

Scores:

  • Production Impact: 2
  • Fix Specificity: 5
  • Urgency Impact: 2
  • Total Score: 9

Reason for filtering: This is a valid performance issue. The shutdown method ends all active traces but fails to clear the _active_traces dictionary, which could cause memory retention in long-lived processes. The line numbers are correct, the suggestion is safe and technically sound.

Analysis: Production impact is low-moderate (2) as this affects long-lived processes but won't cause crashes. Fix specificity is excellent (5) with a clear, immediately applicable solution. Urgency is low (2) as it's a gradual memory retention issue rather than an immediate failure. The fix is straightforward and the issue identification is accurate.


agentops/sdk/exporters.py (1)

111-129: export method repeatedly updates and restores self._session.headers for every export, causing thread contention and potential race conditions under high concurrency, which can degrade performance and reliability.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The commitable suggestion would break the code by passing a 'headers' parameter to super().export(), which the parent OTLPSpanExporter.export() method does not accept, causing a TypeError.

Analysis: While the comment correctly identifies a real concurrency issue with shared session header mutation, the proposed fix is technically incorrect and would introduce a runtime error. The suggestion assumes the parent export method accepts headers as a parameter, which is not the case for OTLPSpanExporter.


agentops/validation.py (3)

93-99: get_jwt_token_sync uses asyncio.run inside a thread pool, which can deadlock or fail if called from an existing event loop (e.g., in async code or Jupyter), causing runtime failures.

Scores:

  • Production Impact: 0
  • Fix Specificity: 0
  • Urgency Impact: 0
  • Total Score: 0

Reason for filtering: The commitable suggestion contains a critical flaw that would cause deadlocks. The suggested code attempts to use loop.run_until_complete() on a running event loop, which will always fail with 'This event loop is already running' error. The current ThreadPoolExecutor approach is actually a correct way to handle async code from sync contexts.

Analysis: While the comment correctly identifies a potential issue with asyncio.run() in event loop contexts, the proposed solution would introduce worse problems by attempting to use run_until_complete() on an active event loop, causing guaranteed deadlocks. The current implementation using ThreadPoolExecutor is actually a valid pattern for this scenario.


285-353: validate_trace_spans uses a fixed sleep (time.sleep(retry_delay)) in a retry loop, which can cause unnecessary delays and blocks the thread, especially if max_retries is high or called frequently.

Scores:

  • Production Impact: 2
  • Fix Specificity: 4
  • Urgency Impact: 2
  • Total Score: 8

Reason for filtering: This is a valid performance optimization comment. The code does use a fixed sleep delay in the retry loop at line 352 (time.sleep(retry_delay)), and the suggestion to implement exponential backoff is technically sound and would improve the retry behavior by reducing unnecessary waiting time while being more respectful of the API being called.

Analysis: The comment correctly identifies a performance issue where fixed retry delays can cause inefficient waiting patterns. The exponential backoff suggestion is a well-established pattern that would improve API retry behavior. Production impact is low (2) since the current code works but isn't optimal. Fix specificity is high (4) as the exact implementation is provided. Urgency is low (2) since this is an optimization rather than a critical bug.


55-58: get_jwt_token does not validate or sanitize the api_key before sending it in a POST request, allowing an attacker to inject malicious payloads if untrusted input is passed, potentially leading to SSRF or API abuse.

Scores:

  • Production Impact: 3
  • Fix Specificity: 4
  • Urgency Impact: 3
  • Total Score: 10

Reason for filtering: This is a valid security concern. The code at lines 55-58 does send the api_key directly in a POST request without any validation or sanitization. While the API endpoint is controlled (agentops.ai), the lack of input validation could allow malicious payloads to be sent if untrusted input is passed to the function. The suggested validation (checking if it's a string, alphanumeric, and reasonable length) is appropriate for API keys and would prevent potential abuse.

Analysis: The security issue is real - the api_key parameter is used directly in the JSON payload without validation. While this is an internal function and the endpoint is controlled, API keys should still be validated for format and length to prevent potential abuse. The fix is specific and implementable, though the production impact is moderate since it requires malicious input to be passed to the function. The urgency is medium as it's a security concern but requires specific conditions to exploit.


@areibman
Copy link
Contributor

Walkthrough

This PR modernizes the AgentOps client and SDK by introducing asynchronous HTTP handling using aiohttp, refactoring authentication and telemetry flows to support dynamic JWT token management, and improving error handling and flexibility throughout. The update enhances compatibility with async applications, streamlines configuration and validation, and ensures robust, non-blocking authentication and telemetry export. Test and configuration files are updated to align with the new async-first architecture.

Changes

File(s) Summary
agentops/client/api/base.py Refactored BaseApiClient to use async HTTP methods and return JSON or None; updated docstrings and method names for async behavior.
agentops/client/api/versions/v3.py Refactored fetch_auth_token to async, updated to use async_request, simplified error handling, and updated imports.
agentops/client/api/versions/v4.py Refactored V4 API client: added async post method, rewrote upload methods for async, improved error handling, and updated return types.
agentops/client/client.py Added async authentication handling, background token fetching, refactored init for dynamic JWT, improved error handling, and removed legacy session code.
agentops/client/http/http_client.py Refactored HttpClient for async-first HTTP using aiohttp, added async_request, managed async/sync sessions, improved thread safety, and removed old session logic.
agentops/config.py Changed API key validation to log warnings instead of raising exceptions; removed InvalidApiKeyException import.
agentops/sdk/core.py Refactored telemetry to use dynamic JWT provider, replaced OTLP exporter, added update_config, improved shutdown, and updated method signatures.
agentops/sdk/exporters.py Enhanced AuthenticatedOTLPExporter for dynamic JWT, injected JWT per export, improved error handling, ensured thread safety, and removed static JWT/session code.
agentops/validation.py Refactored JWT retrieval to async (aiohttp), added sync wrapper, updated validation logic, improved error handling, and simplified LLM span detection.
pyproject.toml Added 'aiohttp' dependency (>=3.8.0, <4.0.0) for async HTTP support.
tests/unit/test_validation.py Updated tests to use get_jwt_token_sync, refactored mocks, changed failure expectations, and cleaned up imports.

Sequence Diagram

This diagram shows the interactions between components:

sequenceDiagram
    title BaseApiClient Asynchronous HTTP Operations

    participant Client as "Client Code"
    participant BaseApiClient
    participant HttpClient

    Note over BaseApiClient: Changed from synchronous to<br/>asynchronous HTTP methods

    %% POST request flow
    Client->>BaseApiClient: async post(path, data, headers)
    activate BaseApiClient
    BaseApiClient->>BaseApiClient: _get_full_url(path)
    BaseApiClient->>HttpClient: async_request(method="post", url, data, headers, timeout)
    activate HttpClient
    Note right of HttpClient: Replaced direct HTTP client<br/>with HttpClient static method
    HttpClient-->>BaseApiClient: response_data (Dict or None)
    deactivate HttpClient
    BaseApiClient-->>Client: JSON response as dictionary
    deactivate BaseApiClient

    %% GET request flow
    Client->>BaseApiClient: async get(path, headers)
    activate BaseApiClient
    BaseApiClient->>BaseApiClient: _get_full_url(path)
    BaseApiClient->>HttpClient: async_request(method="get", url, headers=headers, timeout)
    activate HttpClient
    HttpClient-->>BaseApiClient: response_data (Dict or None)
    deactivate HttpClient
    BaseApiClient-->>Client: JSON response as dictionary
    deactivate BaseApiClient

    %% Error handling
    Client->>BaseApiClient: async any_http_method()
    activate BaseApiClient
    BaseApiClient->>HttpClient: async_request()
    activate HttpClient
    Note over HttpClient: Exception occurs
    HttpClient-->>BaseApiClient: Exception
    deactivate HttpClient
    
    alt Exception handling
        BaseApiClient-->>Client: Raise Exception with details
    end
    deactivate BaseApiClient

    Note over Client, HttpClient: All HTTP methods (post, get, put, delete)<br/>now return Optional[Dict[str, Any]]<br/>instead of requests.Response
Loading

▶️AI Code Reviews for VS Code, Cursor, Windsurf
Install the extension

Note for Windsurf Please change the default marketplace provider to the following in the windsurf settings:

Marketplace Extension Gallery Service URL: https://marketplace.visualstudio.com/_apis/public/gallery

Marketplace Gallery Item URL: https://marketplace.visualstudio.com/items

Entelligence.ai can learn from your feedback. Simply add 👍 / 👎 emojis to teach it your preferences. More shortcuts below

Emoji Descriptions:

  • ⚠️ Potential Issue - May require further investigation.
  • 🔒 Security Vulnerability - Fix to ensure system safety.
  • 💻 Code Improvement - Suggestions to enhance code quality.
  • 🔨 Refactor Suggestion - Recommendations for restructuring code.
  • ℹ️ Others - General comments and information.

Interact with the Bot:

  • Send a message or request using the format:
    @entelligenceai + *your message*
Example: @entelligenceai Can you suggest improvements for this code?
  • Help the Bot learn by providing feedback on its responses.
    @entelligenceai + *feedback*
Example: @entelligenceai Do not comment on `save_auth` function !

Also you can trigger various commands with the bot by doing
@entelligenceai command

The current supported commands are

  1. config - shows the current config
  2. retrigger_review - retriggers the review

More commands to be added soon.

Dwij1704 and others added 2 commits July 21, 2025 08:31
…ical headers from being overridden by user-supplied values. Update tests to verify protection of critical headers and ensure proper JWT token usage.
@dot-agi dot-agi requested a review from Copilot July 24, 2025 12:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the AgentOps SDK by migrating from synchronous to asynchronous HTTP operations and improving authentication handling. The changes implement dynamic JWT token management, graceful error handling, and better logging throughout the system.

  • Migrated HTTP and authentication flows to async using aiohttp with fallback support
  • Refactored client and SDK for non-blocking JWT authentication with dynamic token updates
  • Enhanced error handling to be more graceful, preventing crashes on authentication failures

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/unit/test_validation.py Updated tests to use synchronous wrapper and mock async functions
tests/unit/sdk/test_exporters.py Enhanced security tests for header protection in OTLP exporter
pyproject.toml Added aiohttp dependency for async HTTP functionality
agentops/validation.py Migrated to async JWT token fetching with graceful error handling
agentops/sdk/exporters.py Implemented dynamic JWT authentication and header security protection
agentops/sdk/core.py Updated telemetry setup to support JWT providers and async operations
agentops/config.py Changed API key validation to log warnings instead of throwing exceptions
agentops/client/http/http_client.py Added async HTTP client methods with aiohttp support
agentops/client/client.py Implemented async authentication task management
agentops/client/api/versions/v4.py Enhanced V4 API client with better error handling
agentops/client/api/versions/v3.py Migrated V3 authentication to async with graceful failures
agentops/client/api/base.py Converted base API client methods to async


@patch("agentops.validation.requests.post")
def test_get_jwt_token_success(self, mock_post):
@patch("tests.unit.test_validation.get_jwt_token_sync")
Copy link

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch path 'tests.unit.test_validation.get_jwt_token_sync' is incorrect. It should patch 'agentops.validation.get_jwt_token_sync' since that's where the function is defined and imported from.

Suggested change
@patch("tests.unit.test_validation.get_jwt_token_sync")
@patch("agentops.validation.get_jwt_token_sync")

Copilot uses AI. Check for mistakes.

@patch("agentops.validation.requests.post")
def test_get_jwt_token_failure(self, mock_post):
@patch("tests.unit.test_validation.get_jwt_token_sync")
Copy link

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch path 'tests.unit.test_validation.get_jwt_token_sync' is incorrect. It should patch 'agentops.validation.get_jwt_token_sync' since that's where the function is defined and imported from.

Suggested change
@patch("tests.unit.test_validation.get_jwt_token_sync")
@patch("agentops.validation.get_jwt_token_sync")

Copilot uses AI. Check for mistakes.
@patch("agentops.get_client")
@patch("agentops.validation.requests.post")
def test_get_jwt_token_from_env(self, mock_post, mock_get_client, mock_getenv):
@patch("tests.unit.test_validation.get_jwt_token_sync")
Copy link

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch path 'tests.unit.test_validation.get_jwt_token_sync' is incorrect. It should patch 'agentops.validation.get_jwt_token_sync' since that's where the function is defined and imported from.

Suggested change
@patch("tests.unit.test_validation.get_jwt_token_sync")
@patch("agentops.validation.get_jwt_token_sync")

Copilot uses AI. Check for mistakes.
Comment on lines +130 to +135
def get_metrics_headers():
token = jwt_provider() if jwt_provider else None
return {"Authorization": f"Bearer {token}"} if token else {}

metric_exporter = OTLPMetricExporter(endpoint=metrics_endpoint, headers=get_metrics_headers())

Copy link

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The headers are computed once at initialization but the JWT token may change over time. The OTLPMetricExporter should receive a dynamic header provider or be updated to use the same pattern as the AuthenticatedOTLPExporter.

Suggested change
def get_metrics_headers():
token = jwt_provider() if jwt_provider else None
return {"Authorization": f"Bearer {token}"} if token else {}
metric_exporter = OTLPMetricExporter(endpoint=metrics_endpoint, headers=get_metrics_headers())
class DynamicHeaderOTLPMetricExporter(OTLPMetricExporter):
def __init__(self, endpoint: str, jwt_provider: Optional[Callable[[], Optional[str]]]):
self._jwt_provider = jwt_provider
super().__init__(endpoint=endpoint, headers={})
def _update_headers(self):
token = self._jwt_provider() if self._jwt_provider else None
self._headers = {"Authorization": f"Bearer {token}"} if token else {}
def export(self, *args, **kwargs):
self._update_headers()
return super().export(*args, **kwargs)
metric_exporter = DynamicHeaderOTLPMetricExporter(endpoint=metrics_endpoint, jwt_provider=jwt_provider)

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +100
# Close the old session if it exists but is closed
if cls._async_session is not None and cls._async_session.closed:
cls._async_session = None

# Create connector with connection pooling
connector = aiohttp.TCPConnector(
limit=100, # Total connection pool size
limit_per_host=30, # Per-host connection limit
ttl_dns_cache=300, # DNS cache TTL
use_dns_cache=True,
enable_cleanup_closed=True,
)

# Create session with default headers
headers = {
"Content-Type": "application/json",
"User-Agent": f"agentops-python/{get_agentops_version() or 'unknown'}",
}

cls._async_session = aiohttp.ClientSession(
connector=connector, headers=headers, timeout=aiohttp.ClientTimeout(total=30)
)
Copy link

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The session check and creation is not thread-safe. Multiple threads could simultaneously create sessions, potentially causing resource leaks. Consider using the same locking pattern as get_session().

Suggested change
# Close the old session if it exists but is closed
if cls._async_session is not None and cls._async_session.closed:
cls._async_session = None
# Create connector with connection pooling
connector = aiohttp.TCPConnector(
limit=100, # Total connection pool size
limit_per_host=30, # Per-host connection limit
ttl_dns_cache=300, # DNS cache TTL
use_dns_cache=True,
enable_cleanup_closed=True,
)
# Create session with default headers
headers = {
"Content-Type": "application/json",
"User-Agent": f"agentops-python/{get_agentops_version() or 'unknown'}",
}
cls._async_session = aiohttp.ClientSession(
connector=connector, headers=headers, timeout=aiohttp.ClientTimeout(total=30)
)
with cls._async_session_lock:
if cls._async_session is None or cls._async_session.closed: # Double-check locking
# Close the old session if it exists but is closed
if cls._async_session is not None and cls._async_session.closed:
cls._async_session = None
# Create connector with connection pooling
connector = aiohttp.TCPConnector(
limit=100, # Total connection pool size
limit_per_host=30, # Per-host connection limit
ttl_dns_cache=300, # DNS cache TTL
use_dns_cache=True,
enable_cleanup_closed=True,
)
# Create session with default headers
headers = {
"Content-Type": "application/json",
"User-Agent": f"agentops-python/{get_agentops_version() or 'unknown'}",
}
cls._async_session = aiohttp.ClientSession(
connector=connector, headers=headers, timeout=aiohttp.ClientTimeout(total=30)
)

Copilot uses AI. Check for mistakes.
Comment on lines +120 to +128
if self._auth_task and not self._auth_task.done():
return # Task already running

try:
loop = asyncio.get_event_loop()
if loop.is_running():
# Use existing event loop
self._auth_task = loop.create_task(self._fetch_auth_async(api_key))
else:
Copy link

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for auth task status is not thread-safe. Multiple threads could pass the check simultaneously and create multiple auth tasks. Consider adding proper locking around this check.

Suggested change
if self._auth_task and not self._auth_task.done():
return # Task already running
try:
loop = asyncio.get_event_loop()
if loop.is_running():
# Use existing event loop
self._auth_task = loop.create_task(self._fetch_auth_async(api_key))
else:
with self._auth_lock:
if self._auth_task and not self._auth_task.done():
return # Task already running
try:
loop = asyncio.get_event_loop()
if loop.is_running():
# Use existing event loop
self._auth_task = loop.create_task(self._fetch_auth_async(api_key))
else:
# Create new event loop in background thread
def run_async_auth():
asyncio.run(self._fetch_auth_async(api_key))
import threading
auth_thread = threading.Thread(target=run_async_auth, daemon=True)
auth_thread.start()
except RuntimeError:

Copilot uses AI. Check for mistakes.
@areibman areibman merged commit ac3dadc into main Jul 31, 2025
34 of 48 checks passed
@areibman areibman deleted the async-backend branch July 31, 2025 18:11
fenilfaldu pushed a commit that referenced this pull request Aug 4, 2025
* Enhance async functionality and logging in AgentOps SDK

* Enhance LLM span validation and improve V4 API client functionality

* Refactor header handling in AuthenticatedOTLPExporter to prevent critical headers from being overridden by user-supplied values. Update tests to verify protection of critical headers and ensure proper JWT token usage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants